| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Telemetry.Resolve
Description
Telemetry configuration resolution and export-failure routing — the boot-time substrate that sits between the operator's environment and the OpenTelemetry SDK.
Écluse's maintainer runs Datadog, but the project is vendor-neutral, so an operator
may describe the same telemetry identity in either dialect: a Datadog shop sets the
DD_* variables, a plain OpenTelemetry shop sets the OTEL_* ones. This module is
the self-aligning resolver that collapses both into one answer, so logs and
traces share a single identity whichever dialect was provided.
The resolver
resolveTelemetry is a bounded precedence table over exactly four fields —
service.name, deployment.environment, service.version, and the OTLP export
endpoint — each resolved Datadog-value-wins → vanilla OpenTelemetry → default.
It is deliberately not a general per-variable merge: only these four cross between
the dialects, and only their fixed precedence is encoded. The DD_API_KEY /
DD_SITE agentless-SaaS credentials are never read — Écluse exports to an
operator-declared, node-local collector/Agent, never directly to a vendor's
cloud, so there is no path by which a key in the environment turns into off-cluster
egress. The endpoint itself is a declared destination (like the mirror queue), not an
attack surface, so it is normalised and used as given, not classified or gated.
The resolved ResolvedTelemetry is the single source of truth for both halves
of the telemetry stack: otelEnvironmentOverrides projects it back to the canonical
OTEL_* variables the env-driven SDK reads (so a DD_*-only deployment still
configures the exporter), and the same record feeds the dd log object that stitches
a log line to its trace.
Export-failure routing
Telemetry failures must stay off the request path and out of raw stderr. The SDK's
batch exporter runs asynchronously, so an unreachable collector never touches a served
request. This module owns the shared throttle those failures coalesce through: an
ExportFailureSink carries one throttle plus a katip target, and routeExportFailure
surfaces the first failure plainly, then a periodic heartbeat carrying the suppressed
count, so a persistently unreachable endpoint is one visible warning and a heartbeat,
not a per-flush flood. The exporter wrappers (Ecluse.Telemetry) feed the sink through
observeExportResult; installExportErrorHandler routes the SDK's own diagnostic
stream through the same sink.
The configuration model and the export-failure mechanism are described in
docs/architecture/observability.md.
Synopsis
- data ResolvedTelemetry = ResolvedTelemetry {}
- data TelemetryEndpoint = TelemetryEndpoint {
- teUrl :: Text
- teSource :: EndpointSource
- data EndpointSource
- resolveTelemetry :: [(String, String)] -> ResolvedTelemetry
- otelEnvironmentOverrides :: [(String, String)] -> [(String, String)]
- data ThrottleState = ThrottleState {}
- data ThrottleEmit
- initialThrottle :: ThrottleState
- throttleInterval :: NominalDiffTime
- throttleStep :: NominalDiffTime -> UTCTime -> ThrottleState -> (ThrottleState, ThrottleEmit)
- data ExportFailureSink
- newExportFailureSink :: IO UTCTime -> (Severity -> Text -> IO ()) -> IO ExportFailureSink
- exportFailureSink :: LogEnv -> IO ExportFailureSink
- routeExportFailure :: ExportFailureSink -> Text -> IO ()
- observeExportResult :: ExportFailureSink -> Text -> ExportResult -> IO ()
- installExportErrorHandler :: ExportFailureSink -> IO ()
- prepareTelemetry :: LogEnv -> [(String, String)] -> IO ()
The resolved telemetry identity
data ResolvedTelemetry Source #
The telemetry identity resolved from the environment: the single source of
truth for both the SDK configuration and the dd log object. rtEnvironment and
rtVersion are Nothing when the operator named neither dialect's form — they are
genuinely optional resource attributes, not defaulted to a placeholder.
Constructors
| ResolvedTelemetry | |
Fields
| |
Instances
| Show ResolvedTelemetry Source # | |
Defined in Ecluse.Telemetry.Resolve Methods showsPrec :: Int -> ResolvedTelemetry -> ShowS # show :: ResolvedTelemetry -> String # showList :: [ResolvedTelemetry] -> ShowS # | |
| Eq ResolvedTelemetry Source # | |
Defined in Ecluse.Telemetry.Resolve Methods (==) :: ResolvedTelemetry -> ResolvedTelemetry -> Bool # (/=) :: ResolvedTelemetry -> ResolvedTelemetry -> Bool # | |
data TelemetryEndpoint Source #
A resolved OTLP export endpoint and the source it was resolved from.
Constructors
| TelemetryEndpoint | |
Fields
| |
Instances
| Show TelemetryEndpoint Source # | |
Defined in Ecluse.Telemetry.Resolve Methods showsPrec :: Int -> TelemetryEndpoint -> ShowS # show :: TelemetryEndpoint -> String # showList :: [TelemetryEndpoint] -> ShowS # | |
| Eq TelemetryEndpoint Source # | |
Defined in Ecluse.Telemetry.Resolve Methods (==) :: TelemetryEndpoint -> TelemetryEndpoint -> Bool # (/=) :: TelemetryEndpoint -> TelemetryEndpoint -> Bool # | |
data EndpointSource Source #
Where a resolved OTLP endpoint came from, so the boot path can distinguish a deliberately-configured target from the silent default and warn on the latter.
Constructors
| FromDdAgentHost | Derived from |
| FromOtelEndpoint | Taken verbatim from |
| DefaultedEndpoint | No endpoint was configured; the |
Instances
| Show EndpointSource Source # | |
Defined in Ecluse.Telemetry.Resolve Methods showsPrec :: Int -> EndpointSource -> ShowS # show :: EndpointSource -> String # showList :: [EndpointSource] -> ShowS # | |
| Eq EndpointSource Source # | |
Defined in Ecluse.Telemetry.Resolve Methods (==) :: EndpointSource -> EndpointSource -> Bool # (/=) :: EndpointSource -> EndpointSource -> Bool # | |
resolveTelemetry :: [(String, String)] -> ResolvedTelemetry Source #
Resolve the telemetry identity from an environment list, each field
Datadog-value-wins → vanilla OpenTelemetry → default. service.name falls
DD_SERVICE → OTEL_SERVICE_NAME → service.name in OTEL_RESOURCE_ATTRIBUTES →
ecluse; deployment.environment and service.version fall DD_ENV/DD_VERSION
→ the matching OTEL_RESOURCE_ATTRIBUTES key → unset; the endpoint is DD_AGENT_HOST
(as http://{host}:4318) → OTEL_EXPORTER_OTLP_ENDPOINT → http://localhost:4318.
A value present but blank is treated as unset, so an empty DD_ENV= does not stamp an
empty environment onto every signal. DD_API_KEY and DD_SITE are never consulted.
>>>rtServiceName (resolveTelemetry [("DD_SERVICE", "api"), ("OTEL_SERVICE_NAME", "ignored")])"api"
>>>teUrl (rtEndpoint (resolveTelemetry []))"http://localhost:4318"
Canonical OTEL_* projection
otelEnvironmentOverrides :: [(String, String)] -> [(String, String)] Source #
Project the resolved identity back to the canonical OTEL_* variables the
env-driven SDK reads, so a DD_*-only deployment still configures the exporter. The
overrides set OTEL_SERVICE_NAME, the OTLP endpoint, the http/protobuf protocol
(the only transport built — gRPC is behind a disabled cabal flag), and an
OTEL_RESOURCE_ATTRIBUTES whose service.name/deployment.environment/
service.version keys are overlaid by the resolution while any other operator-set
attributes are preserved.
Applied with setEnv before the SDK initialises (see
prepareTelemetry); idempotent for a vanilla deployment that already set the same
OTEL_* values.
Export-failure throttle (pure core)
data ThrottleState Source #
The throttle state for SDK export-error routing: when an error was last logged, and how many have been suppressed since. Exposed so the throttle decision is unit-tested without wall-clock timing.
Constructors
| ThrottleState | |
Fields
| |
Instances
| Show ThrottleState Source # | |
Defined in Ecluse.Telemetry.Resolve Methods showsPrec :: Int -> ThrottleState -> ShowS # show :: ThrottleState -> String # showList :: [ThrottleState] -> ShowS # | |
| Eq ThrottleState Source # | |
Defined in Ecluse.Telemetry.Resolve Methods (==) :: ThrottleState -> ThrottleState -> Bool # (/=) :: ThrottleState -> ThrottleState -> Bool # | |
data ThrottleEmit Source #
What throttleStep decided to do with an export error.
Constructors
| EmitFirst | The first error: surface it plainly. |
| EmitHeartbeat Int | The throttle window elapsed: surface a heartbeat carrying the count of errors since the last surfaced one (this one included). |
| EmitSuppress | Within the window: suppress and count. |
Instances
| Show ThrottleEmit Source # | |
Defined in Ecluse.Telemetry.Resolve Methods showsPrec :: Int -> ThrottleEmit -> ShowS # show :: ThrottleEmit -> String # showList :: [ThrottleEmit] -> ShowS # | |
| Eq ThrottleEmit Source # | |
Defined in Ecluse.Telemetry.Resolve | |
initialThrottle :: ThrottleState Source #
The initial throttle state: nothing logged, nothing suppressed.
throttleInterval :: NominalDiffTime Source #
How long export errors are coalesced between surfaced heartbeats.
throttleStep :: NominalDiffTime -> UTCTime -> ThrottleState -> (ThrottleState, ThrottleEmit) Source #
Advance the throttle for one export error at now: surface the first error,
surface a heartbeat once the throttleInterval has elapsed since the last surfaced
one (resetting the suppressed count), and otherwise suppress while counting. Pure,
so a sequence of (time, decision) steps is asserted directly.
Export-failure routing
data ExportFailureSink Source #
The shared export-failure sink: a single throttle plus the katip target that
every export failure feeds — the span exporter, the metric exporter, and the SDK's own
diagnostic stream — so a persistently unreachable collector is one coalesced stream (the
first failure plainly, then a periodic heartbeat) rather than several independent floods.
The clock and the surfacing action are injected so the throttle decision is unit-tested
without wall-clock timing or a live katip scribe (mirroring the pure throttleStep
tests); exportFailureSink wires the production clock and katip target.
newExportFailureSink :: IO UTCTime -> (Severity -> Text -> IO ()) -> IO ExportFailureSink Source #
Build an export-failure sink over an injected clock and surfacing action.
routeExportFailure :: ExportFailureSink -> Text -> IO () Source #
Route one export-failure diagnostic through the shared throttle into katip: the
first surfaced plainly, a heartbeat carrying the suppressed count once throttleInterval
has elapsed since the last surfaced one, otherwise suppressed and counted.
observeExportResult :: ExportFailureSink -> Text -> ExportResult -> IO () Source #
Observe one exporter's ExportResult, routing a Failure through the sink and
ignoring a Success. This only observes the failure — the inner result is the
caller's to return unchanged, so export semantics are untouched (a failed export stays
off the request path). signal names the failing exporter (span / metric).
installExportErrorHandler :: ExportFailureSink -> IO () Source #
Install a process-global handler for the SDK's own diagnostic stream, routed through
the shared sink so it coalesces with the exporter-failure feed. In hs-opentelemetry
1.0.0.0 the only caller of this handler is the SDK's internal logging — a failed OTLP
export is dropped there rather than routed here — so the export-failure feed comes from
the exporter wrappers (observeExportResult); this handler is kept for the SDK-internal
diagnostics it still serves.
The forwarded diagnostic String is the SDK's own text and is trusted not to carry
secrets: this module never reads the credential-bearing telemetry inputs
(OTEL_EXPORTER_OTLP_HEADERS, DD_API_KEY, DD_SITE), so the only residual channel is
whatever the SDK itself chooses to log, which the upstream exporter keeps to
endpoint/status diagnostics.
Boot wiring
prepareTelemetry :: LogEnv -> [(String, String)] -> IO () Source #
Prepare the telemetry substrate at boot, before the SDK initialises: resolve the
identity and normalise the canonical OTEL_* environment the env-driven SDK reads (so a
DD_*-only deployment still configures the exporter). The export-failure observation
itself is wired when the substrate stands up ("Ecluse.Telemetry.withTelemetry"), which
builds the shared sink and installs the exporter wrappers and the SDK error handler.
A defaulted endpoint — neither DD_AGENT_HOST nor OTEL_EXPORTER_OTLP_ENDPOINT set —
is surfaced through katip as one boot warning and falls back to
http://localhost:4318; it is never a failure. The OTLP endpoint is an
operator-declared destination (like the mirror queue), so it is normalised and used
as given, not classified or gated.