| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Server.Pipeline
Description
The serve paths behind the package routes: the packument merge behind
GET /{pkg} and the artifact relay behind GET /{pkg}/-/{file}.tgz.
This is the data-plane handler module. It composes the
slices that decide what to serve — the registry client
(Ecluse.Registry.Npm), the per-version rules (Ecluse.Rules), the structural
filter (Ecluse.Registry.Npm.Filter), the cross-upstream merge
(Ecluse.Package.Merge), the metadata cache (Ecluse.Server.Cache), the
own-ETag conditional (Ecluse.Server.Conditional), and the serve-outcome status
(Ecluse.Server.Response) — into one action in the
Handler reader, reading its mount's serve dependencies and
the composition-root Env from the request's
RequestCtx.
Credential authority
This handler implements the default passthrough credential posture (see
docs/architecture/access-model.md). The invariant that holds under every
strategy is the public strip: the client's credential is __stripped before any
public-upstream fetch__, which is always anonymous — sending an internal token to the
public registry would be a credential disclosure, so the public-upstream fetch is built
with no token at all. Under passthrough the client's own credential is additionally
forwarded verbatim to the private upstream, which is the authority for who may
read what. The two origins are fetched concurrently, each with its own credential
posture; nothing shares a token across the trust split.
Because passthrough makes the private upstream the per-client authority, its
metadata is not cached across clients here: the private origin is fetched and parsed on
every request with that client's own credential, so the upstream re-authorises each
client itself, and only the anonymous public origin is cached (one shared document, no
per-client authority to preserve). Caching the private origin keyed by base URL alone
would let one client's cached entry serve another client's private document within the
TTL, bypassing the upstream's authorisation — a cross-client disclosure. (Other
strategies make the private origin shareable by authorising each serve differently; the
metadata cache itself stays credential-free regardless — see
docs/architecture/access-model.md → Caching.)
Merge, not fallback
A packument is the set of available versions, spread across upstreams, so it is
merged rather than short-circuited on a private hit (see
docs/architecture/registry-model.md → "Packument merge across upstreams").
Private versions are trusted and enter unfiltered; public versions are gated
through the rules and the structural filter (filterPlan decides, applyFilterPlan
replays) before they enter; the two are combined, private winning a collision and
an integrity divergence flagged. If one upstream
is unavailable while the other succeeds, the best-effort union of what resolved is
served — only when nothing resolves does the request error.
Decision surface vs served surface
The merge and filter reason over the typed PackageInfo but the document served
is the raw upstream JSON, edited in place, so every unmodeled wire key
survives (see docs/architecture/registry-model.md → "Decision surface vs
served surface"). The MergePlan names, for each surviving version, the source
that won it; the served body is assembled by taking each survivor's object from
the raw Value of its winning source, carrying the reconciled dist-tags and
time, and relaying every other top-level key from the precedence-winning
document. The typed model is never re-serialised. The two fields the merge owns as
a decision — dist-tags.latest and the time instants — are re-rendered from that
decision (the times as normalised ISO-8601), so they may differ byte-for-byte from
any single upstream while denoting the same value; integrity-bearing fields
(dist.integrity, dist.tarball) are relayed raw and untouched. The served bytes
get our own ETag, since a merged/filtered body matches no single upstream's.
Ecosystem coupling
This is the npm packument pipeline: it reaches for the npm registry
client, projection, and structural filter directly, so it is the one
Ecluse.Server.* module that depends on a concrete adapter. The coupling is
expedient, not intended — the agnostic handles that would let it dispatch through an
adapter (a per-adapter router, and an ecosystem-neutral filter/projection) would
let a second ecosystem reuse this orchestration unchanged.
Artifact path
The tarball handler (serveTarball) is the demand-driven artifact relay. It fetches
each tarball from its authoritative upstream location — the Artifact.artUrl the
projection preserved from the upstream dist.tarball, selected from the gated
version by the requested filename — rather than reconstructing
{base}/{pkg}/-/{file} by npm convention. Honouring the upstream-declared
location is what lets the proxy front a registry that serves its artifacts from a
separate host or an off-convention path (a CDN/files host, a signed URL); a
reconstruction would silently fetch the wrong place. The location is gated, not
trusted: it is fetched only when the tarball-host policy
(tarballHostAllowed, per PROXY_RESPECT_UPSTREAM_TARBALL_HOST)
admits its host — the default refuses a cross-host dist.tarball — and the
untrusted egress additionally carries the resolved-IP recheck.
The private origin is tried first, uncached, forwarding the client's credential:
its packument is fetched, the requested artifact selected, and its artUrl fetched
over the trusted manager; a hit streams the artifact through with __bounded
memory__ (the withResponse/responseStream relay, never a buffering fetch), and
any non-served outcome — the packument not resolving, no artifact matching the
filename, the policy refusing the host, a non-2xx — falls through. The public
origin is anonymous: it gates that one version against the rules (the same
machinery the packument path gates the whole set with) and selects the artifact, and
on an admit __streams the public bytes from artUrl and enqueues a
MirrorJob__ (naming that authoritative URL) for the worker to
back-fill the mirror target; on a reject — including a host the tarball-host policy
refuses — it renders the serve error model (403/503/500/404) through the
mount's renderer. The enqueue is __serve-then-enqueue, best-effort and
non-blocking__: the artifact reaches the client first, and an enqueue failure is
swallowed rather than failing or delaying the response. Mirroring is
demand-driven — a job is enqueued only here, on a tarball-path admit, never when
a packument is filtered. The serve path does not verify dist.integrity; the
client checks the artifact's own hash and the worker re-verifies before publishing.
An artifact is a pass-through body — served byte-identical to upstream's — so its
conditional-GET handling relays rather than computing an own ETag (see
docs/architecture/web-layer.md → "Middleware and helper libraries", and contrast
the merged-packument own-ETag path): the client's If-None-Match/If-Modified-Since
are forwarded onto the upstream artifact request on both legs (forwardValidators),
and an upstream 304 Not Modified is relayed straight back to the client as a bodiless
304 (isNotModified via the relay's accept predicate) rather than re-downloading the
tarball — the cheap freshness check on the hot artifact path.
Synopsis
- servePackument :: PackageName -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived
- headPackument :: PackageName -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived
- serveTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived
- headTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived
The packument handler
servePackument :: PackageName -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #
Serve a GET /{pkg} packument request end to end, over the request's
RequestCtx.
The mount's PackumentDeps and error renderer are read from the matched
MountBinding in context, not threaded as arguments. When the mount has no
packument-serve dependencies wired, the route is recognised but not served — a
501 in the mount's surface — rather than fabricating a result.
With dependencies wired: the edge token, if configured, is validated before any
upstream is touched. Then the private and public upstreams are fetched
concurrently — the client's credential forwarded to the private origin, the public
origin anonymous — each parse failure or unavailable upstream degrading to a missing
contribution rather than an error. Private versions are trusted as-is; public
versions are gated through the rules and the structural filter (filterPlan then
applyFilterPlan); the surviving sets are merged (mergePackuments) and the
MergePlan replayed onto the raw upstream Values to assemble the served body,
which is then answered against the client's conditional request with our own ETag.
When nothing survives, the status follows the most recoverable cause via
packumentStatus. An origin whose self-reported packument name disagrees with the
route is validated out — dropped as untrusted for this request and logged — so a
single misreporting upstream never denies a package another upstream serves; when
that leaves no valid origin, the request is a 502 (a responding upstream
returned an invalid response), distinct from a genuine absence. Every refusal — the
edge 401 and the no-survivors 403/503/502/500 — is rendered through the
mount's MountRenderer.
headPackument :: PackageName -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #
Serve a HEAD /{pkg} packument request: the identical pipeline and gating as
servePackument — the same fetch, merge, filter, rule decision, and no-survivors
status — answered with the identical status and headers as the GET (the would-be
merged body's Content-Length and the own ETag the conditional-request machinery
computes), but with the body suppressed (bodiless), as HTTP semantics require of a
HEAD reply.
A packument body is assembled locally (a metadata fetch plus the cross-upstream
merge), so — unlike the tarball HEAD (headTarball) — answering it pumps __no
artifact body__ and carries no egress-amplification risk: this is the HTTP-correctness
half of the explicit-HEAD handling, not the DoS lever the tarball path closes. The
merged body is still materialised, to size it and compute its ETag; only the bytes
are withheld from the reply.
The tarball handler
serveTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #
Serve a GET /{pkg}/-/{file}.tgz artifact request end to end, over the
request's RequestCtx.
The mount's PackumentDeps and error renderer are read from the matched
MountBinding; an unwired mount is the recognised-but-unserved 501 stub (as for
servePackument). With dependencies wired and the edge token (if any) validated,
the artifact is fetched by the preserved Filename — never a name rebuilt from
the coordinate:
- the private upstream is tried first, uncached, forwarding the client's
credential; a
2xxstreams the bytes through with bounded memory, any other status falls through; - on a private miss the public version metadata is fetched anonymously and
that one version gated against the rules; an admit streams the public bytes
and enqueues a
MirrorJob(serve-then-enqueue, the enqueue best-effort and non-blocking), a reject renders the serve error model (403/503/500/404) through the mount's renderer.
The public-upstream fetch is always anonymous (the client credential is never sent to the
public upstream); the mirror job carries no credential. The serve path does not
verify dist.integrity (see the module header → "Artifact path").
headTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #
Serve a HEAD /{pkg}/-/{file}.tgz artifact request end to end, over the
request's RequestCtx.
A HEAD must never run the full-GET streaming pump: a bodiless HEAD would
otherwise open the upstream artifact connection and pump a whole artifact body that
the reply then discards — wasted upstream egress and a DoS-amplification lever (a
client forcing arbitrary full-artifact fetches with cheap HEADs). So this handler
gates the artifact through the identical pipeline as serveTarball — the same
edge auth, host-allowlist, internal-range, and tarball-host policy, and the same
upstream-request construction — but issues the upstream request as a HEAD and relays
its status and safe response headers (relayArtifact) with no body
(probeUpstreamWhen). On an admit no MirrorJob is enqueued: a
HEAD serves no bytes, so there is nothing to back-fill (mirroring stays demand-driven
on the GET path). A refusal renders the same serve error model with an empty body.