ecluse
Safe HaskellNone
LanguageGHC2021

Ecluse.Server.Pipeline

Description

The serve paths behind the package routes: the packument merge behind GET /{pkg} and the artifact relay behind GET /{pkg}/-/{file}.tgz.

This is the data-plane handler module. It composes the slices that decide what to serve — the registry client (Ecluse.Registry.Npm), the per-version rules (Ecluse.Rules), the structural filter (Ecluse.Registry.Npm.Filter), the cross-upstream merge (Ecluse.Package.Merge), the metadata cache (Ecluse.Server.Cache), the own-ETag conditional (Ecluse.Server.Conditional), and the serve-outcome status (Ecluse.Server.Response) — into one action in the Handler reader, reading its mount's serve dependencies and the composition-root Env from the request's RequestCtx.

Credential authority

This handler implements the default passthrough credential posture (see docs/architecture/access-model.md). The invariant that holds under every strategy is the public strip: the client's credential is __stripped before any public-upstream fetch__, which is always anonymous — sending an internal token to the public registry would be a credential disclosure, so the public-upstream fetch is built with no token at all. Under passthrough the client's own credential is additionally forwarded verbatim to the private upstream, which is the authority for who may read what. The two origins are fetched concurrently, each with its own credential posture; nothing shares a token across the trust split.

Because passthrough makes the private upstream the per-client authority, its metadata is not cached across clients here: the private origin is fetched and parsed on every request with that client's own credential, so the upstream re-authorises each client itself, and only the anonymous public origin is cached (one shared document, no per-client authority to preserve). Caching the private origin keyed by base URL alone would let one client's cached entry serve another client's private document within the TTL, bypassing the upstream's authorisation — a cross-client disclosure. (Other strategies make the private origin shareable by authorising each serve differently; the metadata cache itself stays credential-free regardless — see docs/architecture/access-model.mdCaching.)

Merge, not fallback

A packument is the set of available versions, spread across upstreams, so it is merged rather than short-circuited on a private hit (see docs/architecture/registry-model.md → "Packument merge across upstreams"). Private versions are trusted and enter unfiltered; public versions are gated through the rules and the structural filter (filterPlan decides, applyFilterPlan replays) before they enter; the two are combined, private winning a collision and an integrity divergence flagged. If one upstream is unavailable while the other succeeds, the best-effort union of what resolved is served — only when nothing resolves does the request error.

Decision surface vs served surface

The merge and filter reason over the typed PackageInfo but the document served is the raw upstream JSON, edited in place, so every unmodeled wire key survives (see docs/architecture/registry-model.md → "Decision surface vs served surface"). The MergePlan names, for each surviving version, the source that won it; the served body is assembled by taking each survivor's object from the raw Value of its winning source, carrying the reconciled dist-tags and time, and relaying every other top-level key from the precedence-winning document. The typed model is never re-serialised. The two fields the merge owns as a decision — dist-tags.latest and the time instants — are re-rendered from that decision (the times as normalised ISO-8601), so they may differ byte-for-byte from any single upstream while denoting the same value; integrity-bearing fields (dist.integrity, dist.tarball) are relayed raw and untouched. The served bytes get our own ETag, since a merged/filtered body matches no single upstream's.

Ecosystem coupling

This is the npm packument pipeline: it reaches for the npm registry client, projection, and structural filter directly, so it is the one Ecluse.Server.* module that depends on a concrete adapter. The coupling is expedient, not intended — the agnostic handles that would let it dispatch through an adapter (a per-adapter router, and an ecosystem-neutral filter/projection) would let a second ecosystem reuse this orchestration unchanged.

Artifact path

The tarball handler (serveTarball) is the demand-driven artifact relay. It fetches each tarball from its authoritative upstream location — the Artifact.artUrl the projection preserved from the upstream dist.tarball, selected from the gated version by the requested filename — rather than reconstructing {base}/{pkg}/-/{file} by npm convention. Honouring the upstream-declared location is what lets the proxy front a registry that serves its artifacts from a separate host or an off-convention path (a CDN/files host, a signed URL); a reconstruction would silently fetch the wrong place. The location is gated, not trusted: it is fetched only when the tarball-host policy (tarballHostAllowed, per PROXY_RESPECT_UPSTREAM_TARBALL_HOST) admits its host — the default refuses a cross-host dist.tarball — and the untrusted egress additionally carries the resolved-IP recheck.

The private origin is tried first, uncached, forwarding the client's credential: its packument is fetched, the requested artifact selected, and its artUrl fetched over the trusted manager; a hit streams the artifact through with __bounded memory__ (the withResponse/responseStream relay, never a buffering fetch), and any non-served outcome — the packument not resolving, no artifact matching the filename, the policy refusing the host, a non-2xx — falls through. The public origin is anonymous: it gates that one version against the rules (the same machinery the packument path gates the whole set with) and selects the artifact, and on an admit __streams the public bytes from artUrl and enqueues a MirrorJob__ (naming that authoritative URL) for the worker to back-fill the mirror target; on a reject — including a host the tarball-host policy refuses — it renders the serve error model (403/503/500/404) through the mount's renderer. The enqueue is __serve-then-enqueue, best-effort and non-blocking__: the artifact reaches the client first, and an enqueue failure is swallowed rather than failing or delaying the response. Mirroring is demand-driven — a job is enqueued only here, on a tarball-path admit, never when a packument is filtered. The serve path does not verify dist.integrity; the client checks the artifact's own hash and the worker re-verifies before publishing.

An artifact is a pass-through body — served byte-identical to upstream's — so its conditional-GET handling relays rather than computing an own ETag (see docs/architecture/web-layer.md → "Middleware and helper libraries", and contrast the merged-packument own-ETag path): the client's If-None-Match/If-Modified-Since are forwarded onto the upstream artifact request on both legs (forwardValidators), and an upstream 304 Not Modified is relayed straight back to the client as a bodiless 304 (isNotModified via the relay's accept predicate) rather than re-downloading the tarball — the cheap freshness check on the hot artifact path.

Synopsis

The packument handler

servePackument :: PackageName -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #

Serve a GET /{pkg} packument request end to end, over the request's RequestCtx.

The mount's PackumentDeps and error renderer are read from the matched MountBinding in context, not threaded as arguments. When the mount has no packument-serve dependencies wired, the route is recognised but not served — a 501 in the mount's surface — rather than fabricating a result.

With dependencies wired: the edge token, if configured, is validated before any upstream is touched. Then the private and public upstreams are fetched concurrently — the client's credential forwarded to the private origin, the public origin anonymous — each parse failure or unavailable upstream degrading to a missing contribution rather than an error. Private versions are trusted as-is; public versions are gated through the rules and the structural filter (filterPlan then applyFilterPlan); the surviving sets are merged (mergePackuments) and the MergePlan replayed onto the raw upstream Values to assemble the served body, which is then answered against the client's conditional request with our own ETag. When nothing survives, the status follows the most recoverable cause via packumentStatus. An origin whose self-reported packument name disagrees with the route is validated out — dropped as untrusted for this request and logged — so a single misreporting upstream never denies a package another upstream serves; when that leaves no valid origin, the request is a 502 (a responding upstream returned an invalid response), distinct from a genuine absence. Every refusal — the edge 401 and the no-survivors 403/503/502/500 — is rendered through the mount's MountRenderer.

headPackument :: PackageName -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #

Serve a HEAD /{pkg} packument request: the identical pipeline and gating as servePackument — the same fetch, merge, filter, rule decision, and no-survivors status — answered with the identical status and headers as the GET (the would-be merged body's Content-Length and the own ETag the conditional-request machinery computes), but with the body suppressed (bodiless), as HTTP semantics require of a HEAD reply.

A packument body is assembled locally (a metadata fetch plus the cross-upstream merge), so — unlike the tarball HEAD (headTarball) — answering it pumps __no artifact body__ and carries no egress-amplification risk: this is the HTTP-correctness half of the explicit-HEAD handling, not the DoS lever the tarball path closes. The merged body is still materialised, to size it and compute its ETag; only the bytes are withheld from the reply.

The tarball handler

serveTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #

Serve a GET /{pkg}/-/{file}.tgz artifact request end to end, over the request's RequestCtx.

The mount's PackumentDeps and error renderer are read from the matched MountBinding; an unwired mount is the recognised-but-unserved 501 stub (as for servePackument). With dependencies wired and the edge token (if any) validated, the artifact is fetched by the preserved Filename — never a name rebuilt from the coordinate:

  • the private upstream is tried first, uncached, forwarding the client's credential; a 2xx streams the bytes through with bounded memory, any other status falls through;
  • on a private miss the public version metadata is fetched anonymously and that one version gated against the rules; an admit streams the public bytes and enqueues a MirrorJob (serve-then-enqueue, the enqueue best-effort and non-blocking), a reject renders the serve error model (403/503/500/404) through the mount's renderer.

The public-upstream fetch is always anonymous (the client credential is never sent to the public upstream); the mirror job carries no credential. The serve path does not verify dist.integrity (see the module header → "Artifact path").

headTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #

Serve a HEAD /{pkg}/-/{file}.tgz artifact request end to end, over the request's RequestCtx.

A HEAD must never run the full-GET streaming pump: a bodiless HEAD would otherwise open the upstream artifact connection and pump a whole artifact body that the reply then discards — wasted upstream egress and a DoS-amplification lever (a client forcing arbitrary full-artifact fetches with cheap HEADs). So this handler gates the artifact through the identical pipeline as serveTarball — the same edge auth, host-allowlist, internal-range, and tarball-host policy, and the same upstream-request construction — but issues the upstream request as a HEAD and relays its status and safe response headers (relayArtifact) with no body (probeUpstreamWhen). On an admit no MirrorJob is enqueued: a HEAD serves no bytes, so there is nothing to back-fill (mirroring stays demand-driven on the GET path). A refusal renders the same serve error model with an empty body.