ecluse
Safe HaskellNone
LanguageGHC2021

Ecluse.Registry.Npm.Filter

Description

The two pure transforms a single public-upstream npm packument needs before Écluse serves it: rewrite the embedded artifact URLs under the mount's prefix, and replay a FilterPlan's verdicts across every version.

Both transforms operate structurally over the raw aeson Value, never by re-serialising a typed model. This is load-bearing: the served packument is an open document — its schema is additionalProperties: true (see docs/architecture/api-surface.md → "The synthesized-packument schema") — so any field Écluse does not model (author keys, registry bookkeeping, per-version extras) must be relayed unchanged. Editing the Value in place removes denied versions and rewrites dist.tarball while leaving every unmodelled key untouched; rebuilding the body from Ecluse.Package would silently drop them.

The decision/replay split

Which versions survive, where dist-tags.latest resolves, and each version's denial Decision is the ecosystem-agnostic filtering decision, taken over the typed PackageInfo by Ecluse.Package.Filter and handed here as a FilterPlan. This module owns only the __npm wire-shape replay__: restrict versions/time to the surviving keys, rebuild dist-tags, and rewrite tarball URLs over the raw upstream bytes. The npm wire knowledge lives here; the decision logic does not (it is reused by every ecosystem). See docs/architecture/registry-model.md → "Decision surface vs served surface".

URL rewriting

rewriteTarballUrls rewrites each version's dist.tarball to {mount-base}/{pkg}/-/{file}, so a client resolving metadata through the proxy also downloads the bytes through it rather than going straight to upstream and bypassing the gate (see docs/architecture/hosting.md → "The load-bearing requirement: URL rewriting"). Keeping artifacts same-host also keeps npm's auth flowing, which a separate artifact host would silently drop. The mount's externally-visible base URL is supplied by the caller; this transform performs no IO. It is idempotent, so a later assembly pass that rewrites the merged body again is a no-op on an already-rewritten URL.

Replaying the filter plan

applyFilterPlan replays a FilterPlan onto the raw Value: a version not in the plan's survivors is removed from both versions and time, so a client's resolver only ever sees admitted versions (presence in the packument is availability — see docs/research/reverse-engineering/npm.md §8). dist-tags.latest is repointed at the plan's resolved latest, and any other tag whose target did not survive is dropped, never repointed. Finally tarball URLs are rewritten under the mount base. The result is coherent: dist-tags.latest is always a key of versions, and time has an entry for exactly the surviving versions.

When the plan has no survivors, the replay returns NoSurvivors carrying the plan's per-version denial Decisions; the serve layer maps that to a status, which this module deliberately does not choose. A body that is not even a JSON object is not a packument we can replay onto — it carries no versions to serve, so it yields NoSurvivors with no decisions.

Synopsis

URL rewriting

rewriteTarballUrls :: Text -> Value -> Value Source #

Rewrite every version's dist.tarball to {base}/{pkg}/-/{file}, so the artifact is fetched back through this mount rather than directly from upstream.

base is the mount's externally-visible base URL (including any path prefix), supplied by the caller; a trailing slash on it is ignored. {pkg} is the packument's own name (the scoped @scope/name form npm uses in URLs), read from the document so the transform is self-contained. {file} is the upstream tarball URL's last path segment — the artifact filename — preserved verbatim so the bytes a client integrity-checks are unchanged.

Total and lossless: a version with no dist object, no tarball string, or a tarball with no filename segment is left untouched, as is a document with no usable name; every unmodelled key is relayed unchanged. Rewriting is idempotent — a second pass derives the same {pkg} and {file} and so produces the same URL.

The name is upstream-controlled (it is the packument's own field), so each of its structural components — the scope and base name either side of a @scope/ prefix — is gated through "Ecluse.Server.Route.isSafeComponent" before it is interpolated. A name carrying a traversal, an embedded separator, or a control character is rejected and the document is left untouched rather than emit a dist.tarball that aims a client outside the package's own path.

Filtering

applyFilterPlan :: Text -> FilterPlan -> Value -> FilterResult Source #

Replay a FilterPlan onto the raw packument Value, removing every non-surviving version, repairing cross-field coherence, and rewriting tarball URLs under base (the mount's externally-visible base URL).

The plan was decided over the projected PackageInfo (the typed view of the same document), but the edits land on the raw Value, so unmodelled fields survive (see the module header). A version key is kept iff it is in the plan's survivors.

When survivors remain the body is returned Filtered with:

  • versions and time restricted to the surviving version keys (time is pruned by removal of the denied keys, so its unmodelled created/modified bookkeeping is relayed);
  • dist-tags.latest pointed at the plan's resolved latest (fpLatest) — the kept upstream latest, or its downward repoint when the upstream latest was denied;
  • every other dist-tags entry whose target did not survive dropped (never repointed — repointing beta at a stable release would misrepresent it);
  • every surviving version's dist.tarball rewritten under base. The rewrite is idempotent, so a later cross-upstream assembly pass that rewrites the merged body again leaves these URLs unchanged.

When the plan has no survivors, NoSurvivors carries its per-version decisions. A non-object body is not a packument we can replay onto; with no versions it has no survivors and no decisions to report.

data FilterResult Source #

The outcome of replaying a FilterPlan onto a packument.

A Filtered body still has at least one admitted version and is internally coherent. NoSurvivors means every version was rejected; it carries each version's Decision so the serve layer can render the denial and choose the status (403 for an all-policy denial, 503 for a transient or undecidable cause). Choosing that status is not this module's job.

Constructors

Filtered Value

At least one version survived; the coherent, filtered packument body.

NoSurvivors [Decision]

No version survived; each rejected version's decision, for the serve layer to map to a status and a denial body.

Instances

Instances details
Show FilterResult Source # 
Instance details

Defined in Ecluse.Registry.Npm.Filter

Eq FilterResult Source # 
Instance details

Defined in Ecluse.Registry.Npm.Filter