| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Registry.Npm.Filter
Contents
Description
The two pure transforms a single public-upstream npm packument needs
before Écluse serves it: rewrite the embedded artifact URLs under the mount's
prefix, and replay a FilterPlan's verdicts across every version.
Both transforms operate structurally over the raw aeson Value, never by
re-serialising a typed model. This is load-bearing: the served packument is an
open document — its schema is additionalProperties: true (see
docs/architecture/api-surface.md → "The synthesized-packument schema") — so
any field Écluse does not model (author keys, registry bookkeeping, per-version
extras) must be relayed unchanged. Editing the Value in place removes denied
versions and rewrites dist.tarball while leaving every unmodelled key untouched;
rebuilding the body from Ecluse.Package would silently drop them.
The decision/replay split
Which versions survive, where dist-tags.latest resolves, and each version's
denial Decision is the ecosystem-agnostic filtering decision, taken over the
typed PackageInfo by Ecluse.Package.Filter and handed here as a
FilterPlan. This module owns only the __npm wire-shape
replay__: restrict versions/time to the surviving keys, rebuild dist-tags,
and rewrite tarball URLs over the raw upstream bytes. The npm wire knowledge lives
here; the decision logic does not (it is reused by every ecosystem). See
docs/architecture/registry-model.md → "Decision surface vs served surface".
URL rewriting
rewriteTarballUrls rewrites each version's dist.tarball to
{mount-base}/{pkg}/-/{file}, so a client resolving metadata through the
proxy also downloads the bytes through it rather than going straight to upstream
and bypassing the gate (see docs/architecture/hosting.md → "The load-bearing
requirement: URL rewriting"). Keeping artifacts same-host also keeps npm's auth
flowing, which a separate artifact host would silently drop. The mount's
externally-visible base URL is supplied by the caller; this
transform performs no IO. It is idempotent, so a later assembly pass that
rewrites the merged body again is a no-op on an already-rewritten URL.
Replaying the filter plan
applyFilterPlan replays a FilterPlan onto the raw Value: a version not in the
plan's survivors is removed from both versions and time, so a client's resolver
only ever sees admitted versions (presence in the packument is availability — see
docs/research/reverse-engineering/npm.md §8). dist-tags.latest is repointed
at the plan's resolved latest, and any other tag whose target did not survive is
dropped, never repointed. Finally tarball URLs are rewritten under the mount
base. The result is coherent: dist-tags.latest is always a key of versions, and
time has an entry for exactly the surviving versions.
When the plan has no survivors, the replay returns NoSurvivors carrying the
plan's per-version denial Decisions; the serve layer maps that to a status, which
this module deliberately does not choose. A body that is not even a JSON object is
not a packument we can replay onto — it carries no versions to serve, so it yields
NoSurvivors with no decisions.
Synopsis
- rewriteTarballUrls :: Text -> Value -> Value
- applyFilterPlan :: Text -> FilterPlan -> Value -> FilterResult
- data FilterResult
- = Filtered Value
- | NoSurvivors [Decision]
URL rewriting
rewriteTarballUrls :: Text -> Value -> Value Source #
Rewrite every version's dist.tarball to {base}/{pkg}/-/{file}, so the
artifact is fetched back through this mount rather than directly from upstream.
base is the mount's externally-visible base URL (including any path prefix),
supplied by the caller; a trailing slash on it is ignored. {pkg} is the
packument's own name (the scoped @scope/name form npm uses in URLs), read
from the document so the transform is self-contained. {file} is the upstream
tarball URL's last path segment — the artifact filename — preserved verbatim so
the bytes a client integrity-checks are unchanged.
Total and lossless: a version with no dist object, no tarball string, or a
tarball with no filename segment is left untouched, as is a document with no
usable name; every unmodelled key is relayed unchanged. Rewriting is
idempotent — a second pass derives the same {pkg} and {file} and so
produces the same URL.
The name is upstream-controlled (it is the packument's own field), so each
of its structural components — the scope and base name either side of a @scope/
prefix — is gated through "Ecluse.Server.Route.isSafeComponent" before it is
interpolated. A name carrying a traversal, an embedded separator, or a control
character is rejected and the document is left untouched rather than emit a
dist.tarball that aims a client outside the package's own path.
Filtering
applyFilterPlan :: Text -> FilterPlan -> Value -> FilterResult Source #
Replay a FilterPlan onto the raw packument Value, removing every
non-surviving version, repairing cross-field coherence, and rewriting tarball URLs
under base (the mount's externally-visible base URL).
The plan was decided over the projected PackageInfo (the typed
view of the same document), but the edits land on the raw Value, so unmodelled
fields survive (see the module header). A version key is kept iff it is in the
plan's survivors.
When survivors remain the body is returned Filtered with:
versionsandtimerestricted to the surviving version keys (timeis pruned by removal of the denied keys, so its unmodelledcreated/modifiedbookkeeping is relayed);dist-tags.latestpointed at the plan's resolvedlatest(fpLatest) — the kept upstreamlatest, or its downward repoint when the upstreamlatestwas denied;- every other
dist-tagsentry whose target did not survive dropped (never repointed — repointingbetaat a stable release would misrepresent it); - every surviving version's
dist.tarballrewritten underbase. The rewrite is idempotent, so a later cross-upstream assembly pass that rewrites the merged body again leaves these URLs unchanged.
When the plan has no survivors, NoSurvivors carries its per-version decisions. A
non-object body is not a packument we can replay onto; with no versions it has no
survivors and no decisions to report.
data FilterResult Source #
The outcome of replaying a FilterPlan onto a packument.
A Filtered body still has at least one admitted version and is internally
coherent. NoSurvivors means every version was rejected; it carries each
version's Decision so the serve layer can render the denial and choose the
status (403 for an all-policy denial, 503 for a transient or undecidable cause).
Choosing that status is not this module's job.
Constructors
| Filtered Value | At least one version survived; the coherent, filtered packument body. |
| NoSurvivors [Decision] | No version survived; each rejected version's decision, for the serve layer to map to a status and a denial body. |
Instances
| Show FilterResult Source # | |
Defined in Ecluse.Registry.Npm.Filter Methods showsPrec :: Int -> FilterResult -> ShowS # show :: FilterResult -> String # showList :: [FilterResult] -> ShowS # | |
| Eq FilterResult Source # | |
Defined in Ecluse.Registry.Npm.Filter | |