| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Registry.Npm
Description
The npm data plane: the effectful Ecluse.Registry fields over
http-client.
This module is the network half of the npm protocol boundary. Where
Ecluse.Registry.Npm.Wire and Ecluse.Registry.Npm.Project are the pure decode
and projection, this is the side-effecting fetch and publish: newNpmClient
assembles a Ecluse.Registry.RegistryClient whose effectful fields talk to a
registry over plain HTTP, and whose parse* fields are the pure projection
re-exported through the handle.
It speaks the npm registry protocol directly with http-client, never
amazonka: the control plane (the GetAuthorizationToken mint, the mirror
queue) is amazonka's job behind separate handles, but the data plane — fetch
metadata, stream a tarball, publish — is ordinary HTTPS+JSON, identical across
every npm-speaking backend (npmjs.org, CodeArtifact, Artifact Registry, a
self-hosted Verdaccio). Keeping the streaming path off amazonka's
conduit/ResourceT machinery is exactly what makes bounded-memory artifact
proxying tractable (see docs/architecture/web-layer.md → "Control plane vs.
data plane").
Request shaping
Three details of the wire protocol are load-bearing and handled here (see
docs/research/reverse-engineering/npm.md → "Transport & conventions"):
- Content negotiation. Metadata comes in two forms selected by
Accept: the abbreviated install view (application/vnd.npm.install-v1+json), which the proxy treats as primary, and the full packument (application/json), needed when a rule reasons over publish age (the abbreviated form drops thetimemap).MetadataFormselects between them; both requestAccept-Encoding: gzip, since popular packuments are megabytes. - Scoped-name path encoding. A scoped name
@scope/nameis encoded on the wire as@scope%2Fname— the scope separator is percent-encoded but the leading@is not.metadataRequestbuilds this from an already-parsedPackageName, never from raw client path segments. - Idempotent publish. A
PUT /{pkg}that re-publishes an existing version returns409 Conflict; because versions are immutable, that conflict is success-equivalent for a redelivered mirror job (the artifact is already present), sopublishArtifacttreats it asRight, not an error.
Streaming and buffering
artifactRequest marks its request non-decompressing (decompress returns
False): a tarball is opaque binary that must reach the client byte-for-byte, so
the .tgz is never gunzipped in flight (and its dist.integrity stays valid).
The request is exposed so the web layer (see
docs/architecture/web-layer.md → "Streaming and resource lifetime") can
bracket it with withResponse/responseStream and relay the open body
without buffering the whole artifact in memory. The handle's
fetchArtifact field, by contrast, buffers (its RegistryResponse
return is whole bytes) and is for the mirror worker, which must read the entire
artifact to verify its integrity before publishing.
Authentication
The client accepts an injected bearer token and attaches it to every
request; it never originates credential policy. Which token to send on which request is
the request pipeline's authority model, decided upstream of this module: always
strip the client's token before any public fetch, and use the minted mirror
token only to write. Whether the client's own token is forwarded to the private
upstream (the default passthrough) or Écluse's own read token is used instead is
the mount's access strategy (see docs/architecture/access-model.md). A client
with no token sends none.
Synopsis
- data NpmClientConfig = NpmClientConfig {
- npmBaseUrl :: Text
- npmManager :: Manager
- npmToken :: Maybe Secret
- npmLimits :: Limits
- defaultNpmConfig :: Manager -> NpmClientConfig
- publicRegistryBaseUrl :: Text
- newNpmClient :: NpmClientConfig -> IO RegistryClient
- data MetadataForm
- = Abbreviated
- | Full
- metadataAccept :: MetadataForm -> ByteString
- data Validators = Validators {}
- noValidators :: Validators
- metadataRequest :: NpmClientConfig -> MetadataForm -> Validators -> PackageName -> Either UrlFormationError Request
- artifactRequest :: NpmClientConfig -> PackageName -> Version -> Either UrlFormationError Request
- artifactRequestByFile :: NpmClientConfig -> PackageName -> Text -> Either UrlFormationError Request
- artifactRequestByUrl :: NpmClientConfig -> Text -> Either UrlFormationError Request
- artifactFileUrl :: Text -> PackageName -> Text -> Either UrlFormationError Text
- publishRequest :: NpmClientConfig -> PackageName -> ByteString -> Either UrlFormationError Request
- npmPublishDocument :: PackageName -> Version -> Text -> Maybe Text -> Maybe Text -> ByteString -> ByteString
- fetchMetadataForm :: NpmClientConfig -> MetadataForm -> Validators -> PackageName -> IO RegistryResponse
- newtype ResponseBoundExceeded = ResponseBoundExceeded LimitError
Construction
data NpmClientConfig Source #
Everything newNpmClient needs to talk to one npm-speaking registry: the
base URL, the shared HTTP Manager, and an optional injected bearer token.
The Manager is shared (it owns the connection pool), so it is taken rather than
built here — the same one the composition root reuses across requests. The token
is whatever the request pipeline decided this client should present; this module
never chooses it (see the module header → Authentication).
Constructors
| NpmClientConfig | |
Fields
| |
defaultNpmConfig :: Manager -> NpmClientConfig Source #
An anonymous client config against the public registry (publicRegistryBaseUrl),
using the given shared Manager and the secure-default response bounds
(defaultLimits). Override npmBaseUrl/npmToken/npmLimits for
a managed backend or a per-deployment budget.
publicRegistryBaseUrl :: Text Source #
The canonical public npm registry base URL, https://registry.npmjs.org.
The default target when no managed backend is configured.
newNpmClient :: NpmClientConfig -> IO RegistryClient Source #
Assemble a Ecluse.Registry.RegistryClient for the npm protocol over the given configuration.
The effectful fields close over the config's Manager and token and speak npm
over HTTP; the parse* fields are the pure projection from
Ecluse.Registry.Npm.Project, re-exported through the handle. The handle's
fetchMetadata requests the Abbreviated form
unconditionally; the richer fetchMetadataForm (for the full packument and
relayed validators) is exposed separately for the request pipeline.
Content negotiation
data MetadataForm Source #
Which of npm's two metadata documents to request, selected by the Accept
header (see metadataAccept).
Constructors
| Abbreviated | The install-optimised abbreviated packument
( |
| Full | The full packument ( |
Instances
| Show MetadataForm Source # | |
Defined in Ecluse.Registry.Npm Methods showsPrec :: Int -> MetadataForm -> ShowS # show :: MetadataForm -> String # showList :: [MetadataForm] -> ShowS # | |
| Eq MetadataForm Source # | |
Defined in Ecluse.Registry.Npm | |
metadataAccept :: MetadataForm -> ByteString Source #
The Accept header value selecting a MetadataForm.
>>>metadataAccept Abbreviated"application/vnd.npm.install-v1+json"
>>>metadataAccept Full"application/json"
Conditional-GET validators
data Validators Source #
The conditional-GET validators to relay on a metadata fetch. Replaying an
upstream's ETag as If-None-Match (or its Last-Modified as
If-Modified-Since) lets the upstream answer 304 Not Modified with no body —
the cheap freshness check the proxy uses on a cache revalidation. Both are
forwarded only when present.
Constructors
| Validators | |
Fields
| |
Instances
| Show Validators Source # | |
Defined in Ecluse.Registry.Npm Methods showsPrec :: Int -> Validators -> ShowS # show :: Validators -> String # showList :: [Validators] -> ShowS # | |
| Eq Validators Source # | |
Defined in Ecluse.Registry.Npm | |
noValidators :: Validators Source #
No conditional-GET validators — an unconditional fetch.
Request building
metadataRequest :: NpmClientConfig -> MetadataForm -> Validators -> PackageName -> Either UrlFormationError Request Source #
Build the metadata GET request for a package: the URL is
{baseUrl}/{encoded-name} with the Accept header for the chosen
MetadataForm, Accept-Encoding: gzip, an optional bearer token, and any
relayed conditional-GET Validators.
The package path is derived from an already-parsed PackageName, then the
scope separator is percent-encoded (@scope/name → @scope%2Fname). Fails
with a UrlFormationError only when the URL cannot be formed (an empty base URL).
artifactRequest :: NpmClientConfig -> PackageName -> Version -> Either UrlFormationError Request Source #
Build the artifact GET request for one version's tarball.
The request is marked non-decompressing (decompress returns False) so the
.tgz bytes are streamed through verbatim — a tarball is opaque binary and must
reach the client byte-for-byte for its dist.integrity to verify. The artifact
URL is the registry-served tarball location, derived like metadataRequest but
addressing the version's artifact path. Exposed so the web layer can bracket it
for bounded-memory streaming (see the module header).
Fails with a UrlFormationError only when the URL cannot be formed.
artifactRequestByFile :: NpmClientConfig -> PackageName -> Text -> Either UrlFormationError Request Source #
Build the artifact GET request addressing a tarball by its __preserved
on-the-wire filename__, at {baseUrl}/{encoded-pkg}/-/{filename}.
The serve path fetches an artifact by the exact filename the client requested —
the authoritative name for the bytes — rather than reconstructing it from
(package, version) as artifactRequest does, so a registry whose tarball naming
differs from the proxy's own convention still resolves. The filename is taken
verbatim (the classifier has already passed it through the component-safety gate),
and the package segment is the same scope-percent-encoded path artifactRequest
uses. The request is marked non-decompressing for the same reason: a .tgz is
opaque binary streamed byte-for-byte so its dist.integrity verifies. Exposed so
the web layer can bracket it for bounded-memory streaming (see the module header).
Fails with a UrlFormationError only when the URL cannot be formed.
artifactRequestByUrl :: NpmClientConfig -> Text -> Either UrlFormationError Request Source #
Build the artifact GET request addressing a tarball at its __authoritative
upstream location__ — the absolute url the projection preserved from the
upstream's dist.tarball — rather than reconstructing it from (base, package,
file).
The artifact location is server-chosen data, not a derivable fact: a registry may
serve a version's tarball from a different host or a path the npm /-/ convention
cannot rebuild (a separate CDN/files host, server-generated segments, a signed
query string). Honouring the preserved location is what lets Écluse front those
registries; the URL it fetches is the same one the served packument's
dist.integrity is paired with, so the bytes still verify. The egress gate
(tarballHostAllowed plus the resolved-IP recheck) decides
whether that location may be fetched; this builder only forms the request once
it is permitted. The NpmClientConfig's npmBaseUrl is unused here (the URL is
absolute) but its Manager and token are not — the manager carries the trust
context and the token the credential posture.
The request is marked non-decompressing for the same reason as artifactRequest:
a .tgz is opaque binary streamed byte-for-byte. Fails with a UrlFormationError
only when the url cannot be parsed into a request.
artifactFileUrl :: Text -> PackageName -> Text -> Either UrlFormationError Text Source #
The artifact (tarball) URL addressing a preserved filename:
{baseUrl}/{encoded-name}/-/{encoded-filename}. The filename is the exact
on-the-wire name (not {base}-{version}.tgz rebuilt from the coordinate), so the
bytes are fetched by the name the client requested; it is percent-encoded as a
single component (encodeComponent) so a once-decoded escape
in it cannot reach the upstream raw. Exposed so the serve path can record the
public artifact location on a mirror job (the same URL its public fetch targets).
Fails with a UrlFormationError only when the URL cannot be formed.
publishRequest :: NpmClientConfig -> PackageName -> ByteString -> Either UrlFormationError Request Source #
Build the publish PUT /{pkg} request: the body is the npm publish
document (a packument carrying the version manifest and the base64 tarball under
_attachments), already serialised by the caller. Carries the bearer token and a
Content-Type: application/json header.
Fails with a UrlFormationError only when the URL cannot be formed; a genuine
write fault (a non-2xx, non-409 status) is the PublishError that
publishArtifact reports.
Publish-document assembly
Arguments
| :: PackageName | The package being published. |
| -> Version | The version being published. |
| -> Text | The tarball's filename — the |
| -> Maybe Text | The |
| -> Maybe Text | The |
| -> ByteString | The verified tarball bytes. |
| -> ByteString |
Assemble the npm publish document for one version from its verified tarball
bytes — the serialised body publishRequest (hence
publishArtifact) PUTs to /{pkg}.
The document is the npm PUT /{pkg} shape: the package name and a single-version
versions map carrying the version manifest (name, version, and a dist with
the integrity digests), dist-tags.latest pointed at that version, and the tarball
itself base64-encoded under _attachments with its byte length. A managed npm
registry (CodeArtifact, Artifact Registry, Verdaccio) recomputes the served
dist.tarball location from the attachment, so the location is not carried.
The integrity digests written into dist are the caller's — the worker passes
the serve-time-admitted digests it has already verified the bytes against — so the
published manifest's integrity matches exactly the bytes attached. The tarball
length is taken from the actual byte count, never a caller-declared size, so the
attachment can never disagree with its own bytes.
This is the inverse of the read-side decode in Ecluse.Registry.Npm.Wire, which
deliberately does not model _attachments: it is constructed only here, for the
write.
Lower-level fetch (form- and validator-aware)
fetchMetadataForm :: NpmClientConfig -> MetadataForm -> Validators -> PackageName -> IO RegistryResponse Source #
Fetch a package's metadata in the requested MetadataForm, relaying any
conditional-GET Validators. The bounded-read fetch used by the handle's
fetchMetadata; the request pipeline calls this directly when it
needs the full packument or wants to revalidate against an ETag.
The body is read chunk-by-chunk through boundedRead against
the config's npmLimits, not buffered whole: a hostile or compromised upstream
returning a body larger than maxBodyBytes is aborted
fail-closed rather than exhausting memory (security.md invariant 4). A body
within budget is returned whole (the metadata path projects the entire document);
artifacts are the separate streaming concern, not bounded here. The request's
Accept-Encoding: gzip still applies — http-client decompresses transparently
under withResponse exactly as under httpLbs, so the cap bounds the
decompressed bytes the proxy actually retains.
A body-size breach surfaces as a typed ResponseBoundExceeded exception carrying
the LimitError, so the request pipeline's tryAny degrades the
contribution to nothing — the fail-closed parse-failure path — rather than the
projection layer ever seeing a truncated body. A request-building failure (an
unformable URL) likewise surfaces as a typed UrlFormationError exception rather
than a silent success: a misconfigured base URL is a programming/config fault on
the read path, not a per-response condition the projection layer reports. (The write
path instead returns an unformable URL as a PublishFault value,
where the worker must choose retry vs. drop.)
Response-bound breach
newtype ResponseBoundExceeded Source #
Raised when an upstream metadata body breaches a Limits
ceiling: the body-size guard here, or — surfaced through the same type by the serve
pipeline — the version-count or nesting-depth guard.
Carries the LimitError (which ceiling, the observed value, and the
cap), so the breach is diagnosable rather than collapsing into an opaque failure:
the serve path logs it at the breach point before degrading the contribution to
nothing. It is thrown fail-closed (never a truncated or partial body), so it surfaces
to the fetch caller exactly as a parse failure would — the request pipeline's tryAny
treats it as a degraded (missing) contribution.
Constructors
| ResponseBoundExceeded LimitError |
Instances
| Exception ResponseBoundExceeded Source # | |
Defined in Ecluse.Registry.Npm | |
| Show ResponseBoundExceeded Source # | |
Defined in Ecluse.Registry.Npm Methods showsPrec :: Int -> ResponseBoundExceeded -> ShowS # show :: ResponseBoundExceeded -> String # showList :: [ResponseBoundExceeded] -> ShowS # | |
| Eq ResponseBoundExceeded Source # | |
Defined in Ecluse.Registry.Npm Methods (==) :: ResponseBoundExceeded -> ResponseBoundExceeded -> Bool # (/=) :: ResponseBoundExceeded -> ResponseBoundExceeded -> Bool # | |