ecluse
Safe HaskellNone
LanguageGHC2021

Ecluse.Registry.Npm

Description

The npm data plane: the effectful Ecluse.Registry fields over http-client.

This module is the network half of the npm protocol boundary. Where Ecluse.Registry.Npm.Wire and Ecluse.Registry.Npm.Project are the pure decode and projection, this is the side-effecting fetch and publish: newNpmClient assembles a Ecluse.Registry.RegistryClient whose effectful fields talk to a registry over plain HTTP, and whose parse* fields are the pure projection re-exported through the handle.

It speaks the npm registry protocol directly with http-client, never amazonka: the control plane (the GetAuthorizationToken mint, the mirror queue) is amazonka's job behind separate handles, but the data plane — fetch metadata, stream a tarball, publish — is ordinary HTTPS+JSON, identical across every npm-speaking backend (npmjs.org, CodeArtifact, Artifact Registry, a self-hosted Verdaccio). Keeping the streaming path off amazonka's conduit/ResourceT machinery is exactly what makes bounded-memory artifact proxying tractable (see docs/architecture/web-layer.md → "Control plane vs. data plane").

Request shaping

Three details of the wire protocol are load-bearing and handled here (see docs/research/reverse-engineering/npm.md → "Transport & conventions"):

  • Content negotiation. Metadata comes in two forms selected by Accept: the abbreviated install view (application/vnd.npm.install-v1+json), which the proxy treats as primary, and the full packument (application/json), needed when a rule reasons over publish age (the abbreviated form drops the time map). MetadataForm selects between them; both request Accept-Encoding: gzip, since popular packuments are megabytes.
  • Scoped-name path encoding. A scoped name @scope/name is encoded on the wire as @scope%2Fname — the scope separator is percent-encoded but the leading @ is not. metadataRequest builds this from an already-parsed PackageName, never from raw client path segments.
  • Idempotent publish. A PUT /{pkg} that re-publishes an existing version returns 409 Conflict; because versions are immutable, that conflict is success-equivalent for a redelivered mirror job (the artifact is already present), so publishArtifact treats it as Right, not an error.

Streaming and buffering

artifactRequest marks its request non-decompressing (decompress returns False): a tarball is opaque binary that must reach the client byte-for-byte, so the .tgz is never gunzipped in flight (and its dist.integrity stays valid). The request is exposed so the web layer (see docs/architecture/web-layer.md → "Streaming and resource lifetime") can bracket it with withResponse/responseStream and relay the open body without buffering the whole artifact in memory. The handle's fetchArtifact field, by contrast, buffers (its RegistryResponse return is whole bytes) and is for the mirror worker, which must read the entire artifact to verify its integrity before publishing.

Authentication

The client accepts an injected bearer token and attaches it to every request; it never originates credential policy. Which token to send on which request is the request pipeline's authority model, decided upstream of this module: always strip the client's token before any public fetch, and use the minted mirror token only to write. Whether the client's own token is forwarded to the private upstream (the default passthrough) or Écluse's own read token is used instead is the mount's access strategy (see docs/architecture/access-model.md). A client with no token sends none.

Synopsis

Construction

data NpmClientConfig Source #

Everything newNpmClient needs to talk to one npm-speaking registry: the base URL, the shared HTTP Manager, and an optional injected bearer token.

The Manager is shared (it owns the connection pool), so it is taken rather than built here — the same one the composition root reuses across requests. The token is whatever the request pipeline decided this client should present; this module never chooses it (see the module header → Authentication).

Constructors

NpmClientConfig 

Fields

  • npmBaseUrl :: Text

    The registry base URL (e.g. the public registry, or a CodeArtifact npm endpoint). The package path is appended to it.

  • npmManager :: Manager

    The shared http-client Manager to issue requests through.

  • npmToken :: Maybe Secret

    An injected bearer token to attach, or Nothing for anonymous requests.

  • npmLimits :: Limits

    The response-bound budget enforced on a metadata fetch: fetchMetadataForm reads the body through boundedRead against maxBodyBytes, aborting fail-closed past the cap rather than buffering an unbounded body. The other Limits ceilings (version count, nesting depth) are enforced by the decode/projection layer, not here.

defaultNpmConfig :: Manager -> NpmClientConfig Source #

An anonymous client config against the public registry (publicRegistryBaseUrl), using the given shared Manager and the secure-default response bounds (defaultLimits). Override npmBaseUrl/npmToken/npmLimits for a managed backend or a per-deployment budget.

publicRegistryBaseUrl :: Text Source #

The canonical public npm registry base URL, https://registry.npmjs.org. The default target when no managed backend is configured.

newNpmClient :: NpmClientConfig -> IO RegistryClient Source #

Assemble a Ecluse.Registry.RegistryClient for the npm protocol over the given configuration.

The effectful fields close over the config's Manager and token and speak npm over HTTP; the parse* fields are the pure projection from Ecluse.Registry.Npm.Project, re-exported through the handle. The handle's fetchMetadata requests the Abbreviated form unconditionally; the richer fetchMetadataForm (for the full packument and relayed validators) is exposed separately for the request pipeline.

Content negotiation

data MetadataForm Source #

Which of npm's two metadata documents to request, selected by the Accept header (see metadataAccept).

Constructors

Abbreviated

The install-optimised abbreviated packument (application/vnd.npm.install-v1+json). Smaller and the proxy's primary view, but it drops the time map.

Full

The full packument (application/json). Larger, but the only form carrying the time map a publish-age rule needs.

Instances

Instances details
Show MetadataForm Source # 
Instance details

Defined in Ecluse.Registry.Npm

Eq MetadataForm Source # 
Instance details

Defined in Ecluse.Registry.Npm

metadataAccept :: MetadataForm -> ByteString Source #

The Accept header value selecting a MetadataForm.

>>> metadataAccept Abbreviated
"application/vnd.npm.install-v1+json"
>>> metadataAccept Full
"application/json"

Conditional-GET validators

data Validators Source #

The conditional-GET validators to relay on a metadata fetch. Replaying an upstream's ETag as If-None-Match (or its Last-Modified as If-Modified-Since) lets the upstream answer 304 Not Modified with no body — the cheap freshness check the proxy uses on a cache revalidation. Both are forwarded only when present.

Constructors

Validators 

Fields

Instances

Instances details
Show Validators Source # 
Instance details

Defined in Ecluse.Registry.Npm

Eq Validators Source # 
Instance details

Defined in Ecluse.Registry.Npm

noValidators :: Validators Source #

No conditional-GET validators — an unconditional fetch.

Request building

metadataRequest :: NpmClientConfig -> MetadataForm -> Validators -> PackageName -> Either UrlFormationError Request Source #

Build the metadata GET request for a package: the URL is {baseUrl}/{encoded-name} with the Accept header for the chosen MetadataForm, Accept-Encoding: gzip, an optional bearer token, and any relayed conditional-GET Validators.

The package path is derived from an already-parsed PackageName, then the scope separator is percent-encoded (@scope/name@scope%2Fname). Fails with a UrlFormationError only when the URL cannot be formed (an empty base URL).

artifactRequest :: NpmClientConfig -> PackageName -> Version -> Either UrlFormationError Request Source #

Build the artifact GET request for one version's tarball.

The request is marked non-decompressing (decompress returns False) so the .tgz bytes are streamed through verbatim — a tarball is opaque binary and must reach the client byte-for-byte for its dist.integrity to verify. The artifact URL is the registry-served tarball location, derived like metadataRequest but addressing the version's artifact path. Exposed so the web layer can bracket it for bounded-memory streaming (see the module header).

Fails with a UrlFormationError only when the URL cannot be formed.

artifactRequestByFile :: NpmClientConfig -> PackageName -> Text -> Either UrlFormationError Request Source #

Build the artifact GET request addressing a tarball by its __preserved on-the-wire filename__, at {baseUrl}/{encoded-pkg}/-/{filename}.

The serve path fetches an artifact by the exact filename the client requested — the authoritative name for the bytes — rather than reconstructing it from (package, version) as artifactRequest does, so a registry whose tarball naming differs from the proxy's own convention still resolves. The filename is taken verbatim (the classifier has already passed it through the component-safety gate), and the package segment is the same scope-percent-encoded path artifactRequest uses. The request is marked non-decompressing for the same reason: a .tgz is opaque binary streamed byte-for-byte so its dist.integrity verifies. Exposed so the web layer can bracket it for bounded-memory streaming (see the module header).

Fails with a UrlFormationError only when the URL cannot be formed.

artifactRequestByUrl :: NpmClientConfig -> Text -> Either UrlFormationError Request Source #

Build the artifact GET request addressing a tarball at its __authoritative upstream location__ — the absolute url the projection preserved from the upstream's dist.tarball — rather than reconstructing it from (base, package, file).

The artifact location is server-chosen data, not a derivable fact: a registry may serve a version's tarball from a different host or a path the npm /-/ convention cannot rebuild (a separate CDN/files host, server-generated segments, a signed query string). Honouring the preserved location is what lets Écluse front those registries; the URL it fetches is the same one the served packument's dist.integrity is paired with, so the bytes still verify. The egress gate (tarballHostAllowed plus the resolved-IP recheck) decides whether that location may be fetched; this builder only forms the request once it is permitted. The NpmClientConfig's npmBaseUrl is unused here (the URL is absolute) but its Manager and token are not — the manager carries the trust context and the token the credential posture.

The request is marked non-decompressing for the same reason as artifactRequest: a .tgz is opaque binary streamed byte-for-byte. Fails with a UrlFormationError only when the url cannot be parsed into a request.

artifactFileUrl :: Text -> PackageName -> Text -> Either UrlFormationError Text Source #

The artifact (tarball) URL addressing a preserved filename: {baseUrl}/{encoded-name}/-/{encoded-filename}. The filename is the exact on-the-wire name (not {base}-{version}.tgz rebuilt from the coordinate), so the bytes are fetched by the name the client requested; it is percent-encoded as a single component (encodeComponent) so a once-decoded escape in it cannot reach the upstream raw. Exposed so the serve path can record the public artifact location on a mirror job (the same URL its public fetch targets).

Fails with a UrlFormationError only when the URL cannot be formed.

publishRequest :: NpmClientConfig -> PackageName -> ByteString -> Either UrlFormationError Request Source #

Build the publish PUT /{pkg} request: the body is the npm publish document (a packument carrying the version manifest and the base64 tarball under _attachments), already serialised by the caller. Carries the bearer token and a Content-Type: application/json header.

Fails with a UrlFormationError only when the URL cannot be formed; a genuine write fault (a non-2xx, non-409 status) is the PublishError that publishArtifact reports.

Publish-document assembly

npmPublishDocument Source #

Arguments

:: PackageName

The package being published.

-> Version

The version being published.

-> Text

The tarball's filename — the _attachments key and tarball file segment.

-> Maybe Text

The dist.integrity SRI string, if known (e.g. "sha512-…").

-> Maybe Text

The dist.shasum (SHA-1, hex), if known.

-> ByteString

The verified tarball bytes.

-> ByteString 

Assemble the npm publish document for one version from its verified tarball bytes — the serialised body publishRequest (hence publishArtifact) PUTs to /{pkg}.

The document is the npm PUT /{pkg} shape: the package name and a single-version versions map carrying the version manifest (name, version, and a dist with the integrity digests), dist-tags.latest pointed at that version, and the tarball itself base64-encoded under _attachments with its byte length. A managed npm registry (CodeArtifact, Artifact Registry, Verdaccio) recomputes the served dist.tarball location from the attachment, so the location is not carried.

The integrity digests written into dist are the caller's — the worker passes the serve-time-admitted digests it has already verified the bytes against — so the published manifest's integrity matches exactly the bytes attached. The tarball length is taken from the actual byte count, never a caller-declared size, so the attachment can never disagree with its own bytes.

This is the inverse of the read-side decode in Ecluse.Registry.Npm.Wire, which deliberately does not model _attachments: it is constructed only here, for the write.

Lower-level fetch (form- and validator-aware)

fetchMetadataForm :: NpmClientConfig -> MetadataForm -> Validators -> PackageName -> IO RegistryResponse Source #

Fetch a package's metadata in the requested MetadataForm, relaying any conditional-GET Validators. The bounded-read fetch used by the handle's fetchMetadata; the request pipeline calls this directly when it needs the full packument or wants to revalidate against an ETag.

The body is read chunk-by-chunk through boundedRead against the config's npmLimits, not buffered whole: a hostile or compromised upstream returning a body larger than maxBodyBytes is aborted fail-closed rather than exhausting memory (security.md invariant 4). A body within budget is returned whole (the metadata path projects the entire document); artifacts are the separate streaming concern, not bounded here. The request's Accept-Encoding: gzip still applies — http-client decompresses transparently under withResponse exactly as under httpLbs, so the cap bounds the decompressed bytes the proxy actually retains.

A body-size breach surfaces as a typed ResponseBoundExceeded exception carrying the LimitError, so the request pipeline's tryAny degrades the contribution to nothing — the fail-closed parse-failure path — rather than the projection layer ever seeing a truncated body. A request-building failure (an unformable URL) likewise surfaces as a typed UrlFormationError exception rather than a silent success: a misconfigured base URL is a programming/config fault on the read path, not a per-response condition the projection layer reports. (The write path instead returns an unformable URL as a PublishFault value, where the worker must choose retry vs. drop.)

Response-bound breach

newtype ResponseBoundExceeded Source #

Raised when an upstream metadata body breaches a Limits ceiling: the body-size guard here, or — surfaced through the same type by the serve pipeline — the version-count or nesting-depth guard.

Carries the LimitError (which ceiling, the observed value, and the cap), so the breach is diagnosable rather than collapsing into an opaque failure: the serve path logs it at the breach point before degrading the contribution to nothing. It is thrown fail-closed (never a truncated or partial body), so it surfaces to the fetch caller exactly as a parse failure would — the request pipeline's tryAny treats it as a degraded (missing) contribution.