| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Server.Route
Description
The shared serve-action vocabulary of the front door, and the agnostic default router.
A Route is one classified request — everything the proxy is willing to serve,
named independently of any ecosystem's URL grammar. The actions are common
across registries (fetch a packument, stream a tarball, answer a liveness probe,
deny a search); only the URL→action mapping is ecosystem-specific. That mapping
is a Classifier, injected at the composition root, so this module stays free of
any one ecosystem's path conventions while the dispatcher routes through whatever
classifier its mount carries.
The model is deny by default, mirroring the rules engine (Ecluse.Rules):
the agnostic default denyAll classifies every path as Unsupported (a 404 at
the edge), so a deployment that wires no ecosystem router serves nothing rather
than guessing. An ecosystem adapter supplies a Classifier that recognises its
own paths and falls back to Unsupported for the rest.
Route is a small sum so the whole routing table is unit-testable with __no
server__: feed a Classifier some segments, assert the Route.
Synopsis
- data Route
- newtype Filename = Filename Text
- type Classifier = [Text] -> Route
- denyAll :: Classifier
- isSafeComponent :: Text -> Bool
- encodeComponent :: Text -> Text
Routes
A classified request. Everything the front door is willing to serve is one
of these; an unrecognised path is Unsupported (deny by default).
The constructors are the proxy's actions, shared across ecosystems — the
artifact a Tarball streams and the metadata a Packument merges are the same
serve behaviour whether the upstream is npm, PyPI, or another registry. Only the
mapping from a request path to one of these (a Classifier) is
ecosystem-specific.
Constructors
| Packument PackageName | A package-metadata request — the packument. |
| Tarball PackageName Version Filename | An artifact request, as a parsed coordinate: the package, the
|
| Ping | A registry liveness probe, answered locally. |
| Search | Package search (unsupported). |
| Unsupported | Anything unrecognised. Renders as a |
An artifact's on-the-wire file name, the agnostic artifact-name type a
Tarball route carries.
It is held as a distinct type, not a bare Text, because it is __authoritative
for fetching the bytes__: the proxy fetches an artifact at the upstream path built
from this exact name, never one reconstructed from (package, version), so that a
registry whose artifact naming differs from the proxy's own convention still
resolves. The name is preserved verbatim as received; the classifier that produces
it has already applied the component-safety gate (isSafeComponent), so the value
is safe to interpolate into a downstream URL.
Instances
Classification
type Classifier = [Text] -> Route Source #
The mapping from an ecosystem-native request path to a Route.
A classifier sees the already-mount-stripped, percent-decoded path segments and returns the serve action. Each ecosystem adapter contributes its own — recognising its path grammar and denying everything else — so the agnostic dispatcher stays closed while every mount routes through its ecosystem's template. Dispatch chooses the classifier per matched mount (see Ecluse.Server), so the same shape carries either a single ecosystem or a mount-keyed selection.
denyAll :: Classifier Source #
The agnostic default classifier: every path is Unsupported.
This is the deny-by-default base a deployment runs with until a composition root wires an ecosystem's classifier in, so an unwired server serves nothing rather than guessing a grammar. It deliberately knows no path conventions of its own.
Component safety
isSafeComponent :: Text -> Bool Source #
Whether a single decoded path component is safe to interpolate into a downstream upstream URL — the deny-by-default gate a classifier applies to every component it accepts (a scope, base name, or tarball filename).
The path is percent-decoded before it reaches us, so a single segment can carry a
'/', a '\\', a control character, or be "."/".."; any of these
enables path traversal or request smuggling once the name reaches the upstream
URL. A component is UNSAFE iff it is empty, is exactly "." or "..", or
contains a '/', a '\\', or any isControl character. Everything else
is accepted: this is a security boundary, not an ecosystem-policy validator,
so ordinary names with interior dots (lodash.merge, is.odd), hyphens,
underscores, digits, or uppercase all pass.
It lives in the agnostic layer because the threat — interpolating a hostile segment into an upstream URL — is ecosystem-independent; both an ecosystem's path classifier and the defence-in-depth check in Ecluse.Security share this one rule.
This gate is structural: it stops a component that would change the upstream
URL's shape (a traversal, an embedded separator, a control character). It does
not stop a component that carries other URL-reserved bytes — a '%',
'?', '#', ';', or a space — which an accepted name can still hold
(notably a once-decoded segment carrying a literal %2e%2e%2f). Those are
neutralised not by widening this denylist but by percent-encoding every accepted
component with encodeComponent when the upstream URL is built, so the safety of
an interpolated component rests on encode-on-build, not on this gate alone.
encodeComponent :: Text -> Text Source #
Percent-encode a single decoded path component for safe interpolation
into an upstream URL — the encode-on-build partner of isSafeComponent.
A component is the content between a URL's structural delimiters (a scope, base
name, or filename), never the delimiters themselves, so this encodes
conservatively: it keeps only the RFC 3986 unreserved set
(A-Z, a-z, 0-9, and '-', '.', '_', '~') verbatim and
percent-encodes every other byte of the component's UTF-8 encoding as
%XX (upper-case hex). A caller composing a path therefore writes the structural
'/', scope %2F, '@' sigil, and the like itself, around encoded
components — so a '%', '/', '?', '#', ';', space, or control
byte inside a component cannot alter the URL's shape, inject a query or fragment,
or — the once-decoded %2e%2e%2f case — survive as a live escape a
decode-and-normalise upstream could resolve to traversal.
Encoding is per-byte over the UTF-8 form, so a multi-byte character is encoded one
%XX per byte ('é' → %C3%A9). It does not encode an already-percent-encoded
escape idempotently — a literal '%' is always re-encoded to %25 — which is the
point: the component is decoded content, so any '%' in it is a literal to be
escaped, not a structural escape to preserve.