| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Security
Description
Outbound-request and response-bound guards for the proxy's data plane.
Écluse builds outbound HTTP requests from two untrusted sources — __client-supplied
package identifiers (the request path) and upstream-supplied artifact
locations__ (a packument's dist.tarball) — and then parses whatever an upstream
returns. This module is the pure guard layer that keeps those steps from being
steered or exhausted by hostile input. It defends three boundaries:
- Where the proxy fetches.
isAllowedUpstreamHostrestricts outbound fetches to the configured upstream hosts, andisBlockedTargetrejects internal address ranges (cloud instance metadata, loopback, RFC1918) that the proxy's network position can otherwise reach. Together they are the SSRF gate: a target must be both on the allowlist and not an internal address. - How an upstream URL is derived.
upstreamUrlForbuilds an artifact/metadata URL from a configured base URL and an already-parsedPackageName, never from raw client path segments, re-checking each name component with the router's own safety rule so traversal, encoded slashes, or an absolute URL cannot change the target. - How much an upstream may cost. A
Limitsbudget plusboundedRead(abort a streamed body pastmaxBodyBytes) andcheckVersionCount/checkNestingDepth(reject an oversized or deeply-nested parsed document) bound algorithmic-complexity DoS from a hostile or compromised upstream. Every limit fails closed: exceeding one yieldsLeft, never a truncated or partial result.
The functions are pure and total; the streamed-body guard (boundedRead) is
polymorphic over the producing monad so the streaming data plane can run it in
IO while tests drive it purely. They are primitives: the fetch and serve
layers compose them at the boundary (see docs/architecture/registry-model.md
→ "Registry Abstraction", docs/architecture/web-layer.md, and
docs/architecture/hosting.md → "URL rewriting"). Path-component safety is
shared with the router's Ecluse.Server.Route (isSafeComponent); the threat
model these guards answer is recorded there too.
Synopsis
- data LoweredHostSet
- lowerCaseHosts :: Set Text -> LoweredHostSet
- isAllowedUpstreamHost :: LoweredHostSet -> Text -> Bool
- isBlockedTarget :: LoweredHostSet -> Text -> Bool
- isBlockedIP :: IP -> Bool
- hostOptedIn :: LoweredHostSet -> Text -> Bool
- hostAddress :: Text -> Text
- splitHostPort :: Text -> Maybe (Text, Text)
- data TarballHostPolicy
- data Origin
- tarballHostAllowed :: Origin -> TarballHostPolicy -> LoweredHostSet -> LoweredHostSet -> Text -> Text -> Bool
- upstreamUrlFor :: Text -> PackageName -> Either UrlError Text
- data UrlError
- data Limits = Limits {}
- defaultLimits :: Limits
- data LimitError
- boundedRead :: Monad m => Limits -> m ByteString -> m (Either LimitError ByteString)
- checkVersionCount :: Limits -> PackageInfo -> Either LimitError PackageInfo
- checkNestingDepth :: Limits -> Value -> Either LimitError Value
Outbound host allowlist
data LoweredHostSet Source #
A set of host strings normalised to lower case, the form the host guards
(isAllowedUpstreamHost and isBlockedTarget) compare against.
The type is opaque, and lowerCaseHosts is its only constructor: a value of
this type therefore carries the proof that every host in it is already
lower-cased, so the guards lower only the incoming host and the case-insensitive
match cannot be bypassed by an un-normalised configuration set.
Instances
| Show LoweredHostSet Source # | |
Defined in Ecluse.Security Methods showsPrec :: Int -> LoweredHostSet -> ShowS # show :: LoweredHostSet -> String # showList :: [LoweredHostSet] -> ShowS # | |
| Eq LoweredHostSet Source # | |
Defined in Ecluse.Security Methods (==) :: LoweredHostSet -> LoweredHostSet -> Bool # (/=) :: LoweredHostSet -> LoweredHostSet -> Bool # | |
lowerCaseHosts :: Set Text -> LoweredHostSet Source #
Normalise a set of configured host strings to the canonical key form the host
guards take, yielding a LoweredHostSet.
A plain DNS name is folded to lower case (hostnames are case-insensitive), so the
guards match an incoming host against the configuration regardless of how either
was spelled. An entry that parses as an IP literal is additionally rendered to
the single canonical literal the resolved-address recheck produces (see
canonicalHostKey), so equivalent spellings of one address — compressed versus
expanded IPv6, differing case — collapse to one key. An operator who opts in
0:0:0:0:0:0:0:1 therefore matches a resolved ::1 rather than missing it on a
textual difference.
isAllowedUpstreamHost :: LoweredHostSet -> Text -> Bool Source #
Whether host is one of the configured upstream hosts.
The first guard on every outbound fetch: the proxy talks to its configured
private/public upstreams and mirror target, and nothing else — so a target
host derived from a packument's dist.tarball (or anywhere else) is fetched only
if it appears in allowed. The match is exact on the bare host (no port, no
scheme — extract it with hostAddress first) and case-insensitive, since
DNS hostnames are; an empty host is never allowed. This is the allowlist half
of the SSRF gate; pair it with isBlockedTarget for the internal-range half.
The allowlist is a LoweredHostSet, so it is already normalised and only the
incoming host is folded here — through the same canonicalHostKey the set was
built with, so an IP-literal entry matches regardless of how either side spells the
address.
Internal-range block
isBlockedTarget :: LoweredHostSet -> Text -> Bool Source #
Whether host is an internal address the proxy must not fetch, unless it
is explicitly opted in.
A proxy sits in a privileged network position, so an attacker who can steer a
fetch (see the module header) aims it at addresses only the proxy can reach: the
cloud instance-metadata endpoint (169.254.169.254), loopback, or the private
network (RFC1918). This blocks, by parsing host as a literal IP and testing it
against:
- link-local
169.254.0.0/16(which contains the169.254.169.254metadata address) and IPv6fe80::/10; - loopback
127.0.0.0/8and IPv6::1; - unspecified / this-host
0.0.0.0/8and IPv6::—0.0.0.0is not a no-op target: on Linux a connect to it reaches a loopback-bound service, so it is a loopback-equivalent that must be blocked alongside127.0.0.0/8; - RFC1918 private
10.0.0.0/8,172.16.0.0/12, and192.168.0.0/16; - CGNAT shared
100.64.0.0/10(RFC 6598) — carrier-grade NAT space some cloud fabrics route internally; - IPv6 unique-local
fc00::/7(RFC 4193) — the private-network IPv6 analogue, which contains the AWS IMDSv6 metadata endpointfd00:ec2::254.
A host in allowedInternal is never blocked (matched case-insensitively, as
DNS and the host allowlist are) — the deliberate opt-in for a private upstream that
genuinely lives on an internal address. As a LoweredHostSet it is already
lower-cased, so only the incoming host is folded for the comparison. A host
that is not an IP literal (a DNS name) is not blocked here: name-based targets
are constrained by the isAllowedUpstreamHost allowlist instead, and
post-resolution IP filtering belongs to the resolving fetch layer, not this pure
check. Both guards apply — an allowlisted host that resolves to an internal literal
is still caught when its address is tested here.
isBlockedIP :: IP -> Bool Source #
Whether an IP falls in a blocked internal range.
The single source of record for the internal-range decision, shared by the
literal block here (isBlockedTarget) and the resolved-address recheck in
Ecluse.Security.Egress so both gate against identical ranges. An
IPv4-mapped IPv6 address (::ffff:a.b.c.d) is first decoded to its embedded IPv4
and tested against the IPv4 ranges: a mapped internal literal (e.g.
::ffff:169.254.169.254) is a recognised SSRF smuggling form, so it must be
caught by the IPv4 block rather than slip through as an unrelated IPv6 address.
hostOptedIn :: LoweredHostSet -> Text -> Bool Source #
Whether host is opted in to the internal-range block — the deliberate
exemption for a private upstream that genuinely lives on an internal address.
The opt-in half of isBlockedTarget, shared with the resolved-address recheck in
Ecluse.Security.Egress so the literal block and the connection-time block honour
the exemption identically. The match folds case (as DNS and the host allowlist do)
and, for an IP-literal, collapses equivalent spellings to one canonical key (see
canonicalHostKey): the allowedInternal set was built with that same key, so an
opt-in matches a resolved or literal address whichever representation either uses.
hostAddress :: Text -> Text Source #
Extract the bare host from a URI or host[:port] authority.
A convenience for the SSRF gate: an outbound target is usually a full URL or an
authority, but isAllowedUpstreamHost and isBlockedTarget compare the bare
host. This strips a scheme:// prefix, any userinfo@, any :port suffix,
and any /path/?query/#fragment tail, lower-casing the result. It is a
pragmatic extractor for comparison, not a full RFC 3986 parser; a value with
no recognisable host yields the empty string, which both guards treat as
not-allowed. IPv6 literals in brackets ([::1]:443) are returned without the
brackets — the bracket-aware host[:port] split is splitHostPort, shared with
the SQS endpoint parser so the two cannot drift on an authority edge case; a
malformed authority (an opening bracket with no close) yields the empty string,
the same fail-safe the guards apply to it.
splitHostPort :: Text -> Maybe (Text, Text) Source #
Split a host[:port] authority into its bare host and the raw ":port"
remainder (empty when no port is present), bracket-aware so an IPv6 literal's
inner colons are never mistaken for the port separator.
The single canonical authority split feeding both the data-plane host extractor
(hostAddress) and the SQS endpoint parser (parseEndpointUrl),
so the two re-implementations the [::1]:port edge cases tripped on cannot drift
again. A […] IPv6 literal is split on its closing bracket — the host is returned
without the brackets and the remainder is whatever follows (a ":port" or empty) —
so an inner :: is never read as the port separator; a bare authority is split on
its first . An opening bracket with no close is a malformed authority and
yields :Nothing, which hostAddress folds to the empty (not-allowed) host and the
endpoint parser surfaces as a malformed-URL boot error.
Tarball-host policy
data TarballHostPolicy Source #
Whether a tarball may be fetched from a host that differs from the upstream that served the packument.
An upstream's dist.tarball is server-chosen data (see
docs/architecture/security.md → "Why dist.tarball is honoured"), so a
compromised or hostile upstream can name any host as the artifact location.
This policy bounds the third axis of that risk — where the bytes are fetched —
that the host allowlist and the resolved-IP block leave open: even an
allowlisted-but-different host is a wider fetch surface than the packument's own
source, and the safe reading of the allowlist is "same source unless told
otherwise".
Constructors
| SameHostAsPackument | The secure default: a tarball is fetched only from the same host
that served the packument; a |
| AnyAllowlistedHost | The opt-in: a tarball may be fetched from any allowlisted host (for a registry that legitimately serves artifacts from a separate CDN/files host). This widens the fetch surface to the whole allowlist; it never escapes it or the internal-range block. |
Instances
| Show TarballHostPolicy Source # | |
Defined in Ecluse.Security Methods showsPrec :: Int -> TarballHostPolicy -> ShowS # show :: TarballHostPolicy -> String # showList :: [TarballHostPolicy] -> ShowS # | |
| Eq TarballHostPolicy Source # | |
Defined in Ecluse.Security Methods (==) :: TarballHostPolicy -> TarballHostPolicy -> Bool # (/=) :: TarballHostPolicy -> TarballHostPolicy -> Bool # | |
The trust of the origin a dist.tarball is being served from, mirroring the
connection-layer trust split (see Ecluse.Security.Egress): the operator-configured
private upstream is TrustedOrigin, and the public upstream — together with every
artifact location an attacker could influence — is UntrustedOrigin.
The distinction governs the internal-range block alone. The trusted private
origin is deliberately exempt from it (a private registry may legitimately live on
an internal address, and only an untrusted target can be steered there), exactly as
the trusted origin's connections use the unguarded newTrustedTlsManager
while untrusted ones carry the resolved-IP recheck of
newGuardedTlsManager (security.md invariant 3). It never
relaxes the host allowlist or the same-host clause — those gate both origins
identically — so a trusted origin's dist.tarball is still constrained to its own
allowlisted host.
Constructors
| TrustedOrigin | The operator-configured private upstream: exempt from the internal-range block. |
| UntrustedOrigin | The public upstream, and any attacker-influenceable target: subject to the internal-range block (and the resolved-IP recheck at connect time). |
Arguments
| :: Origin | |
| -> TarballHostPolicy | |
| -> LoweredHostSet | The host allowlist (the same one every outbound fetch is gated by). |
| -> LoweredHostSet | The hosts deliberately opted in to the internal-range block (untrusted origin). |
| -> Text | The bare host that served the packument. |
| -> Text | The bare host of the candidate |
| -> Bool |
Whether a dist.tarball host may be fetched, given the origin's trust, the
policy, the host that served the packument, and the configured guards.
This is the policy half of the dist.tarball defence; it never replaces the host
allowlist or the internal-range block but composes on top of them, so the
answer is the conjunction of three independent checks and over-blocking is the
fail-safe:
- the
tarballHostmust be on the host allowlist (allowed), as every outbound target is — adist.tarballhost off the allowlist is refused regardless of policy; - it must not be an internal address (subject to the per-host
allowedInternalopt-in), as every untrusted outbound target is — but aTrustedOriginis exempt from this clause (its connections likewise carry no resolved-IP recheck; seeOriginandsecurity.mdinvariant 3); and - under
SameHostAsPackument(the secure default) it must additionally equal thepackumentHost— the host that served the metadata — so a tarball on a different host is refused even when that host is allowlisted. UnderAnyAllowlistedHostthat last clause is relaxed, leaving only the allowlist and (origin-aware) internal-range checks.
The allowlist and same-host clauses gate both origins identically; only the
internal-range clause is origin-aware, so a TrustedOrigin is never let past its own
allowlisted host or onto a different host than its metadata under the default.
Hosts are compared by their canonical key (case-folded, and for an IP-literal the
single canonical literal — see canonicalHostKey), as the host guards are. An
empty tarballHost is never allowed (the allowlist already refuses it). The
packumentHost is the bare host the metadata was fetched from (extract it with
hostAddress); only its equality to tarballHost matters, so it need not itself
be re-validated here — it was already gated when the packument was fetched.
Identifier → URL safety
upstreamUrlFor :: Text -> PackageName -> Either UrlError Text Source #
Build an upstream URL for a package from a configured base URL and an
already-parsed PackageName.
This is the only sanctioned way to derive an upstream URL for a package: the
target is {baseUrl}/{path}, where path is built from the package's structural
components and baseUrl is configuration, never a client-supplied path. The
client never chooses the host or the path prefix — only which (validated) package
— so ../ traversal, an encoded slash, an absolute URL, or a CRLF in the original
request cannot steer the fetch elsewhere (see the module header).
The path is built with two complementary defences. First, although a PackageName
is normally produced by the router's already-safe parse, its smart constructor does
no validation, so this re-checks every structural component (scope and base
name) with the router's own isSafeComponent — a name carrying
a '/', '\\', control character, or a "."/".." component is refused
with UnsafeComponent rather than interpolated. Second, each accepted component is
then percent-encoded (encodeComponent) around the
structural '@' sigil and %2F scope separator this builder writes — so a
'%', '?', '#', or other reserved byte the denylist accepts (notably a
once-decoded %2e%2e%2f) cannot reach the upstream URL raw. A scoped
@scope/name therefore yields exactly one %2F (the separator written here, not
an encoding of a component), with no double-encoding. An empty baseUrl is refused
with EmptyBaseUrl. A single trailing slash on baseUrl is tolerated so the join
never doubles it.
Why building an upstream URL from an identifier was refused.
Constructors
| UnsafeComponent Text | A name component (scope or base name) is unsafe to interpolate — see
|
| EmptyBaseUrl | The configured base URL is empty, so no URL can be formed. |
Response bounds
Resource budget for a single upstream response. Every field is a hard
ceiling enforced fail-closed: exceeding one aborts with a LimitError rather
than returning a truncated or partially-parsed result. These bound the
algorithmic-complexity DoS a hostile or compromised upstream can inflict by
returning a huge or pathological document.
Constructors
| Limits | |
Fields
| |
defaultLimits :: Limits Source #
Sane defaults for Limits. Generous enough for real registry documents and
tight enough to fail closed on pathological input: a 16 MiB metadata body, 100k
versions, and 64 levels of JSON nesting. Override per deployment as needed.
data LimitError Source #
Which Limits ceiling a response exceeded.
Constructors
| BodyTooLarge Int | The body exceeded |
| TooManyVersions Int Int | The packument carried more than |
| TooDeeplyNested Int | JSON nesting exceeded |
Instances
| Show LimitError Source # | |
Defined in Ecluse.Security Methods showsPrec :: Int -> LimitError -> ShowS # show :: LimitError -> String # showList :: [LimitError] -> ShowS # | |
| Eq LimitError Source # | |
Defined in Ecluse.Security | |
boundedRead :: Monad m => Limits -> m ByteString -> m (Either LimitError ByteString) Source #
Read a streamed body chunk-by-chunk, aborting as soon as the accumulated
size would exceed maxBodyBytes. Polymorphic over the producing monad so the
streaming fetch can run it in IO while tests drive it purely.
readChunk is a chunk producer following the http-client BodyReader contract:
each call yields the next chunk, and an empty ByteString signals end of
input. boundedRead pulls chunks until EOF and returns the concatenated body, or
stops at the first chunk that pushes the running total past maxBodyBytes and
returns — fail-closed, never a truncated body. A
zero or negative Left (BodyTooLarge …)maxBodyBytes rejects any non-empty body. The bound is checked
before a chunk is retained, so memory never exceeds the limit plus one chunk.
checkVersionCount :: Limits -> PackageInfo -> Either LimitError PackageInfo Source #
Reject a parsed packument carrying more than maxVersionCount versions,
returning it unchanged when within budget.
Applied after a document is projected to PackageInfo but before
per-version rule evaluation, so the cost of evaluating rules over every version is
bounded by configuration rather than by what an upstream returns. Counts the
infoVersions map; on breach returns , otherwise the document unchanged so it threads through a parse
pipeline.Left (TooManyVersions
count cap)
checkNestingDepth :: Limits -> Value -> Either LimitError Value Source #
Reject a decoded JSON document nested deeper than maxNestingDepth,
returning it unchanged when within budget.
Run on the already-decoded Value — after the parser has produced it, before
the document is projected to domain types — so a pathologically nested payload is
refused before any deep domain traversal. It is therefore not the defence
against an unbounded structure: the structure is already bounded-by-body-size by
the time it reaches here, since the maxBodyBytes cap on the streamed read precedes
the decode (a body the parser never finishes reading never produces a Value). This
guard bounds the traversal cost of a within-size-but-deeply-nested document — the
stack/CPU a recursive walk of it would spend — which the body cap alone does not
bound (a small body can still nest deeply). Depth counts container nesting: a scalar
is depth 1, and each enclosing Object/Array adds one. An empty container
counts as a leaf (depth 1), since it forces no descent. Traversal short-circuits at
the first sub-tree to breach the ceiling, so a deeply-nested branch costs no more than
the ceiling to reject.