ecluse
Safe HaskellNone
LanguageGHC2021

Ecluse.Security

Description

Outbound-request and response-bound guards for the proxy's data plane.

Écluse builds outbound HTTP requests from two untrusted sources — __client-supplied package identifiers (the request path) and upstream-supplied artifact locations__ (a packument's dist.tarball) — and then parses whatever an upstream returns. This module is the pure guard layer that keeps those steps from being steered or exhausted by hostile input. It defends three boundaries:

  • Where the proxy fetches. isAllowedUpstreamHost restricts outbound fetches to the configured upstream hosts, and isBlockedTarget rejects internal address ranges (cloud instance metadata, loopback, RFC1918) that the proxy's network position can otherwise reach. Together they are the SSRF gate: a target must be both on the allowlist and not an internal address.
  • How an upstream URL is derived. upstreamUrlFor builds an artifact/metadata URL from a configured base URL and an already-parsed PackageName, never from raw client path segments, re-checking each name component with the router's own safety rule so traversal, encoded slashes, or an absolute URL cannot change the target.
  • How much an upstream may cost. A Limits budget plus boundedRead (abort a streamed body past maxBodyBytes) and checkVersionCount / checkNestingDepth (reject an oversized or deeply-nested parsed document) bound algorithmic-complexity DoS from a hostile or compromised upstream. Every limit fails closed: exceeding one yields Left, never a truncated or partial result.

The functions are pure and total; the streamed-body guard (boundedRead) is polymorphic over the producing monad so the streaming data plane can run it in IO while tests drive it purely. They are primitives: the fetch and serve layers compose them at the boundary (see docs/architecture/registry-model.md → "Registry Abstraction", docs/architecture/web-layer.md, and docs/architecture/hosting.md → "URL rewriting"). Path-component safety is shared with the router's Ecluse.Server.Route (isSafeComponent); the threat model these guards answer is recorded there too.

Synopsis

Outbound host allowlist

data LoweredHostSet Source #

A set of host strings normalised to lower case, the form the host guards (isAllowedUpstreamHost and isBlockedTarget) compare against.

The type is opaque, and lowerCaseHosts is its only constructor: a value of this type therefore carries the proof that every host in it is already lower-cased, so the guards lower only the incoming host and the case-insensitive match cannot be bypassed by an un-normalised configuration set.

Instances

Instances details
Show LoweredHostSet Source # 
Instance details

Defined in Ecluse.Security

Eq LoweredHostSet Source # 
Instance details

Defined in Ecluse.Security

lowerCaseHosts :: Set Text -> LoweredHostSet Source #

Normalise a set of configured host strings to the canonical key form the host guards take, yielding a LoweredHostSet.

A plain DNS name is folded to lower case (hostnames are case-insensitive), so the guards match an incoming host against the configuration regardless of how either was spelled. An entry that parses as an IP literal is additionally rendered to the single canonical literal the resolved-address recheck produces (see canonicalHostKey), so equivalent spellings of one address — compressed versus expanded IPv6, differing case — collapse to one key. An operator who opts in 0:0:0:0:0:0:0:1 therefore matches a resolved ::1 rather than missing it on a textual difference.

isAllowedUpstreamHost :: LoweredHostSet -> Text -> Bool Source #

Whether host is one of the configured upstream hosts.

The first guard on every outbound fetch: the proxy talks to its configured private/public upstreams and mirror target, and nothing else — so a target host derived from a packument's dist.tarball (or anywhere else) is fetched only if it appears in allowed. The match is exact on the bare host (no port, no scheme — extract it with hostAddress first) and case-insensitive, since DNS hostnames are; an empty host is never allowed. This is the allowlist half of the SSRF gate; pair it with isBlockedTarget for the internal-range half.

The allowlist is a LoweredHostSet, so it is already normalised and only the incoming host is folded here — through the same canonicalHostKey the set was built with, so an IP-literal entry matches regardless of how either side spells the address.

Internal-range block

isBlockedTarget :: LoweredHostSet -> Text -> Bool Source #

Whether host is an internal address the proxy must not fetch, unless it is explicitly opted in.

A proxy sits in a privileged network position, so an attacker who can steer a fetch (see the module header) aims it at addresses only the proxy can reach: the cloud instance-metadata endpoint (169.254.169.254), loopback, or the private network (RFC1918). This blocks, by parsing host as a literal IP and testing it against:

  • link-local 169.254.0.0/16 (which contains the 169.254.169.254 metadata address) and IPv6 fe80::/10;
  • loopback 127.0.0.0/8 and IPv6 ::1;
  • unspecified / this-host 0.0.0.0/8 and IPv6 ::0.0.0.0 is not a no-op target: on Linux a connect to it reaches a loopback-bound service, so it is a loopback-equivalent that must be blocked alongside 127.0.0.0/8;
  • RFC1918 private 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16;
  • CGNAT shared 100.64.0.0/10 (RFC 6598) — carrier-grade NAT space some cloud fabrics route internally;
  • IPv6 unique-local fc00::/7 (RFC 4193) — the private-network IPv6 analogue, which contains the AWS IMDSv6 metadata endpoint fd00:ec2::254.

A host in allowedInternal is never blocked (matched case-insensitively, as DNS and the host allowlist are) — the deliberate opt-in for a private upstream that genuinely lives on an internal address. As a LoweredHostSet it is already lower-cased, so only the incoming host is folded for the comparison. A host that is not an IP literal (a DNS name) is not blocked here: name-based targets are constrained by the isAllowedUpstreamHost allowlist instead, and post-resolution IP filtering belongs to the resolving fetch layer, not this pure check. Both guards apply — an allowlisted host that resolves to an internal literal is still caught when its address is tested here.

isBlockedIP :: IP -> Bool Source #

Whether an IP falls in a blocked internal range.

The single source of record for the internal-range decision, shared by the literal block here (isBlockedTarget) and the resolved-address recheck in Ecluse.Security.Egress so both gate against identical ranges. An IPv4-mapped IPv6 address (::ffff:a.b.c.d) is first decoded to its embedded IPv4 and tested against the IPv4 ranges: a mapped internal literal (e.g. ::ffff:169.254.169.254) is a recognised SSRF smuggling form, so it must be caught by the IPv4 block rather than slip through as an unrelated IPv6 address.

hostOptedIn :: LoweredHostSet -> Text -> Bool Source #

Whether host is opted in to the internal-range block — the deliberate exemption for a private upstream that genuinely lives on an internal address.

The opt-in half of isBlockedTarget, shared with the resolved-address recheck in Ecluse.Security.Egress so the literal block and the connection-time block honour the exemption identically. The match folds case (as DNS and the host allowlist do) and, for an IP-literal, collapses equivalent spellings to one canonical key (see canonicalHostKey): the allowedInternal set was built with that same key, so an opt-in matches a resolved or literal address whichever representation either uses.

hostAddress :: Text -> Text Source #

Extract the bare host from a URI or host[:port] authority.

A convenience for the SSRF gate: an outbound target is usually a full URL or an authority, but isAllowedUpstreamHost and isBlockedTarget compare the bare host. This strips a scheme:// prefix, any userinfo@, any :port suffix, and any /path/?query/#fragment tail, lower-casing the result. It is a pragmatic extractor for comparison, not a full RFC 3986 parser; a value with no recognisable host yields the empty string, which both guards treat as not-allowed. IPv6 literals in brackets ([::1]:443) are returned without the brackets — the bracket-aware host[:port] split is splitHostPort, shared with the SQS endpoint parser so the two cannot drift on an authority edge case; a malformed authority (an opening bracket with no close) yields the empty string, the same fail-safe the guards apply to it.

splitHostPort :: Text -> Maybe (Text, Text) Source #

Split a host[:port] authority into its bare host and the raw ":port" remainder (empty when no port is present), bracket-aware so an IPv6 literal's inner colons are never mistaken for the port separator.

The single canonical authority split feeding both the data-plane host extractor (hostAddress) and the SQS endpoint parser (parseEndpointUrl), so the two re-implementations the [::1]:port edge cases tripped on cannot drift again. A […] IPv6 literal is split on its closing bracket — the host is returned without the brackets and the remainder is whatever follows (a ":port" or empty) — so an inner :: is never read as the port separator; a bare authority is split on its first :. An opening bracket with no close is a malformed authority and yields Nothing, which hostAddress folds to the empty (not-allowed) host and the endpoint parser surfaces as a malformed-URL boot error.

Tarball-host policy

data TarballHostPolicy Source #

Whether a tarball may be fetched from a host that differs from the upstream that served the packument.

An upstream's dist.tarball is server-chosen data (see docs/architecture/security.md → "Why dist.tarball is honoured"), so a compromised or hostile upstream can name any host as the artifact location. This policy bounds the third axis of that risk — where the bytes are fetched — that the host allowlist and the resolved-IP block leave open: even an allowlisted-but-different host is a wider fetch surface than the packument's own source, and the safe reading of the allowlist is "same source unless told otherwise".

Constructors

SameHostAsPackument

The secure default: a tarball is fetched only from the same host that served the packument; a dist.tarball on any other host is refused, even one otherwise on the allowlist.

AnyAllowlistedHost

The opt-in: a tarball may be fetched from any allowlisted host (for a registry that legitimately serves artifacts from a separate CDN/files host). This widens the fetch surface to the whole allowlist; it never escapes it or the internal-range block.

data Origin Source #

The trust of the origin a dist.tarball is being served from, mirroring the connection-layer trust split (see Ecluse.Security.Egress): the operator-configured private upstream is TrustedOrigin, and the public upstream — together with every artifact location an attacker could influence — is UntrustedOrigin.

The distinction governs the internal-range block alone. The trusted private origin is deliberately exempt from it (a private registry may legitimately live on an internal address, and only an untrusted target can be steered there), exactly as the trusted origin's connections use the unguarded newTrustedTlsManager while untrusted ones carry the resolved-IP recheck of newGuardedTlsManager (security.md invariant 3). It never relaxes the host allowlist or the same-host clause — those gate both origins identically — so a trusted origin's dist.tarball is still constrained to its own allowlisted host.

Constructors

TrustedOrigin

The operator-configured private upstream: exempt from the internal-range block.

UntrustedOrigin

The public upstream, and any attacker-influenceable target: subject to the internal-range block (and the resolved-IP recheck at connect time).

Instances

Instances details
Show Origin Source # 
Instance details

Defined in Ecluse.Security

Eq Origin Source # 
Instance details

Defined in Ecluse.Security

Methods

(==) :: Origin -> Origin -> Bool #

(/=) :: Origin -> Origin -> Bool #

tarballHostAllowed Source #

Arguments

:: Origin 
-> TarballHostPolicy 
-> LoweredHostSet

The host allowlist (the same one every outbound fetch is gated by).

-> LoweredHostSet

The hosts deliberately opted in to the internal-range block (untrusted origin).

-> Text

The bare host that served the packument.

-> Text

The bare host of the candidate dist.tarball.

-> Bool 

Whether a dist.tarball host may be fetched, given the origin's trust, the policy, the host that served the packument, and the configured guards.

This is the policy half of the dist.tarball defence; it never replaces the host allowlist or the internal-range block but composes on top of them, so the answer is the conjunction of three independent checks and over-blocking is the fail-safe:

  • the tarballHost must be on the host allowlist (allowed), as every outbound target is — a dist.tarball host off the allowlist is refused regardless of policy;
  • it must not be an internal address (subject to the per-host allowedInternal opt-in), as every untrusted outbound target is — but a TrustedOrigin is exempt from this clause (its connections likewise carry no resolved-IP recheck; see Origin and security.md invariant 3); and
  • under SameHostAsPackument (the secure default) it must additionally equal the packumentHost — the host that served the metadata — so a tarball on a different host is refused even when that host is allowlisted. Under AnyAllowlistedHost that last clause is relaxed, leaving only the allowlist and (origin-aware) internal-range checks.

The allowlist and same-host clauses gate both origins identically; only the internal-range clause is origin-aware, so a TrustedOrigin is never let past its own allowlisted host or onto a different host than its metadata under the default.

Hosts are compared by their canonical key (case-folded, and for an IP-literal the single canonical literal — see canonicalHostKey), as the host guards are. An empty tarballHost is never allowed (the allowlist already refuses it). The packumentHost is the bare host the metadata was fetched from (extract it with hostAddress); only its equality to tarballHost matters, so it need not itself be re-validated here — it was already gated when the packument was fetched.

Identifier → URL safety

upstreamUrlFor :: Text -> PackageName -> Either UrlError Text Source #

Build an upstream URL for a package from a configured base URL and an already-parsed PackageName.

This is the only sanctioned way to derive an upstream URL for a package: the target is {baseUrl}/{path}, where path is built from the package's structural components and baseUrl is configuration, never a client-supplied path. The client never chooses the host or the path prefix — only which (validated) package — so ../ traversal, an encoded slash, an absolute URL, or a CRLF in the original request cannot steer the fetch elsewhere (see the module header).

The path is built with two complementary defences. First, although a PackageName is normally produced by the router's already-safe parse, its smart constructor does no validation, so this re-checks every structural component (scope and base name) with the router's own isSafeComponent — a name carrying a '/', '\\', control character, or a "."/".." component is refused with UnsafeComponent rather than interpolated. Second, each accepted component is then percent-encoded (encodeComponent) around the structural '@' sigil and %2F scope separator this builder writes — so a '%', '?', '#', or other reserved byte the denylist accepts (notably a once-decoded %2e%2e%2f) cannot reach the upstream URL raw. A scoped @scope/name therefore yields exactly one %2F (the separator written here, not an encoding of a component), with no double-encoding. An empty baseUrl is refused with EmptyBaseUrl. A single trailing slash on baseUrl is tolerated so the join never doubles it.

data UrlError Source #

Why building an upstream URL from an identifier was refused.

Constructors

UnsafeComponent Text

A name component (scope or base name) is unsafe to interpolate — see isSafeComponent. Carries the offending component.

EmptyBaseUrl

The configured base URL is empty, so no URL can be formed.

Instances

Instances details
Show UrlError Source # 
Instance details

Defined in Ecluse.Security

Eq UrlError Source # 
Instance details

Defined in Ecluse.Security

Response bounds

data Limits Source #

Resource budget for a single upstream response. Every field is a hard ceiling enforced fail-closed: exceeding one aborts with a LimitError rather than returning a truncated or partially-parsed result. These bound the algorithmic-complexity DoS a hostile or compromised upstream can inflict by returning a huge or pathological document.

Constructors

Limits 

Fields

Instances

Instances details
Show Limits Source # 
Instance details

Defined in Ecluse.Security

Eq Limits Source # 
Instance details

Defined in Ecluse.Security

Methods

(==) :: Limits -> Limits -> Bool #

(/=) :: Limits -> Limits -> Bool #

defaultLimits :: Limits Source #

Sane defaults for Limits. Generous enough for real registry documents and tight enough to fail closed on pathological input: a 16 MiB metadata body, 100k versions, and 64 levels of JSON nesting. Override per deployment as needed.

data LimitError Source #

Which Limits ceiling a response exceeded.

Constructors

BodyTooLarge Int

The body exceeded maxBodyBytes; carries the configured ceiling.

TooManyVersions Int Int

The packument carried more than maxVersionCount versions; carries the count seen and the ceiling.

TooDeeplyNested Int

JSON nesting exceeded maxNestingDepth; carries the ceiling.

Instances

Instances details
Show LimitError Source # 
Instance details

Defined in Ecluse.Security

Eq LimitError Source # 
Instance details

Defined in Ecluse.Security

boundedRead :: Monad m => Limits -> m ByteString -> m (Either LimitError ByteString) Source #

Read a streamed body chunk-by-chunk, aborting as soon as the accumulated size would exceed maxBodyBytes. Polymorphic over the producing monad so the streaming fetch can run it in IO while tests drive it purely.

readChunk is a chunk producer following the http-client BodyReader contract: each call yields the next chunk, and an empty ByteString signals end of input. boundedRead pulls chunks until EOF and returns the concatenated body, or stops at the first chunk that pushes the running total past maxBodyBytes and returns Left (BodyTooLarge …)fail-closed, never a truncated body. A zero or negative maxBodyBytes rejects any non-empty body. The bound is checked before a chunk is retained, so memory never exceeds the limit plus one chunk.

checkVersionCount :: Limits -> PackageInfo -> Either LimitError PackageInfo Source #

Reject a parsed packument carrying more than maxVersionCount versions, returning it unchanged when within budget.

Applied after a document is projected to PackageInfo but before per-version rule evaluation, so the cost of evaluating rules over every version is bounded by configuration rather than by what an upstream returns. Counts the infoVersions map; on breach returns Left (TooManyVersions count cap), otherwise the document unchanged so it threads through a parse pipeline.

checkNestingDepth :: Limits -> Value -> Either LimitError Value Source #

Reject a decoded JSON document nested deeper than maxNestingDepth, returning it unchanged when within budget.

Run on the already-decoded Value — after the parser has produced it, before the document is projected to domain types — so a pathologically nested payload is refused before any deep domain traversal. It is therefore not the defence against an unbounded structure: the structure is already bounded-by-body-size by the time it reaches here, since the maxBodyBytes cap on the streamed read precedes the decode (a body the parser never finishes reading never produces a Value). This guard bounds the traversal cost of a within-size-but-deeply-nested document — the stack/CPU a recursive walk of it would spend — which the body cap alone does not bound (a small body can still nest deeply). Depth counts container nesting: a scalar is depth 1, and each enclosing Object/Array adds one. An empty container counts as a leaf (depth 1), since it forces no descent. Traversal short-circuits at the first sub-tree to breach the ceiling, so a deeply-nested branch costs no more than the ceiling to reject.