ecluse
Safe HaskellNone
LanguageGHC2021

Ecluse.Package.Merge

Description

Merging several upstream packuments into the one document Écluse serves.

A packument is the set of available versions of a package, and that set is spread across upstreams: a trusted private upstream holds what has been vetted, while a gated public upstream holds the full history — including versions not yet mirrored. Serving only the private document would hide those, so Écluse serves their union rather than short-circuiting on a private hit. This module is the pure, ecosystem-agnostic fold that reasons over that union on the PackageInfo domain model — it lives above the registry handle, written once and reused by every ecosystem, and never imports a registry adapter.

Decision surface, not served surface. This module reasons over the typed PackageInfo but does not emit a finished, re-serialisable PackageInfo. The document Écluse serves is the raw upstream JSON (Value), edited in place by the serve layer, so that every unmodeled wire key survives. The typed model is lossy, so re-encoding it would drop those keys. This module therefore emits a MergePlan — exactly which versions survive, which input each survivor came from, the reconciled dist-tags/time, and the detected divergences — that the serve layer replays onto the raw Values. See docs/architecture/registry-model.md → "Decision surface vs served surface".

The trust split is the caller's, expressed as a Provenance tag on each input and applied before the merge: TrustedSource (private) versions are admitted as-is; GatedSource (public) versions are the already-rule-filtered set. This module does not run rules — it reasons over exactly what it is handed (see docs/architecture/rules-engine.md → "Applying verdicts to a packument").

Two things make the merge more than a map union, and both are supply-chain signals, not silent reconciliations:

  • Collision. When the same version key comes from both a TrustedSource and a GatedSource, the trusted copy wins (it is the authority) — recorded in the plan as the survivor's winning SourceId.
  • Divergence. When the colliding copies __contradict on a shared integrity algorithm__ — an algorithm both expose carries disagreeing digests — that is exactly the tampering Écluse exists to catch. Copies that merely expose different algorithm sets without contradicting on a shared one (one mirror also carrying a legacy digest the other omits) describe the same bytes and are not a divergence. The trusted copy still wins the merge, but a real contradiction is reported in the MergePlan; whether to additionally drop the version (fail-closed) is a policy decision left to the caller, so this module stays pure.

The merge is a lawful Monoid. The fold is realised over a Merge accumulator with a lawful Semigroup / Monoid: mempty is the empty merge (the degenerate identity at zero inputs) and (<>) is the trusted-wins union with order-independent divergence detection. mergePackuments assigns each input a SourceId by list position, foldMaps the contributions into the accumulator, and projects to a MergePlan. See the Semigroup instance for the exact law domain (associative + identity, intentionally not commutative).

See docs/architecture/registry-model.md → "Packument merge across upstreams".

Synopsis

Provenance

data Provenance Source #

The trust provenance of an upstream's contribution to the merge. The split is decided by the caller — by which upstream a document came from — and applied before merging, never derived here.

The constructors are named *Source rather than the bare Trusted/Gated because Ecluse.Package already exports a Trust constructor named Trusted; a bare name would collide for the many callers that import Ecluse.Package openly.

The Ord instance is the trust order itself — TrustedSource compares __less than__ GatedSource so that "smallest wins" gives trusted precedence; the merge's resolution leans on this directly (see mergePackuments).

Constructors

TrustedSource

A private-upstream document. Its versions are already vetted, so they enter the union unfiltered and win any collision.

GatedSource

A public-upstream document. Its versions are the set that already survived the rules engine; the merge unions them but never re-filters.

Merging

type SourceId = Int Source #

A stable identifier for one input to a single mergePackuments call: the 0-based index of that (Provenance, PackageInfo) in the input list.

The serve layer needs to take a surviving version's object from the raw Value of whichever source won it, so the plan must name that source. Provenance alone is not enough: it identifies a source only while there is exactly one input per provenance (the npm topology today — one trusted, one gated). The input index stays unambiguous even when several inputs share a provenance (e.g. an aggregating private upstream plus a first-party source, both TrustedSource), which keeps the plan robust for the multi-source case without a new type. The caller pairs each SourceId back to the raw Value it passed at that position.

data MergePlan Source #

The outcome of reasoning over a set of upstream packuments: a plan the serve layer replays onto the raw upstream Values to assemble the lossless served body. It carries exactly the decisions the merge owns — never a finished, re-serialisable document (see this module's header, "Decision surface, not served surface").

Constructors

MergePlan 

Fields

  • mpName :: PackageName

    The package identity, carried from the contributions. Every contribution that reaches the merge has had its self-reported name validated against the requested one upstream of here (a disagreeing origin is dropped before the merge), so all inputs carry the same identity and it is never a substituted or manufactured value — only one an upstream genuinely reported.

  • mpSurvivors :: Map Text SourceId

    Each surviving version key mapped to the SourceId of the input that won it, so the serve layer takes that version's object from the right source's raw Value. Trusted wins a collision; absent versions are not keys here.

  • mpDistTags :: Map Text Version

    dist-tags reconciled over the surviving union — latest resolved by the shared selector, every other surviving-target tag carried, absent-target tags dropped.

  • mpTime :: Map Text UTCTime

    The time union restricted to surviving versions; publish times for versions that did not survive are dropped.

  • mpDivergences :: Set Divergence

    Every distinct same-version integrity conflict found. A Set because divergence is a property of the set of distinct integrity fingerprints contributed for a version key, not of any pairwise fold step: the winner's fingerprint is recorded against /each distinct fingerprint that contradicts it on a shared algorithm/, which is order-independent and deduplicating by construction. Empty when no two copies of a shared version contradict on a shared algorithm — including when they merely expose different algorithm sets without disagreeing on one they share.

Instances

Instances details
Show MergePlan Source # 
Instance details

Defined in Ecluse.Package.Merge

Eq MergePlan Source # 
Instance details

Defined in Ecluse.Package.Merge

data Divergence Source #

A detected integrity conflict: a version key present in more than one source whose copies contradict on a shared algorithm — an algorithm both expose carries disagreeing digests. The trusted copy wins the merge; this record preserves both fingerprints so the caller can log, meter, and decide policy (serve-with-private-winning vs fail-closed). It is the merge's supply-chain signal — surfaced, never silently reconciled.

Ord is derived purely to let MergePlan carry divergences as a Set: the ordering is structural (over the version key and the two fingerprints) and has no meaning beyond deduplication and a stable presentation.

Constructors

Divergence 

Fields

data IntegrityFingerprint Source #

An order-independent fingerprint of a version's artifact integrity: the sorted multiset of (algorithm, digest) pairs across all of the version's artifacts. The comparison ignores artifact ordering and non-integrity fields (filename, URL, size) that legitimately vary between mirrors of the same bytes.

Two copies diverge when they contradict on a shared algorithm: an algorithm present in both carries disagreeing digests. An asymmetric pair — one copy exposing an algorithm the other omits — does not diverge on that account; only a shared algorithm whose digests disagree does. So a mirror serving a modern digest alongside a legacy one agrees with a mirror serving only the modern digest, as long as that shared digest matches.

Opaque so the comparison used for divergence detection cannot be sidestepped; read the pairs back with integrityHashes when logging or metering a Divergence. Ord is derived (structurally, over the sorted pairs) only so a Divergence may live in a Set; it carries no domain meaning beyond that, and in particular is not the divergence test (which is the shared-algorithm contradiction above, never structural inequality of the whole set).

integrityHashes :: IntegrityFingerprint -> [(HashAlg, Text)] Source #

The (algorithm, digest) pairs of a fingerprint, sorted, for an audit trail.

mergePackuments :: [(Provenance, PackageInfo)] -> Maybe MergePlan Source #

Reason over several upstream packuments, by Provenance, and emit the MergePlan the serve layer replays onto the raw Values. Pure and total.

The merge is a fold with the degenerate identity at one input: a single packument yields a plan whose survivors are all of its versions (all won by source 0), with its tags and times reconciled and no divergences, so 0/1-upstream deployments need no special case. It is realised as a foldMap of each input's contribute into the lawful Merge Monoid, projected by planFrom. The model:

  • Union by version key, with TrustedSource winning a collision over GatedSource (the private upstream is the authority). The winning input's SourceId is recorded for the survivor. A collision whose copies contradict on a shared integrity algorithm is recorded as a Divergence; the winner is still kept.
  • 'dist-tags' reconciled over the union. latest is resolved by selectLatest — keep-unless-denied, stable-preferring, and unparseable-safe — from the precedence-winning source's tagged latest and the surviving versions; any other tag pointing at a version absent from the union is dropped. Collisions on the same tag are resolved by provenance (trusted wins), consistent with the version fold, so the plan does not depend on caller input order.
  • time restricted to the union, with per-version collisions also resolved by provenance — publish times for versions that did not survive are dropped.

The plan's identity (mpName) is carried from the contributions; callers fetch one package across its upstreams and each contribution's name has been validated against the requested one before reaching here, so all inputs share that one identity and it is never a substituted value. An empty input list yields Nothing — there is nothing to serve.

The merge accumulator

The merge is realised as a fold into a lawful Monoid. contribute turns one (Provenance, PackageInfo) input into a Merge; (<>) combines two merges (trusted-wins union, with order-independent divergence kept unresolved until the projection); mempty is the empty merge (the degenerate identity). planFrom projects a folded Merge to a MergePlan. mergePackuments is exactly planFrom . foldMap (uncurry contribute). The Merge type is opaque — build it only through contribute and mempty — so a SourceId always names a real input position. See the Semigroup instance for the law domain (associative + identity, intentionally not commutative, and why).

data Merge Source #

The monoidal accumulator the merge folds into. It holds, unresolved, every candidate offered for every version key, plus the ranked dist-tags and time contributions; resolution to a single winner per key, and the divergence set, happens once in planFrom. Keeping candidates unresolved is what makes (<>) associative: a pairwise winner-vs-loser decision taken during the fold is not associative once three or more copies of a key collide, because divergence is a property of the whole set of distinct fingerprints, not of any one step.

Each accumulator also carries the count of inputs it represents, so that (<>) can __re-index the right operand's SourceIds by the left operand's input count__. This positional re-indexing is what makes a SourceId name an input's list position after a foldMap of single-input contributions — and it is the sole reason the instance is non-commutative (see the Semigroup instance).

Instances

Instances details
Monoid Merge Source # 
Instance details

Defined in Ecluse.Package.Merge

Methods

mempty :: Merge #

mappend :: Merge -> Merge -> Merge #

mconcat :: [Merge] -> Merge #

Semigroup Merge Source #

The merge's Semigroup has a deliberately narrow law domain, and the narrowing is load-bearing, not an accident:

  • Associative(a <> b) <> c == a <> (b <> c). The SourceId re-indexing offsets compose additively, and every per-key combiner (set union for candidates, "keep the smaller rank" for tags/time, "left name wins" for the identity) is itself associative, so the whole is.
  • Identitymempty (the empty merge) is both a left and a right unit.
  • Intentionally NOT commutativea <> b /= b <> a in general. (<>) re-indexes the right operand's SourceIds by the left operand's input count, because a SourceId must name the input's position in the caller's list — the index the serve layer pairs back to a raw Value. Swapping the operands swaps those positions, so the SourceId labels differ.

The order-independence guarantee, stated precisely (and the reason commutativity is the wrong law): precedence is resolved by provenance, so the surviving key set and the winning provenance per key are invariant under any permutation of the inputs, and the value-level reconciliations (the survivor a key resolves to, the divergence fingerprint-pairs, the dist-tags/time targets) are invariant under any permutation that keeps each collision cross-provenance — which the npm topology (exactly one trusted, one gated upstream) always does, so every observable decision is order-independent there. The sole residual order-dependence is the positional tiebreak between two inputs of the same provenance: provenance cannot break that tie, so the lower SourceId (earlier input) wins it, and which copy is the divergence winner then tracks order. That positional tiebreak is exactly why SourceId exists and why the instance is non-commutative.

Instance details

Defined in Ecluse.Package.Merge

Methods

(<>) :: Merge -> Merge -> Merge #

sconcat :: NonEmpty Merge -> Merge #

stimes :: Integral b => b -> Merge -> Merge #

Show Merge Source # 
Instance details

Defined in Ecluse.Package.Merge

Methods

showsPrec :: Int -> Merge -> ShowS #

show :: Merge -> String #

showList :: [Merge] -> ShowS #

Eq Merge Source # 
Instance details

Defined in Ecluse.Package.Merge

Methods

(==) :: Merge -> Merge -> Bool #

(/=) :: Merge -> Merge -> Bool #

contribute :: Provenance -> PackageInfo -> Merge Source #

One input's contribution to the accumulator, at local SourceId 0: every version becomes a candidate, every dist-tags target and time instant a ranked value at this input's provenance, and the package name is offered as the identity. foldMap contribute over the inputs then re-indexes each to its list position via the Semigroup offset, so the absolute SourceId of a single-input contribution is its index in the foldMap.

planFrom :: Merge -> Maybe MergePlan Source #

Project the resolved MergePlan from a folded Merge. Resolves each version key to its precedence winner, derives the divergence Set from the shared-algorithm contradictions among each key's distinct fingerprints, and reconciles dist-tags/time over the survivors. Returns Nothing only for the empty merge (mempty), which has no name and so nothing to serve — equivalently, the empty input list.