| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Package.Merge
Description
Merging several upstream packuments into the one document Écluse serves.
A packument is the set of available versions of a package, and that set is
spread across upstreams: a trusted private upstream holds what has been vetted,
while a gated public upstream holds the full history — including versions not yet
mirrored. Serving only the private document would hide those, so Écluse serves
their union rather than short-circuiting on a private hit. This module is the
pure, ecosystem-agnostic fold that reasons over that union on the
PackageInfo domain model — it lives above the registry handle,
written once and reused by every ecosystem, and never imports a registry adapter.
Decision surface, not served surface. This module reasons over the typed
PackageInfo but does not emit a finished, re-serialisable PackageInfo.
The document Écluse serves is the raw upstream JSON (Value), edited in place by
the serve layer, so that every unmodeled wire key survives. The typed model
is lossy, so re-encoding it would drop those keys. This module therefore emits a
MergePlan — exactly which versions survive, which input each survivor came from,
the reconciled dist-tags/time, and the detected divergences — that the serve
layer replays onto the raw Values. See docs/architecture/registry-model.md
→ "Decision surface vs served surface".
The trust split is the caller's, expressed as a Provenance tag on each
input and applied before the merge: TrustedSource (private) versions are
admitted as-is; GatedSource (public) versions are the already-rule-filtered set.
This module does not run rules — it reasons over exactly what it is handed (see
docs/architecture/rules-engine.md → "Applying verdicts to a packument").
Two things make the merge more than a map union, and both are supply-chain signals, not silent reconciliations:
- Collision. When the same version key comes from both a
TrustedSourceand aGatedSource, the trusted copy wins (it is the authority) — recorded in the plan as the survivor's winningSourceId. - Divergence. When the colliding copies __contradict on a shared integrity
algorithm__ — an algorithm both expose carries disagreeing digests — that is
exactly the tampering Écluse exists to catch. Copies that merely expose
different algorithm sets without contradicting on a shared one (one mirror also
carrying a legacy digest the other omits) describe the same bytes and are not a
divergence. The trusted copy still wins the merge, but a real contradiction is
reported in the
MergePlan; whether to additionally drop the version (fail-closed) is a policy decision left to the caller, so this module stays pure.
The merge is a lawful Monoid. The fold is realised over a Merge
accumulator with a lawful Semigroup / Monoid: mempty is the empty merge
(the degenerate identity at zero inputs) and (<>) is the trusted-wins union with
order-independent divergence detection. mergePackuments assigns each input a
SourceId by list position, foldMaps the contributions into the accumulator,
and projects to a MergePlan. See the Semigroup instance for the exact law
domain (associative + identity, intentionally not commutative).
See docs/architecture/registry-model.md → "Packument merge across upstreams".
Synopsis
- data Provenance
- type SourceId = Int
- data MergePlan = MergePlan {}
- data Divergence = Divergence {}
- data IntegrityFingerprint
- integrityHashes :: IntegrityFingerprint -> [(HashAlg, Text)]
- mergePackuments :: [(Provenance, PackageInfo)] -> Maybe MergePlan
- data Merge
- contribute :: Provenance -> PackageInfo -> Merge
- planFrom :: Merge -> Maybe MergePlan
Provenance
data Provenance Source #
The trust provenance of an upstream's contribution to the merge. The split is decided by the caller — by which upstream a document came from — and applied before merging, never derived here.
The constructors are named *Source rather than the bare Trusted/Gated
because Ecluse.Package already exports a Trust constructor
named Trusted; a bare name would collide for the many callers that import
Ecluse.Package openly.
The Ord instance is the trust order itself — TrustedSource compares __less
than__ GatedSource so that "smallest wins" gives trusted precedence; the merge's
resolution leans on this directly (see mergePackuments).
Constructors
| TrustedSource | A private-upstream document. Its versions are already vetted, so they enter the union unfiltered and win any collision. |
| GatedSource | A public-upstream document. Its versions are the set that already survived the rules engine; the merge unions them but never re-filters. |
Instances
| Show Provenance Source # | |
Defined in Ecluse.Package.Merge Methods showsPrec :: Int -> Provenance -> ShowS # show :: Provenance -> String # showList :: [Provenance] -> ShowS # | |
| Eq Provenance Source # | |
Defined in Ecluse.Package.Merge | |
| Ord Provenance Source # | |
Defined in Ecluse.Package.Merge Methods compare :: Provenance -> Provenance -> Ordering # (<) :: Provenance -> Provenance -> Bool # (<=) :: Provenance -> Provenance -> Bool # (>) :: Provenance -> Provenance -> Bool # (>=) :: Provenance -> Provenance -> Bool # max :: Provenance -> Provenance -> Provenance # min :: Provenance -> Provenance -> Provenance # | |
Merging
A stable identifier for one input to a single mergePackuments call: the
0-based index of that (Provenance, PackageInfo) in the input list.
The serve layer needs to take a surviving version's object from the raw
Value of whichever source won it, so the plan must name that source. Provenance
alone is not enough: it identifies a source only while there is exactly one
input per provenance (the npm topology today — one trusted, one gated). The
input index stays unambiguous even when several inputs share a provenance (e.g. an
aggregating private upstream plus a first-party source, both TrustedSource),
which keeps the plan robust for the multi-source case without a new type. The
caller pairs each SourceId back to the raw Value it passed at that position.
The outcome of reasoning over a set of upstream packuments: a plan the
serve layer replays onto the raw upstream Values to assemble the lossless
served body. It carries exactly the decisions the merge owns — never a finished,
re-serialisable document (see this module's header, "Decision surface, not served
surface").
Constructors
| MergePlan | |
Fields
| |
data Divergence Source #
A detected integrity conflict: a version key present in more than one source whose copies contradict on a shared algorithm — an algorithm both expose carries disagreeing digests. The trusted copy wins the merge; this record preserves both fingerprints so the caller can log, meter, and decide policy (serve-with-private-winning vs fail-closed). It is the merge's supply-chain signal — surfaced, never silently reconciled.
Ord is derived purely to let MergePlan carry divergences as a Set: the
ordering is structural (over the version key and the two fingerprints) and has no
meaning beyond deduplication and a stable presentation.
Constructors
| Divergence | |
Fields
| |
Instances
| Show Divergence Source # | |
Defined in Ecluse.Package.Merge Methods showsPrec :: Int -> Divergence -> ShowS # show :: Divergence -> String # showList :: [Divergence] -> ShowS # | |
| Eq Divergence Source # | |
Defined in Ecluse.Package.Merge | |
| Ord Divergence Source # | |
Defined in Ecluse.Package.Merge Methods compare :: Divergence -> Divergence -> Ordering # (<) :: Divergence -> Divergence -> Bool # (<=) :: Divergence -> Divergence -> Bool # (>) :: Divergence -> Divergence -> Bool # (>=) :: Divergence -> Divergence -> Bool # max :: Divergence -> Divergence -> Divergence # min :: Divergence -> Divergence -> Divergence # | |
data IntegrityFingerprint Source #
An order-independent fingerprint of a version's artifact integrity: the
sorted multiset of (algorithm, digest) pairs across all of the version's
artifacts. The comparison ignores artifact ordering and non-integrity fields
(filename, URL, size) that legitimately vary between mirrors of the same bytes.
Two copies diverge when they contradict on a shared algorithm: an algorithm present in both carries disagreeing digests. An asymmetric pair — one copy exposing an algorithm the other omits — does not diverge on that account; only a shared algorithm whose digests disagree does. So a mirror serving a modern digest alongside a legacy one agrees with a mirror serving only the modern digest, as long as that shared digest matches.
Opaque so the comparison used for divergence detection cannot be sidestepped; read
the pairs back with integrityHashes when logging or metering a Divergence. Ord
is derived (structurally, over the sorted pairs) only so a Divergence may live in a
Set; it carries no domain meaning beyond that, and in particular is not the
divergence test (which is the shared-algorithm contradiction above, never structural
inequality of the whole set).
Instances
| Show IntegrityFingerprint Source # | |
Defined in Ecluse.Package.Merge Methods showsPrec :: Int -> IntegrityFingerprint -> ShowS # show :: IntegrityFingerprint -> String # showList :: [IntegrityFingerprint] -> ShowS # | |
| Eq IntegrityFingerprint Source # | |
Defined in Ecluse.Package.Merge Methods (==) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # (/=) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # | |
| Ord IntegrityFingerprint Source # | |
Defined in Ecluse.Package.Merge Methods compare :: IntegrityFingerprint -> IntegrityFingerprint -> Ordering # (<) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # (<=) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # (>) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # (>=) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # max :: IntegrityFingerprint -> IntegrityFingerprint -> IntegrityFingerprint # min :: IntegrityFingerprint -> IntegrityFingerprint -> IntegrityFingerprint # | |
integrityHashes :: IntegrityFingerprint -> [(HashAlg, Text)] Source #
The (algorithm, digest) pairs of a fingerprint, sorted, for an audit trail.
mergePackuments :: [(Provenance, PackageInfo)] -> Maybe MergePlan Source #
Reason over several upstream packuments, by Provenance, and emit the
MergePlan the serve layer replays onto the raw Values. Pure and total.
The merge is a fold with the degenerate identity at one input: a single
packument yields a plan whose survivors are all of its versions (all won by source
0), with its tags and times reconciled and no divergences, so 0/1-upstream
deployments need no special case. It is realised as a foldMap of each input's
contribute into the lawful Merge Monoid, projected by planFrom. The model:
- Union by version key, with
TrustedSourcewinning a collision overGatedSource(the private upstream is the authority). The winning input'sSourceIdis recorded for the survivor. A collision whose copies contradict on a shared integrity algorithm is recorded as aDivergence; the winner is still kept. - 'dist-tags' reconciled over the union.
latestis resolved byselectLatest— keep-unless-denied, stable-preferring, and unparseable-safe — from the precedence-winning source's taggedlatestand the surviving versions; any other tag pointing at a version absent from the union is dropped. Collisions on the same tag are resolved by provenance (trusted wins), consistent with the version fold, so the plan does not depend on caller input order. timerestricted to the union, with per-version collisions also resolved by provenance — publish times for versions that did not survive are dropped.
The plan's identity (mpName) is carried from the contributions; callers fetch one
package across its upstreams and each contribution's name has been validated against
the requested one before reaching here, so all inputs share that one identity and it
is never a substituted value. An empty input list yields Nothing — there is nothing
to serve.
The merge accumulator
The merge is realised as a fold into a lawful Monoid. contribute turns one
(Provenance, PackageInfo) input into a Merge; (<>) combines two merges
(trusted-wins union, with order-independent divergence kept unresolved until the
projection); mempty is the empty merge (the degenerate identity). planFrom
projects a folded Merge to a MergePlan. mergePackuments is exactly
. The planFrom . foldMap (uncurry contribute)Merge type is opaque —
build it only through contribute and mempty — so a SourceId always names a
real input position. See the Semigroup instance for the law domain (associative
+ identity, intentionally not commutative, and why).
The monoidal accumulator the merge folds into. It holds, unresolved, every
candidate offered for every version key, plus the ranked dist-tags and time
contributions; resolution to a single winner per key, and the divergence set,
happens once in planFrom. Keeping candidates unresolved is what makes (<>)
associative: a pairwise winner-vs-loser decision taken during the fold is not
associative once three or more copies of a key collide, because divergence is a
property of the whole set of distinct fingerprints, not of any one step.
Each accumulator also carries the count of inputs it represents, so that (<>)
can __re-index the right operand's SourceIds by the left operand's input
count__. This positional re-indexing is what makes a SourceId name an input's
list position after a foldMap of single-input contributions — and it is the sole
reason the instance is non-commutative (see the Semigroup instance).
Instances
| Monoid Merge Source # | |
| Semigroup Merge Source # | The merge's
The order-independence guarantee, stated precisely (and the reason commutativity is
the wrong law): precedence is resolved by provenance, so the surviving key set
and the winning provenance per key are invariant under any permutation of the
inputs, and the value-level reconciliations (the survivor a key resolves to, the
divergence fingerprint-pairs, the |
| Show Merge Source # | |
| Eq Merge Source # | |
contribute :: Provenance -> PackageInfo -> Merge Source #
One input's contribution to the accumulator, at local SourceId 0: every
version becomes a candidate, every dist-tags target and time instant a ranked
value at this input's provenance, and the package name is offered as the identity.
foldMap contribute over the inputs then re-indexes each to its list position via
the Semigroup offset, so the absolute SourceId of a single-input contribution
is its index in the foldMap.
planFrom :: Merge -> Maybe MergePlan Source #
Project the resolved MergePlan from a folded Merge. Resolves each version
key to its precedence winner, derives the divergence Set from the shared-algorithm
contradictions among each key's distinct fingerprints, and reconciles
dist-tags/time over the survivors. Returns
Nothing only for the empty merge (mempty), which has no name and so nothing to
serve — equivalently, the empty input list.