Knowledge Base Sync PipelineCompositions: Setup, Scale, and Review

Updated 2026-04-17

Compositions: Setup, Scale, and Review

Step-by-step build of a composition, how match keys and coercion work, scale thresholds, the fan-out halt gate, intermediate reuse, and schema-drift audit.

Building a Composition

The editor lives at Configure → Compositions → New Composition. It's a four-step stepper, not a freeform node graph — finance analysts have consistently told us that graphs feel like something they need an engineer to run.

Step 1 — Name and Output Object

Pick a human-readable name ("Territory × Bookings") and an output object type — a short machine identifier (e.g. territory_bookings) that downstream field mapping will see. Think of the output object type as the table name the composed dataset will be known by.

Step 2 — Pick the Sources

Each source is a pair:

Connection. Either an existing SOURCE endpoint on a sync profile you've already built, or a credential for a source-capable connector that isn't wired into any profile yet. The latter are "virtual endpoints" — the composition editor surfaces them so you don't have to build a throwaway profile just to use Adaptive Planning as a source.
Alias. A short, machine-friendly name — territory, bookings, ns, stripe. This is the prefix that will show up on every column in the output (territory.region, bookings.amount). Pick something you won't regret reading a hundred times in downstream field mappings.

Sources are ordinal. Source #1 is the "left" side for JOIN semantics and the first precedent for UNION dedup. You can add more than two sources for N-way joins — the composer reduces them left-to-right, using the same join type at every step.

Step 3 — Combine Mode

Toggle between JOIN and UNION. For JOIN, pick a match type:

INNER. Only rows that match on both sides.
LEFT. Keep every row from the left side; unmatched left rows get null values for the right-side columns.
RIGHT. Mirror of LEFT.
FULL. Keep everything, matched or not.

For UNION, pick a schema reconciliation mode and a dedup mode:

PERMISSIVE (default). Sources with different field sets are merged by taking the union of columns; missing fields get null. You'll see a warning in the issues list for every missing field.
STRICT. Every source must have the exact same field set, or the run fails fast. Reserve this for compliance flows.
Dedup. NONE (UNION ALL), CONTENT_HASH (drop identical rows; effectively free because every RawRecord already carries a canonical hash), PRIMARY_KEY (dedup by the shared key defined in Step 4; last-write wins by sourceUpdatedAt).

Step 4 — Match Keys (JOIN only)

For each key pair, pick the field on each side that should be compared. Values are trimmed and lower-cased by default, so "T-001" matches "t-001" and the string "1" matches the number 1 without you having to do anything. Three coercion modes per key:

string (default). Cast + trim + lower-case. Handles 80% of cross-connector type mismatches.
numeric. Coerce to number; reject non-numeric values.
exact. Preserve type distinction. "1" will not match 1.

Composite keys are two or more key pairs sharing a group ID in the underlying data model. In the editor, "Add match key" creates another pair — all pairs combined define the equality predicate.

Save and Preview

After saving, Run preview executes the composition against live sources and returns the projected output row count + unmatched counts per side. This is your sanity check — if the projected row count is off by 10× from what you expected, you've probably picked the wrong match key.

Previews are rate-limited to ten per tenant per five minutes. They run with previewMode: true, which means they don't seed the intermediate cache, don't emit schema-drift audit events, and don't trigger the fan-out halt flag.

Attaching to a Sync Profile

Go to Configure → Field Mapping, pick or create a profile, and toggle the source step from "Connection" to "Composition". The dropdown lists every saved composition for your tenant. Field mappings on composed profiles reference source fields by their alias-prefixed names (territory.region) — the editor shows a hint so you remember.

A profile with a composition only needs a TARGET endpoint. You can't mix a composition with a single SOURCE endpoint on the same profile — it's one or the other.

Scale and Performance

| Build-side row count | Algorithm | Notes | | --- | --- | --- | | ≤ 250,000 | Monolithic in-memory hash join | Build side is the smaller of the two inputs | | 250,000 – 2,000,000 | Partitioned (Grace-style) hash join | 16 buckets + 1 null bucket, joined bucket-by-bucket | | > 2,000,000 | Hard-aborts with JOIN_MEMORY_EXCEEDED | Not supported in-process; push the join down or split the data |

The partitioner picks a bucket per row via a stable string hash of the normalized key tuple, so the same key always routes to the same bucket. Null-key rows go to a dedicated bucket so nullPolicy: 'match' still pairs them correctly.

The hard fanoutRowCap (default 1,000,000) is a separate safety net. If any step of the join is about to produce more composed rows than the cap, the run aborts with JOIN_FANOUT_EXCEEDED — that's almost always a sign of a bad join key or a missed cardinality assumption.

Intermediate-Snapshot Reuse

When a connector declares the content-hash-stable or incremental-extract capability, each source's records are persisted after extraction as an ephemeral SOURCE_INTERMEDIATE snapshot tagged with a SHA-256 payload fingerprint. Subsequent composition runs within a 15-minute window reuse the cached rows and skip the connector call entirely.

Concretely: a second preview click right after the first one completes instantly, and repeat sync runs during a short batch window avoid hammering the same API.

Two guardrails keep this safe:

Capability gate. Connectors that can't guarantee stable content hashes (currently: CSV uploads) never reuse.
7-day absolute ceiling. No snapshot older than seven days is reused under any condition — stale reuse is how you ship last month's data to production.

Ephemeral snapshots are swept by purgeEphemeralSnapshots(24) — a retention helper meant to run from a daily cron. Ask your admin if it's wired up.

Fan-Out Review: the Soft HITL Gate

Some joins legitimately explode row counts — a one-to-many booking-to-invoice relationship, for example. Others do it by accident when the chosen key isn't actually unique. The composition carries a fan-out halt threshold (default 5.0) — if outputRecords / max(inputPerAlias) exceeds this, the run halts into a new status, PENDING_FANOUT_REVIEW, even when no material diffs are detected.

The halt is soft — the run can be approved just like any other pending review. But the pause forces a human to confirm that a 10,000-row source legitimately became a 100,000-row composed dataset before those rows head downstream.

Set fanoutHaltThreshold to 0 to disable the halt entirely and fall back to the hard fanoutRowCap. Raise it if your composition genuinely fans out more than 5× every run.

Contributing Sources on Diff Records

Every DiffRecord persisted from a composed run carries contributingSources: string[] — the composition aliases whose fields produced the change. On the review page this tells you, per row, whether a movement came from the territory side, the bookings side, or both. Legacy non-composed runs keep the empty array — no existing UI breaks.

Schema Drift Audit (PERMISSIVE Union)

When a PERMISSIVE union run produces a field set that differs from the one seen on the prior run, the composer persists CompositionSchemaEvent rows — one per added or removed field — and updates the composition's lastKnownFields. The first run just seeds the baseline without emitting events (otherwise you'd get a flood on day one).

This gives you a paper trail for "why did a new taxId column suddenly appear in our invoice union?" without you having to run git blame on the composition config. The events are tenant-scoped and surface alongside your audit log.

Common Pitfalls

Key type mismatches. If your join returns zero matches, open Run preview and check the unmatched counts. 99% of zero-match joins are a silent string-vs-number or whitespace mismatch that the default string coercion would have caught — you probably switched to exact. Switch back or add a coerce step.
Alias-prefixed field mappings. On a composed profile, source field names are alias.field, not field. The Field Mapping page reminds you but the autocomplete can't help until the first run has produced a composed snapshot.
Compositions on existing profiles. V1 only allows compositions on newly created profiles. Swapping an existing single-source profile to a composition would invalidate its field mappings because source field names carry prefixes now. Clone the profile instead.
Adaptive Planning as a source. Adaptive is dual-role (source and target) and works as a composition source via virtual endpoints. Pigment is currently target-only — it won't appear in the source picker for compositions.