Compositions: Setup, Scale, and Review
Step-by-step build of a composition, how match keys and coercion work, scale thresholds, the fan-out halt gate, intermediate reuse, and schema-drift audit.
Building a Composition
The editor lives at Configure → Compositions → New Composition. It's a four-step stepper, not a freeform node graph — finance analysts have consistently told us that graphs feel like something they need an engineer to run.
Step 1 — Name and Output Object
Pick a human-readable name ("Territory × Bookings") and an output object type — a short machine identifier (e.g. territory_bookings) that downstream field mapping will see. Think of the output object type as the table name the composed dataset will be known by.
Step 2 — Pick the Sources
Each source is a pair:
- Connection. Either an existing SOURCE endpoint on a sync profile you've already built, or a credential for a source-capable connector that isn't wired into any profile yet. The latter are "virtual endpoints" — the composition editor surfaces them so you don't have to build a throwaway profile just to use Adaptive Planning as a source.
- Alias. A short, machine-friendly name —
territory,bookings,ns,stripe. This is the prefix that will show up on every column in the output (territory.region,bookings.amount). Pick something you won't regret reading a hundred times in downstream field mappings.
Sources are ordinal. Source #1 is the "left" side for JOIN semantics and the first precedent for UNION dedup. You can add more than two sources for N-way joins — the composer reduces them left-to-right, using the same join type at every step.
Step 3 — Combine Mode
Toggle between JOIN and UNION. For JOIN, pick a match type:
- INNER. Only rows that match on both sides.
- LEFT. Keep every row from the left side; unmatched left rows get null values for the right-side columns.
- RIGHT. Mirror of LEFT.
- FULL. Keep everything, matched or not.
For UNION, pick a schema reconciliation mode and a dedup mode:
- PERMISSIVE (default). Sources with different field sets are merged by taking the union of columns; missing fields get
null. You'll see a warning in the issues list for every missing field. - STRICT. Every source must have the exact same field set, or the run fails fast. Reserve this for compliance flows.
- Dedup.
NONE(UNION ALL),CONTENT_HASH(drop identical rows; effectively free because everyRawRecordalready carries a canonical hash),PRIMARY_KEY(dedup by the shared key defined in Step 4; last-write wins bysourceUpdatedAt).
Step 4 — Match Keys (JOIN only)
For each key pair, pick the field on each side that should be compared. Values are trimmed and lower-cased by default, so "T-001" matches "t-001" and the string "1" matches the number 1 without you having to do anything. Three coercion modes per key:
string(default). Cast + trim + lower-case. Handles 80% of cross-connector type mismatches.numeric. Coerce to number; reject non-numeric values.exact. Preserve type distinction."1"will not match1.
Composite keys are two or more key pairs sharing a group ID in the underlying data model. In the editor, "Add match key" creates another pair — all pairs combined define the equality predicate.
Save and Preview
After saving, Run preview executes the composition against live sources and returns the projected output row count + unmatched counts per side. This is your sanity check — if the projected row count is off by 10× from what you expected, you've probably picked the wrong match key.
Previews are rate-limited to ten per tenant per five minutes. They run with previewMode: true, which means they don't seed the intermediate cache, don't emit schema-drift audit events, and don't trigger the fan-out halt flag.
Attaching to a Sync Profile
Go to Configure → Field Mapping, pick or create a profile, and toggle the source step from "Connection" to "Composition". The dropdown lists every saved composition for your tenant. Field mappings on composed profiles reference source fields by their alias-prefixed names (territory.region) — the editor shows a hint so you remember.
A profile with a composition only needs a TARGET endpoint. You can't mix a composition with a single SOURCE endpoint on the same profile — it's one or the other.
Scale and Performance
| Build-side row count | Algorithm | Notes |
| --- | --- | --- |
| ≤ 250,000 | Monolithic in-memory hash join | Build side is the smaller of the two inputs |
| 250,000 – 2,000,000 | Partitioned (Grace-style) hash join | 16 buckets + 1 null bucket, joined bucket-by-bucket |
| > 2,000,000 | Hard-aborts with JOIN_MEMORY_EXCEEDED | Not supported in-process; push the join down or split the data |
The partitioner picks a bucket per row via a stable string hash of the normalized key tuple, so the same key always routes to the same bucket. Null-key rows go to a dedicated bucket so nullPolicy: 'match' still pairs them correctly.
The hard fanoutRowCap (default 1,000,000) is a separate safety net. If any step of the join is about to produce more composed rows than the cap, the run aborts with JOIN_FANOUT_EXCEEDED — that's almost always a sign of a bad join key or a missed cardinality assumption.
Intermediate-Snapshot Reuse
When a connector declares the content-hash-stable or incremental-extract capability, each source's records are persisted after extraction as an ephemeral SOURCE_INTERMEDIATE snapshot tagged with a SHA-256 payload fingerprint. Subsequent composition runs within a 15-minute window reuse the cached rows and skip the connector call entirely.
Concretely: a second preview click right after the first one completes instantly, and repeat sync runs during a short batch window avoid hammering the same API.
Two guardrails keep this safe:
- Capability gate. Connectors that can't guarantee stable content hashes (currently: CSV uploads) never reuse.
- 7-day absolute ceiling. No snapshot older than seven days is reused under any condition — stale reuse is how you ship last month's data to production.
Ephemeral snapshots are swept by purgeEphemeralSnapshots(24) — a retention helper meant to run from a daily cron. Ask your admin if it's wired up.
Fan-Out Review: the Soft HITL Gate
Some joins legitimately explode row counts — a one-to-many booking-to-invoice relationship, for example. Others do it by accident when the chosen key isn't actually unique. The composition carries a fan-out halt threshold (default 5.0) — if outputRecords / max(inputPerAlias) exceeds this, the run halts into a new status, PENDING_FANOUT_REVIEW, even when no material diffs are detected.
The halt is soft — the run can be approved just like any other pending review. But the pause forces a human to confirm that a 10,000-row source legitimately became a 100,000-row composed dataset before those rows head downstream.
Set fanoutHaltThreshold to 0 to disable the halt entirely and fall back to the hard fanoutRowCap. Raise it if your composition genuinely fans out more than 5× every run.
Contributing Sources on Diff Records
Every DiffRecord persisted from a composed run carries contributingSources: string[] — the composition aliases whose fields produced the change. On the review page this tells you, per row, whether a movement came from the territory side, the bookings side, or both. Legacy non-composed runs keep the empty array — no existing UI breaks.
Schema Drift Audit (PERMISSIVE Union)
When a PERMISSIVE union run produces a field set that differs from the one seen on the prior run, the composer persists CompositionSchemaEvent rows — one per added or removed field — and updates the composition's lastKnownFields. The first run just seeds the baseline without emitting events (otherwise you'd get a flood on day one).
This gives you a paper trail for "why did a new taxId column suddenly appear in our invoice union?" without you having to run git blame on the composition config. The events are tenant-scoped and surface alongside your audit log.
Common Pitfalls
- Key type mismatches. If your join returns zero matches, open Run preview and check the unmatched counts. 99% of zero-match joins are a silent string-vs-number or whitespace mismatch that the default
stringcoercion would have caught — you probably switched toexact. Switch back or add a coerce step. - Alias-prefixed field mappings. On a composed profile, source field names are
alias.field, notfield. The Field Mapping page reminds you but the autocomplete can't help until the first run has produced a composed snapshot. - Compositions on existing profiles. V1 only allows compositions on newly created profiles. Swapping an existing single-source profile to a composition would invalidate its field mappings because source field names carry prefixes now. Clone the profile instead.
- Adaptive Planning as a source. Adaptive is dual-role (source and target) and works as a composition source via virtual endpoints. Pigment is currently target-only — it won't appear in the source picker for compositions.
See Also
- Dataset Compositions: JOIN and UNION Across Sources — the conceptual introduction.
- Field Mapping & Transforms — how to author mappings against alias-prefixed source fields.
- Safety Gate Overview — how the pending-review queue handles composed-run diffs.