Knowledge BaseSync PipelineCompositions: Setup, Scale, and Review
Updated 2026-04-17

Compositions: Setup, Scale, and Review

Step-by-step build of a composition, how match keys and coercion work, scale thresholds, the fan-out halt gate, intermediate reuse, and schema-drift audit.

Building a Composition

The editor lives at Configure → Compositions → New Composition. It's a four-step stepper, not a freeform node graph — finance analysts have consistently told us that graphs feel like something they need an engineer to run.

Step 1 — Name and Output Object

Pick a human-readable name ("Territory × Bookings") and an output object type — a short machine identifier (e.g. territory_bookings) that downstream field mapping will see. Think of the output object type as the table name the composed dataset will be known by.

Step 2 — Pick the Sources

Each source is a pair:

  • Connection. Either an existing SOURCE endpoint on a sync profile you've already built, or a credential for a source-capable connector that isn't wired into any profile yet. The latter are "virtual endpoints" — the composition editor surfaces them so you don't have to build a throwaway profile just to use Adaptive Planning as a source.
  • Alias. A short, machine-friendly name — territory, bookings, ns, stripe. This is the prefix that will show up on every column in the output (territory.region, bookings.amount). Pick something you won't regret reading a hundred times in downstream field mappings.

Sources are ordinal. Source #1 is the "left" side for JOIN semantics and the first precedent for UNION dedup. You can add more than two sources for N-way joins — the composer reduces them left-to-right, using the same join type at every step.

Step 3 — Combine Mode

Toggle between JOIN and UNION. For JOIN, pick a match type:

  • INNER. Only rows that match on both sides.
  • LEFT. Keep every row from the left side; unmatched left rows get null values for the right-side columns.
  • RIGHT. Mirror of LEFT.
  • FULL. Keep everything, matched or not.

For UNION, pick a schema reconciliation mode and a dedup mode:

  • PERMISSIVE (default). Sources with different field sets are merged by taking the union of columns; missing fields get null. You'll see a warning in the issues list for every missing field.
  • STRICT. Every source must have the exact same field set, or the run fails fast. Reserve this for compliance flows.
  • Dedup. NONE (UNION ALL), CONTENT_HASH (drop identical rows; effectively free because every RawRecord already carries a canonical hash), PRIMARY_KEY (dedup by the shared key defined in Step 4; last-write wins by sourceUpdatedAt).

Step 4 — Match Keys (JOIN only)

For each key pair, pick the field on each side that should be compared. Values are trimmed and lower-cased by default, so "T-001" matches "t-001" and the string "1" matches the number 1 without you having to do anything. Three coercion modes per key:

  • string (default). Cast + trim + lower-case. Handles 80% of cross-connector type mismatches.
  • numeric. Coerce to number; reject non-numeric values.
  • exact. Preserve type distinction. "1" will not match 1.

Composite keys are two or more key pairs sharing a group ID in the underlying data model. In the editor, "Add match key" creates another pair — all pairs combined define the equality predicate.

Save and Preview

After saving, Run preview executes the composition against live sources and returns the projected output row count + unmatched counts per side. This is your sanity check — if the projected row count is off by 10× from what you expected, you've probably picked the wrong match key.

Previews are rate-limited to ten per tenant per five minutes. They run with previewMode: true, which means they don't seed the intermediate cache, don't emit schema-drift audit events, and don't trigger the fan-out halt flag.

Attaching to a Sync Profile

Go to Configure → Field Mapping, pick or create a profile, and toggle the source step from "Connection" to "Composition". The dropdown lists every saved composition for your tenant. Field mappings on composed profiles reference source fields by their alias-prefixed names (territory.region) — the editor shows a hint so you remember.

A profile with a composition only needs a TARGET endpoint. You can't mix a composition with a single SOURCE endpoint on the same profile — it's one or the other.

Scale and Performance

| Build-side row count | Algorithm | Notes | | --- | --- | --- | | ≤ 250,000 | Monolithic in-memory hash join | Build side is the smaller of the two inputs | | 250,000 – 2,000,000 | Partitioned (Grace-style) hash join | 16 buckets + 1 null bucket, joined bucket-by-bucket | | > 2,000,000 | Hard-aborts with JOIN_MEMORY_EXCEEDED | Not supported in-process; push the join down or split the data |

The partitioner picks a bucket per row via a stable string hash of the normalized key tuple, so the same key always routes to the same bucket. Null-key rows go to a dedicated bucket so nullPolicy: 'match' still pairs them correctly.

The hard fanoutRowCap (default 1,000,000) is a separate safety net. If any step of the join is about to produce more composed rows than the cap, the run aborts with JOIN_FANOUT_EXCEEDED — that's almost always a sign of a bad join key or a missed cardinality assumption.

Intermediate-Snapshot Reuse

When a connector declares the content-hash-stable or incremental-extract capability, each source's records are persisted after extraction as an ephemeral SOURCE_INTERMEDIATE snapshot tagged with a SHA-256 payload fingerprint. Subsequent composition runs within a 15-minute window reuse the cached rows and skip the connector call entirely.

Concretely: a second preview click right after the first one completes instantly, and repeat sync runs during a short batch window avoid hammering the same API.

Two guardrails keep this safe:

  • Capability gate. Connectors that can't guarantee stable content hashes (currently: CSV uploads) never reuse.
  • 7-day absolute ceiling. No snapshot older than seven days is reused under any condition — stale reuse is how you ship last month's data to production.

Ephemeral snapshots are swept by purgeEphemeralSnapshots(24) — a retention helper meant to run from a daily cron. Ask your admin if it's wired up.

Fan-Out Review: the Soft HITL Gate

Some joins legitimately explode row counts — a one-to-many booking-to-invoice relationship, for example. Others do it by accident when the chosen key isn't actually unique. The composition carries a fan-out halt threshold (default 5.0) — if outputRecords / max(inputPerAlias) exceeds this, the run halts into a new status, PENDING_FANOUT_REVIEW, even when no material diffs are detected.

The halt is soft — the run can be approved just like any other pending review. But the pause forces a human to confirm that a 10,000-row source legitimately became a 100,000-row composed dataset before those rows head downstream.

Set fanoutHaltThreshold to 0 to disable the halt entirely and fall back to the hard fanoutRowCap. Raise it if your composition genuinely fans out more than 5× every run.

Contributing Sources on Diff Records

Every DiffRecord persisted from a composed run carries contributingSources: string[] — the composition aliases whose fields produced the change. On the review page this tells you, per row, whether a movement came from the territory side, the bookings side, or both. Legacy non-composed runs keep the empty array — no existing UI breaks.

Schema Drift Audit (PERMISSIVE Union)

When a PERMISSIVE union run produces a field set that differs from the one seen on the prior run, the composer persists CompositionSchemaEvent rows — one per added or removed field — and updates the composition's lastKnownFields. The first run just seeds the baseline without emitting events (otherwise you'd get a flood on day one).

This gives you a paper trail for "why did a new taxId column suddenly appear in our invoice union?" without you having to run git blame on the composition config. The events are tenant-scoped and surface alongside your audit log.

Common Pitfalls

  • Key type mismatches. If your join returns zero matches, open Run preview and check the unmatched counts. 99% of zero-match joins are a silent string-vs-number or whitespace mismatch that the default string coercion would have caught — you probably switched to exact. Switch back or add a coerce step.
  • Alias-prefixed field mappings. On a composed profile, source field names are alias.field, not field. The Field Mapping page reminds you but the autocomplete can't help until the first run has produced a composed snapshot.
  • Compositions on existing profiles. V1 only allows compositions on newly created profiles. Swapping an existing single-source profile to a composition would invalidate its field mappings because source field names carry prefixes now. Clone the profile instead.
  • Adaptive Planning as a source. Adaptive is dual-role (source and target) and works as a composition source via virtual endpoints. Pigment is currently target-only — it won't appear in the source picker for compositions.

See Also

Built by Vantage Advisory

VersionForge is built by the team at Vantage Advisory Group — consultants who have spent years implementing Workday, NetSuite, Stripe, Salesforce, Adaptive, and Pigment integrations for finance, RevOps, and workforce-planning teams. We built the product we kept wishing existed.

See It Running on Your Own Data in 30 Minutes

Book a walkthrough with the founding team. Bring your messiest data pipeline — GL close, MRR reconciliation, or headcount plan. We'll show you how VersionForge handles it.