Change Detection
How VersionForge's canonical row hashing detects adds, updates, and deletes at the row level.
Change Detection
The change detection engine is the core of VersionForge's sync intelligence. Instead of replacing entire datasets on every sync, the change detection engine identifies exactly which rows changed -- and only those rows flow through the rest of the pipeline.
Why Row-Level Diffing Matters
Most data sync tools use a full-replace strategy: pull everything from the source, delete everything in the target, and reload. This approach has serious problems for FP&A teams:
- No visibility into what changed -- you cannot review individual changes before they land
- Unnecessary load on target systems -- rewriting 500,000 rows when only 12 changed wastes API quota and time
- No audit trail -- if something goes wrong, you cannot identify which specific change caused it
VersionForge's row-level diff solves all three. You see every individual change, only changed rows are synced, and every change is logged with before/after values.
Canonical Row Hashing
At the heart of the change detection engine is canonical hashing. Here is how it works:
- Each row is serialized into a deterministic string representation. Fields are sorted alphabetically by key, values are normalized (trimmed whitespace, consistent date formats, consistent number precision), and the result is hashed using SHA-256.
- The hash is canonical -- the same row data always produces the same hash, regardless of field order in the source system or minor formatting differences.
- When a new snapshot arrives, VersionForge computes the hash for every row and compares it against the hashes from the previous snapshot.
Row data: { account: "6100", amount: 14500.00, period: "2026-03" }
Canonical: account=6100|amount=14500.00|period=2026-03
Hash: a3f2c8... (SHA-256)
Canonical hashing means that cosmetic differences -- like trailing spaces, inconsistent date separators, or reordered JSON keys -- do not trigger false-positive changes.
Three Change Types
The change detection engine classifies every difference into one of three types:
| Type | Meaning | How Detected | |---|---|---| | ADD | A row exists in the new snapshot but not in the previous one | Row key present in new, absent in old | | UPDATE | A row exists in both snapshots but its hash changed | Same row key, different hash | | DELETE | A row exists in the previous snapshot but not the new one | Row key present in old, absent in new |
Each change carries the full before and after state of the row, not just the hash. This means the Safety Gate can show field-level detail -- exactly which column changed and what the old and new values are.
Row Keys
The change detection engine needs a way to identify "the same row" across snapshots. Each connector defines a row key -- a combination of one or more fields that uniquely identify a row. For example:
- Workday employees:
employee_id - NetSuite GL:
account_id+period+subsidiary - Stripe invoices:
invoice_id
If your row key is not truly unique, the change detection engine may produce incorrect results. During connector setup, VersionForge validates key uniqueness and warns you if duplicates are detected.
How Staging Feeds Change Detection
The change detection engine operates on staged snapshots, not live source data. Each sync produces a new snapshot in the staging area. The diff compares snapshot N against snapshot N-1. This design means:
- Source systems are only queried once per sync (during Extract)
- Diffs are fast because they operate on local data
- You can re-run a diff without re-extracting from the source
For more detail on snapshot storage, see Staging & Snapshots.