Sync Pipeline Overview
How data moves through VersionForge's six-stage sync pipeline from extraction to load.
Sync Pipeline Overview
Every sync in VersionForge follows the same six-stage pipeline. Understanding these stages helps you predict timing, debug failures, and configure transforms with confidence.
The Six Stages
Data flows through the pipeline in strict order. If any stage fails, the pipeline halts and no partial data reaches your target system.
Extract
VersionForge connects to your source system (Workday, NetSuite, Stripe, etc.) and pulls the relevant dataset. Each connector handles authentication, pagination, and rate limiting automatically. The extract produces a raw snapshot of source data.
Stage
The raw extract is written to the staging area as a chunked snapshot. Large datasets are split into manageable chunks for efficient storage and comparison. VersionForge retains previous snapshots so the change detection engine has a baseline to compare against.
Diff
The change detection engine compares the new snapshot against the previous one using canonical row hashing. Every row is hashed deterministically, and the engine classifies each difference as an ADD, UPDATE, or DELETE. Only changed rows move forward -- unchanged data is ignored entirely.
Transform
Changed rows pass through your configured field mappings and transform rules. This is where source fields are renamed, concatenated, split, or converted to match your target system's schema. Lookups and default values are applied here.
Review (Safety Gate)
Transformed changes land in the Safety Gate review queue. Every add, update, and delete is presented with field-level detail for human review. Nothing syncs to your target until changes are explicitly approved -- either manually or through auto-approve rules.
Load
Approved changes are pushed to your target system (Adaptive Planning, Pigment, etc.) using the target connector's native API. VersionForge handles batching, retries, and confirmation. Each loaded batch is logged for audit purposes.
What Triggers a Sync
You can start a sync in three ways:
- Manual trigger -- Click Run Sync on the pipeline dashboard. Useful for ad-hoc refreshes and testing.
- Scheduled (cron) -- Configure a recurring schedule (e.g., every weekday at 6:00 AM UTC). See Scheduling & Triggers for setup details.
- API-triggered -- Send a
POSTrequest to/api/v1/sync/triggerto start a sync programmatically. This integrates with CI/CD pipelines, webhooks, and external orchestrators.
Typical Sync Timing
| Dataset Size | Extract | Diff | Transform + Review | Load | Total | |---|---|---|---|---|---| | 1,000 rows | ~5s | under 1s | Depends on review | ~3s | ~10s + review | | 50,000 rows | ~30s | ~5s | Depends on review | ~45s | ~90s + review | | 500,000 rows | ~5min | ~30s | Depends on review | ~8min | ~15min + review |
The Review stage is the only stage that depends on human action. If you configure auto-approve rules for routine changes, many syncs complete end-to-end without manual intervention.
Pipeline Visibility
Every sync run produces a pipeline log visible on the dashboard. You can see the status of each stage, row counts at every step, and detailed error messages if a stage fails. Pipeline logs are retained for 90 days by default.
If a sync fails mid-pipeline, no data reaches your target system. VersionForge treats partial loads as a data integrity risk and rolls back automatically.