Sync Engine
The QuperSync sync engine orchestrates data movement from Amazon Redshift to PostgreSQL in three phases: Extract, Transform, and Load. The engine is designed to be idempotent — re-running a sync after a failure produces the same result without duplicating data.
Three-Phase Architecture
Phase 1 — Extract
Data is extracted from Redshift using the redshift-connector Python driver (ODBC-based). The extraction query is configurable per sync job:
- Full extraction:
SELECT * FROM source_table— fetches all rows. Used for initial loads or tables where incremental sync is not supported. - Incremental extraction: Adds a WHERE clause filtering by a watermark column (typically
updated_atorcreated_at):WHERE updated_at > last_sync_time. Only new or modified rows are fetched.
Results are fetched in batches using cursor.fetchmany(batch_size) rather than loading all rows into memory at once. This bounds memory usage regardless of table size.
Phase 2 — Transform
Each row dict from Redshift is mapped to a domain Entity object. The transformation layer handles:
- Type mapping: Redshift types are converted to Python native types (VARCHAR → str, NUMERIC → float, TIMESTAMP → datetime, etc.)
- NULL coercion: NULL values from Redshift are preserved as Python
Noneand propagated to PostgreSQL as SQL NULL - Business rules: Entity constructors validate required fields and apply any business-specific transformations (e.g., currency normalization, date truncation)
Phase 3 — Load
Entities are written to PostgreSQL using an upsert pattern:
This is idempotent: running the same load twice produces one row in the target table, not two. If the row already exists, it is updated with the latest values.
Performance Configuration
| Parameter | Default | Description |
|---|---|---|
| batch_size | 1,000 rows | Number of rows fetched from Redshift and inserted to PostgreSQL per batch |
| commit_per_batch | true | Each batch is committed as its own transaction. Failed batches roll back without affecting previously committed batches. |
| max_retries | 3 | Number of retry attempts for a failed batch before marking the sync job as failed |
Error Handling
Errors during a batch do not abort the entire sync. The engine:
- Rolls back the failed batch transaction
- Logs the error with the batch offset
- Retries up to
max_retriestimes with exponential backoff - If all retries fail, records the failure in the
qupersync_statetable and advances to the next batch - The watermark is only updated after a fully successful sync — a partial failure will re-process failed rows on the next scheduled run
Idempotency Guarantee
ON CONFLICT DO UPDATE, re-running a sync after a partial failure is always safe. Previously-synced rows will be updated (no-op if unchanged), and only the failed rows will be newly inserted. There is no risk of duplicate data in the target PostgreSQL tables.