Data Synchronization

How files, profiles, and databases are synchronized across federated instances.

File Synchronization

Attachment Fetching

When receiving an action with file attachments:

Algorithm: Sync Attachments

Input: attachment_ids from remote action
Output: Result<()>

For each attachment_id:
1. Check if already exists locally
   - If exists: Skip to next (already synced)

2. Construct remote URL:
   https://cl-o.{issuer_id_tag}/api/file/{attachment_id}

3. Download file from remote instance

4. Verify content integrity:
   - Compute SHA256 hash of downloaded data
   - Compare hash with attachment_id
   - If mismatch: Return FileIntegrityCheckFailed error

5. Store file data in blob adapter

6. Extract and store metadata:
   - Read X-File-Metadata header (if present)
   - Parse as JSON
   - Store in metadata adapter

7. Continue to next attachment

This ensures:
- Content-addressed files (hash = ID)
- No duplicate downloads
- Cryptographic integrity verification

Lazy Loading

Files are fetched on-demand rather than proactively:

User views post with image attachment
  ↓
Check if image exists locally
  ↓
If not, fetch from remote instance
  ↓
Verify content hash
  ↓
Store locally
  ↓
Serve to user

Profile Synchronization

Remote Profile Caching

Cache remote profiles locally for performance:

Algorithm: Sync Profile with Caching

Input: id_tag (remote user identifier)
Output: Result<Profile>

1. Check local cache for profile:
   - If cached AND cache_age < 24 hours: Return cached profile
   - If cache_age >= 24 hours: Continue to step 2

2. Fetch profile from remote instance:
   - GET https://cl-o.{id_tag}/api/me

3. Update local cache:
   - Store profile with current timestamp

4. Return profile

Benefits:
- Reduces network requests (24h TTL)
- Improves performance for repeated access
- Staleness acceptable for user profiles

Profile Updates

Profiles don’t push updates; instances pull when needed:

Need to display Alice's profile
  ↓
Check cache (last updated < 24h?)
  ↓
If fresh: use cache
If stale: fetch from Alice's instance
  ↓
Update cache
  ↓
Display profile

Database Federation

Read-Only Replication

Subscribe to remote database updates:

FederatedDatabase Structure:

  • origin_instance: Source instance domain (e.g., alice.example.com)
  • local_replica: Whether to maintain local copy for fast access
  • sync_mode: Synchronization mode (see below)

SyncMode Enum:

  • ReadOnly: Subscribe to updates from remote, no local edits
  • ReadWrite: Bidirectional synchronization
  • Periodic(Duration): Full sync every N seconds (fallback for network issues)

Sync Protocol

Using action/inbox mechanism:

DatabaseSyncAction Structure:

  • db_file_id: SHA256 identifier of database file
  • updates: Binary update payload (Yrs CRDT or redb operations)
  • state_vector: Current state hash for conflict detection
  • timestamp: Unix timestamp of update creation

Database Update Distribution Algorithm:

For each subscriber instance:

  1. Create DatabaseSyncAction with:

    • Database file ID
    • Binary updates (from CRDT or redb)
    • Computed state vector
    • Current timestamp
  2. POST to subscriber’s inbox:

    • Endpoint: https://cl-o.{subscriber_id_tag}/api/inbox
    • Send DatabaseSyncAction as JSON
  3. Subscriber’s ActionVerifierTask processes:

    • Extracts binary updates
    • Applies to local replica
    • Merges with any local changes

This pattern allows:

  • Real-time database synchronization
  • Conflict resolution via CRDTs
  • Federation of collaborative databases

See Also