Data Synchronization
How files, profiles, and databases are synchronized across federated instances.
File Synchronization
Attachment Fetching
When receiving an action with file attachments:
Algorithm: Sync Attachments
Input: attachment_ids from remote action
Output: Result<()>
For each attachment_id:
1. Check if already exists locally
- If exists: Skip to next (already synced)
2. Construct remote URL:
https://cl-o.{issuer_id_tag}/api/file/{attachment_id}
3. Download file from remote instance
4. Verify content integrity:
- Compute SHA256 hash of downloaded data
- Compare hash with attachment_id
- If mismatch: Return FileIntegrityCheckFailed error
5. Store file data in blob adapter
6. Extract and store metadata:
- Read X-File-Metadata header (if present)
- Parse as JSON
- Store in metadata adapter
7. Continue to next attachment
This ensures:
- Content-addressed files (hash = ID)
- No duplicate downloads
- Cryptographic integrity verificationLazy Loading
Files are fetched on-demand rather than proactively:
User views post with image attachment
↓
Check if image exists locally
↓
If not, fetch from remote instance
↓
Verify content hash
↓
Store locally
↓
Serve to userProfile Synchronization
Remote Profile Caching
Cache remote profiles locally for performance:
Algorithm: Sync Profile with Caching
Input: id_tag (remote user identifier)
Output: Result<Profile>
1. Check local cache for profile:
- If cached AND cache_age < 24 hours: Return cached profile
- If cache_age >= 24 hours: Continue to step 2
2. Fetch profile from remote instance:
- GET https://cl-o.{id_tag}/api/me
3. Update local cache:
- Store profile with current timestamp
4. Return profile
Benefits:
- Reduces network requests (24h TTL)
- Improves performance for repeated access
- Staleness acceptable for user profilesProfile Updates
Profiles don’t push updates; instances pull when needed:
Need to display Alice's profile
↓
Check cache (last updated < 24h?)
↓
If fresh: use cache
If stale: fetch from Alice's instance
↓
Update cache
↓
Display profileDatabase Federation
Read-Only Replication
Subscribe to remote database updates:
FederatedDatabase Structure:
origin_instance: Source instance domain (e.g., alice.example.com)local_replica: Whether to maintain local copy for fast accesssync_mode: Synchronization mode (see below)
SyncMode Enum:
ReadOnly: Subscribe to updates from remote, no local editsReadWrite: Bidirectional synchronizationPeriodic(Duration): Full sync every N seconds (fallback for network issues)
Sync Protocol
Using action/inbox mechanism:
DatabaseSyncAction Structure:
db_file_id: SHA256 identifier of database fileupdates: Binary update payload (Yrs CRDT or redb operations)state_vector: Current state hash for conflict detectiontimestamp: Unix timestamp of update creation
Database Update Distribution Algorithm:
For each subscriber instance:
-
Create DatabaseSyncAction with:
- Database file ID
- Binary updates (from CRDT or redb)
- Computed state vector
- Current timestamp
-
POST to subscriber’s inbox:
- Endpoint: https://cl-o.{subscriber_id_tag}/api/inbox
- Send DatabaseSyncAction as JSON
-
Subscriber’s ActionVerifierTask processes:
- Extracts binary updates
- Applies to local replica
- Merges with any local changes
This pattern allows:
- Real-time database synchronization
- Conflict resolution via CRDTs
- Federation of collaborative databases
See Also
- ProxyToken Authentication - Cross-instance authentication
- Key Verification - Signature verification
- Relationships - Connection management