redb Implementation
The CRDT adapter stores collaborative document updates persistently using redb, enabling conflict-free replicated data types to survive server restarts while maintaining real-time synchronization capabilities.
Architecture Overview
The CRDT adapter bridges between the Yrs CRDT engine (in-memory) and persistent storage, storing binary update streams that can be replayed to reconstruct document state.
Client 1 ─┐
├─► Yrs CRDT ─► Binary Updates ─► redb Storage
Client 2 ─┘ │ │
│◄──────── Load on startup ────┘
└─► Broadcast ─► Subscribed ClientsKey Insight
CRDT systems work by accumulating operation updates rather than storing full document state. Each update is a binary-encoded operation (insert, delete, format, etc.) that can be:
- Applied to reconstruct current document state
- Sent to new subscribers for synchronization
- Replayed in any order (commutative property)
Storage Layout
The adapter uses three redb tables per database:
1. Updates Table (crdt_updates)
Stores binary CRDT update blobs indexed by document and sequence number.
Schema: (doc_id:seq) → update_bytes
Key Format: "{doc_id}:{sequence}"
Value: Binary CRDT update blob (from Yrs)
Example Keys:
"doc_abc123:0" First update
"doc_abc123:1" Second update
"doc_abc123:2" Third updateProperties:
- Sequential numbering ensures order preservation
- Each update is a self-contained binary blob
- Updates are append-only (immutable)
- Prefix scan retrieves all updates for a document
2. Metadata Table (crdt_metadata)
Stores document metadata as JSON.
Schema: doc_id → metadata_json
{
"created_at": 1738483200,
"updated_at": 1738486800,
"owner": "alice.example.com",
"permissions": {...},
"title": "Collaborative Document"
}Purpose:
- Track document ownership and permissions
- Store human-readable metadata
- Enable document discovery and listing
3. Stats Table (crdt_stats)
Tracks update counts and storage metrics (currently defined but not actively used in core operations).
Schema: doc_id → stats_json
Potential metrics:
- Total update count
- Total bytes stored
- Last compaction timestamp
Multi-Tenancy Storage Modes
The adapter supports two storage strategies configured at initialization:
Per-Tenant Files Mode (per_tenant_files=true)
Each tenant gets a dedicated redb file:
storage/
├── tn_1.db (Tenant 1 documents)
├── tn_2.db (Tenant 2 documents)
└── tn_3.db (Tenant 3 documents)Advantages:
- ✅ Complete isolation between tenants
- ✅ Independent backups per tenant
- ✅ Easier to delete/archive specific tenants
- ✅ Better fault isolation
Trade-offs:
- ⚠️ More file handles required
- ⚠️ Slightly higher disk overhead
Use case: Multi-tenant SaaS deployments where tenant isolation is critical
Single File Mode (per_tenant_files=false)
All tenants share one database:
storage/
└── crdt.db (All tenants)Advantages:
- ✅ Fewer file handles
- ✅ Simpler operational management
- ✅ Easier bulk operations
Trade-offs:
- ⚠️ No physical isolation between tenants
- ⚠️ Tenant deletion requires filtering
Use case: Single-user deployments or trusted environments
In-Memory Document Instances
The adapter caches document instances in memory to optimize performance and enable real-time subscriptions.
DocumentInstance Structure
struct DocumentInstance {
broadcaster: tokio::sync::broadcast::Sender<CrdtChangeEvent>,
last_accessed: AtomicU64, // For LRU eviction
update_count: AtomicU64, // Sequence counter
}Each instance provides:
- Broadcast channel: Real-time notifications to subscribed clients
- Sequence counter: Monotonically increasing update numbers
- LRU tracking: Timestamp for idle document eviction
Caching Strategy
Cache population:
- Document accessed → Check cache (DashMap)
- If missing → Create instance with broadcast channel
- Store in cache with initial timestamp
LRU eviction (configurable):
max_instances: Maximum cached documents (default: 100)idle_timeout_secs: Evict after N seconds idle (default: 300s)auto_evict: Enable/disable automatic eviction (default: true)
Benefits:
- Avoid reopening redb transactions repeatedly
- Enable efficient pub/sub without polling
- Reduce memory usage for inactive documents
Core Operations
Storing Updates
When a client modifies a CRDT document:
1. Client sends binary update → Server
2. Adapter fetches/creates DocumentInstance
3. Sequence number assigned (atomic increment)
4. Update stored: updates["{doc_id}:{seq}"] = update_bytes
5. Broadcast update to all subscribers
6. Return successAtomicity: Each update is stored in a redb write transaction, ensuring crash consistency.
Key generation:
fn make_update_key(doc_id: &str, seq: u64) -> String {
format!("{}:{}", doc_id, seq)
}Loading Updates
When a client opens a document:
1. Prefix scan: updates.range("{doc_id}:"..)
2. Collect all matching keys (while prefix matches)
3. Read binary blobs in sequence order
4. Return Vec<CrdtUpdate>
5. Client applies updates to Yrs document → reconstructs statePerformance: Prefix scans are efficient in redb’s B-tree structure.
Subscriptions
Clients can subscribe to document changes:
Without snapshot (new updates only):
let mut rx = instance.broadcaster.subscribe();
while let Ok(event) = rx.recv().await {
// Send update to client
}With snapshot (full sync):
// 1. Send all existing updates (from redb)
for update in get_updates(doc_id).await? {
yield update;
}
// 2. Then stream new updates (from broadcaster)
let mut rx = instance.broadcaster.subscribe();
while let Ok(event) = rx.recv().await {
yield event;
}Snapshot mode enables new clients to:
- Receive complete document history
- Reconstruct current state
- Continue receiving live updates
Deleting Documents
1. Begin write transaction
2. Prefix scan to find all updates: "{doc_id}:"
3. Delete each update key
4. Delete metadata entry
5. Commit transaction
6. Remove from instance cacheNote: Currently no compaction strategy to reduce update count.
Configuration
struct AdapterConfig {
max_instances: usize, // Default: 100
idle_timeout_secs: u64, // Default: 300 (5 minutes)
broadcast_capacity: usize, // Default: 1000 messages
auto_evict: bool, // Default: true
}Tuning guidance:
- High traffic: Increase
max_instancesandbroadcast_capacity - Memory constrained: Reduce
max_instances, loweridle_timeout_secs - Long-running docs: Disable
auto_evictor increase timeout
Update Compaction (Future Enhancement)
Currently, updates accumulate indefinitely. Potential future optimization:
Compaction strategy:
- When update count exceeds threshold (e.g., 1000)
- Load all updates and apply to Yrs document
- Encode current state as single snapshot update
- Replace all updates with snapshot
- Reset sequence counter
Benefits:
- Faster document loading
- Reduced storage usage
- Shorter sync times for new clients
Trade-off: Loses granular edit history
Comparison with RTDB Storage
| Aspect | CRDT (crdt-adapter-redb) | RTDB (rtdb-adapter-redb) |
|---|---|---|
| Data model | Binary operation stream | Structured JSON documents |
| Storage | Sequential update blobs | Hierarchical key-value |
| Queries | None (replays all updates) | Filters, sorting, pagination |
| Concurrency | Automatic conflict resolution | Last-write-wins |
| Use case | Collaborative text editing | Structured data with queries |
| Sync protocol | Full update stream | Partial updates via queries |
Performance Characteristics
Write performance:
- Insert update: O(log N) (B-tree insert)
- Broadcast: O(M) where M = subscriber count
- Typical: <1ms for small updates
Read performance:
- Load document: O(K log N) where K = update count
- Subscription: O(1) after initial load
- Typical: 50-200ms for 1000 updates
Memory usage:
- Per instance: ~1KB overhead
- Broadcast buffer:
broadcast_capacity × avg_update_size - Total: ~100KB for 100 cached docs
See Also
- CRDT Overview - Introduction to CRDTs and Yrs
- RTDB Implementation - Comparison with query-based storage
- System Architecture - Overall architecture context