redb Implementation
The CRDT adapter stores collaborative document updates persistently using redb, enabling conflict-free replicated data types to survive server restarts while maintaining real-time synchronization capabilities.
Architecture Overview
The CRDT adapter bridges between the Yrs CRDT engine (in-memory) and persistent storage, storing binary update streams that can be replayed to reconstruct document state.
Client 1 ─┐
├─► Yrs CRDT ─► Binary Updates ─► redb Storage
Client 2 ─┘ │ │
│◄──────── Load on startup ────┘
└─► Broadcast ─► Subscribed ClientsKey Insight
CRDT systems work by accumulating operation updates rather than storing full document state. Each update is a binary-encoded operation (insert, delete, format, etc.) that can be:
- Applied to reconstruct current document state
- Sent to new subscribers for synchronization
- Replayed in any order (commutative property)
Storage Layout
The adapter uses a single redb table per database:
Updates Table (crdt_updates_v2)
Stores binary CRDT update blobs using structured binary keys for efficient range scanning.
Schema: binary_key → update_bytes
Key Format (34 bytes total):
[version: 1 byte] Protocol version (currently 1)
[doc_id: 24 bytes] Fixed-length document ID (zero-padded)
[type: 1 byte] Record type (0=update, 1=state_vector, 2=metadata)
[seq: 8 bytes BE] Sequence number in big-endian (for proper sorting)
Value: Binary CRDT update blob (from Yrs)Properties:
- Binary keys enable efficient range scans per document
- Big-endian sequence numbers ensure correct sort order in B-tree
- Fixed-length doc_id allows prefix-based document scanning
- Updates are append-only (immutable)
- Record type field reserves space for future state vector/metadata records
The adapter uses only this single table. Document metadata (ownership, permissions) is managed by the MetaAdapter, not the CRDT storage layer. Statistics (update count, byte size) are computed dynamically from the stored updates via the CrdtAdapter::stats() default implementation.
Multi-Tenancy Storage Modes
The adapter supports two storage strategies configured at initialization:
Per-Tenant Files Mode (per_tenant_files=true)
Each tenant gets a dedicated redb file:
storage/
├── tn_{tn_id}.db (e.g., tn_42.db for tenant 42)
├── tn_{tn_id}.db (one file per tenant)
└── ...Advantages:
- ✅ Complete isolation between tenants
- ✅ Independent backups per tenant
- ✅ Easier to delete/archive specific tenants
- ✅ Better fault isolation
Trade-offs:
- ⚠️ More file handles required
- ⚠️ Slightly higher disk overhead
Use case: Multi-tenant SaaS deployments where tenant isolation is critical
Single File Mode (per_tenant_files=false)
All tenants share one database:
storage/
└── crdt.db (All tenants)Advantages:
- ✅ Fewer file handles
- ✅ Simpler operational management
- ✅ Easier bulk operations
Trade-offs:
- ⚠️ No physical isolation between tenants
- ⚠️ Tenant deletion requires filtering
Use case: Single-user deployments or trusted environments
In-Memory Document Instances
The adapter caches document instances in memory to optimize performance and enable real-time subscriptions.
DocumentInstance Structure
struct DocumentInstance {
broadcaster: broadcast::Sender<CrdtChangeEvent>,
last_accessed: AtomicU64, // Timestamp for LRU eviction
update_count: AtomicU64, // Sequence counter (initialized from DB max seq)
}Each instance provides:
- Broadcast channel: Real-time notifications to subscribed clients
- Sequence counter: Monotonically increasing update numbers
- LRU tracking: Timestamp for idle document eviction
Caching Strategy
Cache population:
- Document accessed → Check cache (DashMap)
- If missing → Create instance with broadcast channel
- Store in cache with initial timestamp
LRU eviction (configurable):
max_instances: Maximum cached documents (default: 100)idle_timeout_secs: Evict after N seconds idle (default: 300s)auto_evict: Enable/disable automatic eviction (default: true)
Benefits:
- Avoid reopening redb transactions repeatedly
- Enable efficient pub/sub without polling
- Reduce memory usage for inactive documents
Core Operations
Storing Updates
When a client modifies a CRDT document:
1. Client sends binary update → Server
2. Adapter fetches/creates DocumentInstance
3. Sequence number assigned (atomic increment)
4. Update stored with binary key: updates[encode_update_key(doc_id, seq)] = update_bytes
5. Broadcast update to all subscribers
6. Return successAtomicity: Each update is stored in a redb write transaction, ensuring crash consistency.
Key generation:
fn encode_update_key(doc_id: &str, seq: u64) -> [u8; 34] {
// [version:1][doc_id:24][type:1][seq:8_BE]
let mut key = [0u8; 34];
key[0] = 1; // version
let doc_bytes = doc_id.as_bytes();
key[1..1 + doc_bytes.len().min(24)].copy_from_slice(&doc_bytes[..doc_bytes.len().min(24)]);
key[25] = 0; // record_type::UPDATE
key[26..].copy_from_slice(&seq.to_be_bytes());
key
}Loading Updates
When a client opens a document:
1. Range scan: updates.range(make_doc_range(doc_id))
(scans from [version, doc_id, UPDATE, 0] to [version, doc_id, UPDATE, u64::MAX])
2. Read binary blobs in sequence order (big-endian keys ensure correct order)
3. Return Vec<CrdtUpdate>
4. Client applies updates to Yrs document → reconstructs statePerformance: Prefix scans are efficient in redb’s B-tree structure.
Subscriptions
Clients can subscribe to document changes:
Without snapshot (new updates only):
let mut rx = instance.broadcaster.subscribe();
while let Ok(event) = rx.recv().await {
// Send update to client
}With snapshot (full sync):
// 1. Send all existing updates (from redb)
for update in get_updates(doc_id).await? {
yield update;
}
// 2. Then stream new updates (from broadcaster)
let mut rx = instance.broadcaster.subscribe();
while let Ok(event) = rx.recv().await {
yield event;
}Snapshot mode enables new clients to:
- Receive complete document history
- Reconstruct current state
- Continue receiving live updates
Deleting Documents
1. Begin write transaction
2. Range scan to find all updates: make_doc_range(doc_id)
3. Collect keys, then delete each one
4. Commit transaction
5. Remove from instance cacheNote: Compaction (merging updates) is performed automatically by the WebSocket layer when the last client disconnects from a document (see CRDT Overview).
Configuration
struct AdapterConfig {
max_instances: usize, // Default: 100
idle_timeout_secs: u64, // Default: 300 (5 minutes)
broadcast_capacity: usize, // Default: 1000 messages
auto_evict: bool, // Default: true
}Tuning guidance:
- High traffic: Increase
max_instancesandbroadcast_capacity - Memory constrained: Reduce
max_instances, loweridle_timeout_secs - Long-running docs: Disable
auto_evictor increase timeout
Update Compaction
The WebSocket layer automatically merges updates when the last client disconnects from a document:
- Wait 2-second grace period (in case a new client connects)
- Load all updates and apply to a temporary Y.Doc
- Encode current state as a single update
- If the merged update is smaller: replace all updates with the single merged one
- Sequence counter resets for the optimized document
Benefits:
- Faster document loading for subsequent connections
- Reduced storage usage
- Shorter sync times for new clients
Trade-off: Loses granular edit history
Comparison with RTDB Storage
| Aspect | CRDT (crdt-adapter-redb) | RTDB (rtdb-adapter-redb) |
|---|---|---|
| Data model | Binary operation stream | Structured JSON documents |
| Storage | Sequential update blobs | Hierarchical key-value |
| Queries | None (replays all updates) | Filters, sorting, pagination |
| Concurrency | Automatic conflict resolution | Last-write-wins |
| Use case | Collaborative text editing | Structured data with queries |
| Sync protocol | Full update stream | Partial updates via queries |
Performance Characteristics
Write performance:
- Insert update: O(log N) (B-tree insert)
- Broadcast: O(M) where M = subscriber count
- Typical: <1ms for small updates
Read performance:
- Load document: O(K log N) where K = update count
- Subscription: O(1) after initial load
- Typical: 50-200ms for 1000 updates
Memory usage:
- Per instance: ~1KB overhead
- Broadcast buffer:
broadcast_capacity × avg_update_size - Total: ~100KB for 100 cached docs
See Also
- CRDT Overview - Introduction to CRDTs and Yrs
- RTDB Implementation - Comparison with query-based storage
- System Architecture - Overall architecture context