redb Implementation

The CRDT adapter stores collaborative document updates persistently using redb, enabling conflict-free replicated data types to survive server restarts while maintaining real-time synchronization capabilities.

Architecture Overview

The CRDT adapter bridges between the Yrs CRDT engine (in-memory) and persistent storage, storing binary update streams that can be replayed to reconstruct document state.

Client 1 ─┐
          ├─► Yrs CRDT ─► Binary Updates ─► redb Storage
Client 2 ─┘      │                              │
                 │◄──────── Load on startup ────┘
                 └─► Broadcast ─► Subscribed Clients

Key Insight

CRDT systems work by accumulating operation updates rather than storing full document state. Each update is a binary-encoded operation (insert, delete, format, etc.) that can be:

  • Applied to reconstruct current document state
  • Sent to new subscribers for synchronization
  • Replayed in any order (commutative property)

Storage Layout

The adapter uses a single redb table per database:

Updates Table (crdt_updates_v2)

Stores binary CRDT update blobs using structured binary keys for efficient range scanning.

Schema: binary_key → update_bytes

Key Format (34 bytes total):
  [version: 1 byte]       Protocol version (currently 1)
  [doc_id: 24 bytes]      Fixed-length document ID (zero-padded)
  [type: 1 byte]          Record type (0=update, 1=state_vector, 2=metadata)
  [seq: 8 bytes BE]       Sequence number in big-endian (for proper sorting)

Value: Binary CRDT update blob (from Yrs)

Properties:

  • Binary keys enable efficient range scans per document
  • Big-endian sequence numbers ensure correct sort order in B-tree
  • Fixed-length doc_id allows prefix-based document scanning
  • Updates are append-only (immutable)
  • Record type field reserves space for future state vector/metadata records

The adapter uses only this single table. Document metadata (ownership, permissions) is managed by the MetaAdapter, not the CRDT storage layer. Statistics (update count, byte size) are computed dynamically from the stored updates via the CrdtAdapter::stats() default implementation.

Multi-Tenancy Storage Modes

The adapter supports two storage strategies configured at initialization:

Per-Tenant Files Mode (per_tenant_files=true)

Each tenant gets a dedicated redb file:

storage/
├── tn_{tn_id}.db      (e.g., tn_42.db for tenant 42)
├── tn_{tn_id}.db      (one file per tenant)
└── ...

Advantages:

  • ✅ Complete isolation between tenants
  • ✅ Independent backups per tenant
  • ✅ Easier to delete/archive specific tenants
  • ✅ Better fault isolation

Trade-offs:

  • ⚠️ More file handles required
  • ⚠️ Slightly higher disk overhead

Use case: Multi-tenant SaaS deployments where tenant isolation is critical

Single File Mode (per_tenant_files=false)

All tenants share one database:

storage/
└── crdt.db      (All tenants)

Advantages:

  • ✅ Fewer file handles
  • ✅ Simpler operational management
  • ✅ Easier bulk operations

Trade-offs:

  • ⚠️ No physical isolation between tenants
  • ⚠️ Tenant deletion requires filtering

Use case: Single-user deployments or trusted environments

In-Memory Document Instances

The adapter caches document instances in memory to optimize performance and enable real-time subscriptions.

DocumentInstance Structure

struct DocumentInstance {
    broadcaster: broadcast::Sender<CrdtChangeEvent>,
    last_accessed: AtomicU64,   // Timestamp for LRU eviction
    update_count: AtomicU64,    // Sequence counter (initialized from DB max seq)
}

Each instance provides:

  • Broadcast channel: Real-time notifications to subscribed clients
  • Sequence counter: Monotonically increasing update numbers
  • LRU tracking: Timestamp for idle document eviction

Caching Strategy

Cache population:

  1. Document accessed → Check cache (DashMap)
  2. If missing → Create instance with broadcast channel
  3. Store in cache with initial timestamp

LRU eviction (configurable):

  • max_instances: Maximum cached documents (default: 100)
  • idle_timeout_secs: Evict after N seconds idle (default: 300s)
  • auto_evict: Enable/disable automatic eviction (default: true)

Benefits:

  • Avoid reopening redb transactions repeatedly
  • Enable efficient pub/sub without polling
  • Reduce memory usage for inactive documents

Core Operations

Storing Updates

When a client modifies a CRDT document:

1. Client sends binary update → Server
2. Adapter fetches/creates DocumentInstance
3. Sequence number assigned (atomic increment)
4. Update stored with binary key: updates[encode_update_key(doc_id, seq)] = update_bytes
5. Broadcast update to all subscribers
6. Return success

Atomicity: Each update is stored in a redb write transaction, ensuring crash consistency.

Key generation:

fn encode_update_key(doc_id: &str, seq: u64) -> [u8; 34] {
    // [version:1][doc_id:24][type:1][seq:8_BE]
    let mut key = [0u8; 34];
    key[0] = 1; // version
    let doc_bytes = doc_id.as_bytes();
    key[1..1 + doc_bytes.len().min(24)].copy_from_slice(&doc_bytes[..doc_bytes.len().min(24)]);
    key[25] = 0; // record_type::UPDATE
    key[26..].copy_from_slice(&seq.to_be_bytes());
    key
}

Loading Updates

When a client opens a document:

1. Range scan: updates.range(make_doc_range(doc_id))
   (scans from [version, doc_id, UPDATE, 0] to [version, doc_id, UPDATE, u64::MAX])
2. Read binary blobs in sequence order (big-endian keys ensure correct order)
3. Return Vec<CrdtUpdate>
4. Client applies updates to Yrs document → reconstructs state

Performance: Prefix scans are efficient in redb’s B-tree structure.

Subscriptions

Clients can subscribe to document changes:

Without snapshot (new updates only):

let mut rx = instance.broadcaster.subscribe();
while let Ok(event) = rx.recv().await {
    // Send update to client
}

With snapshot (full sync):

// 1. Send all existing updates (from redb)
for update in get_updates(doc_id).await? {
    yield update;
}

// 2. Then stream new updates (from broadcaster)
let mut rx = instance.broadcaster.subscribe();
while let Ok(event) = rx.recv().await {
    yield event;
}

Snapshot mode enables new clients to:

  1. Receive complete document history
  2. Reconstruct current state
  3. Continue receiving live updates

Deleting Documents

1. Begin write transaction
2. Range scan to find all updates: make_doc_range(doc_id)
3. Collect keys, then delete each one
4. Commit transaction
5. Remove from instance cache

Note: Compaction (merging updates) is performed automatically by the WebSocket layer when the last client disconnects from a document (see CRDT Overview).

Configuration

struct AdapterConfig {
    max_instances: usize,        // Default: 100
    idle_timeout_secs: u64,      // Default: 300 (5 minutes)
    broadcast_capacity: usize,   // Default: 1000 messages
    auto_evict: bool,            // Default: true
}

Tuning guidance:

  • High traffic: Increase max_instances and broadcast_capacity
  • Memory constrained: Reduce max_instances, lower idle_timeout_secs
  • Long-running docs: Disable auto_evict or increase timeout

Update Compaction

The WebSocket layer automatically merges updates when the last client disconnects from a document:

  1. Wait 2-second grace period (in case a new client connects)
  2. Load all updates and apply to a temporary Y.Doc
  3. Encode current state as a single update
  4. If the merged update is smaller: replace all updates with the single merged one
  5. Sequence counter resets for the optimized document

Benefits:

  • Faster document loading for subsequent connections
  • Reduced storage usage
  • Shorter sync times for new clients

Trade-off: Loses granular edit history

Comparison with RTDB Storage

Aspect CRDT (crdt-adapter-redb) RTDB (rtdb-adapter-redb)
Data model Binary operation stream Structured JSON documents
Storage Sequential update blobs Hierarchical key-value
Queries None (replays all updates) Filters, sorting, pagination
Concurrency Automatic conflict resolution Last-write-wins
Use case Collaborative text editing Structured data with queries
Sync protocol Full update stream Partial updates via queries

Performance Characteristics

Write performance:

  • Insert update: O(log N) (B-tree insert)
  • Broadcast: O(M) where M = subscriber count
  • Typical: <1ms for small updates

Read performance:

  • Load document: O(K log N) where K = update count
  • Subscription: O(1) after initial load
  • Typical: 50-200ms for 1000 updates

Memory usage:

  • Per instance: ~1KB overhead
  • Broadcast buffer: broadcast_capacity × avg_update_size
  • Total: ~100KB for 100 cached docs

See Also