redb Implementation

The CRDT adapter stores collaborative document updates persistently using redb, enabling conflict-free replicated data types to survive server restarts while maintaining real-time synchronization capabilities.

Architecture Overview

The CRDT adapter bridges between the Yrs CRDT engine (in-memory) and persistent storage, storing binary update streams that can be replayed to reconstruct document state.

Client 1 ─┐
          ├─► Yrs CRDT ─► Binary Updates ─► redb Storage
Client 2 ─┘      │                              │
                 │◄──────── Load on startup ────┘
                 └─► Broadcast ─► Subscribed Clients

Key Insight

CRDT systems work by accumulating operation updates rather than storing full document state. Each update is a binary-encoded operation (insert, delete, format, etc.) that can be:

  • Applied to reconstruct current document state
  • Sent to new subscribers for synchronization
  • Replayed in any order (commutative property)

Storage Layout

The adapter uses three redb tables per database:

1. Updates Table (crdt_updates)

Stores binary CRDT update blobs indexed by document and sequence number.

Schema: (doc_id:seq) → update_bytes

Key Format:    "{doc_id}:{sequence}"
Value:         Binary CRDT update blob (from Yrs)

Example Keys:
  "doc_abc123:0"      First update
  "doc_abc123:1"      Second update
  "doc_abc123:2"      Third update

Properties:

  • Sequential numbering ensures order preservation
  • Each update is a self-contained binary blob
  • Updates are append-only (immutable)
  • Prefix scan retrieves all updates for a document

2. Metadata Table (crdt_metadata)

Stores document metadata as JSON.

Schema: doc_id → metadata_json

{
  "created_at": 1738483200,
  "updated_at": 1738486800,
  "owner": "alice.example.com",
  "permissions": {...},
  "title": "Collaborative Document"
}

Purpose:

  • Track document ownership and permissions
  • Store human-readable metadata
  • Enable document discovery and listing

3. Stats Table (crdt_stats)

Tracks update counts and storage metrics (currently defined but not actively used in core operations).

Schema: doc_id → stats_json

Potential metrics:

  • Total update count
  • Total bytes stored
  • Last compaction timestamp

Multi-Tenancy Storage Modes

The adapter supports two storage strategies configured at initialization:

Per-Tenant Files Mode (per_tenant_files=true)

Each tenant gets a dedicated redb file:

storage/
├── tn_1.db      (Tenant 1 documents)
├── tn_2.db      (Tenant 2 documents)
└── tn_3.db      (Tenant 3 documents)

Advantages:

  • ✅ Complete isolation between tenants
  • ✅ Independent backups per tenant
  • ✅ Easier to delete/archive specific tenants
  • ✅ Better fault isolation

Trade-offs:

  • ⚠️ More file handles required
  • ⚠️ Slightly higher disk overhead

Use case: Multi-tenant SaaS deployments where tenant isolation is critical

Single File Mode (per_tenant_files=false)

All tenants share one database:

storage/
└── crdt.db      (All tenants)

Advantages:

  • ✅ Fewer file handles
  • ✅ Simpler operational management
  • ✅ Easier bulk operations

Trade-offs:

  • ⚠️ No physical isolation between tenants
  • ⚠️ Tenant deletion requires filtering

Use case: Single-user deployments or trusted environments

In-Memory Document Instances

The adapter caches document instances in memory to optimize performance and enable real-time subscriptions.

DocumentInstance Structure

struct DocumentInstance {
    broadcaster: tokio::sync::broadcast::Sender<CrdtChangeEvent>,
    last_accessed: AtomicU64,  // For LRU eviction
    update_count: AtomicU64,   // Sequence counter
}

Each instance provides:

  • Broadcast channel: Real-time notifications to subscribed clients
  • Sequence counter: Monotonically increasing update numbers
  • LRU tracking: Timestamp for idle document eviction

Caching Strategy

Cache population:

  1. Document accessed → Check cache (DashMap)
  2. If missing → Create instance with broadcast channel
  3. Store in cache with initial timestamp

LRU eviction (configurable):

  • max_instances: Maximum cached documents (default: 100)
  • idle_timeout_secs: Evict after N seconds idle (default: 300s)
  • auto_evict: Enable/disable automatic eviction (default: true)

Benefits:

  • Avoid reopening redb transactions repeatedly
  • Enable efficient pub/sub without polling
  • Reduce memory usage for inactive documents

Core Operations

Storing Updates

When a client modifies a CRDT document:

1. Client sends binary update → Server
2. Adapter fetches/creates DocumentInstance
3. Sequence number assigned (atomic increment)
4. Update stored: updates["{doc_id}:{seq}"] = update_bytes
5. Broadcast update to all subscribers
6. Return success

Atomicity: Each update is stored in a redb write transaction, ensuring crash consistency.

Key generation:

fn make_update_key(doc_id: &str, seq: u64) -> String {
    format!("{}:{}", doc_id, seq)
}

Loading Updates

When a client opens a document:

1. Prefix scan: updates.range("{doc_id}:"..)
2. Collect all matching keys (while prefix matches)
3. Read binary blobs in sequence order
4. Return Vec<CrdtUpdate>
5. Client applies updates to Yrs document → reconstructs state

Performance: Prefix scans are efficient in redb’s B-tree structure.

Subscriptions

Clients can subscribe to document changes:

Without snapshot (new updates only):

let mut rx = instance.broadcaster.subscribe();
while let Ok(event) = rx.recv().await {
    // Send update to client
}

With snapshot (full sync):

// 1. Send all existing updates (from redb)
for update in get_updates(doc_id).await? {
    yield update;
}

// 2. Then stream new updates (from broadcaster)
let mut rx = instance.broadcaster.subscribe();
while let Ok(event) = rx.recv().await {
    yield event;
}

Snapshot mode enables new clients to:

  1. Receive complete document history
  2. Reconstruct current state
  3. Continue receiving live updates

Deleting Documents

1. Begin write transaction
2. Prefix scan to find all updates: "{doc_id}:"
3. Delete each update key
4. Delete metadata entry
5. Commit transaction
6. Remove from instance cache

Note: Currently no compaction strategy to reduce update count.

Configuration

struct AdapterConfig {
    max_instances: usize,        // Default: 100
    idle_timeout_secs: u64,      // Default: 300 (5 minutes)
    broadcast_capacity: usize,   // Default: 1000 messages
    auto_evict: bool,            // Default: true
}

Tuning guidance:

  • High traffic: Increase max_instances and broadcast_capacity
  • Memory constrained: Reduce max_instances, lower idle_timeout_secs
  • Long-running docs: Disable auto_evict or increase timeout

Update Compaction (Future Enhancement)

Currently, updates accumulate indefinitely. Potential future optimization:

Compaction strategy:

  1. When update count exceeds threshold (e.g., 1000)
  2. Load all updates and apply to Yrs document
  3. Encode current state as single snapshot update
  4. Replace all updates with snapshot
  5. Reset sequence counter

Benefits:

  • Faster document loading
  • Reduced storage usage
  • Shorter sync times for new clients

Trade-off: Loses granular edit history

Comparison with RTDB Storage

Aspect CRDT (crdt-adapter-redb) RTDB (rtdb-adapter-redb)
Data model Binary operation stream Structured JSON documents
Storage Sequential update blobs Hierarchical key-value
Queries None (replays all updates) Filters, sorting, pagination
Concurrency Automatic conflict resolution Last-write-wins
Use case Collaborative text editing Structured data with queries
Sync protocol Full update stream Partial updates via queries

Performance Characteristics

Write performance:

  • Insert update: O(log N) (B-tree insert)
  • Broadcast: O(M) where M = subscriber count
  • Typical: <1ms for small updates

Read performance:

  • Load document: O(K log N) where K = update count
  • Subscription: O(1) after initial load
  • Typical: 50-200ms for 1000 updates

Memory usage:

  • Per instance: ~1KB overhead
  • Broadcast buffer: broadcast_capacity × avg_update_size
  • Total: ~100KB for 100 cached docs

See Also