redb Implementation

The CRDT adapter stores collaborative document updates persistently using redb, enabling conflict-free replicated data types to survive server restarts while maintaining real-time synchronization capabilities.

Architecture Overview

The CRDT adapter bridges between the Yrs CRDT engine (in-memory) and persistent storage, storing binary update streams that can be replayed to reconstruct document state.

Client 1 ─┐
          ├─► Yrs CRDT ─► Binary Updates ─► redb Storage
Client 2 ─┘      │                              │
                 │◄──────── Load on startup ────┘
                 └─► Broadcast ─► Subscribed Clients

Key Insight

CRDT systems work by accumulating operation updates rather than storing full document state. Each update is a binary-encoded operation (insert, delete, format, etc.) that can be:

Applied to reconstruct current document state
Sent to new subscribers for synchronization
Replayed in any order (commutative property)

Storage Layout

The adapter uses three redb tables per database:

1. Updates Table (`crdt_updates`)

Stores binary CRDT update blobs indexed by document and sequence number.

Schema: (doc_id:seq) → update_bytes

Key Format:    "{doc_id}:{sequence}"
Value:         Binary CRDT update blob (from Yrs)

Example Keys:
  "doc_abc123:0"      First update
  "doc_abc123:1"      Second update
  "doc_abc123:2"      Third update

Properties:

Sequential numbering ensures order preservation
Each update is a self-contained binary blob
Updates are append-only (immutable)
Prefix scan retrieves all updates for a document

2. Metadata Table (`crdt_metadata`)

Stores document metadata as JSON.

Schema: doc_id → metadata_json

{
  "created_at": 1738483200,
  "updated_at": 1738486800,
  "owner": "alice.example.com",
  "permissions": {...},
  "title": "Collaborative Document"
}

Purpose:

Track document ownership and permissions
Store human-readable metadata
Enable document discovery and listing

3. Stats Table (`crdt_stats`)

Tracks update counts and storage metrics (currently defined but not actively used in core operations).

Schema: doc_id → stats_json

Potential metrics:

Total update count
Total bytes stored
Last compaction timestamp

Multi-Tenancy Storage Modes

The adapter supports two storage strategies configured at initialization:

Per-Tenant Files Mode (`per_tenant_files=true`)

Each tenant gets a dedicated redb file:

storage/
├── tn_1.db      (Tenant 1 documents)
├── tn_2.db      (Tenant 2 documents)
└── tn_3.db      (Tenant 3 documents)

Advantages:

✅ Complete isolation between tenants
✅ Independent backups per tenant
✅ Easier to delete/archive specific tenants
✅ Better fault isolation

Trade-offs:

⚠️ More file handles required
⚠️ Slightly higher disk overhead

Use case: Multi-tenant SaaS deployments where tenant isolation is critical

Single File Mode (`per_tenant_files=false`)

All tenants share one database:

storage/
└── crdt.db      (All tenants)

Advantages:

✅ Fewer file handles
✅ Simpler operational management
✅ Easier bulk operations

Trade-offs:

⚠️ No physical isolation between tenants
⚠️ Tenant deletion requires filtering

Use case: Single-user deployments or trusted environments

In-Memory Document Instances

The adapter caches document instances in memory to optimize performance and enable real-time subscriptions.

DocumentInstance Structure

struct DocumentInstance {
    broadcaster: tokio::sync::broadcast::Sender<CrdtChangeEvent>,
    last_accessed: AtomicU64,  // For LRU eviction
    update_count: AtomicU64,   // Sequence counter
}

Each instance provides:

Broadcast channel: Real-time notifications to subscribed clients
Sequence counter: Monotonically increasing update numbers
LRU tracking: Timestamp for idle document eviction

Caching Strategy

Cache population:

Document accessed → Check cache (DashMap)
If missing → Create instance with broadcast channel
Store in cache with initial timestamp

LRU eviction (configurable):

max_instances: Maximum cached documents (default: 100)
idle_timeout_secs: Evict after N seconds idle (default: 300s)
auto_evict: Enable/disable automatic eviction (default: true)

Benefits:

Avoid reopening redb transactions repeatedly
Enable efficient pub/sub without polling
Reduce memory usage for inactive documents

Core Operations

Storing Updates

When a client modifies a CRDT document:

1. Client sends binary update → Server
2. Adapter fetches/creates DocumentInstance
3. Sequence number assigned (atomic increment)
4. Update stored: updates["{doc_id}:{seq}"] = update_bytes
5. Broadcast update to all subscribers
6. Return success

Atomicity: Each update is stored in a redb write transaction, ensuring crash consistency.

Key generation:

fn make_update_key(doc_id: &str, seq: u64) -> String {
    format!("{}:{}", doc_id, seq)
}

Loading Updates

When a client opens a document:

1. Prefix scan: updates.range("{doc_id}:"..)
2. Collect all matching keys (while prefix matches)
3. Read binary blobs in sequence order
4. Return Vec<CrdtUpdate>
5. Client applies updates to Yrs document → reconstructs state

Performance: Prefix scans are efficient in redb’s B-tree structure.

Subscriptions

Clients can subscribe to document changes:

Without snapshot (new updates only):

let mut rx = instance.broadcaster.subscribe();
while let Ok(event) = rx.recv().await {
    // Send update to client
}

With snapshot (full sync):

// 1. Send all existing updates (from redb)
for update in get_updates(doc_id).await? {
    yield update;
}

// 2. Then stream new updates (from broadcaster)
let mut rx = instance.broadcaster.subscribe();
while let Ok(event) = rx.recv().await {
    yield event;
}

Snapshot mode enables new clients to:

Receive complete document history
Reconstruct current state
Continue receiving live updates

Deleting Documents

1. Begin write transaction
2. Prefix scan to find all updates: "{doc_id}:"
3. Delete each update key
4. Delete metadata entry
5. Commit transaction
6. Remove from instance cache

Note: Currently no compaction strategy to reduce update count.

Configuration

struct AdapterConfig {
    max_instances: usize,        // Default: 100
    idle_timeout_secs: u64,      // Default: 300 (5 minutes)
    broadcast_capacity: usize,   // Default: 1000 messages
    auto_evict: bool,            // Default: true
}

Tuning guidance:

High traffic: Increase max_instances and broadcast_capacity
Memory constrained: Reduce max_instances, lower idle_timeout_secs
Long-running docs: Disable auto_evict or increase timeout

Update Compaction (Future Enhancement)

Currently, updates accumulate indefinitely. Potential future optimization:

Compaction strategy:

When update count exceeds threshold (e.g., 1000)
Load all updates and apply to Yrs document
Encode current state as single snapshot update
Replace all updates with snapshot
Reset sequence counter

Benefits:

Faster document loading
Reduced storage usage
Shorter sync times for new clients

Trade-off: Loses granular edit history

Comparison with RTDB Storage

Aspect	CRDT (crdt-adapter-redb)	RTDB (rtdb-adapter-redb)
Data model	Binary operation stream	Structured JSON documents
Storage	Sequential update blobs	Hierarchical key-value
Queries	None (replays all updates)	Filters, sorting, pagination
Concurrency	Automatic conflict resolution	Last-write-wins
Use case	Collaborative text editing	Structured data with queries
Sync protocol	Full update stream	Partial updates via queries

Performance Characteristics

Write performance:

Insert update: O(log N) (B-tree insert)
Broadcast: O(M) where M = subscriber count
Typical: <1ms for small updates

Read performance:

Load document: O(K log N) where K = update count
Subscription: O(1) after initial load
Typical: 50-200ms for 1000 updates

Memory usage:

Per instance: ~1KB overhead
Broadcast buffer: broadcast_capacity × avg_update_size
Total: ~100KB for 100 cached docs