Internals

How Yjs works under the hood—understanding the mechanics behind CRDT synchronization.

The CRDT Model

Simplified Overview

This page provides a practical understanding of Yjs internals. For definitive details, consult the Yjs documentation.

Yjs is primarily operation-based: it stores and transmits operations (inserts, deletes). However, it also supports state encoding for snapshots and initial sync. This combination provides:

  • Efficient sync - Only missing operations are transmitted based on vector clock comparison
  • Full state snapshots - New clients can receive complete state without operation replay
  • Compact updates - Ongoing changes are small binary operation deltas

Items and the Item List

Internally, all data is a linked list of “items”:

[item1] <-> [item2] <-> [item3] <-> [item4]

Each item contains:

  • ID - Unique (clientId, clock) pair
  • Content - The actual data
  • Origin - Item this was inserted after
  • Right Origin - Item this was inserted before

Client IDs and Clocks

Every client has a unique ID and logical clock. Item IDs are (clientId, clock) pairs, ensuring globally unique identifiers without coordination.

Vector Clocks

State vectors track what each client has seen:

{
  clientA: 15,  // Has seen A's operations up to clock 15
  clientB: 8,   // Has seen B's operations up to clock 8
}

When syncing, only missing operations are sent based on vector clock comparison.

Conflict Resolution

Y.Map: Last-writer-wins by logical timestamp. Higher clock wins.

Y.Array: Concurrent insertions at same position are both preserved. Order determined by client ID.

Y.Text: Character-level merging. Concurrent insertions both appear; order by position and client ID.

The Update Format

Changes are encoded as compact binary:

yDoc.on('update', (update: Uint8Array) => {
  // Send over network or store for persistence
})

Y.applyUpdate(yDoc, update)  // Apply received update
Y.mergeUpdates([u1, u2, u3]) // Compact multiple updates

Garbage Collection

Deleted items become tombstones (needed for concurrent operation resolution). Tombstones are eventually garbage collected when all clients have moved past them.

GC Implications

Heavy editing accumulates tombstones until GC. Very long-lived, heavily-edited documents may grow larger than expected.

Subdocuments

For large documents, split into subdocuments for lazy loading:

const mainDoc = new Y.Doc()
const subDoc = new Y.Doc({ guid: 'chapter-1' })
mainDoc.getMap('subdocs').set('chapter1', subDoc)

Performance Characteristics

The following are approximate complexities for typical use cases. Actual performance varies by implementation details, document structure, and operation history:

Operation Approximate Complexity
Map get/set O(1) average
Array push O(1)
Array insert at index O(n)
Text insert O(log n) typical*
Sync (diff) O(changes)

* Text insertion complexity depends on the document’s internal structure and edit history.

Space overhead: For typical documents, expect 2-10x the raw data size due to CRDT metadata and tombstones. Very long-lived, heavily-edited documents may accumulate more overhead. Documents with minimal edits will be closer to the lower bound.

See Also