Content-Addressing & Merkle Trees

Cloudillo implements a merkle tree structure using content-addressed identifiers throughout its architecture. Every action, file, and data blob is identified by the cryptographic hash of its content, creating an immutable, verifiable chain of trust.

What is Content-Addressing?

Content-addressing means identifying data by what it is (its content) rather than where it is (its location). Instead of using arbitrary IDs or URLs, Cloudillo computes a cryptographic hash of the content itself and uses that hash as the identifier.

Benefits

βœ… Immutable: Content cannot change without changing its identifier βœ… Tamper-Evident: Any modification is immediately detectable βœ… Deduplicatable: Identical content produces identical identifiers βœ… Verifiable: Anyone can recompute and verify hashes independently βœ… Cacheable: Content-addressed data can be cached forever βœ… Trustless: No need to trust storage providersβ€”verify the hash

Hash Function

Cloudillo uses SHA-256 for all content-addressing:

  • Algorithm: SHA-256 (256-bit Secure Hash Algorithm)
  • Encoding: Base64url without padding (URL-safe)
  • Output: 43-character base64-encoded string
  • Collision Resistance: Cryptographically secure
compute_hash(prefix, data):
    hash = SHA256(data)
    encoded = base64url_encode(hash)  // URL-safe, no padding
    return "{prefix}1~{encoded}"

// Example:
compute_hash("b", blob_bytes) β†’ "b1~abc123def456..." (43 chars)
compute_hash("f", descriptor)  β†’ "f1~Qo2E3G8TJZ..." (43 chars)
compute_hash("a", jwt_token)   β†’ "a1~8kR3mN9pQ2vL..." (43 chars)

Merkle Tree Structure

Cloudillo’s content-addressing creates a variable-depth merkle tree where actions can reference other actions recursively. The example below shows a six-level hierarchy for a POST action with image attachments:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Level 6: Action ID (a1~8kR3mN9pQ2vL6xW...)              β”‚
β”‚   ↑ SHA-256 hash of ↓                                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Level 5: Action Token (JWT)                             β”‚
β”‚   Header: {"alg":"ES384","typ":"JWT"}                   β”‚
β”‚   Payload: {                                            β”‚
β”‚     "iss": "alice.example.com",                         β”‚
β”‚     "t": "POST:IMG",                                    β”‚
β”‚     "c": "Amazing photo!",                              β”‚
β”‚     "a": ["f1~Qo2E3G8TJZ..."],                          β”‚
β”‚     "iat": 1738483100                                   β”‚
β”‚   }                                                     β”‚
β”‚   Signature: <ES384 signature>                          β”‚
β”‚   ↓ references ↓                                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Level 4: File ID (f1~Qo2E3G8TJZ2HTGhVlrtTDBp...)        β”‚
β”‚   ↑ SHA-256 hash of ↓                                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Level 3: File Descriptor (d1~...)                       β”‚
β”‚   "d1~tn:b1~abc:f=AVIF:s=4096:r=150x150,                β”‚
β”‚        sd:b1~def:f=AVIF:s=32768:r=640x480,              β”‚
β”‚        md:b1~ghi:f=AVIF:s=262144:r=1920x1080"           β”‚
β”‚   ↓ references ↓                                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Level 2: Variant IDs                                    β”‚
β”‚   b1~abc123... (tn)                                     β”‚
β”‚   b1~def456... (sd)                                     β”‚
β”‚   b1~ghi789... (md)                                     β”‚
β”‚   ↑ SHA-256 hash of ↓                                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Level 1: Blob Data (raw bytes)                          β”‚
β”‚   <AVIF encoded image data - tn variant>                β”‚
β”‚   <AVIF encoded image data - sd variant>                β”‚
β”‚   <AVIF encoded image data - md variant>                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Level 1: Blob Data

Raw bytes of actual content (images, videos, documents).

  • No identifier yetβ€”just the binary data
  • This is what gets hashed to create variant IDs

Level 2: Variant IDs (b1~…)

Content-addressed blob identifiers computed as SHA256(blob_bytes).

Properties:

  • Identifies a single image/video variant
  • Changing even one byte changes the entire ID
  • Multiple actions can reference the same variant_id (deduplication)

Level 3: File Descriptor (d1~…)

Encoded string listing all available variants of a file.

Format:

d1~{variant}:{variant_id}:f={format}:s={size}:r={width}x{height},...

Example:

d1~tn:b1~abc123:f=AVIF:s=4096:r=150x150,sd:b1~def456:f=AVIF:s=32768:r=640x480

This descriptor says:

  • Thumbnail variant: b1~abc123, AVIF format, 4KB, 150Γ—150px
  • Standard variant: b1~def456, AVIF format, 32KB, 640Γ—480px

Level 4: File ID (f1~…)

Content-addressed file identifier computed as SHA256(descriptor_string).

Properties:

  • Identifies the complete file with all its variants
  • Changing any variant changes the descriptor, which changes the file_id
  • Used in action token attachments

Level 5: Action Token (JWT)

Cryptographically signed JSON Web Token representing a user action.

Structure:

{
  "header": {
    "alg": "ES384",
    "typ": "JWT"
  },
  "payload": {
    "iss": "alice.example.com",
    "k": "20250101",
    "t": "POST:IMG",
    "c": "Check out this amazing photo!",
    "a": ["f1~Qo2E3G8TJZ..."],
    "iat": 1738483100
  },
  "signature": "<ES384 signature with alice's private key>"
}

Encoding: {base64(header)}.{base64(payload)}.{base64(signature)}

Level 6: Action ID (a1~…)

Content-addressed action identifier computed as SHA256(complete_jwt_token).

Properties:

  • Identifies the complete action immutably
  • Changing any field (even whitespace) changes the action_id
  • Used as parent references in replies/reactions

Hash Versioning Scheme

All identifiers use a versioned prefix format for future-proofing:

{prefix}{version}~{base64_encoded_hash}

Current Prefixes (Version 1)

Prefix Resource Type Hash Input Example
a1~ Action Entire JWT token a1~8kR3mN9pQ2vL6xW...
f1~ File File descriptor string f1~Qo2E3G8TJZ2HTGh...
d1~ Descriptor (not a hash, the encoded format itself) d1~tn:b1~abc:f=AVIF:...
b1~ Blob Blob bytes (raw data) b1~abc123def456ghi...

Version Scheme

  • Version 1: SHA-256 with base64url encoding (no padding)
  • Future versions: Can upgrade to SHA-3, BLAKE3, etc.
  • Backward compatibility: Old content remains valid forever
  • Algorithm agility: Migrate to new algorithms without breaking existing references

Example upgrade path:

a1~...  (SHA-256)
a2~...  (SHA-3)
a3~...  (BLAKE3)

Merkle Tree Properties

Cloudillo’s content-addressing creates a merkle tree with these properties:

Immutability

Once created, content cannot be modified without changing its identifier.

Example:

Original Post:
  content: "Hello World"
  action_id: a1~abc123...

Modified Post:
  content: "Hello Universe"
  action_id: a1~xyz789...  ← DIFFERENT ID!

Any attempt to modify content results in a completely new action with a different ID. The original remains unchanged.

Tamper-Evidence

Any modification anywhere in the tree is immediately detectable.

Example:

Post (a1~abc...)
  └─ Attachment: f1~Qo2...
      └─ Variants: b1~tn..., b1~sd..., b1~md...

If someone modifies the thumbnail image:
  βœ— New variant_id (b1~xyz...)
  βœ— Descriptor changes
  βœ— New file_id (f1~uvw...)
  βœ— Post attachment no longer matches
  βœ— Verification fails!

Deduplication

Identical content produces identical identifiers, enabling automatic deduplication.

Example:

Alice posts photo.jpg β†’ file_id: f1~abc123...
Bob posts photo.jpg (same file) β†’ file_id: f1~abc123... (same!)

Storage: Only one copy of the blob bytes needed
Bandwidth: Can skip downloading if already cached

Verifiability

Anyone can independently verify the entire chain.

Process:

  1. Download action token
  2. Verify JWT signature (proves author identity)
  3. Recompute action_id = SHA256(JWT)
  4. Compare with claimed action_id
  5. For each attachment:
    • Download file descriptor
    • Download all variant blobs
    • Verify each variant_id = SHA256(blob)
    • Recompute file_id = SHA256(descriptor)
    • Compare with attachment reference
  6. βœ“ Complete verification

No trust required: Pure mathematics ensures integrity.

Chain of Trust

Parent references create an immutable chain.

Example:

Post (a1~abc...)
  ↑ parent_id
Comment (a1~def...)
  ↑ parent_id
Reply (a1~ghi...)
  • Reply references Comment by content hash
  • Comment references Post by content hash
  • Cannot modify Post without breaking Comment reference
  • Cannot modify Comment without breaking Reply reference
  • Entire thread is cryptographically bound together

Proof of Authenticity

Cloudillo provides two complementary layers of proof:

Layer 1: Author Identity (Cryptographic Signatures)

Action tokens are signed with ES384 (ECDSA with P-384 curve and SHA-384 hash):

Action Token = Header.Payload.Signature

Signature = ECDSA_sign(SHA384(Header.Payload), alice_private_key)

Verification:

  1. Fetch alice’s public key from https://cl-o.alice.example.com/api/me/keys
  2. Verify signature using public key
  3. βœ“ Proves Alice created this action

Layer 2: Content Integrity (Content Hashes)

All identifiers are SHA-256 hashes:

action_id = SHA256(entire JWT token)
file_id = SHA256(descriptor string)
variant_id = SHA256(blob bytes)

Verification:

  1. Download content
  2. Recompute hash
  3. Compare with claimed identifier
  4. βœ“ Proves content hasn’t been tampered with

Combined Proof

Author + Content = Complete Authenticity

For a post with image attachment:

  1. βœ“ Signature proves Alice authored the post
  2. βœ“ Action hash proves post content is intact
  3. βœ“ File hash proves descriptor is intact
  4. βœ“ Variant hashes prove image bytes are intact
  5. βœ“ Complete chain of authenticity established

No trusted intermediaries neededβ€”pure cryptographic proof.

Verification Example

Scenario: Verify a LIKE action on a POST with image attachment.

verify_like_action(like_id):
    // 1. Verify LIKE action
    like_token = fetch_action_token(like_id)
    verify_signature(like_token, bob_public_key)
    verify like_id == SHA256(like_token)
    βœ“ Bob authored this LIKE, content is intact

    // 2. Verify parent POST action
    post_id = like_token.parent_id
    post_token = fetch_action_token(post_id)
    verify_signature(post_token, alice_public_key)
    verify post_id == SHA256(post_token)
    βœ“ Alice authored this POST, content is intact

    // 3. Verify file attachment
    file_id = post_token.attachments[0]
    descriptor = fetch_descriptor(file_id)
    verify file_id == SHA256(descriptor)
    βœ“ File descriptor is intact

    // 4. Verify each variant blob
    variants = parse_descriptor(descriptor)
    for each variant:
        blob_data = fetch_blob(variant.blob_id)
        verify variant.blob_id == SHA256(blob_data)
        verify blob_data.size == variant.size
    βœ“ All image variants are intact

    RESULT: Complete chain verified!

What this proves:

  • Bob signed the LIKE (authentication)
  • Alice signed the POST (authentication)
  • No content was tampered with at any level (integrity)
  • The LIKE references this specific POST (linkage)
  • The POST references this specific image file (linkage)
  • All image bytes are authentic (end-to-end verification)

DAG Structure

Cloudillo’s action system forms a Directed Acyclic Graph (DAG) with these properties:

Multiple Roots

Unlike a traditional tree with one root, Cloudillo has multiple independent threads:

Post 1 (a1~abc...)                Post 2 (a1~xyz...)
β”‚                                  β”‚
β”œβ”€ Comment 1.1 (a1~def...)        β”œβ”€ Comment 2.1 (a1~uvw...)
β”‚  β”‚                               β”‚
β”‚  β”œβ”€ Reply 1.1.1 (a1~ghi...)     └─ Like 2.1 (a1~rst...)
β”‚  β”‚                                  (parent: a1~uvw...)
β”‚  └─ Reply 1.1.2 (a1~jkl...)
β”‚
β”œβ”€ Comment 1.2 (a1~mno...)
β”‚
└─ Like 1 (a1~pqr...)
   (parent: a1~abc...)

Each top-level post is a root node. Comments and reactions form child nodes.

Shared Attachments

Multiple actions can reference the same file:

Post 1 (a1~abc...)
  └─ Attachment: f1~photo123

Post 2 (a1~xyz...)
  └─ Attachment: f1~photo123  ← SAME FILE!

Repost (a1~uvw...)
  └─ Attachment: f1~photo123  ← SAME FILE!

Benefits:

  • Storage efficiency: Only one copy of blob data needed
  • Bandwidth efficiency: Download once, use everywhere
  • Consistency: Everyone sees exactly the same image
  • Verification: Single verification proves authenticity for all uses

Acyclic Property

The graph has no cycles (no circular references):

βœ“ Valid:
  Post β†’ Comment β†’ Reply (linear chain)
  Post β†’ Comment1, Post β†’ Comment2 (branching)

βœ— Invalid:
  Post β†’ Comment β†’ Reply β†’ Post (cycle!)

Cycles are prevented because:

  • Parent references use content hashes
  • Cannot reference an action that doesn’t exist yet
  • Cannot create action hash without knowing full content
  • Mathematical impossibility to create circular references

Efficient Traversal

Forward traversal (find children):

-- Find all comments on a post
SELECT * FROM actions
WHERE parent_id = 'a1~abc123...'
AND type LIKE 'CMNT%';

Backward traversal (find parents):

find_root(action_id):
    action = fetch_action(action_id)
    if action.parent_id exists:
        return find_root(action.parent_id)  // Recursive
    else:
        return action  // Found root

Root ID optimization: To avoid repeated traversal, the root_id is computed once and stored in the database (see Actions: Root ID Handling).

Performance Implications

Caching Strategy

Content-addressed data is immutable, enabling aggressive caching:

GET /api/files/f1~Qo2E3G8TJZ...

Response Headers:
  Cache-Control: public, max-age=31536000, immutable
  Content-Type: image/avif

Benefits:

  • Browsers cache forever (max-age = 1 year)
  • CDN cache forever
  • No cache invalidation needed
  • Reduces bandwidth for federated instances

Storage Deduplication

Identical content is stored only once:

Alice uploads photo.jpg β†’ b1~abc123 (1MB stored)
Bob uploads photo.jpg β†’ b1~abc123 (0MB stored, reuse!)
Carol uploads photo.jpg β†’ b1~abc123 (0MB stored, reuse!)

Total storage: 1MB instead of 3MB

Automatic deduplication at multiple levels:

  • Variant level (same image blob)
  • File level (same set of variants)
  • Action level (impossible to duplicate due to signatures and timestamps)

Security Considerations

Collision Resistance

SHA-256 provides 256-bit security:

  • Probability of collision: ~2^-256 (effectively zero)
  • Preimage attack: computationally infeasible
  • Second preimage attack: computationally infeasible

In practice: Finding a collision would require more energy than exists in the observable universe.

Tamper Detection

Any modification anywhere in the tree is immediately detectable:

Attack scenario: Attacker tries to modify an image in alice’s post

1. Attacker modifies image blob
   β†’ New variant_id (hash mismatch!)
2. Attacker updates descriptor with new variant_id
   β†’ New file_id (hash mismatch!)
3. Attacker updates post attachment with new file_id
   β†’ Breaks JWT signature (alice didn't sign this!)
4. Attacker creates new JWT with new attachment
   β†’ New action_id (different post!)

Result: Attacker cannot tamper without detection. They can only create NEW actions, not modify existing ones.

Trust Model

Cloudillo’s merkle tree creates a trustless verification model:

What Verification Method Trust Required
Author identity JWT signature (ES384) DNS + Public key infrastructure
Content integrity SHA-256 hash None (pure mathematics)
Parent references Content hashes None (pure mathematics)
Attachment integrity SHA-256 hash chain None (pure mathematics)

Storage providers don’t need to be trusted: Even if a storage provider is malicious, they cannot:

  • Modify content without breaking hashes βœ—
  • Forge signatures without private keys βœ—
  • Create false parent references βœ—
  • Tamper with attachments without detection βœ—

Users verify everything cryptographicallyβ€”no trust required.

Attack Resistance

Known attacks and mitigations:

Attack Mitigation
Modify action content Hash mismatch detected
Forge author signature Signature verification fails
Swap file attachment Hash mismatch detected
Modify parent reference Breaks cryptographic chain
Replay old actions Timestamp validation, deduplication
Storage provider tampering Hash verification fails

See Also

  • [Actions & Action Tokens](/architecture/actions-federation/actions - How action tokens are created and verified
  • File Storage & Processing - How file content-addressing works
  • Identity System - Cryptographic signing keys for actions
  • [Access Control](/architecture/data-layer/access-control/access - How access tokens work alongside content-addressing
  • System Architecture - Overall system design