Content-Addressing & Merkle Trees

Cloudillo implements a merkle tree structure using content-addressed identifiers throughout its architecture. Every action, file, and data blob is identified by the cryptographic hash of its content, creating an immutable, verifiable chain of trust.

What is Content-Addressing?

Content-addressing means identifying data by what it is (its content) rather than where it is (its location). Instead of using arbitrary IDs or URLs, Cloudillo computes a cryptographic hash of the content itself and uses that hash as the identifier.

Benefits

  • βœ… Immutable: Content cannot change without changing its identifier
  • βœ… Tamper-Evident: Any modification is immediately detectable
  • βœ… Deduplicatable: Identical content produces identical identifiers
  • βœ… Verifiable: Anyone can recompute and verify hashes independently
  • βœ… Cacheable: Content-addressed data can be cached forever
  • βœ… Trustless: No need to trust storage providersβ€”verify the hash

Hash Function

Cloudillo uses SHA-256 for all content-addressing:

  • Algorithm: SHA-256 (256-bit Secure Hash Algorithm)
  • Encoding: Base64url without padding (URL-safe)
  • Output: 43-character base64-encoded string
  • Collision Resistance: Cryptographically secure
compute_hash(prefix, data):
    hash = SHA256(data)
    encoded = base64url_encode(hash)  // URL-safe, no padding
    return "{prefix}1~{encoded}"

// Example:
compute_hash("b", blob_bytes) β†’ "b1~abc123def456..." (43 chars)
compute_hash("f", descriptor)  β†’ "f1~Qo2E3G8TJZ..." (43 chars)
compute_hash("a", jwt_token)   β†’ "a1~8kR3mN9pQ2vL..." (43 chars)

Merkle Tree Structure

Cloudillo’s content-addressing creates a variable-depth merkle tree where actions can reference other actions recursively. The example below shows a six-level hierarchy for a POST action with image attachments:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Level 6: Action ID (a1~8kR3mN9pQ2vL6xW...)              β”‚
β”‚   ↑ SHA-256 hash of ↓                                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Level 5: Action Token (JWT)                             β”‚
β”‚   Header: {"alg":"ES384","typ":"JWT"}                   β”‚
β”‚   Payload: {                                            β”‚
β”‚     "iss": "alice.example.com",                         β”‚
β”‚     "t": "POST:IMG",                                    β”‚
β”‚     "c": "Amazing photo!",                              β”‚
β”‚     "a": ["f1~Qo2E3G8TJZ..."],                          β”‚
β”‚     "iat": 1738483100                                   β”‚
β”‚   }                                                     β”‚
β”‚   Signature: <ES384 signature>                          β”‚
β”‚   ↓ references ↓                                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Level 4: File ID (f1~Qo2E3G8TJZ2HTGhVlrtTDBp...)        β”‚
β”‚   ↑ SHA-256 hash of ↓                                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Level 3: File Descriptor (d2,...)                       β”‚
β”‚   "d2,vis.tn:b1~abc:f=avif:s=4096:r=150x150;            β”‚
β”‚        vis.sd:b1~def:f=avif:s=32768:r=640x480;          β”‚
β”‚        vis.md:b1~ghi:f=avif:s=262144:r=1920x1080"       β”‚
β”‚   ↓ references ↓                                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Level 2: Variant IDs                                    β”‚
β”‚   b1~abc123... (vis.tn)                                 β”‚
β”‚   b1~def456... (vis.sd)                                 β”‚
β”‚   b1~ghi789... (vis.md)                                 β”‚
β”‚   ↑ SHA-256 hash of ↓                                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Level 1: Blob Data (raw bytes)                          β”‚
β”‚   <AVIF encoded image data - tn variant>                β”‚
β”‚   <AVIF encoded image data - sd variant>                β”‚
β”‚   <AVIF encoded image data - md variant>                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Level 1: Blob Data

Raw bytes of actual content (images, videos, documents).

Level 2: Variant IDs (b1~…)

Content-addressed blob identifiers computed as SHA256(blob_bytes).

Level 3: File Descriptor (d2,…)

Encoded string listing all available variants of a file.

Format:

d2,{class}.{variant}:{variant_id}:f={format}:s={size}:r={width}x{height};...

Example:

d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x150;vis.sd:b1~def456:f=avif:s=32768:r=640x480

Level 4: File ID (f1~…)

Content-addressed file identifier computed as SHA256(descriptor_string).

Level 5: Action Token (JWT)

Cryptographically signed JSON Web Token representing a user action.

Structure:

{
  "header": {
    "alg": "ES384",
    "typ": "JWT"
  },
  "payload": {
    "iss": "alice.example.com",
    "k": "20250101",
    "t": "POST:IMG",
    "c": "Check out this amazing photo!",
    "a": ["f1~Qo2E3G8TJZ..."],
    "iat": 1738483100
  },
  "signature": "<ES384 signature with alice's private key>"
}

Level 6: Action ID (a1~…)

Content-addressed action identifier computed as SHA256(complete_jwt_token).

Hash Versioning Scheme

All identifiers use a versioned prefix format for future-proofing:

{prefix}{version}~{base64_encoded_hash}

Current Prefixes

Prefix Resource Type Hash Input Example
a1~ Action Entire JWT token a1~8kR3mN9pQ2vL6xW...
f1~ File File descriptor string f1~Qo2E3G8TJZ2HTGh...
d2, Descriptor (not a hash, the encoded format itself) d2,vis.tn:b1~abc:f=avif:...
b1~ Blob Blob bytes (raw data) b1~abc123def456ghi...

Version Scheme

  • Version 1: SHA-256 with base64url encoding (no padding)
  • Future versions: Can upgrade to SHA-3, BLAKE3, etc.
  • Backward compatibility: Old content remains valid forever
  • Algorithm agility: Migrate to new algorithms without breaking existing references

Example upgrade path:

a1~...  (SHA-256)
a2~...  (SHA-3)
a3~...  (BLAKE3)

Merkle Tree Properties

Cloudillo’s content-addressing creates a merkle tree with these properties:

Immutability

Once created, content cannot be modified without changing its identifier.

Example:

Original Post:
  content: "Hello World"
  action_id: a1~abc123...

Modified Post:
  content: "Hello Universe"
  action_id: a1~xyz789...  ← DIFFERENT ID!

Any attempt to modify content results in a completely new action with a different ID. The original remains unchanged.

Tamper-Evidence

Any modification anywhere in the tree is immediately detectable.

Example:

Post (a1~abc...)
  └─ Attachment: f1~Qo2...
      └─ Variants: b1~tn..., b1~sd..., b1~md...

If someone modifies the thumbnail image:
  βœ— New variant_id (b1~xyz...)
  βœ— Descriptor changes
  βœ— New file_id (f1~uvw...)
  βœ— Post attachment no longer matches
  βœ— Verification fails!

Deduplication

Identical content produces identical identifiers, enabling automatic deduplication.

Example:

Alice posts photo.jpg β†’ file_id: f1~abc123...
Bob posts photo.jpg (same file) β†’ file_id: f1~abc123... (same!)

Storage: Only one copy of the blob bytes needed
Bandwidth: Can skip downloading if already cached

Verifiability

Anyone can independently verify the entire chain.

Process:

  1. Download action token
  2. Verify JWT signature (proves author identity)
  3. Recompute action_id = SHA256(JWT)
  4. Compare with claimed action_id
  5. For each attachment:
    • Download file descriptor
    • Download all variant blobs
    • Verify each variant_id = SHA256(blob)
    • Recompute file_id = SHA256(descriptor)
    • Compare with attachment reference
  6. βœ“ Complete verification

No trust required: Pure mathematics ensures integrity.

Chain of Trust

Parent references create an immutable chain.

Example:

Post (a1~abc...)
  ↑ parent_id
Comment (a1~def...)
  ↑ parent_id
Reply (a1~ghi...)
  • Reply references Comment by content hash
  • Comment references Post by content hash
  • Cannot modify Post without breaking Comment reference
  • Cannot modify Comment without breaking Reply reference
  • Entire thread is cryptographically bound together

Proof of Authenticity

Cloudillo provides two complementary layers of proof:

Layer 1: Author Identity (Cryptographic Signatures)

Action tokens are signed with ES384 (ECDSA with P-384 curve and SHA-384 hash):

Action Token = Header.Payload.Signature

Signature = ECDSA_sign(SHA384(Header.Payload), alice_private_key)

Verification:

  1. Fetch alice’s public key from https://cl-o.alice.example.com/api/me/keys
  2. Verify signature using public key
  3. βœ“ Proves Alice created this action

Layer 2: Content Integrity (Content Hashes)

All identifiers are SHA-256 hashes:

action_id = SHA256(entire JWT token)
file_id = SHA256(descriptor string)
variant_id = SHA256(blob bytes)

Verification:

  1. Download content
  2. Recompute hash
  3. Compare with claimed identifier
  4. βœ“ Proves content hasn’t been tampered with

Verification Example

Scenario: Verify a LIKE action on a POST with image attachment.

verify_like_action(like_id):
    // 1. Verify LIKE action
    like_token = fetch_action_token(like_id)
    verify_signature(like_token, bob_public_key)
    verify like_id == SHA256(like_token)
    βœ“ Bob authored this LIKE, content is intact

    // 2. Verify parent POST action
    post_id = like_token.parent_id
    post_token = fetch_action_token(post_id)
    verify_signature(post_token, alice_public_key)
    verify post_id == SHA256(post_token)
    βœ“ Alice authored this POST, content is intact

    // 3. Verify file attachment
    file_id = post_token.attachments[0]
    descriptor = fetch_descriptor(file_id)
    verify file_id == SHA256(descriptor)
    βœ“ File descriptor is intact

    // 4. Verify each variant blob
    variants = parse_descriptor(descriptor)
    for each variant:
        blob_data = fetch_blob(variant.blob_id)
        verify variant.blob_id == SHA256(blob_data)
        verify blob_data.size == variant.size
    βœ“ All image variants are intact

    RESULT: Complete chain verified!

DAG Structure

Cloudillo’s action system forms a Directed Acyclic Graph (DAG) with these properties:

Multiple Roots

Unlike a traditional tree with one root, Cloudillo has multiple independent threads:

Post 1 (a1~abc...)                Post 2 (a1~xyz...)
β”‚                                  β”‚
β”œβ”€ Comment 1.1 (a1~def...)        β”œβ”€ Comment 2.1 (a1~uvw...)
β”‚  β”‚                               β”‚
β”‚  β”œβ”€ Reply 1.1.1 (a1~ghi...)     └─ Like 2.1 (a1~rst...)
β”‚  β”‚                                  (parent: a1~uvw...)
β”‚  └─ Reply 1.1.2 (a1~jkl...)
β”‚
β”œβ”€ Comment 1.2 (a1~mno...)
β”‚
└─ Like 1 (a1~pqr...)
   (parent: a1~abc...)

Each top-level post is a root node. Comments and reactions form child nodes.

Performance Implications

Caching Strategy

Content-addressed data is immutable, enabling aggressive caching:

GET /api/files/f1~Qo2E3G8TJZ...

Response Headers:
  Cache-Control: public, max-age=31536000, immutable
  Content-Type: image/avif

Benefits:

  • Browsers cache forever (max-age = 1 year)
  • CDN cache forever
  • No cache invalidation needed
  • Reduces bandwidth for federated instances

Security Considerations

Trust Model

Cloudillo’s merkle tree creates a trustless verification model:

What Verification Method Trust Required
Author identity JWT signature (ES384) DNS + Public key infrastructure
Content integrity SHA-256 hash None (pure mathematics)
Parent references Content hashes None (pure mathematics)
Attachment integrity SHA-256 hash chain None (pure mathematics)

Storage providers don’t need to be trusted: Even if a storage provider is malicious, they cannot:

  • Modify content without breaking hashes βœ—
  • Forge signatures without private keys βœ—
  • Create false parent references βœ—
  • Tamper with attachments without detection βœ—

Users verify everything cryptographicallyβ€”no trust required.

Attack Resistance

Known attacks and mitigations:

Attack Mitigation
Modify action content Hash mismatch detected
Forge author signature Signature verification fails
Swap file attachment Hash mismatch detected
Modify parent reference Breaks cryptographic chain
Replay old actions Timestamp validation, deduplication
Storage provider tampering Hash verification fails

See Also