Content-Addressing & Merkle Trees
Cloudillo implements a merkle tree structure using content-addressed identifiers throughout its architecture. Every action, file, and data blob is identified by the cryptographic hash of its content, creating an immutable, verifiable chain of trust.
What is Content-Addressing?
Content-addressing means identifying data by what it is (its content) rather than where it is (its location). Instead of using arbitrary IDs or URLs, Cloudillo computes a cryptographic hash of the content itself and uses that hash as the identifier.
Benefits
β Immutable: Content cannot change without changing its identifier β Tamper-Evident: Any modification is immediately detectable β Deduplicatable: Identical content produces identical identifiers β Verifiable: Anyone can recompute and verify hashes independently β Cacheable: Content-addressed data can be cached forever β Trustless: No need to trust storage providersβverify the hash
Hash Function
Cloudillo uses SHA-256 for all content-addressing:
- Algorithm: SHA-256 (256-bit Secure Hash Algorithm)
- Encoding: Base64url without padding (URL-safe)
- Output: 43-character base64-encoded string
- Collision Resistance: Cryptographically secure
compute_hash(prefix, data):
hash = SHA256(data)
encoded = base64url_encode(hash) // URL-safe, no padding
return "{prefix}1~{encoded}"
// Example:
compute_hash("b", blob_bytes) β "b1~abc123def456..." (43 chars)
compute_hash("f", descriptor) β "f1~Qo2E3G8TJZ..." (43 chars)
compute_hash("a", jwt_token) β "a1~8kR3mN9pQ2vL..." (43 chars)Merkle Tree Structure
Cloudillo’s content-addressing creates a variable-depth merkle tree where actions can reference other actions recursively. The example below shows a six-level hierarchy for a POST action with image attachments:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Level 6: Action ID (a1~8kR3mN9pQ2vL6xW...) β
β β SHA-256 hash of β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Level 5: Action Token (JWT) β
β Header: {"alg":"ES384","typ":"JWT"} β
β Payload: { β
β "iss": "alice.example.com", β
β "t": "POST:IMG", β
β "c": "Amazing photo!", β
β "a": ["f1~Qo2E3G8TJZ..."], β
β "iat": 1738483100 β
β } β
β Signature: <ES384 signature> β
β β references β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Level 4: File ID (f1~Qo2E3G8TJZ2HTGhVlrtTDBp...) β
β β SHA-256 hash of β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Level 3: File Descriptor (d1~...) β
β "d1~tn:b1~abc:f=AVIF:s=4096:r=150x150, β
β sd:b1~def:f=AVIF:s=32768:r=640x480, β
β md:b1~ghi:f=AVIF:s=262144:r=1920x1080" β
β β references β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Level 2: Variant IDs β
β b1~abc123... (tn) β
β b1~def456... (sd) β
β b1~ghi789... (md) β
β β SHA-256 hash of β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Level 1: Blob Data (raw bytes) β
β <AVIF encoded image data - tn variant> β
β <AVIF encoded image data - sd variant> β
β <AVIF encoded image data - md variant> β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββLevel 1: Blob Data
Raw bytes of actual content (images, videos, documents).
- No identifier yetβjust the binary data
- This is what gets hashed to create variant IDs
Level 2: Variant IDs (b1~…)
Content-addressed blob identifiers computed as SHA256(blob_bytes).
Properties:
- Identifies a single image/video variant
- Changing even one byte changes the entire ID
- Multiple actions can reference the same variant_id (deduplication)
Level 3: File Descriptor (d1~…)
Encoded string listing all available variants of a file.
Format:
d1~{variant}:{variant_id}:f={format}:s={size}:r={width}x{height},...Example:
d1~tn:b1~abc123:f=AVIF:s=4096:r=150x150,sd:b1~def456:f=AVIF:s=32768:r=640x480This descriptor says:
- Thumbnail variant:
b1~abc123, AVIF format, 4KB, 150Γ150px - Standard variant:
b1~def456, AVIF format, 32KB, 640Γ480px
Level 4: File ID (f1~…)
Content-addressed file identifier computed as SHA256(descriptor_string).
Properties:
- Identifies the complete file with all its variants
- Changing any variant changes the descriptor, which changes the file_id
- Used in action token attachments
Level 5: Action Token (JWT)
Cryptographically signed JSON Web Token representing a user action.
Structure:
{
"header": {
"alg": "ES384",
"typ": "JWT"
},
"payload": {
"iss": "alice.example.com",
"k": "20250101",
"t": "POST:IMG",
"c": "Check out this amazing photo!",
"a": ["f1~Qo2E3G8TJZ..."],
"iat": 1738483100
},
"signature": "<ES384 signature with alice's private key>"
}Encoding: {base64(header)}.{base64(payload)}.{base64(signature)}
Level 6: Action ID (a1~…)
Content-addressed action identifier computed as SHA256(complete_jwt_token).
Properties:
- Identifies the complete action immutably
- Changing any field (even whitespace) changes the action_id
- Used as parent references in replies/reactions
Hash Versioning Scheme
All identifiers use a versioned prefix format for future-proofing:
{prefix}{version}~{base64_encoded_hash}Current Prefixes (Version 1)
| Prefix | Resource Type | Hash Input | Example |
|---|---|---|---|
a1~ |
Action | Entire JWT token | a1~8kR3mN9pQ2vL6xW... |
f1~ |
File | File descriptor string | f1~Qo2E3G8TJZ2HTGh... |
d1~ |
Descriptor | (not a hash, the encoded format itself) | d1~tn:b1~abc:f=AVIF:... |
b1~ |
Blob | Blob bytes (raw data) | b1~abc123def456ghi... |
Version Scheme
- Version 1: SHA-256 with base64url encoding (no padding)
- Future versions: Can upgrade to SHA-3, BLAKE3, etc.
- Backward compatibility: Old content remains valid forever
- Algorithm agility: Migrate to new algorithms without breaking existing references
Example upgrade path:
a1~... (SHA-256)
a2~... (SHA-3)
a3~... (BLAKE3)Merkle Tree Properties
Cloudillo’s content-addressing creates a merkle tree with these properties:
Immutability
Once created, content cannot be modified without changing its identifier.
Example:
Original Post:
content: "Hello World"
action_id: a1~abc123...
Modified Post:
content: "Hello Universe"
action_id: a1~xyz789... β DIFFERENT ID!Any attempt to modify content results in a completely new action with a different ID. The original remains unchanged.
Tamper-Evidence
Any modification anywhere in the tree is immediately detectable.
Example:
Post (a1~abc...)
ββ Attachment: f1~Qo2...
ββ Variants: b1~tn..., b1~sd..., b1~md...
If someone modifies the thumbnail image:
β New variant_id (b1~xyz...)
β Descriptor changes
β New file_id (f1~uvw...)
β Post attachment no longer matches
β Verification fails!Deduplication
Identical content produces identical identifiers, enabling automatic deduplication.
Example:
Alice posts photo.jpg β file_id: f1~abc123...
Bob posts photo.jpg (same file) β file_id: f1~abc123... (same!)
Storage: Only one copy of the blob bytes needed
Bandwidth: Can skip downloading if already cachedVerifiability
Anyone can independently verify the entire chain.
Process:
- Download action token
- Verify JWT signature (proves author identity)
- Recompute action_id = SHA256(JWT)
- Compare with claimed action_id
- For each attachment:
- Download file descriptor
- Download all variant blobs
- Verify each variant_id = SHA256(blob)
- Recompute file_id = SHA256(descriptor)
- Compare with attachment reference
- β Complete verification
No trust required: Pure mathematics ensures integrity.
Chain of Trust
Parent references create an immutable chain.
Example:
Post (a1~abc...)
β parent_id
Comment (a1~def...)
β parent_id
Reply (a1~ghi...)- Reply references Comment by content hash
- Comment references Post by content hash
- Cannot modify Post without breaking Comment reference
- Cannot modify Comment without breaking Reply reference
- Entire thread is cryptographically bound together
Proof of Authenticity
Cloudillo provides two complementary layers of proof:
Layer 1: Author Identity (Cryptographic Signatures)
Action tokens are signed with ES384 (ECDSA with P-384 curve and SHA-384 hash):
Action Token = Header.Payload.Signature
Signature = ECDSA_sign(SHA384(Header.Payload), alice_private_key)Verification:
- Fetch alice’s public key from
https://cl-o.alice.example.com/api/me/keys - Verify signature using public key
- β Proves Alice created this action
Layer 2: Content Integrity (Content Hashes)
All identifiers are SHA-256 hashes:
action_id = SHA256(entire JWT token)
file_id = SHA256(descriptor string)
variant_id = SHA256(blob bytes)Verification:
- Download content
- Recompute hash
- Compare with claimed identifier
- β Proves content hasn’t been tampered with
Combined Proof
Author + Content = Complete Authenticity
For a post with image attachment:
- β Signature proves Alice authored the post
- β Action hash proves post content is intact
- β File hash proves descriptor is intact
- β Variant hashes prove image bytes are intact
- β Complete chain of authenticity established
No trusted intermediaries neededβpure cryptographic proof.
Verification Example
Scenario: Verify a LIKE action on a POST with image attachment.
verify_like_action(like_id):
// 1. Verify LIKE action
like_token = fetch_action_token(like_id)
verify_signature(like_token, bob_public_key)
verify like_id == SHA256(like_token)
β Bob authored this LIKE, content is intact
// 2. Verify parent POST action
post_id = like_token.parent_id
post_token = fetch_action_token(post_id)
verify_signature(post_token, alice_public_key)
verify post_id == SHA256(post_token)
β Alice authored this POST, content is intact
// 3. Verify file attachment
file_id = post_token.attachments[0]
descriptor = fetch_descriptor(file_id)
verify file_id == SHA256(descriptor)
β File descriptor is intact
// 4. Verify each variant blob
variants = parse_descriptor(descriptor)
for each variant:
blob_data = fetch_blob(variant.blob_id)
verify variant.blob_id == SHA256(blob_data)
verify blob_data.size == variant.size
β All image variants are intact
RESULT: Complete chain verified!What this proves:
- Bob signed the LIKE (authentication)
- Alice signed the POST (authentication)
- No content was tampered with at any level (integrity)
- The LIKE references this specific POST (linkage)
- The POST references this specific image file (linkage)
- All image bytes are authentic (end-to-end verification)
DAG Structure
Cloudillo’s action system forms a Directed Acyclic Graph (DAG) with these properties:
Multiple Roots
Unlike a traditional tree with one root, Cloudillo has multiple independent threads:
Post 1 (a1~abc...) Post 2 (a1~xyz...)
β β
ββ Comment 1.1 (a1~def...) ββ Comment 2.1 (a1~uvw...)
β β β
β ββ Reply 1.1.1 (a1~ghi...) ββ Like 2.1 (a1~rst...)
β β (parent: a1~uvw...)
β ββ Reply 1.1.2 (a1~jkl...)
β
ββ Comment 1.2 (a1~mno...)
β
ββ Like 1 (a1~pqr...)
(parent: a1~abc...)Each top-level post is a root node. Comments and reactions form child nodes.
Shared Attachments
Multiple actions can reference the same file:
Post 1 (a1~abc...)
ββ Attachment: f1~photo123
Post 2 (a1~xyz...)
ββ Attachment: f1~photo123 β SAME FILE!
Repost (a1~uvw...)
ββ Attachment: f1~photo123 β SAME FILE!Benefits:
- Storage efficiency: Only one copy of blob data needed
- Bandwidth efficiency: Download once, use everywhere
- Consistency: Everyone sees exactly the same image
- Verification: Single verification proves authenticity for all uses
Acyclic Property
The graph has no cycles (no circular references):
β Valid:
Post β Comment β Reply (linear chain)
Post β Comment1, Post β Comment2 (branching)
β Invalid:
Post β Comment β Reply β Post (cycle!)Cycles are prevented because:
- Parent references use content hashes
- Cannot reference an action that doesn’t exist yet
- Cannot create action hash without knowing full content
- Mathematical impossibility to create circular references
Efficient Traversal
Forward traversal (find children):
-- Find all comments on a post
SELECT * FROM actions
WHERE parent_id = 'a1~abc123...'
AND type LIKE 'CMNT%';Backward traversal (find parents):
find_root(action_id):
action = fetch_action(action_id)
if action.parent_id exists:
return find_root(action.parent_id) // Recursive
else:
return action // Found rootRoot ID optimization: To avoid repeated traversal, the root_id is computed once and stored in the database (see Actions: Root ID Handling).
Performance Implications
Caching Strategy
Content-addressed data is immutable, enabling aggressive caching:
GET /api/files/f1~Qo2E3G8TJZ...
Response Headers:
Cache-Control: public, max-age=31536000, immutable
Content-Type: image/avifBenefits:
- Browsers cache forever (max-age = 1 year)
- CDN cache forever
- No cache invalidation needed
- Reduces bandwidth for federated instances
Storage Deduplication
Identical content is stored only once:
Alice uploads photo.jpg β b1~abc123 (1MB stored)
Bob uploads photo.jpg β b1~abc123 (0MB stored, reuse!)
Carol uploads photo.jpg β b1~abc123 (0MB stored, reuse!)
Total storage: 1MB instead of 3MBAutomatic deduplication at multiple levels:
- Variant level (same image blob)
- File level (same set of variants)
- Action level (impossible to duplicate due to signatures and timestamps)
Security Considerations
Collision Resistance
SHA-256 provides 256-bit security:
- Probability of collision: ~2^-256 (effectively zero)
- Preimage attack: computationally infeasible
- Second preimage attack: computationally infeasible
In practice: Finding a collision would require more energy than exists in the observable universe.
Tamper Detection
Any modification anywhere in the tree is immediately detectable:
Attack scenario: Attacker tries to modify an image in alice’s post
1. Attacker modifies image blob
β New variant_id (hash mismatch!)
2. Attacker updates descriptor with new variant_id
β New file_id (hash mismatch!)
3. Attacker updates post attachment with new file_id
β Breaks JWT signature (alice didn't sign this!)
4. Attacker creates new JWT with new attachment
β New action_id (different post!)Result: Attacker cannot tamper without detection. They can only create NEW actions, not modify existing ones.
Trust Model
Cloudillo’s merkle tree creates a trustless verification model:
| What | Verification Method | Trust Required |
|---|---|---|
| Author identity | JWT signature (ES384) | DNS + Public key infrastructure |
| Content integrity | SHA-256 hash | None (pure mathematics) |
| Parent references | Content hashes | None (pure mathematics) |
| Attachment integrity | SHA-256 hash chain | None (pure mathematics) |
Storage providers don’t need to be trusted: Even if a storage provider is malicious, they cannot:
- Modify content without breaking hashes β
- Forge signatures without private keys β
- Create false parent references β
- Tamper with attachments without detection β
Users verify everything cryptographicallyβno trust required.
Attack Resistance
Known attacks and mitigations:
| Attack | Mitigation |
|---|---|
| Modify action content | Hash mismatch detected |
| Forge author signature | Signature verification fails |
| Swap file attachment | Hash mismatch detected |
| Modify parent reference | Breaks cryptographic chain |
| Replay old actions | Timestamp validation, deduplication |
| Storage provider tampering | Hash verification fails |
See Also
- [Actions & Action Tokens](/architecture/actions-federation/actions - How action tokens are created and verified
- File Storage & Processing - How file content-addressing works
- Identity System - Cryptographic signing keys for actions
- [Access Control](/architecture/data-layer/access-control/access - How access tokens work alongside content-addressing
- System Architecture - Overall system design