Content-Addressing & Merkle Trees
Cloudillo implements a merkle tree structure using content-addressed identifiers throughout its architecture. Every action, file, and data blob is identified by the cryptographic hash of its content, creating an immutable, verifiable chain of trust.
What is Content-Addressing?
Content-addressing means identifying data by what it is (its content) rather than where it is (its location). Instead of using arbitrary IDs or URLs, Cloudillo computes a cryptographic hash of the content itself and uses that hash as the identifier.
Benefits
- β Immutable: Content cannot change without changing its identifier
- β Tamper-Evident: Any modification is immediately detectable
- β Deduplicatable: Identical content produces identical identifiers
- β Verifiable: Anyone can recompute and verify hashes independently
- β Cacheable: Content-addressed data can be cached forever
- β Trustless: No need to trust storage providersβverify the hash
Hash Function
Cloudillo uses SHA-256 for all content-addressing:
- Algorithm: SHA-256 (256-bit Secure Hash Algorithm)
- Encoding: Base64url without padding (URL-safe)
- Output: 43-character base64-encoded string
- Collision Resistance: Cryptographically secure
compute_hash(prefix, data):
hash = SHA256(data)
encoded = base64url_encode(hash) // URL-safe, no padding
return "{prefix}1~{encoded}"
// Example:
compute_hash("b", blob_bytes) β "b1~abc123def456..." (43 chars)
compute_hash("f", descriptor) β "f1~Qo2E3G8TJZ..." (43 chars)
compute_hash("a", jwt_token) β "a1~8kR3mN9pQ2vL..." (43 chars)Merkle Tree Structure
Cloudillo’s content-addressing creates a variable-depth merkle tree where actions can reference other actions recursively. The example below shows a six-level hierarchy for a POST action with image attachments:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Level 6: Action ID (a1~8kR3mN9pQ2vL6xW...) β
β β SHA-256 hash of β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Level 5: Action Token (JWT) β
β Header: {"alg":"ES384","typ":"JWT"} β
β Payload: { β
β "iss": "alice.example.com", β
β "t": "POST:IMG", β
β "c": "Amazing photo!", β
β "a": ["f1~Qo2E3G8TJZ..."], β
β "iat": 1738483100 β
β } β
β Signature: <ES384 signature> β
β β references β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Level 4: File ID (f1~Qo2E3G8TJZ2HTGhVlrtTDBp...) β
β β SHA-256 hash of β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Level 3: File Descriptor (d2,...) β
β "d2,vis.tn:b1~abc:f=avif:s=4096:r=150x150; β
β vis.sd:b1~def:f=avif:s=32768:r=640x480; β
β vis.md:b1~ghi:f=avif:s=262144:r=1920x1080" β
β β references β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Level 2: Variant IDs β
β b1~abc123... (vis.tn) β
β b1~def456... (vis.sd) β
β b1~ghi789... (vis.md) β
β β SHA-256 hash of β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Level 1: Blob Data (raw bytes) β
β <AVIF encoded image data - tn variant> β
β <AVIF encoded image data - sd variant> β
β <AVIF encoded image data - md variant> β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββLevel 1: Blob Data
Raw bytes of actual content (images, videos, documents).
Level 2: Variant IDs (b1~…)
Content-addressed blob identifiers computed as SHA256(blob_bytes).
Level 3: File Descriptor (d2,…)
Encoded string listing all available variants of a file.
Format:
d2,{class}.{variant}:{variant_id}:f={format}:s={size}:r={width}x{height};...Example:
d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x150;vis.sd:b1~def456:f=avif:s=32768:r=640x480Level 4: File ID (f1~…)
Content-addressed file identifier computed as SHA256(descriptor_string).
Level 5: Action Token (JWT)
Cryptographically signed JSON Web Token representing a user action.
Structure:
{
"header": {
"alg": "ES384",
"typ": "JWT"
},
"payload": {
"iss": "alice.example.com",
"k": "20250101",
"t": "POST:IMG",
"c": "Check out this amazing photo!",
"a": ["f1~Qo2E3G8TJZ..."],
"iat": 1738483100
},
"signature": "<ES384 signature with alice's private key>"
}Level 6: Action ID (a1~…)
Content-addressed action identifier computed as SHA256(complete_jwt_token).
Hash Versioning Scheme
All identifiers use a versioned prefix format for future-proofing:
{prefix}{version}~{base64_encoded_hash}Current Prefixes
| Prefix | Resource Type | Hash Input | Example |
|---|---|---|---|
a1~ |
Action | Entire JWT token | a1~8kR3mN9pQ2vL6xW... |
f1~ |
File | File descriptor string | f1~Qo2E3G8TJZ2HTGh... |
d2, |
Descriptor | (not a hash, the encoded format itself) | d2,vis.tn:b1~abc:f=avif:... |
b1~ |
Blob | Blob bytes (raw data) | b1~abc123def456ghi... |
Version Scheme
- Version 1: SHA-256 with base64url encoding (no padding)
- Future versions: Can upgrade to SHA-3, BLAKE3, etc.
- Backward compatibility: Old content remains valid forever
- Algorithm agility: Migrate to new algorithms without breaking existing references
Example upgrade path:
a1~... (SHA-256)
a2~... (SHA-3)
a3~... (BLAKE3)Merkle Tree Properties
Cloudillo’s content-addressing creates a merkle tree with these properties:
Immutability
Once created, content cannot be modified without changing its identifier.
Example:
Original Post:
content: "Hello World"
action_id: a1~abc123...
Modified Post:
content: "Hello Universe"
action_id: a1~xyz789... β DIFFERENT ID!Any attempt to modify content results in a completely new action with a different ID. The original remains unchanged.
Tamper-Evidence
Any modification anywhere in the tree is immediately detectable.
Example:
Post (a1~abc...)
ββ Attachment: f1~Qo2...
ββ Variants: b1~tn..., b1~sd..., b1~md...
If someone modifies the thumbnail image:
β New variant_id (b1~xyz...)
β Descriptor changes
β New file_id (f1~uvw...)
β Post attachment no longer matches
β Verification fails!Deduplication
Identical content produces identical identifiers, enabling automatic deduplication.
Example:
Alice posts photo.jpg β file_id: f1~abc123...
Bob posts photo.jpg (same file) β file_id: f1~abc123... (same!)
Storage: Only one copy of the blob bytes needed
Bandwidth: Can skip downloading if already cachedVerifiability
Anyone can independently verify the entire chain.
Process:
- Download action token
- Verify JWT signature (proves author identity)
- Recompute action_id = SHA256(JWT)
- Compare with claimed action_id
- For each attachment:
- Download file descriptor
- Download all variant blobs
- Verify each variant_id = SHA256(blob)
- Recompute file_id = SHA256(descriptor)
- Compare with attachment reference
- β Complete verification
No trust required: Pure mathematics ensures integrity.
Chain of Trust
Parent references create an immutable chain.
Example:
Post (a1~abc...)
β parent_id
Comment (a1~def...)
β parent_id
Reply (a1~ghi...)- Reply references Comment by content hash
- Comment references Post by content hash
- Cannot modify Post without breaking Comment reference
- Cannot modify Comment without breaking Reply reference
- Entire thread is cryptographically bound together
Proof of Authenticity
Cloudillo provides two complementary layers of proof:
Layer 1: Author Identity (Cryptographic Signatures)
Action tokens are signed with ES384 (ECDSA with P-384 curve and SHA-384 hash):
Action Token = Header.Payload.Signature
Signature = ECDSA_sign(SHA384(Header.Payload), alice_private_key)Verification:
- Fetch alice’s public key from
https://cl-o.alice.example.com/api/me/keys - Verify signature using public key
- β Proves Alice created this action
Layer 2: Content Integrity (Content Hashes)
All identifiers are SHA-256 hashes:
action_id = SHA256(entire JWT token)
file_id = SHA256(descriptor string)
variant_id = SHA256(blob bytes)Verification:
- Download content
- Recompute hash
- Compare with claimed identifier
- β Proves content hasn’t been tampered with
Verification Example
Scenario: Verify a LIKE action on a POST with image attachment.
verify_like_action(like_id):
// 1. Verify LIKE action
like_token = fetch_action_token(like_id)
verify_signature(like_token, bob_public_key)
verify like_id == SHA256(like_token)
β Bob authored this LIKE, content is intact
// 2. Verify parent POST action
post_id = like_token.parent_id
post_token = fetch_action_token(post_id)
verify_signature(post_token, alice_public_key)
verify post_id == SHA256(post_token)
β Alice authored this POST, content is intact
// 3. Verify file attachment
file_id = post_token.attachments[0]
descriptor = fetch_descriptor(file_id)
verify file_id == SHA256(descriptor)
β File descriptor is intact
// 4. Verify each variant blob
variants = parse_descriptor(descriptor)
for each variant:
blob_data = fetch_blob(variant.blob_id)
verify variant.blob_id == SHA256(blob_data)
verify blob_data.size == variant.size
β All image variants are intact
RESULT: Complete chain verified!DAG Structure
Cloudillo’s action system forms a Directed Acyclic Graph (DAG) with these properties:
Multiple Roots
Unlike a traditional tree with one root, Cloudillo has multiple independent threads:
Post 1 (a1~abc...) Post 2 (a1~xyz...)
β β
ββ Comment 1.1 (a1~def...) ββ Comment 2.1 (a1~uvw...)
β β β
β ββ Reply 1.1.1 (a1~ghi...) ββ Like 2.1 (a1~rst...)
β β (parent: a1~uvw...)
β ββ Reply 1.1.2 (a1~jkl...)
β
ββ Comment 1.2 (a1~mno...)
β
ββ Like 1 (a1~pqr...)
(parent: a1~abc...)Each top-level post is a root node. Comments and reactions form child nodes.
Performance Implications
Caching Strategy
Content-addressed data is immutable, enabling aggressive caching:
GET /api/files/f1~Qo2E3G8TJZ...
Response Headers:
Cache-Control: public, max-age=31536000, immutable
Content-Type: image/avifBenefits:
- Browsers cache forever (max-age = 1 year)
- CDN cache forever
- No cache invalidation needed
- Reduces bandwidth for federated instances
Security Considerations
Trust Model
Cloudillo’s merkle tree creates a trustless verification model:
| What | Verification Method | Trust Required |
|---|---|---|
| Author identity | JWT signature (ES384) | DNS + Public key infrastructure |
| Content integrity | SHA-256 hash | None (pure mathematics) |
| Parent references | Content hashes | None (pure mathematics) |
| Attachment integrity | SHA-256 hash chain | None (pure mathematics) |
Storage providers don’t need to be trusted: Even if a storage provider is malicious, they cannot:
- Modify content without breaking hashes β
- Forge signatures without private keys β
- Create false parent references β
- Tamper with attachments without detection β
Users verify everything cryptographicallyβno trust required.
Attack Resistance
Known attacks and mitigations:
| Attack | Mitigation |
|---|---|
| Modify action content | Hash mismatch detected |
| Forge author signature | Signature verification fails |
| Swap file attachment | Hash mismatch detected |
| Modify parent reference | Breaks cryptographic chain |
| Replay old actions | Timestamp validation, deduplication |
| Storage provider tampering | Hash verification fails |
See Also
- Actions & Action Tokens - How action tokens are created and verified
- File Storage & Processing - How file content-addressing works
- Identity System - Cryptographic signing keys for actions
- Access Control - How access tokens work alongside content-addressing
- System Architecture - Overall system design