Blob Storage
Cloudillo’s blob storage uses content-addressed storage for immutable binary data (files, images, videos). Intelligent variant generation for images ensures data integrity, deduplication, and efficient delivery across different use cases.
Why Content-Addressed Storage?
Traditional file storage asks “where is this file?” Content-addressed storage asks “what is this file?” This simple shift enables powerful features:
Real-world analogy: Imagine a library where books are organized by their content fingerprint rather than shelf location. Two copies of the same book have the same fingerprint—you only need to store one. If someone claims to have “the original,” you can verify it instantly by checking the fingerprint.
Benefits you’ll notice:
- Upload once, access anywhere: The same image uploaded by different users is stored only once
- Verification without trust: Anyone can confirm a file hasn’t been modified
- Efficient caching: Files can be cached forever—they never change
- Automatic deduplication: Storage costs decrease as the network grows
Benefits for developers:
- Simple URLs: File ID = file content = permanent reference
- No cache invalidation: Content-addressed files are immutable
- Built-in integrity: Hash verification catches corruption instantly
Content-Addressed Storage
Concept
Files are identified by the SHA256 hash of their content, making identifiers:
- Immutable: Content cannot change without changing the ID
- Verifiable: Recipients can verify integrity
- Deduplicat able: Identical content gets same ID
- Tamper-proof: Any modification is immediately detectable
File Identifier Format
Cloudillo uses multiple identifier types in its content-addressing system:
{prefix}{version}~{base64url_hash}Components:
{prefix}: Resource type indicator (a, f, b, d){version}: Hash algorithm version (currently 1 = SHA-256)~: Separator{base64url_hash}: Base64url-encoded hash (43 characters, no padding)
Identifier Types
| Prefix | Resource Type | Hash Input | Example |
|---|---|---|---|
b1~ |
Blob | Blob bytes (raw image/video data) | b1~abc123def456... |
f1~ |
File | File descriptor string | f1~QoEYeG8TJZ2HTGh... |
d2, |
Descriptor | (not a hash, the encoded format itself) | d2,vis.tn:b1~abc:f=avif:... |
a1~ |
Action | Complete JWT token | a1~8kR3mN9pQ2vL... |
Important: d2, is not a content-addressed identifier—it’s the actual encoded descriptor string. The file ID (f1~) is the hash of this descriptor.
Examples
Blob ID: b1~QoEYeG8TJZ2HTGhVlrtTDBpvBGOp6gfGhq4QmD6Z46w
File ID: f1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
Descriptor: d2,vis.tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128
Action ID: a1~8kR3mN9pQ2vL6xWpYzT4BjN5FqGxCmK9RsH2VwLnD8PAll file and blob IDs use SHA-256 content-addressing. See Content-Addressing & Merkle Trees for hash computation details.
File Types
Cloudillo supports four file types, each handled by different adapters based on mutability and use case:
| Type | Adapter | Mutability | Description |
|---|---|---|---|
BLOB |
BlobAdapter | Immutable | Binary content (images, videos, documents) |
CRDT |
CrdtAdapter | Mutable | Collaborative documents (Yjs-based real-time editing) |
RTDB |
RtdbAdapter | Mutable | Real-time database files for app state |
FLDR |
MetaAdapter | Mutable | Folder/directory metadata |
Why Different File Types?
Different collaboration scenarios require different storage strategies:
- BLOB: When you upload a photo or video, it never changes—if you want a different version, you upload a new file. This immutability enables powerful caching and deduplication across the network.
- CRDT: When editing a document with others in real-time (like Google Docs), changes from all participants must merge seamlessly. CRDTs (Conflict-free Replicated Data Types) make this possible.
- RTDB: Apps need to store changing state (todo lists, game scores, form data). RTDB provides real-time synchronization with WebSocket subscriptions.
- FLDR: Organizing files into folders requires mutable metadata without changing the files themselves.
File Type Selection
Files are created via different API endpoints based on their type:
| Endpoint | File Type | Use Case |
|---|---|---|
/api/files/image/* |
BLOB |
Image uploads with auto-variant generation |
/api/files/raw/* |
BLOB |
Raw file uploads (no processing) |
/api/files/crdt/* |
CRDT |
Collaborative document creation |
/api/files/rtdb/* |
RTDB |
Real-time database file creation |
Per-File Access Control
Each file has independent access control:
| Field | Values | Description |
|---|---|---|
| Visibility | P/V/F/C/null |
Who can discover this file |
| Access Level | R/W |
Read-only vs read-write access |
See Access Control for detailed permission handling.
File Variants
Concept
A single uploaded image automatically generates multiple variants optimized for different use cases:
- tn (thumbnail): Tiny preview (~128x96px)
- sd (standard definition): Social media size (~640x480px)
- md (medium definition): Web display (~1280x720px)
- hd (high definition): Full screen (~1920x1080px)
- xd (extra definition): Original/4K+ (~3840x2160px+)
File Descriptor Encoding
A file descriptor encodes all available variants in a compact format.
File Descriptor Format Specification
Format
d2,{class}.{variant}:{blob_id}:f={format}:s={size}:r={width}x{height}[:{optional}];...Components
d2,- Descriptor prefix (version 2){class}- Media class:vis- Visual (images: jpeg, png, webp, avif)vid- Video (mp4/h264)aud- Audio (opus, mp3)doc- Documents (pdf)raw- Original unprocessed file
{variant}- Quality tier:pf,tn,sd,md,hd,xd, ororig{blob_id}- Content-addressed ID of the blob (b1~...)f={format}- Format:avif,webp,jpeg,png,mp4,opus,pdfs={size}- File size in bytes (integer, no separators)r={width}x{height}- Resolution in pixels (width × height);- Semicolon separator between variants (no spaces)
Optional Fields
For video, audio, and document files:
dur={seconds}- Duration in seconds (floating point, video/audio only)br={kbps}- Bitrate in kbps (integer, video/audio only)pg={count}- Page count (integer, documents only)
Example
d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x150;vis.sd:b1~def456:f=avif:s=32768:r=640x480This descriptor encodes two variants:
- Thumbnail: AVIF format, 4096 bytes, 150×150 pixels, blob ID
b1~abc123 - Standard: AVIF format, 32768 bytes, 640×480 pixels, blob ID
b1~def456
Video Example
d2,vis.tn:b1~abc:f=avif:s=4096:r=150x84;vid.sd:b1~def:f=mp4:s=5242880:r=720x404:dur=120.5:br=350;vid.hd:b1~ghi:f=mp4:s=20971520:r=1920x1080:dur=120.5:br=1400This descriptor includes:
- Thumbnail: AVIF image preview
- SD Video: 720p MP4, 120.5 seconds, 350 kbps
- HD Video: 1080p MP4, 120.5 seconds, 1400 kbps
Parsing Rules
- Check prefix: Verify descriptor starts with
d2, - Split by semicolon (
;): Get individual variant entries - For each variant, split by colon (
:) to get components:- Component [0] = class.variant (
vis.tn,vis.sd,vid.hd) - Component [1] = blob_id (
b1~...) - Components [2..] = key=value pairs
- Component [0] = class.variant (
- Parse key=value pairs:
f={format}→ Format strings={size}→ Parse as u64 (bytes)r={width}x{height}→ Split byx, parse as u32 × u32dur={seconds}→ Parse as f64 (optional)br={kbps}→ Parse as u32 (optional)pg={count}→ Parse as u32 (optional)
Parsing logic: split by semicolons for variants, then by colons for fields, then parse key=value pairs.
Variant Size Classes - Exact Specifications
Cloudillo generates image variants at specific size targets to optimize bandwidth and storage:
| Class | Name | Target Resolution | Max Dimension | Use Case |
|---|---|---|---|---|
tn |
Thumbnail | ~150×150px | 200px | List views, previews, avatars |
sd |
Standard Definition | ~640×480px | 800px | Mobile devices, low bandwidth |
md |
Medium Definition | ~1920×1080px | 2000px | Desktop viewing, full screen |
hd |
High Definition | ~3840×2160px | 4000px | 4K displays, high quality |
xd |
Extra Definition | Original size | No limit | Archival, original quality |
Generation Rules
Generated variants based on maximum dimension (largest of width or height):
- max_dim ≥ 3840px: tn, sd, md, hd, xd (all variants)
- max_dim ≥ 1920px: tn, sd, md, hd
- max_dim ≥ 1280px: tn, sd, md
- max_dim < 1280px: tn, sd
Properties:
- Each variant maintains the original aspect ratio
- Uses Lanczos3 filter for high-quality downscaling
- Maximum dimension constraint prevents oversizing
- Smaller originals don’t get upscaled
Variant Selection
Clients request a specific variant:
GET /api/files/f1~Qo2E3G8TJZ...?variant=hdResponse: Returns HD variant if available, otherwise falls back to smaller variants.
Automatic Fallback
If the requested variant doesn’t exist, the server returns the best available:
- Try requested variant (e.g.,
hd) - Fall back to next smaller (e.g.,
md) - Continue until variant found
- Return smallest if none larger
Fallback order: xd → hd → md → sd → tn
Content-Addressing Flow
File storage uses a three-level content-addressing hierarchy:
Level 1: Blob Storage
Upload image → Save as blob → Compute SHA256 of blob bytes → Store blob with ID: b1~{hash}
blob_data = read_file("thumbnail.avif")
blob_id = compute_hash("b", blob_data)
// Result: "b1~abc123..." (thumbnail blob ID)Example: b1~abc123... identifies the thumbnail AVIF blob
See Content-Addressing & Merkle Trees for hash computation details.
Level 2: Variant Collection
Generate all variants (tn, sd, md, hd) → Each variant gets its own blob ID (b1~...) → Collect all variant metadata → Create descriptor string encoding all variants
variants = [
{ class: "vis.tn", blob_id: "b1~abc123", format: "avif", size: 4096, width: 150, height: 150 },
{ class: "vis.sd", blob_id: "b1~def456", format: "avif", size: 32768, width: 640, height: 480 },
{ class: "vis.md", blob_id: "b1~ghi789", format: "avif", size: 262144, width: 1920, height: 1080 },
]
descriptor = build_descriptor(variants)
// Result: "d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x150;vis.sd:b1~def456:f=avif:s=32768:r=640x480;vis.md:b1~ghi789:f=avif:s=262144:r=1920x1080"Level 3: File Descriptor
Build descriptor → Compute SHA256 of descriptor string → Final file ID: f1~{hash} → This file ID goes into action attachments
descriptor = "d2,vis.tn:b1~abc:f=avif:s=4096:r=150x150;vis.sd:b1~def:f=avif:s=32768:r=640x480"
file_id = compute_hash("f", descriptor.as_bytes())
// Result: "f1~Qo2E3G8TJZ..." (file ID)Example Complete Flow
1. User uploads photo.jpg (3MB, 3024x4032px)
2. System generates variants:
vis.tn: 150x200px → 4KB → b1~abc123
vis.sd: 600x800px → 32KB → b1~def456
vis.md: 1440x1920px → 256KB → b1~ghi789
vis.hd: 2880x3840px → 1MB → b1~jkl012
3. System builds descriptor:
"d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x200;
vis.sd:b1~def456:f=avif:s=32768:r=600x800;
vis.md:b1~ghi789:f=avif:s=262144:r=1440x1920;
vis.hd:b1~jkl012:f=avif:s=1048576:r=2880x3840"
4. System hashes descriptor:
file_id = f1~Qo2E3G8TJZ2... = SHA256(descriptor)
5. Action references file:
POST action attachments = ["f1~Qo2E3G8TJZ2..."]
6. Anyone can verify:
- Download all variants
- Verify each blob_id = SHA256(blob)
- Rebuild descriptor
- Verify file_id = SHA256(descriptor)
- Cryptographic proof established ✓Integration with Action Merkle Tree
File attachments create an extended merkle tree:
Action (a1~8kR...)
├─ Signed by user (ES384)
├─ Content-addressed (SHA256 of JWT)
└─ Attachments: [f1~Qo2...]
└─ File (f1~Qo2...)
├─ Content-addressed (SHA256 of descriptor)
└─ Descriptor: "d2,vis.tn:b1~abc...;vis.sd:b1~def..."
├─ Blob vis.tn (b1~abc...)
│ └─ Content-addressed (SHA256 of blob)
├─ Blob vis.sd (b1~def...)
│ └─ Content-addressed (SHA256 of blob)
└─ Blob vis.md (b1~ghi...)
└─ Content-addressed (SHA256 of blob)Benefits:
- Entire tree is cryptographically verifiable
- Cannot modify image without changing all parent hashes
- Deduplication: same image = same file_id
- Federation: remote instances can verify integrity
See Content-Addressing & Merkle Trees for how file content-addressing integrates with the action system.
Image Processing Pipeline
Upload Flow
When a client uploads an image:
- Client Request
POST /api/files/image/profile-picture.jpg
Authorization: Bearer <access_token>
Content-Type: image/jpeg
Content-Length: 2458624
<binary image data>- Dimension Extraction
Extract image dimensions to determine which variants to generate:
img = load_image_from_memory(data)
(width, height) = img.dimensions()
max_dim = max(width, height)
if max_dim >= 3840:
variants = ["tn", "sd", "md", "hd", "xd"]
else if max_dim >= 1920:
variants = ["tn", "sd", "md", "hd"]
else if max_dim >= 1280:
variants = ["tn", "sd", "md"]
else:
variants = ["tn", "sd"]- FileIdGeneratorTask
Create a task to generate the content-addressed ID:
task = FileIdGeneratorTask(
tn_id,
temp_file_path="/tmp/upload-abc123",
original_filename="profile-picture.jpg"
)
task_id = scheduler.schedule(task)- ImageResizerTask (Multiple)
For each variant, create a resize task:
for variant in variants:
task = ImageResizerTask(
tn_id,
source_file_id=original_id,
variant=variant,
target_dimensions=get_variant_dimensions(variant),
format="avif", # Primary format
quality=85,
dependencies=[file_id_task_id] # Wait for ID generation
)
scheduler.schedule(task)- Hash Computation
FileIdGeneratorTask computes SHA256 hash:
file_id = compute_content_hash("f", file_contents)
# See merkle-tree.md for hash computation details- Blob Storage
Store original in BlobAdapter:
blob_adapter.create_blob_stream(tn_id, file_id, file_stream)- Variant Generation
Each ImageResizerTask runs in worker pool (CPU-intensive):
# Execute in worker pool
img = load_image(source_path)
# Resize with Lanczos3 filter (high quality)
resized = img.resize(target_width, target_height, filter=Lanczos3)
# Encode to AVIF
buffer = encode_avif(resized, quality)
# Store variant
variant_id = compute_file_id(buffer)
blob_adapter.create_blob(tn_id, variant_id, buffer)- Metadata Storage
Store file metadata with all variants:
file_metadata = FileMetadata(
tn_id,
file_id=descriptor_id,
original_filename="profile-picture.jpg",
mime_type="image/jpeg",
size=original_size,
variants=[
Variant(name="tn", blob_id="b1~QoE...46w", format="avif",
size=4096, width=200, height=200),
Variant(name="sd", blob_id="b1~xyz...789", format="webp",
size=32768, width=640, height=480),
# ... more variants
],
created_at=current_timestamp()
)
meta_adapter.create_file_metadata(tn_id, file_metadata)- Response
Return descriptor ID to client:
{
"file_id": "f1~QoE...46w",
"descriptor": "d2,vis.tn:b1~QoE...46w:f=avif:s=4096:r=128x96;vis.sd:b1~xyz...789:f=avif:s=8192:r=640x480",
"variants": [
{"name": "vis.tn", "format": "avif", "size": 4096, "dimensions": "128x96"},
{"name": "vis.sd", "format": "avif", "size": 8192, "dimensions": "640x480"}
],
"processing": true
}Complete Upload Flow Diagram
Client uploads image
↓
POST /api/files/image/filename.jpg
↓
Save to temp file
↓
Extract dimensions
↓
Determine variants to generate
↓
Create FileIdGeneratorTask
├─ Compute SHA256 hash
├─ Move to permanent storage (BlobAdapter)
└─ Generate file_id
↓
Create ImageResizerTask (for each variant)
├─ Depends on FileIdGeneratorTask
├─ Load source image
├─ Resize with Lanczos3
├─ Encode to AVIF/WebP/JPEG
├─ Compute variant ID (SHA256)
└─ Store in BlobAdapter
↓
Create file descriptor
├─ Collect all variant IDs
├─ Encode as descriptor
└─ Store metadata in MetaAdapter
↓
Return descriptor ID to clientDownload Flow
Client Request
GET /api/files/f1~...?variant=hd
Authorization: Bearer <access_token>Server Processing
- Parse Descriptor
variants = parse_file_descriptor(file_id)
# Returns list of VariantInfo- Select Best Variant
selected = select_best_variant(
variants,
requested_variant, # "hd"
)
# Falls back if exact match not available:
# hd/avif → hd/webp → md/avif → md/webp → sd/avif → ...- Stream from BlobAdapter
stream = blob_adapter.read_blob_stream(tn_id, selected.file_id)
# Set response headers
response.headers["Content-Type"] = f"image/{selected.format}"
response.headers["X-Cloudillo-Variant"] = selected.blob_id
response.headers["X-Cloudillo-Descriptor"] = descriptor
response.headers["Content-Length"] = selected.size
# Stream response
return stream_response(stream)Response
HTTP/1.1 200 OK
Content-Type: image/avif
Content-Length: 16384
X-Cloudillo-Variant: b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
X-Cloudillo-Descriptor: d2,vis.tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128;vis.sd:b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM:f=avif:s=8137:r=256x364;vis.orig:b1~5gU72rRGiaogZuYhJy853pBd6PsqjPOjS__Kim9-qE0:f=avif:s=15012:r=256x364
Cache-Control: public, max-age=31536000, immutable
<binary image data>Note: Content-addressed files are immutable, so can be cached forever.
Metadata Structure
FileMetadata
Stored in MetaAdapter:
FileMetadata {
tn_id: TnId
file_id: String # Descriptor ID
original_filename: String
mime_type: String
size: u64 # Original size
width: Optional[u32]
height: Optional[u32]
variants: List[VariantInfo]
created_at: i64
owner: String # Identity tag
permissions: FilePermissions
}
VariantInfo {
name: String # "tn", "sd", "md", "hd", "xd"
file_id: String # Content-addressed ID
format: String # "avif", "webp", "jpeg", "png"
size: u64 # Bytes
width: u32
height: u32
}
FilePermissions {
public_read: bool
shared_with: List[String] # Identity tags
}File Presets
Concept
Presets define how files should be processed:
FilePreset:
Image # Auto-generate variants
Video # Future: transcode, thumbnails
Document # Future: preview generation
Database # RTDB database files
Raw # No processing, store as-isUpload with Preset
POST /api/files/{preset}/{filename}
Examples:
POST /api/files/image/avatar.jpg // Generate image variants
POST /api/files/raw/document.pdf // Store as-isStorage Organization
BlobAdapter Layout
{data_dir}/
├── blobs/
│ ├── {tn_id}/
│ │ ├── f1~QoE...46w // Original file
│ │ ├── f1~xyz...789 // Variant 1
│ │ ├── f1~abc...123 // Variant 2
│ │ └── ...
│ └── {other_tn_id}/
│ └── ...MetaAdapter (SQLite)
CREATE TABLE files (
id INTEGER PRIMARY KEY,
tn_id INTEGER NOT NULL,
file_id TEXT NOT NULL,
original_filename TEXT,
mime_type TEXT,
size INTEGER,
width INTEGER,
height INTEGER,
variants TEXT, -- JSON array
created_at INTEGER,
owner TEXT,
permissions TEXT, -- JSON object
UNIQUE(tn_id, file_id)
);
CREATE INDEX idx_files_owner ON files(owner);
CREATE INDEX idx_files_created ON files(created_at);Performance Considerations
Worker Pool Usage
Image processing is CPU-intensive, so uses worker pool:
# Priority levels
Priority.High → User-facing operations (thumbnail)
Priority.Medium → Background tasks (other image variants)
Priority.Low → Longer operations (video upload)Parallel Processing
Multiple variants can be generated in parallel:
# Create all resize tasks at once
task_ids = []
for variant in ["tn", "sd", "md", "hd"]:
task_id = scheduler.schedule(ImageResizerTask(
variant=variant,
# ...
))
task_ids.append(task_id)
# Wait for all to complete
scheduler.wait_all(task_ids)Caching Strategy
Content-addressed files are immutable:
Cache-Control: public, max-age=31536000, immutable- Browsers cache forever
- CDN can cache forever
- No cache invalidation needed
See Also
- System Architecture - Task system and worker pool
- Actions - File attachments in action tokens
- Access Control - File permission checking