Blob Storage
Cloudillo’s blob storage uses content-addressed storage for immutable binary data (files, images, videos). Intelligent variant generation for images ensures data integrity, deduplication, and efficient delivery across different use cases.
Content-Addressed Storage
File Identifier Format
Cloudillo uses multiple identifier types in its content-addressing system:
{prefix}{version}~{base64url_hash}Components:
{prefix}: Resource type indicator (a, f, b, d){version}: Hash algorithm version (currently 1 = SHA-256)~: Separator{base64url_hash}: Base64url-encoded hash (43 characters, no padding)
Identifier Types
| Prefix | Resource Type | Hash Input | Example |
|---|---|---|---|
b1~ |
Blob | Blob bytes (raw image/video data) | b1~abc123def456... |
f1~ |
File | File descriptor string | f1~QoEYeG8TJZ2HTGh... |
d2, |
Descriptor | (not a hash, the encoded format itself) | d2,vis.tn:b1~abc:f=avif:... |
a1~ |
Action | Complete JWT token | a1~8kR3mN9pQ2vL... |
Important: d2, is not a content-addressed identifier—it’s the actual encoded descriptor string. The file ID (f1~) is the hash of this descriptor.
Examples
Blob ID: b1~QoEYeG8TJZ2HTGhVlrtTDBpvBGOp6gfGhq4QmD6Z46w
File ID: f1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
Descriptor: d2,vis.tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128
Action ID: a1~8kR3mN9pQ2vL6xWpYzT4BjN5FqGxCmK9RsH2VwLnD8PAll file and blob IDs use SHA-256 content-addressing. See Content-Addressing & Merkle Trees for hash computation details.
File Types
Cloudillo supports four file types, each handled by different adapters based on mutability and use case:
| Type | Adapter | Mutability | Description |
|---|---|---|---|
BLOB |
BlobAdapter | Immutable | Binary content (images, videos, documents) |
CRDT |
CrdtAdapter | Mutable | Collaborative documents (Yjs-based real-time editing) |
RTDB |
RtdbAdapter | Mutable | Real-time database files for app state |
FLDR |
MetaAdapter | Mutable | Folder/directory metadata |
File Type Selection
Files are created via different API endpoints based on their type:
| Endpoint | File Type | Use Case |
|---|---|---|
POST /api/files/{preset}/{file_name} |
BLOB |
File uploads with preset-based variant generation |
POST /api/files |
CRDT/RTDB/FLDR |
Metadata-only creation for mutable file types |
Available presets for BLOB uploads: default, profile-picture, cover, high_quality, mobile, archive, podcast, video, orig-only, thumbnail-only, apkg
Per-File Access Control
Each file has independent access control:
| Field | Values | Description |
|---|---|---|
| Visibility | P/V/F/C/null |
Who can discover this file |
| Access Level | R/W |
Read-only vs read-write access |
See Access Control for detailed permission handling.
File Variants
Concept
A single uploaded image automatically generates multiple variants optimized for different use cases:
- pf (profile): Profile picture icon (~80px)
- tn (thumbnail): Small preview (~256px)
- sd (standard definition): Mobile/low bandwidth (~720px)
- md (medium definition): Desktop viewing (~1280px)
- hd (high definition): High quality display (~1920px)
- xd (extra definition): 4K/maximum quality (~3840px)
File Descriptor Encoding
A file descriptor encodes all available variants in a compact format.
File Descriptor Format Specification
Format
d2,{class}.{variant}:{blob_id}:f={format}:s={size}:r={width}x{height}[:{optional}];...Components
d2,- Descriptor prefix (version 2){class}- Media class:vis- Visual (images: jpeg, png, webp, avif)vid- Video (mp4/h264)aud- Audio (opus, mp3)doc- Documents (pdf)raw- Original unprocessed file
{variant}- Quality tier:pf,tn,sd,md,hd,xd, ororig{blob_id}- Content-addressed ID of the blob (b1~...)f={format}- Format:avif,webp,jpeg,png,mp4,opus,pdfs={size}- File size in bytes (integer, no separators)r={width}x{height}- Resolution in pixels (width × height);- Semicolon separator between variants (no spaces)
Optional Fields
For video, audio, and document files:
dur={seconds}- Duration in seconds (floating point, video/audio only)br={kbps}- Bitrate in kbps (integer, video/audio only)pg={count}- Page count (integer, documents only)
Example
d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x150;vis.sd:b1~def456:f=avif:s=32768:r=640x480This descriptor encodes two variants:
- Thumbnail: AVIF format, 4096 bytes, 150×150 pixels, blob ID
b1~abc123 - Standard: AVIF format, 32768 bytes, 640×480 pixels, blob ID
b1~def456
Video Example
d2,vis.tn:b1~abc:f=avif:s=4096:r=150x84;vid.sd:b1~def:f=mp4:s=5242880:r=720x404:dur=120.5:br=350;vid.hd:b1~ghi:f=mp4:s=20971520:r=1920x1080:dur=120.5:br=1400This descriptor includes:
- Thumbnail: AVIF image preview
- SD Video: 720p MP4, 120.5 seconds, 350 kbps
- HD Video: 1080p MP4, 120.5 seconds, 1400 kbps
Parsing Rules
- Check prefix: Verify descriptor starts with
d2, - Split by semicolon (
;): Get individual variant entries - For each variant, split by colon (
:) to get components:- Component [0] = class.variant (
vis.tn,vis.sd,vid.hd) - Component [1] = blob_id (
b1~...) - Components [2..] = key=value pairs
- Component [0] = class.variant (
- Parse key=value pairs:
f={format}→ Format strings={size}→ Parse as u64 (bytes)r={width}x{height}→ Split byx, parse as u32 × u32dur={seconds}→ Parse as f64 (optional)br={kbps}→ Parse as u32 (optional)pg={count}→ Parse as u32 (optional)
Parsing logic: split by semicolons for variants, then by colons for fields, then parse key=value pairs.
Variant Size Classes - Exact Specifications
Cloudillo generates image variants at specific size targets to optimize bandwidth and storage:
| Quality | Code | Max Dimension | Use Case |
|---|---|---|---|
| Profile | pf |
80px | Profile picture icons |
| Thumbnail | tn |
256px | List views, previews, avatars |
| Standard | sd |
720px | Mobile devices, low bandwidth |
| Medium | md |
1280px | Desktop viewing |
| High | hd |
1920px | High quality display |
| Extra | xd |
3840px | 4K displays, maximum quality |
| Original | orig |
- | Unprocessed source file |
Generation Rules
Which variants are generated depends on the preset configuration. The default preset generates: tn, sd, md, hd. The high_quality preset adds xd. Variants larger than the original image are automatically skipped (smaller originals are never upscaled).
Properties:
- Each variant maintains the original aspect ratio
- Uses Lanczos3 filter for high-quality downscaling
- Maximum dimension constraint prevents oversizing
- Smaller originals don’t get upscaled
Variant Selection
Clients request a specific variant:
GET /api/files/f1~Qo2E3G8TJZ...?variant=hdResponse: Returns HD variant if available, otherwise falls back to smaller variants.
Automatic Fallback
If the requested variant doesn’t exist, the server returns the best available:
- Try requested variant (e.g.,
hd) - Fall back to next smaller (e.g.,
md) - Continue until variant found
- Return smallest if none larger
Fallback order: xd → hd → md → sd → tn
Content-Addressing Flow
File storage uses a three-level content-addressing hierarchy:
Level 1: Blob Storage
Upload image → Save as blob → Compute SHA256 of blob bytes → Store blob with ID: b1~{hash}
blob_data = read_file("thumbnail.avif")
blob_id = compute_hash("b", blob_data)
// Result: "b1~abc123..." (thumbnail blob ID)Example: b1~abc123... identifies the thumbnail AVIF blob
See Content-Addressing & Merkle Trees for hash computation details.
Level 2: Variant Collection
Generate all variants (tn, sd, md, hd) → Each variant gets its own blob ID (b1~...) → Collect all variant metadata → Create descriptor string encoding all variants
variants = [
{ class: "vis.tn", blob_id: "b1~abc123", format: "avif", size: 4096, width: 150, height: 150 },
{ class: "vis.sd", blob_id: "b1~def456", format: "avif", size: 32768, width: 640, height: 480 },
{ class: "vis.md", blob_id: "b1~ghi789", format: "avif", size: 262144, width: 1920, height: 1080 },
]
descriptor = build_descriptor(variants)
// Result: "d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x150;vis.sd:b1~def456:f=avif:s=32768:r=640x480;vis.md:b1~ghi789:f=avif:s=262144:r=1920x1080"Level 3: File Descriptor
Build descriptor → Compute SHA256 of descriptor string → Final file ID: f1~{hash} → This file ID goes into action attachments
descriptor = "d2,vis.tn:b1~abc:f=avif:s=4096:r=150x150;vis.sd:b1~def:f=avif:s=32768:r=640x480"
file_id = compute_hash("f", descriptor.as_bytes())
// Result: "f1~Qo2E3G8TJZ..." (file ID)Example Complete Flow
1. User uploads photo.jpg (3MB, 3024x4032px)
2. System generates variants:
vis.tn: 150x200px → 4KB → b1~abc123
vis.sd: 600x800px → 32KB → b1~def456
vis.md: 1440x1920px → 256KB → b1~ghi789
vis.hd: 2880x3840px → 1MB → b1~jkl012
3. System builds descriptor:
"d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x200;
vis.sd:b1~def456:f=avif:s=32768:r=600x800;
vis.md:b1~ghi789:f=avif:s=262144:r=1440x1920;
vis.hd:b1~jkl012:f=avif:s=1048576:r=2880x3840"
4. System hashes descriptor:
file_id = f1~Qo2E3G8TJZ2... = SHA256(descriptor)
5. Action references file:
POST action attachments = ["f1~Qo2E3G8TJZ2..."]
6. Anyone can verify:
- Download all variants
- Verify each blob_id = SHA256(blob)
- Rebuild descriptor
- Verify file_id = SHA256(descriptor)
- Cryptographic proof established ✓File attachments integrate into Cloudillo’s merkle tree structure. See Content-Addressing & Merkle Trees for how files fit into the verification chain.
Image Processing Pipeline
Upload Flow
When a client uploads an image:
- Client Request
POST /api/files/default/profile-picture.jpg
Authorization: Bearer <access_token>
Content-Type: image/jpeg
Content-Length: 2458624
<binary image data>- Dimension Extraction
Extract image dimensions and determine which variants to generate based on the preset:
img = load_image_from_memory(data)
(width, height) = img.dimensions()
max_dim = max(width, height)
# Variants come from the preset configuration
# Default preset: ["vis.tn", "vis.sd", "vis.md", "vis.hd"]
# Variants larger than the original are automatically skipped
variants = preset.image_variants.filter(|v| v.max_dim <= max_dim * 1.10)The intermediate steps (task scheduling, hash computation, blob storage, variant generation, and metadata storage) are shown in the Complete Upload Flow Diagram below.
- Response
Return descriptor ID to client:
{
"file_id": "f1~QoE...46w",
"descriptor": "d2,vis.tn:b1~QoE...46w:f=avif:s=4096:r=128x96;vis.sd:b1~xyz...789:f=avif:s=8192:r=640x480",
"variants": [
{"name": "vis.tn", "format": "avif", "size": 4096, "dimensions": "128x96"},
{"name": "vis.sd", "format": "avif", "size": 8192, "dimensions": "640x480"}
],
"processing": true
}Complete Upload Flow Diagram
Client uploads image
↓
POST /api/files/{preset}/filename.jpg
↓
Save to temp file
↓
Extract dimensions
↓
Determine variants to generate
↓
Create FileIdGeneratorTask
├─ Compute SHA256 hash
├─ Move to permanent storage (BlobAdapter)
└─ Generate file_id
↓
Create ImageResizerTask (for each variant)
├─ Depends on FileIdGeneratorTask
├─ Load source image
├─ Resize with Lanczos3
├─ Encode to AVIF/WebP/JPEG
├─ Compute variant ID (SHA256)
└─ Store in BlobAdapter
↓
Create file descriptor
├─ Collect all variant IDs
├─ Encode as descriptor
└─ Store metadata in MetaAdapter
↓
Return descriptor ID to clientDownload Flow
Client Request
GET /api/files/f1~...?variant=hd
Authorization: Bearer <access_token>Server Processing
- Parse Descriptor
variants = parse_file_descriptor(file_id)
# Returns list of VariantInfo- Select Best Variant
selected = select_best_variant(
variants,
requested_variant, # "hd"
)
# Falls back if exact match not available:
# hd/avif → hd/webp → md/avif → md/webp → sd/avif → ...- Stream from BlobAdapter
stream = blob_adapter.read_blob_stream(tn_id, selected.file_id)
# Set response headers
response.headers["Content-Type"] = f"image/{selected.format}"
response.headers["X-Cloudillo-Variant"] = selected.blob_id
response.headers["X-Cloudillo-Descriptor"] = descriptor
response.headers["Content-Length"] = selected.size
# Stream response
return stream_response(stream)Response
HTTP/1.1 200 OK
Content-Type: image/avif
Content-Length: 16384
X-Cloudillo-Variant: b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
X-Cloudillo-Descriptor: d2,vis.tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128;vis.sd:b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM:f=avif:s=8137:r=256x364;vis.orig:b1~5gU72rRGiaogZuYhJy853pBd6PsqjPOjS__Kim9-qE0:f=avif:s=15012:r=256x364
Cache-Control: public, max-age=31536000, immutable
<binary image data>Note: Content-addressed files are immutable, so can be cached forever.
Metadata Structure
FileMetadata
Stored in MetaAdapter:
FileMetadata {
tn_id: TnId
file_id: String # Descriptor ID
original_filename: String
mime_type: String
size: u64 # Original size
width: Optional[u32]
height: Optional[u32]
variants: List[VariantInfo]
created_at: i64
owner: String # Identity tag
permissions: FilePermissions
}
VariantInfo {
name: String # "tn", "sd", "md", "hd", "xd"
file_id: String # Content-addressed ID
format: String # "avif", "webp", "jpeg", "png"
size: u64 # Bytes
width: u32
height: u32
}
FilePermissions {
public_read: bool
shared_with: List[String] # Identity tags
}File Presets
Concept
Presets define how files should be processed:
FilePreset:
Image # Auto-generate variants
Video # Future: transcode, thumbnails
Document # Future: preview generation
Database # RTDB database files
Raw # No processing, store as-isUpload with Preset
POST /api/files/{preset}/{filename}
Examples:
POST /api/files/default/avatar.jpg // Generate default image variants
POST /api/files/archive/document.pdf // Store with minimal processingStorage Organization
BlobAdapter Layout
{data_dir}/
├── blobs/
│ ├── {tn_id}/
│ │ ├── f1~QoE...46w // Original file
│ │ ├── f1~xyz...789 // Variant 1
│ │ ├── f1~abc...123 // Variant 2
│ │ └── ...
│ └── {other_tn_id}/
│ └── ...MetaAdapter (SQLite)
CREATE TABLE files (
id INTEGER PRIMARY KEY,
tn_id INTEGER NOT NULL,
file_id TEXT NOT NULL,
original_filename TEXT,
mime_type TEXT,
size INTEGER,
width INTEGER,
height INTEGER,
variants TEXT, -- JSON array
created_at INTEGER,
owner TEXT,
permissions TEXT, -- JSON object
UNIQUE(tn_id, file_id)
);
CREATE INDEX idx_files_owner ON files(owner);
CREATE INDEX idx_files_created ON files(created_at);See Also
- System Architecture - Task system and worker pool
- Actions - File attachments in action tokens
- Access Control - File permission checking