Blob Storage
Cloudillo’s blob storage uses content-addressed storage for immutable binary data (files, images, videos). Intelligent variant generation for images ensures data integrity, deduplication, and efficient delivery across different use cases.
Content-Addressed Storage
Concept
Files are identified by the SHA256 hash of their content, making identifiers:
- Immutable: Content cannot change without changing the ID
- Verifiable: Recipients can verify integrity
- Deduplicat able: Identical content gets same ID
- Tamper-proof: Any modification is immediately detectable
File Identifier Format
Cloudillo uses multiple identifier types in its content-addressing system:
{prefix}{version}~{base64url_hash}Components:
{prefix}: Resource type indicator (a, f, b, d){version}: Hash algorithm version (currently 1 = SHA-256)~: Separator{base64url_hash}: Base64url-encoded hash (43 characters, no padding)
Identifier Types
| Prefix | Resource Type | Hash Input | Example |
|---|---|---|---|
b1~ |
Blob | Blob bytes (raw image/video data) | b1~abc123def456... |
f1~ |
File | File descriptor string | f1~QoEYeG8TJZ2HTGh... |
d1~ |
Descriptor | (not a hash, the encoded format itself) | d1~tn:b1~abc:f=AVIF:... |
a1~ |
Action | Complete JWT token | a1~8kR3mN9pQ2vL... |
Important: d1~ is not a content-addressed identifierβit’s the actual encoded descriptor string. The file ID (f1~) is the hash of this descriptor.
Examples
Blob ID: b1~QoEYeG8TJZ2HTGhVlrtTDBpvBGOp6gfGhq4QmD6Z46w
File ID: f1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
Descriptor: d1~tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128
Action ID: a1~8kR3mN9pQ2vL6xWpYzT4BjN5FqGxCmK9RsH2VwLnD8PAll file and blob IDs use SHA-256 content-addressing. See Content-Addressing & Merkle Trees for hash computation details.
File Variants
Concept
A single uploaded image automatically generates multiple variants optimized for different use cases:
- tn (thumbnail): Tiny preview (~128x96px)
- sd (standard definition): Social media size (~640x480px)
- md (medium definition): Web display (~1280x720px)
- hd (high definition): Full screen (~1920x1080px)
- xd (extra definition): Original/4K+ (~3840x2160px+)
File Descriptor Encoding
A file descriptor encodes all available variants in a compact format.
File Descriptor Format Specification
Format
d1~{variant}:{blob_id}:f={format}:s={size}:r={width}x{height},{next_variant},...Components
d1~- Descriptor prefix with version (currently version 1){variant}- Size class:tn,sd,md,hd, orxd{blob_id}- Content-addressed ID of the blob (b1~...)f={format}- Image format:AVIF,WebP,JPEG, orPNGs={size}- File size in bytes (integer, no separators)r={width}x{height}- Resolution in pixels (width Γ height),- Comma separator between variants (no spaces)
Example
d1~tn:b1~abc123:f=AVIF:s=4096:r=150x150,sd:b1~def456:f=AVIF:s=32768:r=640x480This descriptor encodes two variants:
- Thumbnail: AVIF format, 4096 bytes, 150Γ150 pixels, blob ID
b1~abc123 - Standard: AVIF format, 32768 bytes, 640Γ480 pixels, blob ID
b1~def456
Parsing Rules
- Check prefix: Verify descriptor starts with
d1~ - Split by comma (
,): Get individual variant entries - For each variant, split by colon (
:) to get components:- Component [0] = variant class (
tn,sd,md,hd,xd) - Component [1] = blob_id (
b1~...) - Components [2..] = key=value pairs
- Component [0] = variant class (
- Parse key=value pairs:
f={format}β Image format strings={size}β Parse as u64 (bytes)r={width}x{height}β Split byx, parse as u32 Γ u32
Parsing logic: split by commas for variants, then by colons for fields, then parse key=value pairs.
Variant Size Classes - Exact Specifications
Cloudillo generates image variants at specific size targets to optimize bandwidth and storage:
| Class | Name | Target Resolution | Max Dimension | Use Case |
|---|---|---|---|---|
tn |
Thumbnail | ~150Γ150px | 200px | List views, previews, avatars |
sd |
Standard Definition | ~640Γ480px | 800px | Mobile devices, low bandwidth |
md |
Medium Definition | ~1920Γ1080px | 2000px | Desktop viewing, full screen |
hd |
High Definition | ~3840Γ2160px | 4000px | 4K displays, high quality |
xd |
Extra Definition | Original size | No limit | Archival, original quality |
Generation Rules
Generated variants based on maximum dimension (largest of width or height):
- max_dim β₯ 3840px: tn, sd, md, hd, xd (all variants)
- max_dim β₯ 1920px: tn, sd, md, hd
- max_dim β₯ 1280px: tn, sd, md
- max_dim < 1280px: tn, sd
Properties:
- Each variant maintains the original aspect ratio
- Uses Lanczos3 filter for high-quality downscaling
- Maximum dimension constraint prevents oversizing
- Smaller originals don’t get upscaled
Variant Selection
Clients request a specific variant:
GET /api/files/f1~Qo2E3G8TJZ...?variant=hdResponse: Returns HD variant if available, otherwise falls back to smaller variants.
Automatic Fallback
If the requested variant doesn’t exist, the server returns the best available:
- Try requested variant (e.g.,
hd) - Fall back to next smaller (e.g.,
md) - Continue until variant found
- Return smallest if none larger
Fallback order: xd β hd β md β sd β tn
Content-Addressing Flow
File storage uses a three-level content-addressing hierarchy:
Level 1: Blob Storage
Upload image β Save as blob β Compute SHA256 of blob bytes β Store blob with ID: b1~{hash}
blob_data = read_file("thumbnail.avif")
blob_id = compute_hash("b", blob_data)
// Result: "b1~abc123..." (thumbnail blob ID)Example: b1~abc123... identifies the thumbnail AVIF blob
See Content-Addressing & Merkle Trees for hash computation details.
Level 2: Variant Collection
Generate all variants (tn, sd, md, hd) β Each variant gets its own blob ID (b1~...) β Collect all variant metadata β Create descriptor string encoding all variants
variants = [
{ class: "tn", blob_id: "b1~abc123", format: "AVIF", size: 4096, width: 150, height: 150 },
{ class: "sd", blob_id: "b1~def456", format: "AVIF", size: 32768, width: 640, height: 480 },
{ class: "md", blob_id: "b1~ghi789", format: "AVIF", size: 262144, width: 1920, height: 1080 },
]
descriptor = build_descriptor(variants)
// Result: "d1~tn:b1~abc123:f=AVIF:s=4096:r=150x150,sd:b1~def456:f=AVIF:s=32768:r=640x480,md:b1~ghi789:f=AVIF:s=262144:r=1920x1080"Level 3: File Descriptor
Build descriptor β Compute SHA256 of descriptor string β Final file ID: f1~{hash} β This file ID goes into action attachments
descriptor = "d1~tn:b1~abc:f=AVIF:s=4096:r=150x150,sd:b1~def:f=AVIF:s=32768:r=640x480"
file_id = compute_hash("f", descriptor.as_bytes())
// Result: "f1~Qo2E3G8TJZ..." (file ID)Example Complete Flow
1. User uploads photo.jpg (3MB, 3024x4032px)
2. System generates variants:
tn: 150x200px β 4KB β b1~abc123
sd: 600x800px β 32KB β b1~def456
md: 1440x1920px β 256KB β b1~ghi789
hd: 2880x3840px β 1MB β b1~jkl012
3. System builds descriptor:
"d1~tn:b1~abc123:f=AVIF:s=4096:r=150x200,
sd:b1~def456:f=AVIF:s=32768:r=600x800,
md:b1~ghi789:f=AVIF:s=262144:r=1440x1920,
hd:b1~jkl012:f=AVIF:s=1048576:r=2880x3840"
4. System hashes descriptor:
file_id = f1~Qo2E3G8TJZ2... = SHA256(descriptor)
5. Action references file:
POST action attachments = ["f1~Qo2E3G8TJZ2..."]
6. Anyone can verify:
- Download all variants
- Verify each blob_id = SHA256(blob)
- Rebuild descriptor
- Verify file_id = SHA256(descriptor)
- Cryptographic proof established βIntegration with Action Merkle Tree
File attachments create an extended merkle tree:
Action (a1~8kR...)
ββ Signed by user (ES384)
ββ Content-addressed (SHA256 of JWT)
ββ Attachments: [f1~Qo2...]
ββ File (f1~Qo2...)
ββ Content-addressed (SHA256 of descriptor)
ββ Descriptor: "d1~tn:b1~abc...,sd:b1~def..."
ββ Blob tn (b1~abc...)
β ββ Content-addressed (SHA256 of blob)
ββ Blob sd (b1~def...)
β ββ Content-addressed (SHA256 of blob)
ββ Blob md (b1~ghi...)
ββ Content-addressed (SHA256 of blob)Benefits:
- Entire tree is cryptographically verifiable
- Cannot modify image without changing all parent hashes
- Deduplication: same image = same file_id
- Federation: remote instances can verify integrity
See Content-Addressing & Merkle Trees for how file content-addressing integrates with the action system.
Image Processing Pipeline
Upload Flow
When a client uploads an image:
- Client Request
POST /api/files/image/profile-picture.jpg
Authorization: Bearer <access_token>
Content-Type: image/jpeg
Content-Length: 2458624
<binary image data>- Dimension Extraction
Extract image dimensions to determine which variants to generate:
img = load_image_from_memory(data)
(width, height) = img.dimensions()
max_dim = max(width, height)
if max_dim >= 3840:
variants = ["tn", "sd", "md", "hd", "xd"]
else if max_dim >= 1920:
variants = ["tn", "sd", "md", "hd"]
else if max_dim >= 1280:
variants = ["tn", "sd", "md"]
else:
variants = ["tn", "sd"]- FileIdGeneratorTask
Create a task to generate the content-addressed ID:
task = FileIdGeneratorTask(
tn_id,
temp_file_path="/tmp/upload-abc123",
original_filename="profile-picture.jpg"
)
task_id = scheduler.schedule(task)- ImageResizerTask (Multiple)
For each variant, create a resize task:
for variant in variants:
task = ImageResizerTask(
tn_id,
source_file_id=original_id,
variant=variant,
target_dimensions=get_variant_dimensions(variant),
format="avif", # Primary format
quality=85,
dependencies=[file_id_task_id] # Wait for ID generation
)
scheduler.schedule(task)- Hash Computation
FileIdGeneratorTask computes SHA256 hash:
file_id = compute_content_hash("f", file_contents)
# See merkle-tree.md for hash computation details- Blob Storage
Store original in BlobAdapter:
blob_adapter.create_blob_stream(tn_id, file_id, file_stream)- Variant Generation
Each ImageResizerTask runs in worker pool (CPU-intensive):
# Execute in worker pool
img = load_image(source_path)
# Resize with Lanczos3 filter (high quality)
resized = img.resize(target_width, target_height, filter=Lanczos3)
# Encode to AVIF
buffer = encode_avif(resized, quality)
# Store variant
variant_id = compute_file_id(buffer)
blob_adapter.create_blob(tn_id, variant_id, buffer)- Metadata Storage
Store file metadata with all variants:
file_metadata = FileMetadata(
tn_id,
file_id=descriptor_id,
original_filename="profile-picture.jpg",
mime_type="image/jpeg",
size=original_size,
variants=[
Variant(name="tn", blob_id="b1~QoE...46w", format="avif",
size=4096, width=200, height=200),
Variant(name="sd", blob_id="b1~xyz...789", format="webp",
size=32768, width=640, height=480),
# ... more variants
],
created_at=current_timestamp()
)
meta_adapter.create_file_metadata(tn_id, file_metadata)- Response
Return descriptor ID to client:
{
"file_id": "d1~tn:QoE...46w:f=avif:s=4096:r=128x96,sd:xyz...789:...",
"variants": [
{"name": "tn", "format": "avif", "size": 4096, "dimensions": "128x96"},
{"name": "sd", "format": "webp", "size": 8192, "dimensions": "640x480"}
],
"processing": true
}Complete Upload Flow Diagram
Client uploads image
β
POST /api/files/image/filename.jpg
β
Save to temp file
β
Extract dimensions
β
Determine variants to generate
β
Create FileIdGeneratorTask
ββ Compute SHA256 hash
ββ Move to permanent storage (BlobAdapter)
ββ Generate file_id
β
Create ImageResizerTask (for each variant)
ββ Depends on FileIdGeneratorTask
ββ Load source image
ββ Resize with Lanczos3
ββ Encode to AVIF/WebP/JPEG
ββ Compute variant ID (SHA256)
ββ Store in BlobAdapter
β
Create file descriptor
ββ Collect all variant IDs
ββ Encode as descriptor
ββ Store metadata in MetaAdapter
β
Return descriptor ID to clientDownload Flow
Client Request
GET /api/files/d1~...?variant=hd
Authorization: Bearer <access_token>Server Processing
- Parse Descriptor
variants = parse_file_descriptor(file_id)
# Returns list of VariantInfo- Select Best Variant
selected = select_best_variant(
variants,
requested_variant, # "hd"
)
# Falls back if exact match not available:
# hd/avif β hd/webp β md/avif β md/webp β sd/avif β ...- Stream from BlobAdapter
stream = blob_adapter.read_blob_stream(tn_id, selected.file_id)
# Set response headers
response.headers["Content-Type"] = f"image/{selected.format}"
response.headers["X-Cloudillo-Variant"] = selected.blob_id
response.headers["X-Cloudillo-Descriptor"] = descriptor
response.headers["Content-Length"] = selected.size
# Stream response
return stream_response(stream)Response
HTTP/1.1 200 OK
Content-Type: image/avif
Content-Length: 16384
X-Cloudillo-Variant: b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
X-Cloudillo-Descriptor: d1~tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128,sd:b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM:f=avif:s=8137:r=256x364,orig:b1~5gU72rRGiaogZuYhJy853pBd6PsqjPOjS__Kim9-qE0:f=avif:s=15012:r=256x364
Cache-Control: public, max-age=31536000, immutable
<binary image data>Note: Content-addressed files are immutable, so can be cached forever.
Metadata Structure
FileMetadata
Stored in MetaAdapter:
FileMetadata {
tn_id: TnId
file_id: String # Descriptor ID
original_filename: String
mime_type: String
size: u64 # Original size
width: Optional[u32]
height: Optional[u32]
variants: List[VariantInfo]
created_at: i64
owner: String # Identity tag
permissions: FilePermissions
}
VariantInfo {
name: String # "tn", "sd", "md", "hd", "xd"
file_id: String # Content-addressed ID
format: String # "avif", "webp", "jpeg", "png"
size: u64 # Bytes
width: u32
height: u32
}
FilePermissions {
public_read: bool
shared_with: List[String] # Identity tags
}File Presets
Concept
Presets define how files should be processed:
FilePreset:
Image # Auto-generate variants
Video # Future: transcode, thumbnails
Document # Future: preview generation
Database # RTDB database files
Raw # No processing, store as-isUpload with Preset
POST /api/files/{preset}/{filename}
Examples:
POST /api/files/image/avatar.jpg // Generate image variants
POST /api/files/raw/document.pdf // Store as-isStorage Organization
BlobAdapter Layout
{data_dir}/
βββ blobs/
β βββ {tn_id}/
β β βββ f1~QoE...46w // Original file
β β βββ f1~xyz...789 // Variant 1
β β βββ f1~abc...123 // Variant 2
β β βββ ...
β βββ {other_tn_id}/
β βββ ...MetaAdapter (SQLite)
CREATE TABLE files (
id INTEGER PRIMARY KEY,
tn_id INTEGER NOT NULL,
file_id TEXT NOT NULL,
original_filename TEXT,
mime_type TEXT,
size INTEGER,
width INTEGER,
height INTEGER,
variants TEXT, -- JSON array
created_at INTEGER,
owner TEXT,
permissions TEXT, -- JSON object
UNIQUE(tn_id, file_id)
);
CREATE INDEX idx_files_owner ON files(owner);
CREATE INDEX idx_files_created ON files(created_at);Performance Considerations
Worker Pool Usage
Image processing is CPU-intensive, so uses worker pool:
# Priority levels
Priority.High β User-facing operations (thumbnail)
Priority.Medium β Background tasks (other image variants)
Priority.Low β Longer operations (video upload)Parallel Processing
Multiple variants can be generated in parallel:
# Create all resize tasks at once
task_ids = []
for variant in ["tn", "sd", "md", "hd"]:
task_id = scheduler.schedule(ImageResizerTask(
variant=variant,
# ...
))
task_ids.append(task_id)
# Wait for all to complete
scheduler.wait_all(task_ids)Caching Strategy
Content-addressed files are immutable:
Cache-Control: public, max-age=31536000, immutable- Browsers cache forever
- CDN can cache forever
- No cache invalidation needed
See Also
- System Architecture - Task system and worker pool
- [Actions](/architecture/actions-federation/actions - File attachments in action tokens
- [Access Control](/architecture/data-layer/access-control/access - File permission checking