Blob Storage

Cloudillo’s blob storage uses content-addressed storage for immutable binary data (files, images, videos). Intelligent variant generation for images ensures data integrity, deduplication, and efficient delivery across different use cases.

Content-Addressed Storage

Concept

Files are identified by the SHA256 hash of their content, making identifiers:

Immutable: Content cannot change without changing the ID
Verifiable: Recipients can verify integrity
Deduplicat able: Identical content gets same ID
Tamper-proof: Any modification is immediately detectable

File Identifier Format

Cloudillo uses multiple identifier types in its content-addressing system:

{prefix}{version}~{base64url_hash}

Components:

{prefix}: Resource type indicator (a, f, b, d)
{version}: Hash algorithm version (currently 1 = SHA-256)
~: Separator
{base64url_hash}: Base64url-encoded hash (43 characters, no padding)

Identifier Types

Prefix	Resource Type	Hash Input	Example
`b1~`	Blob	Blob bytes (raw image/video data)	`b1~abc123def456...`
`f1~`	File	File descriptor string	`f1~QoEYeG8TJZ2HTGh...`
`d1~`	Descriptor	(not a hash, the encoded format itself)	`d1~tn:b1~abc:f=AVIF:...`
`a1~`	Action	Complete JWT token	`a1~8kR3mN9pQ2vL...`

Important: d1~ is not a content-addressed identifier—it’s the actual encoded descriptor string. The file ID (f1~) is the hash of this descriptor.

Examples

Blob ID:       b1~QoEYeG8TJZ2HTGhVlrtTDBpvBGOp6gfGhq4QmD6Z46w
File ID:       f1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
Descriptor:    d1~tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128
Action ID:     a1~8kR3mN9pQ2vL6xWpYzT4BjN5FqGxCmK9RsH2VwLnD8P

All file and blob IDs use SHA-256 content-addressing. See Content-Addressing & Merkle Trees for hash computation details.

File Variants

Concept

A single uploaded image automatically generates multiple variants optimized for different use cases:

tn (thumbnail): Tiny preview (~128x96px)
sd (standard definition): Social media size (~640x480px)
md (medium definition): Web display (~1280x720px)
hd (high definition): Full screen (~1920x1080px)
xd (extra definition): Original/4K+ (~3840x2160px+)

File Descriptor Encoding

A file descriptor encodes all available variants in a compact format.

File Descriptor Format Specification

Format

d1~{variant}:{blob_id}:f={format}:s={size}:r={width}x{height},{next_variant},...

Components

d1~ - Descriptor prefix with version (currently version 1)
{variant} - Size class: tn, sd, md, hd, or xd
{blob_id} - Content-addressed ID of the blob (b1~...)
f={format} - Image format: AVIF, WebP, JPEG, or PNG
s={size} - File size in bytes (integer, no separators)
r={width}x{height} - Resolution in pixels (width × height)
, - Comma separator between variants (no spaces)

Example

d1~tn:b1~abc123:f=AVIF:s=4096:r=150x150,sd:b1~def456:f=AVIF:s=32768:r=640x480

This descriptor encodes two variants:

Thumbnail: AVIF format, 4096 bytes, 150×150 pixels, blob ID b1~abc123
Standard: AVIF format, 32768 bytes, 640×480 pixels, blob ID b1~def456

Parsing Rules

Check prefix: Verify descriptor starts with d1~
Split by comma (,): Get individual variant entries
For each variant, split by colon (:) to get components:
- Component [0] = variant class (tn, sd, md, hd, xd)
- Component [1] = blob_id (b1~...)
- Components [2..] = key=value pairs
Parse key=value pairs:
- f={format} → Image format string
- s={size} → Parse as u64 (bytes)
- r={width}x{height} → Split by x, parse as u32 × u32

Parsing logic: split by commas for variants, then by colons for fields, then parse key=value pairs.

Variant Size Classes - Exact Specifications

Cloudillo generates image variants at specific size targets to optimize bandwidth and storage:

Class	Name	Target Resolution	Max Dimension	Use Case
`tn`	Thumbnail	~150×150px	200px	List views, previews, avatars
`sd`	Standard Definition	~640×480px	800px	Mobile devices, low bandwidth
`md`	Medium Definition	~1920×1080px	2000px	Desktop viewing, full screen
`hd`	High Definition	~3840×2160px	4000px	4K displays, high quality
`xd`	Extra Definition	Original size	No limit	Archival, original quality

Generation Rules

Generated variants based on maximum dimension (largest of width or height):

max_dim ≥ 3840px: tn, sd, md, hd, xd (all variants)
max_dim ≥ 1920px: tn, sd, md, hd
max_dim ≥ 1280px: tn, sd, md
max_dim < 1280px: tn, sd

Properties:

Each variant maintains the original aspect ratio
Uses Lanczos3 filter for high-quality downscaling
Maximum dimension constraint prevents oversizing
Smaller originals don’t get upscaled

Variant Selection

Clients request a specific variant:

GET /api/files/f1~Qo2E3G8TJZ...?variant=hd

Response: Returns HD variant if available, otherwise falls back to smaller variants.

Automatic Fallback

If the requested variant doesn’t exist, the server returns the best available:

Try requested variant (e.g., hd)
Fall back to next smaller (e.g., md)
Continue until variant found
Return smallest if none larger

Fallback order: xd → hd → md → sd → tn

Content-Addressing Flow

File storage uses a three-level content-addressing hierarchy:

Level 1: Blob Storage

Upload image → Save as blob → Compute SHA256 of blob bytes → Store blob with ID: b1~{hash}

blob_data = read_file("thumbnail.avif")
blob_id = compute_hash("b", blob_data)
// Result: "b1~abc123..." (thumbnail blob ID)

Example: b1~abc123... identifies the thumbnail AVIF blob

See Content-Addressing & Merkle Trees for hash computation details.

Level 2: Variant Collection

Generate all variants (tn, sd, md, hd) → Each variant gets its own blob ID (b1~...) → Collect all variant metadata → Create descriptor string encoding all variants

variants = [
    { class: "tn", blob_id: "b1~abc123", format: "AVIF", size: 4096, width: 150, height: 150 },
    { class: "sd", blob_id: "b1~def456", format: "AVIF", size: 32768, width: 640, height: 480 },
    { class: "md", blob_id: "b1~ghi789", format: "AVIF", size: 262144, width: 1920, height: 1080 },
]

descriptor = build_descriptor(variants)
// Result: "d1~tn:b1~abc123:f=AVIF:s=4096:r=150x150,sd:b1~def456:f=AVIF:s=32768:r=640x480,md:b1~ghi789:f=AVIF:s=262144:r=1920x1080"

Level 3: File Descriptor

Build descriptor → Compute SHA256 of descriptor string → Final file ID: f1~{hash} → This file ID goes into action attachments

descriptor = "d1~tn:b1~abc:f=AVIF:s=4096:r=150x150,sd:b1~def:f=AVIF:s=32768:r=640x480"
file_id = compute_hash("f", descriptor.as_bytes())
// Result: "f1~Qo2E3G8TJZ..." (file ID)

Example Complete Flow

1. User uploads photo.jpg (3MB, 3024x4032px)

2. System generates variants:
   tn:  150x200px → 4KB   → b1~abc123
   sd:  600x800px → 32KB  → b1~def456
   md:  1440x1920px → 256KB → b1~ghi789
   hd:  2880x3840px → 1MB → b1~jkl012

3. System builds descriptor:
   "d1~tn:b1~abc123:f=AVIF:s=4096:r=150x200,
       sd:b1~def456:f=AVIF:s=32768:r=600x800,
       md:b1~ghi789:f=AVIF:s=262144:r=1440x1920,
       hd:b1~jkl012:f=AVIF:s=1048576:r=2880x3840"

4. System hashes descriptor:
   file_id = f1~Qo2E3G8TJZ2... = SHA256(descriptor)

5. Action references file:
   POST action attachments = ["f1~Qo2E3G8TJZ2..."]

6. Anyone can verify:
   - Download all variants
   - Verify each blob_id = SHA256(blob)
   - Rebuild descriptor
   - Verify file_id = SHA256(descriptor)
   - Cryptographic proof established ✓

Integration with Action Merkle Tree

File attachments create an extended merkle tree:

Action (a1~8kR...)
  ├─ Signed by user (ES384)
  ├─ Content-addressed (SHA256 of JWT)
  └─ Attachments: [f1~Qo2...]
       └─ File (f1~Qo2...)
            ├─ Content-addressed (SHA256 of descriptor)
            └─ Descriptor: "d1~tn:b1~abc...,sd:b1~def..."
                 ├─ Blob tn (b1~abc...)
                 │   └─ Content-addressed (SHA256 of blob)
                 ├─ Blob sd (b1~def...)
                 │   └─ Content-addressed (SHA256 of blob)
                 └─ Blob md (b1~ghi...)
                     └─ Content-addressed (SHA256 of blob)

Benefits:

Entire tree is cryptographically verifiable
Cannot modify image without changing all parent hashes
Deduplication: same image = same file_id
Federation: remote instances can verify integrity

See Content-Addressing & Merkle Trees for how file content-addressing integrates with the action system.

Image Processing Pipeline

Upload Flow

When a client uploads an image:

Client Request

POST /api/files/image/profile-picture.jpg
Authorization: Bearer <access_token>
Content-Type: image/jpeg
Content-Length: 2458624

<binary image data>

Dimension Extraction

Extract image dimensions to determine which variants to generate:

img = load_image_from_memory(data)
(width, height) = img.dimensions()
max_dim = max(width, height)

if max_dim >= 3840:
    variants = ["tn", "sd", "md", "hd", "xd"]
else if max_dim >= 1920:
    variants = ["tn", "sd", "md", "hd"]
else if max_dim >= 1280:
    variants = ["tn", "sd", "md"]
else:
    variants = ["tn", "sd"]

FileIdGeneratorTask

Create a task to generate the content-addressed ID:

task = FileIdGeneratorTask(
    tn_id,
    temp_file_path="/tmp/upload-abc123",
    original_filename="profile-picture.jpg"
)

task_id = scheduler.schedule(task)

ImageResizerTask (Multiple)

For each variant, create a resize task:

for variant in variants:
    task = ImageResizerTask(
        tn_id,
        source_file_id=original_id,
        variant=variant,
        target_dimensions=get_variant_dimensions(variant),
        format="avif",  # Primary format
        quality=85,
        dependencies=[file_id_task_id]  # Wait for ID generation
    )

    scheduler.schedule(task)

Hash Computation

FileIdGeneratorTask computes SHA256 hash:

file_id = compute_content_hash("f", file_contents)
# See merkle-tree.md for hash computation details

Blob Storage

Store original in BlobAdapter:

blob_adapter.create_blob_stream(tn_id, file_id, file_stream)

Variant Generation

Each ImageResizerTask runs in worker pool (CPU-intensive):

# Execute in worker pool
img = load_image(source_path)

# Resize with Lanczos3 filter (high quality)
resized = img.resize(target_width, target_height, filter=Lanczos3)

# Encode to AVIF
buffer = encode_avif(resized, quality)

# Store variant
variant_id = compute_file_id(buffer)
blob_adapter.create_blob(tn_id, variant_id, buffer)

Metadata Storage

Store file metadata with all variants:

file_metadata = FileMetadata(
    tn_id,
    file_id=descriptor_id,
    original_filename="profile-picture.jpg",
    mime_type="image/jpeg",
    size=original_size,
    variants=[
        Variant(name="tn", blob_id="b1~QoE...46w", format="avif",
                size=4096, width=200, height=200),
        Variant(name="sd", blob_id="b1~xyz...789", format="webp",
                size=32768, width=640, height=480),
        # ... more variants
    ],
    created_at=current_timestamp()
)

meta_adapter.create_file_metadata(tn_id, file_metadata)

Response

Return descriptor ID to client:

{
  "file_id": "d1~tn:QoE...46w:f=avif:s=4096:r=128x96,sd:xyz...789:...",
  "variants": [
    {"name": "tn", "format": "avif", "size": 4096, "dimensions": "128x96"},
    {"name": "sd", "format": "webp", "size": 8192, "dimensions": "640x480"}
  ],
  "processing": true
}

Complete Upload Flow Diagram

Client uploads image
  ↓
POST /api/files/image/filename.jpg
  ↓
Save to temp file
  ↓
Extract dimensions
  ↓
Determine variants to generate
  ↓
Create FileIdGeneratorTask
  ├─ Compute SHA256 hash
  ├─ Move to permanent storage (BlobAdapter)
  └─ Generate file_id
  ↓
Create ImageResizerTask (for each variant)
  ├─ Depends on FileIdGeneratorTask
  ├─ Load source image
  ├─ Resize with Lanczos3
  ├─ Encode to AVIF/WebP/JPEG
  ├─ Compute variant ID (SHA256)
  └─ Store in BlobAdapter
  ↓
Create file descriptor
  ├─ Collect all variant IDs
  ├─ Encode as descriptor
  └─ Store metadata in MetaAdapter
  ↓
Return descriptor ID to client

Download Flow

Client Request

GET /api/files/d1~...?variant=hd
Authorization: Bearer <access_token>

Server Processing

Parse Descriptor

variants = parse_file_descriptor(file_id)
# Returns list of VariantInfo

Select Best Variant

selected = select_best_variant(
    variants,
    requested_variant,   # "hd"
)

# Falls back if exact match not available:
# hd/avif → hd/webp → md/avif → md/webp → sd/avif → ...

Stream from BlobAdapter

stream = blob_adapter.read_blob_stream(tn_id, selected.file_id)

# Set response headers
response.headers["Content-Type"] = f"image/{selected.format}"
response.headers["X-Cloudillo-Variant"] = selected.blob_id
response.headers["X-Cloudillo-Descriptor"] = descriptor
response.headers["Content-Length"] = selected.size

# Stream response
return stream_response(stream)

Response

HTTP/1.1 200 OK
Content-Type: image/avif
Content-Length: 16384
X-Cloudillo-Variant: b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
X-Cloudillo-Descriptor: d1~tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128,sd:b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM:f=avif:s=8137:r=256x364,orig:b1~5gU72rRGiaogZuYhJy853pBd6PsqjPOjS__Kim9-qE0:f=avif:s=15012:r=256x364
Cache-Control: public, max-age=31536000, immutable

<binary image data>

Note: Content-addressed files are immutable, so can be cached forever.

Metadata Structure

FileMetadata

Stored in MetaAdapter:

FileMetadata {
    tn_id: TnId
    file_id: String           # Descriptor ID
    original_filename: String
    mime_type: String
    size: u64                 # Original size
    width: Optional[u32]
    height: Optional[u32]
    variants: List[VariantInfo]
    created_at: i64
    owner: String             # Identity tag
    permissions: FilePermissions
}

VariantInfo {
    name: String              # "tn", "sd", "md", "hd", "xd"
    file_id: String           # Content-addressed ID
    format: String            # "avif", "webp", "jpeg", "png"
    size: u64                 # Bytes
    width: u32
    height: u32
}

FilePermissions {
    public_read: bool
    shared_with: List[String]  # Identity tags
}

File Presets

Concept

Presets define how files should be processed:

FilePreset:
    Image      # Auto-generate variants
    Video      # Future: transcode, thumbnails
    Document   # Future: preview generation
    Database   # RTDB database files
    Raw        # No processing, store as-is

Upload with Preset

POST /api/files/{preset}/{filename}

Examples:
POST /api/files/image/avatar.jpg      // Generate image variants
POST /api/files/raw/document.pdf      // Store as-is

Storage Organization

BlobAdapter Layout

{data_dir}/
├── blobs/
│   ├── {tn_id}/
│   │   ├── f1~QoE...46w           // Original file
│   │   ├── f1~xyz...789           // Variant 1
│   │   ├── f1~abc...123           // Variant 2
│   │   └── ...
│   └── {other_tn_id}/
│       └── ...

MetaAdapter (SQLite)

CREATE TABLE files (
    id INTEGER PRIMARY KEY,
    tn_id INTEGER NOT NULL,
    file_id TEXT NOT NULL,
    original_filename TEXT,
    mime_type TEXT,
    size INTEGER,
    width INTEGER,
    height INTEGER,
    variants TEXT,  -- JSON array
    created_at INTEGER,
    owner TEXT,
    permissions TEXT,  -- JSON object
    UNIQUE(tn_id, file_id)
);

CREATE INDEX idx_files_owner ON files(owner);
CREATE INDEX idx_files_created ON files(created_at);

Performance Considerations

Worker Pool Usage

Image processing is CPU-intensive, so uses worker pool:

# Priority levels
Priority.High   → User-facing operations (thumbnail)
Priority.Medium → Background tasks (other image variants)
Priority.Low    → Longer operations (video upload)

Parallel Processing

Multiple variants can be generated in parallel:

# Create all resize tasks at once
task_ids = []

for variant in ["tn", "sd", "md", "hd"]:
    task_id = scheduler.schedule(ImageResizerTask(
        variant=variant,
        # ...
    ))

    task_ids.append(task_id)

# Wait for all to complete
scheduler.wait_all(task_ids)

Caching Strategy

Content-addressed files are immutable:

Cache-Control: public, max-age=31536000, immutable

Browsers cache forever
CDN can cache forever
No cache invalidation needed

Blob Storage

Content-Addressed Storage

Concept

File Identifier Format

Identifier Types

Examples

File Variants

Concept

File Descriptor Encoding

File Descriptor Format Specification

Format

Components

Example

Parsing Rules

Variant Size Classes - Exact Specifications

Generation Rules

Variant Selection

Automatic Fallback

Content-Addressing Flow

Level 1: Blob Storage

Level 2: Variant Collection

Level 3: File Descriptor

Example Complete Flow

Integration with Action Merkle Tree

Image Processing Pipeline

Upload Flow

Complete Upload Flow Diagram

Download Flow

Client Request

Server Processing

Response

Metadata Structure

FileMetadata

File Presets

Concept

Upload with Preset

Storage Organization

BlobAdapter Layout

MetaAdapter (SQLite)

Performance Considerations

Worker Pool Usage

Parallel Processing

Caching Strategy

See Also