Blob Storage

Cloudillo’s blob storage uses content-addressed storage for immutable binary data (files, images, videos). Intelligent variant generation for images ensures data integrity, deduplication, and efficient delivery across different use cases.

Content-Addressed Storage

File Identifier Format

Cloudillo uses multiple identifier types in its content-addressing system:

{prefix}{version}~{base64url_hash}

Components:

  • {prefix}: Resource type indicator (a, f, b, d)
  • {version}: Hash algorithm version (currently 1 = SHA-256)
  • ~: Separator
  • {base64url_hash}: Base64url-encoded hash (43 characters, no padding)

Identifier Types

Prefix Resource Type Hash Input Example
b1~ Blob Blob bytes (raw image/video data) b1~abc123def456...
f1~ File File descriptor string f1~QoEYeG8TJZ2HTGh...
d2, Descriptor (not a hash, the encoded format itself) d2,vis.tn:b1~abc:f=avif:...
a1~ Action Complete JWT token a1~8kR3mN9pQ2vL...

Important: d2, is not a content-addressed identifier—it’s the actual encoded descriptor string. The file ID (f1~) is the hash of this descriptor.

Examples

Blob ID:       b1~QoEYeG8TJZ2HTGhVlrtTDBpvBGOp6gfGhq4QmD6Z46w
File ID:       f1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
Descriptor:    d2,vis.tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128
Action ID:     a1~8kR3mN9pQ2vL6xWpYzT4BjN5FqGxCmK9RsH2VwLnD8P

All file and blob IDs use SHA-256 content-addressing. See Content-Addressing & Merkle Trees for hash computation details.

File Types

Cloudillo supports four file types, each handled by different adapters based on mutability and use case:

Type Adapter Mutability Description
BLOB BlobAdapter Immutable Binary content (images, videos, documents)
CRDT CrdtAdapter Mutable Collaborative documents (Yjs-based real-time editing)
RTDB RtdbAdapter Mutable Real-time database files for app state
FLDR MetaAdapter Mutable Folder/directory metadata

File Type Selection

Files are created via different API endpoints based on their type:

Endpoint File Type Use Case
POST /api/files/{preset}/{file_name} BLOB File uploads with preset-based variant generation
POST /api/files CRDT/RTDB/FLDR Metadata-only creation for mutable file types

Available presets for BLOB uploads: default, profile-picture, cover, high_quality, mobile, archive, podcast, video, orig-only, thumbnail-only, apkg

Per-File Access Control

Each file has independent access control:

Field Values Description
Visibility P/V/F/C/null Who can discover this file
Access Level R/W Read-only vs read-write access

See Access Control for detailed permission handling.

File Variants

Concept

A single uploaded image automatically generates multiple variants optimized for different use cases:

  • pf (profile): Profile picture icon (~80px)
  • tn (thumbnail): Small preview (~256px)
  • sd (standard definition): Mobile/low bandwidth (~720px)
  • md (medium definition): Desktop viewing (~1280px)
  • hd (high definition): High quality display (~1920px)
  • xd (extra definition): 4K/maximum quality (~3840px)

File Descriptor Encoding

A file descriptor encodes all available variants in a compact format.

File Descriptor Format Specification

Format

d2,{class}.{variant}:{blob_id}:f={format}:s={size}:r={width}x{height}[:{optional}];...

Components

  • d2, - Descriptor prefix (version 2)
  • {class} - Media class:
    • vis - Visual (images: jpeg, png, webp, avif)
    • vid - Video (mp4/h264)
    • aud - Audio (opus, mp3)
    • doc - Documents (pdf)
    • raw - Original unprocessed file
  • {variant} - Quality tier: pf, tn, sd, md, hd, xd, or orig
  • {blob_id} - Content-addressed ID of the blob (b1~...)
  • f={format} - Format: avif, webp, jpeg, png, mp4, opus, pdf
  • s={size} - File size in bytes (integer, no separators)
  • r={width}x{height} - Resolution in pixels (width × height)
  • ; - Semicolon separator between variants (no spaces)

Optional Fields

For video, audio, and document files:

  • dur={seconds} - Duration in seconds (floating point, video/audio only)
  • br={kbps} - Bitrate in kbps (integer, video/audio only)
  • pg={count} - Page count (integer, documents only)

Example

d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x150;vis.sd:b1~def456:f=avif:s=32768:r=640x480

This descriptor encodes two variants:

  • Thumbnail: AVIF format, 4096 bytes, 150×150 pixels, blob ID b1~abc123
  • Standard: AVIF format, 32768 bytes, 640×480 pixels, blob ID b1~def456

Video Example

d2,vis.tn:b1~abc:f=avif:s=4096:r=150x84;vid.sd:b1~def:f=mp4:s=5242880:r=720x404:dur=120.5:br=350;vid.hd:b1~ghi:f=mp4:s=20971520:r=1920x1080:dur=120.5:br=1400

This descriptor includes:

  • Thumbnail: AVIF image preview
  • SD Video: 720p MP4, 120.5 seconds, 350 kbps
  • HD Video: 1080p MP4, 120.5 seconds, 1400 kbps

Parsing Rules

  1. Check prefix: Verify descriptor starts with d2,
  2. Split by semicolon (;): Get individual variant entries
  3. For each variant, split by colon (:) to get components:
    • Component [0] = class.variant (vis.tn, vis.sd, vid.hd)
    • Component [1] = blob_id (b1~...)
    • Components [2..] = key=value pairs
  4. Parse key=value pairs:
    • f={format} → Format string
    • s={size} → Parse as u64 (bytes)
    • r={width}x{height} → Split by x, parse as u32 × u32
    • dur={seconds} → Parse as f64 (optional)
    • br={kbps} → Parse as u32 (optional)
    • pg={count} → Parse as u32 (optional)

Parsing logic: split by semicolons for variants, then by colons for fields, then parse key=value pairs.

Variant Size Classes - Exact Specifications

Cloudillo generates image variants at specific size targets to optimize bandwidth and storage:

Quality Code Max Dimension Use Case
Profile pf 80px Profile picture icons
Thumbnail tn 256px List views, previews, avatars
Standard sd 720px Mobile devices, low bandwidth
Medium md 1280px Desktop viewing
High hd 1920px High quality display
Extra xd 3840px 4K displays, maximum quality
Original orig - Unprocessed source file

Generation Rules

Which variants are generated depends on the preset configuration. The default preset generates: tn, sd, md, hd. The high_quality preset adds xd. Variants larger than the original image are automatically skipped (smaller originals are never upscaled).

Properties:

  • Each variant maintains the original aspect ratio
  • Uses Lanczos3 filter for high-quality downscaling
  • Maximum dimension constraint prevents oversizing
  • Smaller originals don’t get upscaled

Variant Selection

Clients request a specific variant:

GET /api/files/f1~Qo2E3G8TJZ...?variant=hd

Response: Returns HD variant if available, otherwise falls back to smaller variants.

Automatic Fallback

If the requested variant doesn’t exist, the server returns the best available:

  1. Try requested variant (e.g., hd)
  2. Fall back to next smaller (e.g., md)
  3. Continue until variant found
  4. Return smallest if none larger

Fallback order: xdhdmdsdtn

Content-Addressing Flow

File storage uses a three-level content-addressing hierarchy:

Level 1: Blob Storage

Upload image → Save as blob → Compute SHA256 of blob bytes → Store blob with ID: b1~{hash}

blob_data = read_file("thumbnail.avif")
blob_id = compute_hash("b", blob_data)
// Result: "b1~abc123..." (thumbnail blob ID)

Example: b1~abc123... identifies the thumbnail AVIF blob

See Content-Addressing & Merkle Trees for hash computation details.

Level 2: Variant Collection

Generate all variants (tn, sd, md, hd) → Each variant gets its own blob ID (b1~...) → Collect all variant metadata → Create descriptor string encoding all variants

variants = [
    { class: "vis.tn", blob_id: "b1~abc123", format: "avif", size: 4096, width: 150, height: 150 },
    { class: "vis.sd", blob_id: "b1~def456", format: "avif", size: 32768, width: 640, height: 480 },
    { class: "vis.md", blob_id: "b1~ghi789", format: "avif", size: 262144, width: 1920, height: 1080 },
]

descriptor = build_descriptor(variants)
// Result: "d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x150;vis.sd:b1~def456:f=avif:s=32768:r=640x480;vis.md:b1~ghi789:f=avif:s=262144:r=1920x1080"

Level 3: File Descriptor

Build descriptor → Compute SHA256 of descriptor string → Final file ID: f1~{hash} → This file ID goes into action attachments

descriptor = "d2,vis.tn:b1~abc:f=avif:s=4096:r=150x150;vis.sd:b1~def:f=avif:s=32768:r=640x480"
file_id = compute_hash("f", descriptor.as_bytes())
// Result: "f1~Qo2E3G8TJZ..." (file ID)

Example Complete Flow

1. User uploads photo.jpg (3MB, 3024x4032px)

2. System generates variants:
   vis.tn:  150x200px → 4KB   → b1~abc123
   vis.sd:  600x800px → 32KB  → b1~def456
   vis.md:  1440x1920px → 256KB → b1~ghi789
   vis.hd:  2880x3840px → 1MB → b1~jkl012

3. System builds descriptor:
   "d2,vis.tn:b1~abc123:f=avif:s=4096:r=150x200;
       vis.sd:b1~def456:f=avif:s=32768:r=600x800;
       vis.md:b1~ghi789:f=avif:s=262144:r=1440x1920;
       vis.hd:b1~jkl012:f=avif:s=1048576:r=2880x3840"

4. System hashes descriptor:
   file_id = f1~Qo2E3G8TJZ2... = SHA256(descriptor)

5. Action references file:
   POST action attachments = ["f1~Qo2E3G8TJZ2..."]

6. Anyone can verify:
   - Download all variants
   - Verify each blob_id = SHA256(blob)
   - Rebuild descriptor
   - Verify file_id = SHA256(descriptor)
   - Cryptographic proof established ✓

File attachments integrate into Cloudillo’s merkle tree structure. See Content-Addressing & Merkle Trees for how files fit into the verification chain.

Image Processing Pipeline

Upload Flow

When a client uploads an image:

  1. Client Request
POST /api/files/default/profile-picture.jpg
Authorization: Bearer <access_token>
Content-Type: image/jpeg
Content-Length: 2458624

<binary image data>
  1. Dimension Extraction

Extract image dimensions and determine which variants to generate based on the preset:

img = load_image_from_memory(data)
(width, height) = img.dimensions()
max_dim = max(width, height)

# Variants come from the preset configuration
# Default preset: ["vis.tn", "vis.sd", "vis.md", "vis.hd"]
# Variants larger than the original are automatically skipped
variants = preset.image_variants.filter(|v| v.max_dim <= max_dim * 1.10)

The intermediate steps (task scheduling, hash computation, blob storage, variant generation, and metadata storage) are shown in the Complete Upload Flow Diagram below.

  1. Response

Return descriptor ID to client:

{
  "file_id": "f1~QoE...46w",
  "descriptor": "d2,vis.tn:b1~QoE...46w:f=avif:s=4096:r=128x96;vis.sd:b1~xyz...789:f=avif:s=8192:r=640x480",
  "variants": [
    {"name": "vis.tn", "format": "avif", "size": 4096, "dimensions": "128x96"},
    {"name": "vis.sd", "format": "avif", "size": 8192, "dimensions": "640x480"}
  ],
  "processing": true
}

Complete Upload Flow Diagram

Client uploads image
  ↓
POST /api/files/{preset}/filename.jpg
  ↓
Save to temp file
  ↓
Extract dimensions
  ↓
Determine variants to generate
  ↓
Create FileIdGeneratorTask
  ├─ Compute SHA256 hash
  ├─ Move to permanent storage (BlobAdapter)
  └─ Generate file_id
  ↓
Create ImageResizerTask (for each variant)
  ├─ Depends on FileIdGeneratorTask
  ├─ Load source image
  ├─ Resize with Lanczos3
  ├─ Encode to AVIF/WebP/JPEG
  ├─ Compute variant ID (SHA256)
  └─ Store in BlobAdapter
  ↓
Create file descriptor
  ├─ Collect all variant IDs
  ├─ Encode as descriptor
  └─ Store metadata in MetaAdapter
  ↓
Return descriptor ID to client

Download Flow

Client Request

GET /api/files/f1~...?variant=hd
Authorization: Bearer <access_token>

Server Processing

  1. Parse Descriptor
variants = parse_file_descriptor(file_id)
# Returns list of VariantInfo
  1. Select Best Variant
selected = select_best_variant(
    variants,
    requested_variant,   # "hd"
)

# Falls back if exact match not available:
# hd/avif → hd/webp → md/avif → md/webp → sd/avif → ...
  1. Stream from BlobAdapter
stream = blob_adapter.read_blob_stream(tn_id, selected.file_id)

# Set response headers
response.headers["Content-Type"] = f"image/{selected.format}"
response.headers["X-Cloudillo-Variant"] = selected.blob_id
response.headers["X-Cloudillo-Descriptor"] = descriptor
response.headers["Content-Length"] = selected.size

# Stream response
return stream_response(stream)

Response

HTTP/1.1 200 OK
Content-Type: image/avif
Content-Length: 16384
X-Cloudillo-Variant: b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
X-Cloudillo-Descriptor: d2,vis.tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128;vis.sd:b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM:f=avif:s=8137:r=256x364;vis.orig:b1~5gU72rRGiaogZuYhJy853pBd6PsqjPOjS__Kim9-qE0:f=avif:s=15012:r=256x364
Cache-Control: public, max-age=31536000, immutable

<binary image data>

Note: Content-addressed files are immutable, so can be cached forever.

Metadata Structure

FileMetadata

Stored in MetaAdapter:

FileMetadata {
    tn_id: TnId
    file_id: String           # Descriptor ID
    original_filename: String
    mime_type: String
    size: u64                 # Original size
    width: Optional[u32]
    height: Optional[u32]
    variants: List[VariantInfo]
    created_at: i64
    owner: String             # Identity tag
    permissions: FilePermissions
}

VariantInfo {
    name: String              # "tn", "sd", "md", "hd", "xd"
    file_id: String           # Content-addressed ID
    format: String            # "avif", "webp", "jpeg", "png"
    size: u64                 # Bytes
    width: u32
    height: u32
}

FilePermissions {
    public_read: bool
    shared_with: List[String]  # Identity tags
}

File Presets

Concept

Presets define how files should be processed:

FilePreset:
    Image      # Auto-generate variants
    Video      # Future: transcode, thumbnails
    Document   # Future: preview generation
    Database   # RTDB database files
    Raw        # No processing, store as-is

Upload with Preset

POST /api/files/{preset}/{filename}

Examples:
POST /api/files/default/avatar.jpg       // Generate default image variants
POST /api/files/archive/document.pdf     // Store with minimal processing

Storage Organization

BlobAdapter Layout

{data_dir}/
├── blobs/
│   ├── {tn_id}/
│   │   ├── f1~QoE...46w           // Original file
│   │   ├── f1~xyz...789           // Variant 1
│   │   ├── f1~abc...123           // Variant 2
│   │   └── ...
│   └── {other_tn_id}/
│       └── ...

MetaAdapter (SQLite)

CREATE TABLE files (
    id INTEGER PRIMARY KEY,
    tn_id INTEGER NOT NULL,
    file_id TEXT NOT NULL,
    original_filename TEXT,
    mime_type TEXT,
    size INTEGER,
    width INTEGER,
    height INTEGER,
    variants TEXT,  -- JSON array
    created_at INTEGER,
    owner TEXT,
    permissions TEXT,  -- JSON object
    UNIQUE(tn_id, file_id)
);

CREATE INDEX idx_files_owner ON files(owner);
CREATE INDEX idx_files_created ON files(created_at);

See Also