Blob Storage

Cloudillo’s blob storage uses content-addressed storage for immutable binary data (files, images, videos). Intelligent variant generation for images ensures data integrity, deduplication, and efficient delivery across different use cases.

Content-Addressed Storage

Concept

Files are identified by the SHA256 hash of their content, making identifiers:

  • Immutable: Content cannot change without changing the ID
  • Verifiable: Recipients can verify integrity
  • Deduplicat able: Identical content gets same ID
  • Tamper-proof: Any modification is immediately detectable

File Identifier Format

Cloudillo uses multiple identifier types in its content-addressing system:

{prefix}{version}~{base64url_hash}

Components:

  • {prefix}: Resource type indicator (a, f, b, d)
  • {version}: Hash algorithm version (currently 1 = SHA-256)
  • ~: Separator
  • {base64url_hash}: Base64url-encoded hash (43 characters, no padding)

Identifier Types

Prefix Resource Type Hash Input Example
b1~ Blob Blob bytes (raw image/video data) b1~abc123def456...
f1~ File File descriptor string f1~QoEYeG8TJZ2HTGh...
d1~ Descriptor (not a hash, the encoded format itself) d1~tn:b1~abc:f=AVIF:...
a1~ Action Complete JWT token a1~8kR3mN9pQ2vL...

Important: d1~ is not a content-addressed identifierβ€”it’s the actual encoded descriptor string. The file ID (f1~) is the hash of this descriptor.

Examples

Blob ID:       b1~QoEYeG8TJZ2HTGhVlrtTDBpvBGOp6gfGhq4QmD6Z46w
File ID:       f1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
Descriptor:    d1~tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128
Action ID:     a1~8kR3mN9pQ2vL6xWpYzT4BjN5FqGxCmK9RsH2VwLnD8P

All file and blob IDs use SHA-256 content-addressing. See Content-Addressing & Merkle Trees for hash computation details.

File Variants

Concept

A single uploaded image automatically generates multiple variants optimized for different use cases:

  • tn (thumbnail): Tiny preview (~128x96px)
  • sd (standard definition): Social media size (~640x480px)
  • md (medium definition): Web display (~1280x720px)
  • hd (high definition): Full screen (~1920x1080px)
  • xd (extra definition): Original/4K+ (~3840x2160px+)

File Descriptor Encoding

A file descriptor encodes all available variants in a compact format.

File Descriptor Format Specification

Format

d1~{variant}:{blob_id}:f={format}:s={size}:r={width}x{height},{next_variant},...

Components

  • d1~ - Descriptor prefix with version (currently version 1)
  • {variant} - Size class: tn, sd, md, hd, or xd
  • {blob_id} - Content-addressed ID of the blob (b1~...)
  • f={format} - Image format: AVIF, WebP, JPEG, or PNG
  • s={size} - File size in bytes (integer, no separators)
  • r={width}x{height} - Resolution in pixels (width Γ— height)
  • , - Comma separator between variants (no spaces)

Example

d1~tn:b1~abc123:f=AVIF:s=4096:r=150x150,sd:b1~def456:f=AVIF:s=32768:r=640x480

This descriptor encodes two variants:

  • Thumbnail: AVIF format, 4096 bytes, 150Γ—150 pixels, blob ID b1~abc123
  • Standard: AVIF format, 32768 bytes, 640Γ—480 pixels, blob ID b1~def456

Parsing Rules

  1. Check prefix: Verify descriptor starts with d1~
  2. Split by comma (,): Get individual variant entries
  3. For each variant, split by colon (:) to get components:
    • Component [0] = variant class (tn, sd, md, hd, xd)
    • Component [1] = blob_id (b1~...)
    • Components [2..] = key=value pairs
  4. Parse key=value pairs:
    • f={format} β†’ Image format string
    • s={size} β†’ Parse as u64 (bytes)
    • r={width}x{height} β†’ Split by x, parse as u32 Γ— u32

Parsing logic: split by commas for variants, then by colons for fields, then parse key=value pairs.

Variant Size Classes - Exact Specifications

Cloudillo generates image variants at specific size targets to optimize bandwidth and storage:

Class Name Target Resolution Max Dimension Use Case
tn Thumbnail ~150Γ—150px 200px List views, previews, avatars
sd Standard Definition ~640Γ—480px 800px Mobile devices, low bandwidth
md Medium Definition ~1920Γ—1080px 2000px Desktop viewing, full screen
hd High Definition ~3840Γ—2160px 4000px 4K displays, high quality
xd Extra Definition Original size No limit Archival, original quality

Generation Rules

Generated variants based on maximum dimension (largest of width or height):

  • max_dim β‰₯ 3840px: tn, sd, md, hd, xd (all variants)
  • max_dim β‰₯ 1920px: tn, sd, md, hd
  • max_dim β‰₯ 1280px: tn, sd, md
  • max_dim < 1280px: tn, sd

Properties:

  • Each variant maintains the original aspect ratio
  • Uses Lanczos3 filter for high-quality downscaling
  • Maximum dimension constraint prevents oversizing
  • Smaller originals don’t get upscaled

Variant Selection

Clients request a specific variant:

GET /api/files/f1~Qo2E3G8TJZ...?variant=hd

Response: Returns HD variant if available, otherwise falls back to smaller variants.

Automatic Fallback

If the requested variant doesn’t exist, the server returns the best available:

  1. Try requested variant (e.g., hd)
  2. Fall back to next smaller (e.g., md)
  3. Continue until variant found
  4. Return smallest if none larger

Fallback order: xd β†’ hd β†’ md β†’ sd β†’ tn

Content-Addressing Flow

File storage uses a three-level content-addressing hierarchy:

Level 1: Blob Storage

Upload image β†’ Save as blob β†’ Compute SHA256 of blob bytes β†’ Store blob with ID: b1~{hash}

blob_data = read_file("thumbnail.avif")
blob_id = compute_hash("b", blob_data)
// Result: "b1~abc123..." (thumbnail blob ID)

Example: b1~abc123... identifies the thumbnail AVIF blob

See Content-Addressing & Merkle Trees for hash computation details.

Level 2: Variant Collection

Generate all variants (tn, sd, md, hd) β†’ Each variant gets its own blob ID (b1~...) β†’ Collect all variant metadata β†’ Create descriptor string encoding all variants

variants = [
    { class: "tn", blob_id: "b1~abc123", format: "AVIF", size: 4096, width: 150, height: 150 },
    { class: "sd", blob_id: "b1~def456", format: "AVIF", size: 32768, width: 640, height: 480 },
    { class: "md", blob_id: "b1~ghi789", format: "AVIF", size: 262144, width: 1920, height: 1080 },
]

descriptor = build_descriptor(variants)
// Result: "d1~tn:b1~abc123:f=AVIF:s=4096:r=150x150,sd:b1~def456:f=AVIF:s=32768:r=640x480,md:b1~ghi789:f=AVIF:s=262144:r=1920x1080"

Level 3: File Descriptor

Build descriptor β†’ Compute SHA256 of descriptor string β†’ Final file ID: f1~{hash} β†’ This file ID goes into action attachments

descriptor = "d1~tn:b1~abc:f=AVIF:s=4096:r=150x150,sd:b1~def:f=AVIF:s=32768:r=640x480"
file_id = compute_hash("f", descriptor.as_bytes())
// Result: "f1~Qo2E3G8TJZ..." (file ID)

Example Complete Flow

1. User uploads photo.jpg (3MB, 3024x4032px)

2. System generates variants:
   tn:  150x200px β†’ 4KB   β†’ b1~abc123
   sd:  600x800px β†’ 32KB  β†’ b1~def456
   md:  1440x1920px β†’ 256KB β†’ b1~ghi789
   hd:  2880x3840px β†’ 1MB β†’ b1~jkl012

3. System builds descriptor:
   "d1~tn:b1~abc123:f=AVIF:s=4096:r=150x200,
       sd:b1~def456:f=AVIF:s=32768:r=600x800,
       md:b1~ghi789:f=AVIF:s=262144:r=1440x1920,
       hd:b1~jkl012:f=AVIF:s=1048576:r=2880x3840"

4. System hashes descriptor:
   file_id = f1~Qo2E3G8TJZ2... = SHA256(descriptor)

5. Action references file:
   POST action attachments = ["f1~Qo2E3G8TJZ2..."]

6. Anyone can verify:
   - Download all variants
   - Verify each blob_id = SHA256(blob)
   - Rebuild descriptor
   - Verify file_id = SHA256(descriptor)
   - Cryptographic proof established βœ“

Integration with Action Merkle Tree

File attachments create an extended merkle tree:

Action (a1~8kR...)
  β”œβ”€ Signed by user (ES384)
  β”œβ”€ Content-addressed (SHA256 of JWT)
  └─ Attachments: [f1~Qo2...]
       └─ File (f1~Qo2...)
            β”œβ”€ Content-addressed (SHA256 of descriptor)
            └─ Descriptor: "d1~tn:b1~abc...,sd:b1~def..."
                 β”œβ”€ Blob tn (b1~abc...)
                 β”‚   └─ Content-addressed (SHA256 of blob)
                 β”œβ”€ Blob sd (b1~def...)
                 β”‚   └─ Content-addressed (SHA256 of blob)
                 └─ Blob md (b1~ghi...)
                     └─ Content-addressed (SHA256 of blob)

Benefits:

  • Entire tree is cryptographically verifiable
  • Cannot modify image without changing all parent hashes
  • Deduplication: same image = same file_id
  • Federation: remote instances can verify integrity

See Content-Addressing & Merkle Trees for how file content-addressing integrates with the action system.

Image Processing Pipeline

Upload Flow

When a client uploads an image:

  1. Client Request
POST /api/files/image/profile-picture.jpg
Authorization: Bearer <access_token>
Content-Type: image/jpeg
Content-Length: 2458624

<binary image data>
  1. Dimension Extraction

Extract image dimensions to determine which variants to generate:

img = load_image_from_memory(data)
(width, height) = img.dimensions()
max_dim = max(width, height)

if max_dim >= 3840:
    variants = ["tn", "sd", "md", "hd", "xd"]
else if max_dim >= 1920:
    variants = ["tn", "sd", "md", "hd"]
else if max_dim >= 1280:
    variants = ["tn", "sd", "md"]
else:
    variants = ["tn", "sd"]
  1. FileIdGeneratorTask

Create a task to generate the content-addressed ID:

task = FileIdGeneratorTask(
    tn_id,
    temp_file_path="/tmp/upload-abc123",
    original_filename="profile-picture.jpg"
)

task_id = scheduler.schedule(task)
  1. ImageResizerTask (Multiple)

For each variant, create a resize task:

for variant in variants:
    task = ImageResizerTask(
        tn_id,
        source_file_id=original_id,
        variant=variant,
        target_dimensions=get_variant_dimensions(variant),
        format="avif",  # Primary format
        quality=85,
        dependencies=[file_id_task_id]  # Wait for ID generation
    )

    scheduler.schedule(task)
  1. Hash Computation

FileIdGeneratorTask computes SHA256 hash:

file_id = compute_content_hash("f", file_contents)
# See merkle-tree.md for hash computation details
  1. Blob Storage

Store original in BlobAdapter:

blob_adapter.create_blob_stream(tn_id, file_id, file_stream)
  1. Variant Generation

Each ImageResizerTask runs in worker pool (CPU-intensive):

# Execute in worker pool
img = load_image(source_path)

# Resize with Lanczos3 filter (high quality)
resized = img.resize(target_width, target_height, filter=Lanczos3)

# Encode to AVIF
buffer = encode_avif(resized, quality)

# Store variant
variant_id = compute_file_id(buffer)
blob_adapter.create_blob(tn_id, variant_id, buffer)
  1. Metadata Storage

Store file metadata with all variants:

file_metadata = FileMetadata(
    tn_id,
    file_id=descriptor_id,
    original_filename="profile-picture.jpg",
    mime_type="image/jpeg",
    size=original_size,
    variants=[
        Variant(name="tn", blob_id="b1~QoE...46w", format="avif",
                size=4096, width=200, height=200),
        Variant(name="sd", blob_id="b1~xyz...789", format="webp",
                size=32768, width=640, height=480),
        # ... more variants
    ],
    created_at=current_timestamp()
)

meta_adapter.create_file_metadata(tn_id, file_metadata)
  1. Response

Return descriptor ID to client:

{
  "file_id": "d1~tn:QoE...46w:f=avif:s=4096:r=128x96,sd:xyz...789:...",
  "variants": [
    {"name": "tn", "format": "avif", "size": 4096, "dimensions": "128x96"},
    {"name": "sd", "format": "webp", "size": 8192, "dimensions": "640x480"}
  ],
  "processing": true
}

Complete Upload Flow Diagram

Client uploads image
  ↓
POST /api/files/image/filename.jpg
  ↓
Save to temp file
  ↓
Extract dimensions
  ↓
Determine variants to generate
  ↓
Create FileIdGeneratorTask
  β”œβ”€ Compute SHA256 hash
  β”œβ”€ Move to permanent storage (BlobAdapter)
  └─ Generate file_id
  ↓
Create ImageResizerTask (for each variant)
  β”œβ”€ Depends on FileIdGeneratorTask
  β”œβ”€ Load source image
  β”œβ”€ Resize with Lanczos3
  β”œβ”€ Encode to AVIF/WebP/JPEG
  β”œβ”€ Compute variant ID (SHA256)
  └─ Store in BlobAdapter
  ↓
Create file descriptor
  β”œβ”€ Collect all variant IDs
  β”œβ”€ Encode as descriptor
  └─ Store metadata in MetaAdapter
  ↓
Return descriptor ID to client

Download Flow

Client Request

GET /api/files/d1~...?variant=hd
Authorization: Bearer <access_token>

Server Processing

  1. Parse Descriptor
variants = parse_file_descriptor(file_id)
# Returns list of VariantInfo
  1. Select Best Variant
selected = select_best_variant(
    variants,
    requested_variant,   # "hd"
)

# Falls back if exact match not available:
# hd/avif β†’ hd/webp β†’ md/avif β†’ md/webp β†’ sd/avif β†’ ...
  1. Stream from BlobAdapter
stream = blob_adapter.read_blob_stream(tn_id, selected.file_id)

# Set response headers
response.headers["Content-Type"] = f"image/{selected.format}"
response.headers["X-Cloudillo-Variant"] = selected.blob_id
response.headers["X-Cloudillo-Descriptor"] = descriptor
response.headers["Content-Length"] = selected.size

# Stream response
return stream_response(stream)

Response

HTTP/1.1 200 OK
Content-Type: image/avif
Content-Length: 16384
X-Cloudillo-Variant: b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM
X-Cloudillo-Descriptor: d1~tn:b1~xRAVuQtgBx_kLqZnoOSd5XqCK_aQolhq1XeXk73Zn8U:f=avif:s=1960:r=90x128,sd:b1~m8Z35EIa3prvb3bhjsVjdg9SG98xd0bkoWomOHQAwCM:f=avif:s=8137:r=256x364,orig:b1~5gU72rRGiaogZuYhJy853pBd6PsqjPOjS__Kim9-qE0:f=avif:s=15012:r=256x364
Cache-Control: public, max-age=31536000, immutable

<binary image data>

Note: Content-addressed files are immutable, so can be cached forever.

Metadata Structure

FileMetadata

Stored in MetaAdapter:

FileMetadata {
    tn_id: TnId
    file_id: String           # Descriptor ID
    original_filename: String
    mime_type: String
    size: u64                 # Original size
    width: Optional[u32]
    height: Optional[u32]
    variants: List[VariantInfo]
    created_at: i64
    owner: String             # Identity tag
    permissions: FilePermissions
}

VariantInfo {
    name: String              # "tn", "sd", "md", "hd", "xd"
    file_id: String           # Content-addressed ID
    format: String            # "avif", "webp", "jpeg", "png"
    size: u64                 # Bytes
    width: u32
    height: u32
}

FilePermissions {
    public_read: bool
    shared_with: List[String]  # Identity tags
}

File Presets

Concept

Presets define how files should be processed:

FilePreset:
    Image      # Auto-generate variants
    Video      # Future: transcode, thumbnails
    Document   # Future: preview generation
    Database   # RTDB database files
    Raw        # No processing, store as-is

Upload with Preset

POST /api/files/{preset}/{filename}

Examples:
POST /api/files/image/avatar.jpg      // Generate image variants
POST /api/files/raw/document.pdf      // Store as-is

Storage Organization

BlobAdapter Layout

{data_dir}/
β”œβ”€β”€ blobs/
β”‚   β”œβ”€β”€ {tn_id}/
β”‚   β”‚   β”œβ”€β”€ f1~QoE...46w           // Original file
β”‚   β”‚   β”œβ”€β”€ f1~xyz...789           // Variant 1
β”‚   β”‚   β”œβ”€β”€ f1~abc...123           // Variant 2
β”‚   β”‚   └── ...
β”‚   └── {other_tn_id}/
β”‚       └── ...

MetaAdapter (SQLite)

CREATE TABLE files (
    id INTEGER PRIMARY KEY,
    tn_id INTEGER NOT NULL,
    file_id TEXT NOT NULL,
    original_filename TEXT,
    mime_type TEXT,
    size INTEGER,
    width INTEGER,
    height INTEGER,
    variants TEXT,  -- JSON array
    created_at INTEGER,
    owner TEXT,
    permissions TEXT,  -- JSON object
    UNIQUE(tn_id, file_id)
);

CREATE INDEX idx_files_owner ON files(owner);
CREATE INDEX idx_files_created ON files(created_at);

Performance Considerations

Worker Pool Usage

Image processing is CPU-intensive, so uses worker pool:

# Priority levels
Priority.High   β†’ User-facing operations (thumbnail)
Priority.Medium β†’ Background tasks (other image variants)
Priority.Low    β†’ Longer operations (video upload)

Parallel Processing

Multiple variants can be generated in parallel:

# Create all resize tasks at once
task_ids = []

for variant in ["tn", "sd", "md", "hd"]:
    task_id = scheduler.schedule(ImageResizerTask(
        variant=variant,
        # ...
    ))

    task_ids.append(task_id)

# Wait for all to complete
scheduler.wait_all(task_ids)

Caching Strategy

Content-addressed files are immutable:

Cache-Control: public, max-age=31536000, immutable
  • Browsers cache forever
  • CDN can cache forever
  • No cache invalidation needed

See Also

  • System Architecture - Task system and worker pool
  • [Actions](/architecture/actions-federation/actions - File attachments in action tokens
  • [Access Control](/architecture/data-layer/access-control/access - File permission checking