Blob Storage
Cloudillo’s blob storage holds immutable binary data (files, images, videos) under content-addressed IDs — every blob is named by the SHA-256 hash of its bytes. This gives automatic deduplication, integrity verification, and permanent cacheability. Uploaded media is additionally split into multiple size/quality variants grouped by a file descriptor.
Content-Addressed Storage
File Identifier Format
Cloudillo uses multiple identifier types in its content-addressing system:
Components:
{prefix}: Resource type indicator (a, f, b, d){version}: Hash algorithm version (currently 1 = SHA-256)~: Separator{base64url_hash}: Base64url-encoded hash (43 characters, no padding)
Identifier Types
| Prefix | Resource Type | Hash Input | Example |
|---|---|---|---|
b1~ |
Blob | Blob bytes (raw image/video data) | b1~abc123def456... |
f1~ |
File | File descriptor string | f1~QoEYeG8TJZ2HTGh... |
d2, |
Descriptor | (not a hash, the encoded format itself) | d2,vis.tn:b1~abc:f=avif:... |
a1~ |
Action | Complete JWT token | a1~8kR3mN9pQ2vL... |
Important: d2, is not a content-addressed identifier—it’s the actual encoded descriptor string. The file ID (f1~) is the hash of this descriptor.
Examples
All file and blob IDs use SHA-256 content-addressing. See Content-Addressing & Merkle Trees for hash computation details.
File Types
Cloudillo supports four file types, each handled by different adapters based on mutability and use case:
| Type | Adapter | Mutability | Description |
|---|---|---|---|
BLOB |
BlobAdapter | Immutable | Binary content (images, videos, documents) |
CRDT |
CrdtAdapter | Mutable | Collaborative documents (Yjs-based real-time editing) |
RTDB |
RtdbAdapter | Mutable | Real-time database files for app state |
FLDR |
MetaAdapter | Mutable | Folder/directory metadata |
File Type Selection
Files are created via different API endpoints based on their type:
| Endpoint | File Type | Use Case |
|---|---|---|
POST /api/files/{preset}/{file_name} |
BLOB |
File uploads with preset-based variant generation |
POST /api/files |
CRDT/RTDB/FLDR |
Metadata-only creation for mutable file types |
Available presets for BLOB uploads: default, profile-picture, cover, high_quality, mobile, archive, podcast, video, orig-only, thumbnail-only, apkg
Per-File Access Control
Each file has independent access control:
| Field | Values | Description |
|---|---|---|
| Visibility | P/V/F/C/null |
Who can discover this file |
| Access Level | R/W |
Read-only vs read-write access |
See Access Control for detailed permission handling.
File Variants
Concept
A single uploaded image automatically generates multiple variants optimized for different use cases:
- pf (profile): Profile picture icon (~80px)
- tn (thumbnail): Small preview (~256px)
- sd (standard definition): Mobile/low bandwidth (~720px)
- md (medium definition): Desktop viewing (~1280px)
- hd (high definition): High quality display (~1920px)
- xd (extra definition): 4K/maximum quality (~3840px)
File Descriptor Encoding
A file descriptor encodes all available variants in a compact format.
File Descriptor Format Specification
Format
Components
d2,- Descriptor prefix (version 2){class}- Media class:vis- Visual (images: jpeg, png, webp, avif)vid- Video (mp4/h264)aud- Audio (opus, mp3)doc- Documents (pdf)raw- Original unprocessed file
{variant}- Quality tier:pf,tn,sd,md,hd, orxd{blob_id}- Content-addressed ID of the blob (b1~...)f={format}- Format:avif,webp,jpeg,png,mp4,opus,pdfs={size}- File size in bytes (integer, no separators)r={width}x{height}- Resolution in pixels (width × height);- Semicolon separator between variants (no spaces)
The original is encoded as the bare token orig (no class prefix), regardless of its media class — e.g. orig:b1~...:f=jpeg:.... A descriptor may also begin with an optional R={root_id}; field that links the file to its document-tree access-control root.
Optional Fields
For video, audio, and document files:
dur={seconds}- Duration in seconds (floating point, video/audio only)br={kbps}- Bitrate in kbps (integer, video/audio only)pg={count}- Page count (integer, documents only)
Example
This descriptor encodes two variants:
- Thumbnail: AVIF format, 4096 bytes, 150×150 pixels, blob ID
b1~abc123 - Standard: AVIF format, 32768 bytes, 640×480 pixels, blob ID
b1~def456
Video Example
This descriptor includes:
- Thumbnail: AVIF image preview
- SD Video: 720p MP4, 120.5 seconds, 350 kbps
- HD Video: 1080p MP4, 120.5 seconds, 1400 kbps
Parsing Rules
- Check prefix: Verify descriptor starts with
d2, - Split by semicolon (
;): Get individual variant entries - For each variant, split by colon (
:) to get components:- Component [0] = class.variant (
vis.tn,vis.sd,vid.hd) - Component [1] = blob_id (
b1~...) - Components [2..] = key=value pairs
- Component [0] = class.variant (
- Parse key=value pairs:
f={format}→ Format strings={size}→ Parse as u64 (bytes)r={width}x{height}→ Split byx, parse as u32 × u32dur={seconds}→ Parse as f64 (optional)br={kbps}→ Parse as u32 (optional)pg={count}→ Parse as u32 (optional)
Parsing logic: split by semicolons for variants, then by colons for fields, then parse key=value pairs.
Variant Size Classes - Exact Specifications
Cloudillo generates image variants at specific size targets to optimize bandwidth and storage:
| Quality | Code | Max Dimension | Use Case |
|---|---|---|---|
| Profile | pf |
80px | Profile picture icons |
| Thumbnail | tn |
256px | List views, previews, avatars |
| Standard | sd |
720px | Mobile devices, low bandwidth |
| Medium | md |
1280px | Desktop viewing |
| High | hd |
1920px | High quality display |
| Extra | xd |
3840px | 4K displays, maximum quality |
| Original | orig |
- | Unprocessed source file |
Generation Rules
Which variants are generated depends on the preset configuration. The default preset generates: tn, sd, md, hd. The high_quality preset adds xd. Variants larger than the original image are automatically skipped (smaller originals are never upscaled).
Properties:
- Each variant maintains the original aspect ratio
- Uses Lanczos3 filter for high-quality downscaling
- Maximum dimension constraint prevents oversizing
- Smaller originals don’t get upscaled
Variant Selection
Clients request a specific variant:
Response: Returns HD variant if available, otherwise falls back to smaller variants.
Automatic Fallback
If the requested variant doesn’t exist, the server returns the best available:
- Try requested variant (e.g.,
hd) - Fall back to next smaller (e.g.,
md) - Continue until variant found
- Return smallest if none larger
Fallback order: xd → hd → md → sd → tn
Content-Addressing Flow
File storage uses a three-level content-addressing hierarchy:
Level 1: Blob Storage
Upload image → Save as blob → Compute SHA256 of blob bytes → Store blob with ID: b1~{hash}
Example: b1~abc123... identifies the thumbnail AVIF blob
See Content-Addressing & Merkle Trees for hash computation details.
Level 2: Variant Collection
Generate all variants (tn, sd, md, hd) → Each variant gets its own blob ID (b1~...) → Collect all variant metadata → Create descriptor string encoding all variants
Level 3: File Descriptor
Build descriptor → Compute SHA256 of descriptor string → Final file ID: f1~{hash} → This file ID goes into action attachments
Example Complete Flow
File attachments integrate into Cloudillo’s merkle tree structure. See Content-Addressing & Merkle Trees for how files fit into the verification chain.
Image Processing Pipeline
Upload Flow
When a client uploads an image:
- Client Request
- Dimension Extraction & Variant Selection
The image dimensions are extracted and the preset’s image variant list (e.g. ["vis.tn", "vis.sd", "vis.md", "vis.hd"] for default) is walked from smallest to largest. Each variant’s bounding box is capped at the original’s longest side — the original is never upscaled. A variant is then skipped if its capped size is less than 10% larger than the last variant actually created, so a small original collapses to just the thumbnail plus one or two distinct sizes instead of several near-identical blobs.
The intermediate steps (task scheduling, hash computation, blob storage, variant generation, and metadata storage) are shown in the Complete Upload Flow Diagram below.
- Response
The upload responds immediately with a temporary local ID (@{f_id}) plus the synchronously-generated thumbnail blob ID and original dimensions. The remaining variants are still being generated asynchronously:
The final content-addressed file ID (f1~...) is only known once all variant tasks finish. The server then pushes a FILE_ID_GENERATED WebSocket event ( { tempId, fileId, rootId } ) so clients can swap the temporary @{f_id} for the permanent f1~ ID.
Complete Upload Flow Diagram
Download Flow
Client Request
Server Processing
- Parse Descriptor
- Select Best Variant
- Stream from BlobAdapter
Response
Note: Content-addressed files are immutable, so can be cached forever.
Metadata Structure
File metadata lives in two tables in the MetaAdapter. The files row holds the logical file (type, preset, owner, visibility, folder/document-tree links); file_variants rows hold one entry per generated variant. A local internal f_id integer keys both tables while the file is still being processed; the content-addressed file_id (f1~...) is written only once all variant tasks finish (status transitions P → A).
The available flag matters for federation: a synced file lists all variants in its descriptor, but only the variants whose blobs have actually been fetched are marked available locally. See Access Control for how visibility and root_id drive permission checks.
File Presets
Presets control which variants are generated and whether the original is stored. Files are uploaded with a preset in the path:
Available presets: default, profile-picture, cover, high_quality, mobile, archive, podcast, video, orig-only, thumbnail-only, apkg. See File Processing for the full per-preset variant matrix.
Storage Organization
BlobAdapter Layout
Blobs are stored on disk under a per-tenant directory, sharded into two levels by the first four characters of the hash (after the ~). The filename is the full blob ID:
Each variant (and the original, when stored) is an independent blob with its own ID. The file descriptor — not the filesystem — is what groups variants into a logical file. File metadata is stored separately in the MetaAdapter (see Metadata Structure above).
See Also
- System Architecture - Task system and worker pool
- Actions - File attachments in action tokens
- Access Control - File permission checking