File Processing Pipeline

Overview

Cloudillo processes uploaded files through an asynchronous pipeline that generates multiple variants optimized for different use cases. The system uses FFmpeg for multimedia processing and supports images, videos, audio, and PDFs.

Processing Architecture

File Upload
    ↓
Store original blob
    ↓
Create FileIdGeneratorTask
    ↓
Detect file type (MIME)
    ↓
Generate variants (async)
    ├─ Image: thumbnails, SD, MD, HD
    ├─ Video: transcoded variants, thumbnails
    ├─ Audio: normalized, compressed
    └─ PDF: text extraction, thumbnails
    ↓
Create file descriptor
    ↓
Content-address all variants
    ↓
Return file ID (f1~...)

Supported File Types

Images

Format Extensions Processing
JPEG .jpg, .jpeg Resize, AVIF/WebP conversion
PNG .png Resize, AVIF/WebP conversion
GIF .gif First frame extraction, resize
WebP .webp Resize only
AVIF .avif Resize only
Image Format Selection

Thumbnails use AVIF for best compression. Larger variants (SD, MD, HD) use WebP for faster encoding while maintaining good quality.

Video

Format Extensions Processing
MP4 .mp4 H.264 transcode, thumbnails
WebM .webm H.264 transcode, thumbnails
MOV .mov H.264 transcode, thumbnails
MKV .mkv H.264 transcode, thumbnails

Audio

Format Extensions Processing
MP3 .mp3 OPUS conversion
WAV .wav OPUS conversion
OGG .ogg OPUS conversion
FLAC .flac OPUS conversion
M4A .m4a OPUS conversion
OPUS .opus Normalization only

Documents

Format Extensions Processing
PDF .pdf Text extraction, page thumbnails

Variant System

Cloudillo uses a two-level variant system with format <class>.<quality>:

vis.sd   →  class: visual (image), quality: standard definition
vid.hd   →  class: video, quality: high definition
aud.md   →  class: audio, quality: medium

Variant Classes

Class Code Description Source Types
Visual vis Static images JPEG, PNG, WebP, AVIF, GIF
Video vid Video content MP4, WebM, MKV, AVI, MOV
Audio aud Audio tracks MP3, WAV, OGG, FLAC, AAC, OPUS
Document doc Documents PDF
Raw raw Original file Any (unprocessed)

Quality Levels

Quality Code Description
Profile pf 80px - Profile pictures
Thumbnail tn 128px - Small previews
Standard sd 720px - Mobile/low bandwidth
Medium md 1280px - Desktop viewing
High hd 1920px - High quality
Extra xd 3840px - 4K/maximum quality
Original orig Unprocessed source file

Visual Variants (Images)

Variant Max Size Format Use Case
vis.pf 80×80 AVIF Profile pictures
vis.tn 128×128 AVIF Thumbnails, listings
vis.sd 720px WebP Mobile, previews
vis.md 1280px WebP Desktop viewing
vis.hd 1920px WebP High quality display
vis.xd 3840px WebP 4K displays
orig - Original Source file

Video Variants

Variant Max Resolution Bitrate Use Case
vid.sd 720px 1.5 Mbps Mobile, low bandwidth
vid.md 1280px 3 Mbps Desktop
vid.hd 1920px 5 Mbps High quality
vid.xd 3840px 15 Mbps 4K playback

Video processing also extracts a vis.tn thumbnail from the first few seconds.

Audio Variants

Variant Format Bitrate Use Case
aud.sd OPUS 64 kbps Low bandwidth
aud.md OPUS 128 kbps Normal playback
aud.hd OPUS 256 kbps High quality

Document Variants

Variant Description
doc.orig Original PDF
vis.tn Thumbnail of first page

Variant Fallback

When a requested variant isn’t available, the system falls back to lower quality:

Request: vis.hd
Fallback chain: vis.md → vis.sd → vis.tn

File Descriptor Format

File descriptors encode all variant information:

d2,vis.tn:b1~abc123:f=avif:s=4096:r=128x128;vis.sd:b1~def456:f=webp:s=32768:r=720x540;vid.hd:b1~xyz789:f=mp4:s=5242880:r=1920x1080:dur=120.5:br=5000
Component Description
d2, Descriptor version prefix
; Variant separator
vis.tn, vid.hd Two-level variant code
b1~... Blob ID (SHA-256 hash)
f= Format (avif, webp, mp4, opus)
s= Size in bytes
r= Resolution (WxH)
dur= Duration in seconds (video/audio)
br= Bitrate in kbps (video/audio)
pg= Page count (PDFs)

Processing Presets

Presets define which variants to generate for different use cases:

Preset Visual Video Audio Use Case
default vis.tn, vis.sd, vis.md, vis.hd vid.sd, vid.md, vid.hd aud.sd, aud.md, aud.hd General uploads
profile_picture vis.pf, vis.tn, vis.sd - - Profile images
cover vis.tn, vis.sd, vis.md, vis.hd - - Cover/banner images
high_quality vis.tn → vis.xd vid.sd → vid.xd aud.md, aud.hd Maximum quality
mobile vis.tn, vis.sd, vis.md vid.sd, vid.md aud.sd Optimized for mobile
archive vis.tn only - - Minimal (keeps original)
podcast - - aud.sd, aud.md, aud.hd Audio extraction
video vis.tn (thumbnail) vid.sd → vid.hd - Video-focused

All presets preserve the original file as orig unless configured otherwise.

FFmpeg Integration

Video and audio processing uses FFmpeg:

Video Transcoding

ffmpeg -i input.mov \
  -c:v libx264 -preset medium -crf 23 \
  -c:a aac -b:a 128k \
  -vf "scale=1280:720:force_original_aspect_ratio=decrease" \
  output.mp4

Audio Transcoding

ffmpeg -i input.mp3 \
  -c:a libopus -b:a 96k \
  output.opus

Audio Extraction

From video files:

ffmpeg -i input.mp4 \
  -vn -c:a libopus -b:a 96k \
  output.opus

Thumbnail Generation

ffmpeg -i input.mp4 \
  -ss 00:00:01 -vframes 1 \
  -vf "scale=150:150:force_original_aspect_ratio=decrease" \
  thumbnail.jpg

Content-Addressing

All variants are content-addressed:

  1. Blob level: Raw bytes → b1~{SHA256(bytes)}
  2. Descriptor level: Descriptor string → f1~{SHA256(descriptor)}

This enables:

  • Deduplication: Identical files share blobs
  • Verification: Hashes prove integrity
  • Caching: Immutable content can be cached forever

Task Scheduling

File processing uses the task scheduler:

FileUploadTask
    ↓
Creates FileIdGeneratorTask (depends on upload)
    ↓
FileIdGeneratorTask generates variants
    ↓
Action can reference file (depends on FileIdGeneratorTask)

Dependencies ensure actions only reference fully processed files.

Federation Sync

When syncing files across instances:

Metadata-Only Sync

For efficiency, only file descriptors are synced initially:

  1. Receive action with file attachment
  2. Fetch file descriptor from origin
  3. Store descriptor locally
  4. Fetch variants on demand

Variant Fetching

Client requests file
    ↓
Check if variant exists locally
    ↓
Yes → Serve from local storage
    ↓
No → Fetch from origin server
    Store locally
    Serve to client

Error Handling

Error Action
Unsupported format Reject upload with error
FFmpeg failure Log, mark as failed, allow retry
Storage full Queue for retry, alert admin
Timeout Retry with extended timeout

See Also