File Processing Pipeline

Overview

Cloudillo processes uploaded files through an asynchronous pipeline that generates multiple variants optimized for different use cases. The system uses FFmpeg for multimedia processing, resvg for SVG rasterization, and poppler-utils for PDF handling. Supported media types include images, SVG, videos, audio, PDFs, and raw files.

Processing Architecture

Upload (POST /api/files/{preset}/{file_name})
    ↓
Detect MIME type → Map to VariantClass
    ↓
Validate against preset's allowed_media_classes
    ↓
Route to type-specific handler:
    ├─ Image: Read into memory → thumbnail sync → schedule ImageResizerTask per variant
    ├─ SVG: Sanitize → store as vis.sd → rasterize thumbnail sync
    ├─ Video: Stream to temp → FFprobe → extract frame → thumbnail sync
    │         → schedule VideoTranscoderTask + optional AudioExtractorTask
    ├─ Audio: Stream to temp → FFprobe → schedule AudioExtractorTask per tier
    ├─ PDF: Read into memory → store original → schedule PdfProcessorTask
    └─ Raw: Stream to temp → store as-is (orig variant only)
    ↓
Schedule FileIdGeneratorTask (depends on all variant tasks)
    ↓
Create file descriptor → Content-address all variants
    ↓
Return file ID (f1~...)

The upload handler runs directly (not as a scheduled task). Thumbnails are generated synchronously so clients receive an immediate preview. Additional variants are generated asynchronously via the task scheduler.

Supported File Types

Images

Format Extensions Processing
JPEG .jpg, .jpeg Resize, format conversion
PNG .png Resize, format conversion
GIF .gif First frame extraction, resize
WebP .webp Resize
AVIF .avif Resize
SVG .svg Sanitization, rasterized thumbnail
Image Format Configuration

The vis.pf (profile) variant always uses AVIF. For all other variants, the format is configurable: file.thumbnail_format (default: WebP) controls vis.tn, and file.image_format (default: WebP) controls vis.sd through vis.xd.

SVG Security

SVG files are sanitized before storage: <script>, <foreignObject>, and animation elements are removed, on* event handlers are stripped, and javascript:/data:text/html/vbscript: URLs are blocked. The sanitized SVG is stored as vis.sd (vector format scales infinitely) and rasterized via resvg for the thumbnail variant.

Video

Format Extensions Processing
MP4 .mp4 H.264 transcode, thumbnails
WebM .webm H.264 transcode, thumbnails
MOV .mov H.264 transcode, thumbnails
MKV .mkv H.264 transcode, thumbnails
AVI .avi H.264 transcode, thumbnails

Audio

Format Extensions Processing
MP3 .mp3 OPUS conversion
WAV .wav OPUS conversion
OGG .ogg OPUS conversion
FLAC .flac OPUS conversion
AAC .aac OPUS conversion
WebM Audio .weba OPUS conversion

Documents

Format Extensions Processing
PDF .pdf Page count extraction, first-page thumbnail

Raw Files

Any file type not listed above can be uploaded using presets that allow the Raw variant class (e.g., archive, orig-only). Raw files are stored as-is with no processing beyond content-addressing.

Variant System

Cloudillo uses a two-level variant system with format <class>.<quality>:

Variant Classes

Class Code Description Source Types
Visual vis Static images JPEG, PNG, WebP, AVIF, GIF, SVG
Video vid Video content MP4, WebM, MKV, AVI, MOV
Audio aud Audio tracks MP3, WAV, OGG, FLAC, AAC, OPUS
Document doc Documents PDF
Raw raw Original file Any (unprocessed)

Quality Levels

Quality Code Max Size / Bitrate Use Case
Profile pf 80px (always AVIF) Profile pictures
Thumbnail tn 256px Small previews
Standard sd 720px / 1.5 Mbps / 64 kbps Mobile/low bandwidth
Medium md 1280px / 3 Mbps / 128 kbps Desktop viewing
High hd 1920px / 5 Mbps / 256 kbps High quality
Extra xd 3840px / 15 Mbps 4K/maximum quality
Original orig Unprocessed Source file

Variant Fallback

When a requested variant isn’t available, the system falls back to lower quality:

Request: vis.hd
Fallback chain: vis.md → vis.sd → vis.tn

File Descriptor Format

File descriptors encode all variant information:

d2,vis.tn:b1~abc123:f=webp:s=4096:r=256x192;vis.sd:b1~def456:f=webp:s=32768:r=720x540;vid.hd:b1~xyz789:f=mp4:s=5242880:r=1920x1080:dur=120.5:br=5000
Component Description
d2, Descriptor version prefix
; Variant separator
vis.tn, vid.hd Two-level variant code
b1~... Blob ID (SHA-256 hash)
f= Format (avif, webp, mp4, opus)
s= Size in bytes
r= Resolution (WxH)
dur= Duration in seconds (video/audio)
br= Bitrate in kbps (video/audio)
pg= Page count (PDFs)

Processing Presets

Presets define which variants to generate for different use cases:

Preset Visual Video Audio Use Case
default vis.tn, vis.sd, vis.md, vis.hd vid.sd, vid.md, vid.hd aud.md General uploads
profile-picture vis.pf, vis.tn, vis.sd, vis.md, vis.hd - - Profile images
cover vis.tn, vis.sd, vis.md, vis.hd - - Cover/banner images
high_quality vis.tn, vis.sd, vis.md, vis.hd, vis.xd vid.sd, vid.md, vid.hd, vid.xd aud.md, aud.hd Maximum quality
mobile vis.tn, vis.sd, vis.md vid.sd, vid.md aud.sd Optimized for mobile
archive vis.tn only - - Minimal (keeps original)
podcast vis.tn vid.sd aud.sd, aud.md, aud.hd Audio-focused
video vis.tn, vis.sd, vis.md, vis.hd vid.sd, vid.md, vid.hd - Video-focused
orig-only - - - Store original only, no processing
thumbnail-only - - - Generate thumbnail only, discard original
apkg vis.pf (icon extraction) - - App packages (zip)

Presets that set store_original: true (default, high_quality, archive, podcast, video, orig-only, apkg) preserve the original file as orig. Profile-picture, cover, mobile, and thumbnail-only do not store the original.

The archive and orig-only presets also accept raw (unrecognized) file types. Other presets reject uploads with unsupported MIME types.

FFmpeg Integration

Video Transcoding

ffmpeg -i input.mov \
  -c:v libx264 -preset medium -crf 23 \
  -vf "scale=1280:720:force_original_aspect_ratio=decrease" \
  output.mp4

Audio Transcoding

ffmpeg -i input.mp3 \
  -c:a libopus -b:a 128k \
  output.opus

Thumbnail extraction seeks to 10% of video duration (min 3s) and extracts a single frame, which is then resized through the image processing pipeline.

Content-Addressing

All variants are content-addressed:

  1. Blob level: Raw bytes → b1~{SHA256(bytes)}
  2. Descriptor level: Descriptor string → f1~{SHA256(descriptor)}

This enables deduplication (identical files share blobs), integrity verification, and permanent caching of immutable content.

Task Scheduling

File processing uses the task scheduler for asynchronous variant generation:

Task Type Description
image.resize Resize image to target variant dimensions and format
video.transcode Transcode video to target resolution and bitrate
audio.extract Extract/transcode audio to OPUS at target bitrate
pdf.process Extract page count (pdfinfo) and render first-page thumbnail (pdftoppm)
file.id_gen Generate file descriptor after all variant tasks complete

Dependencies ensure actions only reference fully processed files.

Federation Sync

When syncing files across instances, only file descriptors are synced initially. Variants are fetched on demand from the origin server and cached locally.

Error Handling

Error Action
Unknown format, preset allows Raw Store as-is via raw handler
Unknown format, preset disallows Raw Reject with “unsupported media type” error
FFmpeg failure Log, mark task as failed, allow retry
Storage full Queue for retry, alert admin
Timeout Retry with extended timeout

See Also