File Processing Pipeline

Overview

Cloudillo processes uploaded files through an asynchronous pipeline that generates multiple variants optimized for different use cases. The system uses FFmpeg for multimedia processing, resvg for SVG rasterization, and poppler-utils for PDF handling. Supported media types include images, SVG, videos, audio, PDFs, and raw files.

Processing Architecture

Upload (POST /api/files/{preset}/{file_name})
    ↓
Detect MIME type → Map to VariantClass
    ↓
Validate against preset's allowed_media_classes
    ↓
Route to type-specific handler:
    ├─ Image: Read into memory → thumbnail sync → schedule ImageResizerTask per variant
    ├─ SVG: Sanitize → store as vis.sd → rasterize thumbnail sync
    ├─ Video: Stream to temp → FFprobe → extract frame → thumbnail sync
    │         → schedule VideoTranscoderTask + optional AudioExtractorTask
    ├─ Audio: Stream to temp → FFprobe → schedule AudioExtractorTask per tier
    ├─ PDF: Read into memory → store original → schedule PdfProcessorTask
    └─ Raw: Stream to temp → store as-is (orig variant only)
    ↓
Schedule FileIdGeneratorTask (depends on all variant tasks)
    ↓
Create file descriptor → Content-address all variants
    ↓
Return file ID (f1~...)

The upload handler runs directly (not as a scheduled task). Thumbnails are generated synchronously so clients receive an immediate preview. Additional variants are generated asynchronously via the task scheduler.

Supported File Types

Images

Format	Extensions	Processing
JPEG	.jpg, .jpeg	Resize, format conversion
PNG	.png	Resize, format conversion
GIF	.gif	First frame extraction, resize
WebP	.webp	Resize
AVIF	.avif	Resize
SVG	.svg	Sanitization, rasterized thumbnail

Image Format Configuration

The vis.pf (profile) variant always uses AVIF. For all other variants, the format is configurable: file.thumbnail_format (default: WebP) controls vis.tn, and file.image_format (default: WebP) controls vis.sd through vis.xd.

SVG Security

SVG files are sanitized before storage: <script>, <foreignObject>, and animation elements are removed, on* event handlers are stripped, and javascript:/data:text/html/vbscript: URLs are blocked. The sanitized SVG is stored as vis.sd (vector format scales infinitely) and rasterized via resvg for the thumbnail variant.

Video

Format	Extensions	Processing
MP4	.mp4	H.264 transcode, thumbnails
WebM	.webm	H.264 transcode, thumbnails
MOV	.mov	H.264 transcode, thumbnails
MKV	.mkv	H.264 transcode, thumbnails
AVI	.avi	H.264 transcode, thumbnails

Audio

Format	Extensions	Processing
MP3	.mp3	OPUS conversion
WAV	.wav	OPUS conversion
OGG	.ogg	OPUS conversion
FLAC	.flac	OPUS conversion
AAC	.aac	OPUS conversion
WebM Audio	.weba	OPUS conversion

Documents

Format	Extensions	Processing
PDF	.pdf	Page count extraction, first-page thumbnail

Raw Files

Any file type not listed above can be uploaded using presets that allow the Raw variant class (e.g., archive, orig-only). Raw files are stored as-is with no processing beyond content-addressing.

Variant System

Cloudillo uses a two-level variant system with format <class>.<quality>:

Variant Classes

Class	Code	Description	Source Types
Visual	`vis`	Static images	JPEG, PNG, WebP, AVIF, GIF, SVG
Video	`vid`	Video content	MP4, WebM, MKV, AVI, MOV
Audio	`aud`	Audio tracks	MP3, WAV, OGG, FLAC, AAC, OPUS
Document	`doc`	Documents	PDF
Raw	`raw`	Original file	Any (unprocessed)

Quality Levels

Quality	Code	Max Size / Bitrate	Use Case
Profile	`pf`	80px (always AVIF)	Profile pictures
Thumbnail	`tn`	256px	Small previews
Standard	`sd`	720px / 1.5 Mbps / 64 kbps	Mobile/low bandwidth
Medium	`md`	1280px / 3 Mbps / 128 kbps	Desktop viewing
High	`hd`	1920px / 5 Mbps / 256 kbps	High quality
Extra	`xd`	3840px / 15 Mbps	4K/maximum quality
Original	`orig`	Unprocessed	Source file

Variant Fallback

When a requested variant isn’t available, the system falls back to lower quality:

Request: vis.hd
Fallback chain: vis.md → vis.sd → vis.tn

File Descriptor Format

File descriptors encode all variant information:

d2,vis.tn:b1~abc123:f=webp:s=4096:r=256x192;vis.sd:b1~def456:f=webp:s=32768:r=720x540;vid.hd:b1~xyz789:f=mp4:s=5242880:r=1920x1080:dur=120.5:br=5000

Component	Description
`d2,`	Descriptor version prefix
`;`	Variant separator
`vis.tn`, `vid.hd`	Two-level variant code
`b1~...`	Blob ID (SHA-256 hash)
`f=`	Format (avif, webp, mp4, opus)
`s=`	Size in bytes
`r=`	Resolution (WxH)
`dur=`	Duration in seconds (video/audio)
`br=`	Bitrate in kbps (video/audio)
`pg=`	Page count (PDFs)

Processing Presets

Presets define which variants to generate for different use cases:

Preset	Visual	Video	Audio	Use Case
`default`	vis.tn, vis.sd, vis.md, vis.hd	vid.sd, vid.md, vid.hd	aud.md	General uploads
`profile-picture`	vis.pf, vis.tn, vis.sd, vis.md, vis.hd	-	-	Profile images
`cover`	vis.tn, vis.sd, vis.md, vis.hd	-	-	Cover/banner images
`high_quality`	vis.tn, vis.sd, vis.md, vis.hd, vis.xd	vid.sd, vid.md, vid.hd, vid.xd	aud.md, aud.hd	Maximum quality
`mobile`	vis.tn, vis.sd, vis.md	vid.sd, vid.md	aud.sd	Optimized for mobile
`archive`	vis.tn only	-	-	Minimal (keeps original)
`podcast`	vis.tn	vid.sd	aud.sd, aud.md, aud.hd	Audio-focused
`video`	vis.tn, vis.sd, vis.md, vis.hd	vid.sd, vid.md, vid.hd	-	Video-focused
`orig-only`	-	-	-	Store original only, no processing
`thumbnail-only`	-	-	-	Generate thumbnail only, discard original
`apkg`	vis.pf (icon extraction)	-	-	App packages (zip)

Presets that set store_original: true (default, high_quality, archive, podcast, video, orig-only, apkg) preserve the original file as orig. Profile-picture, cover, mobile, and thumbnail-only do not store the original.

The archive and orig-only presets also accept raw (unrecognized) file types. Other presets reject uploads with unsupported MIME types.

FFmpeg Integration

Video Transcoding

ffmpeg -i input.mov \
  -c:v libx264 -preset medium -crf 23 \
  -vf "scale=1280:720:force_original_aspect_ratio=decrease" \
  output.mp4

Audio Transcoding

ffmpeg -i input.mp3 \
  -c:a libopus -b:a 128k \
  output.opus

Thumbnail extraction seeks to 10% of video duration (min 3s) and extracts a single frame, which is then resized through the image processing pipeline.

Content-Addressing

All variants are content-addressed:

Blob level: Raw bytes → b1~{SHA256(bytes)}
Descriptor level: Descriptor string → f1~{SHA256(descriptor)}

This enables deduplication (identical files share blobs), integrity verification, and permanent caching of immutable content.

Task Scheduling

File processing uses the task scheduler for asynchronous variant generation:

Task Type	Description
`image.resize`	Resize image to target variant dimensions and format
`video.transcode`	Transcode video to target resolution and bitrate
`audio.extract`	Extract/transcode audio to OPUS at target bitrate
`pdf.process`	Extract page count (pdfinfo) and render first-page thumbnail (pdftoppm)
`file.id_gen`	Generate file descriptor after all variant tasks complete

Dependencies ensure actions only reference fully processed files.

Federation Sync

When syncing files across instances, only file descriptors are synced initially. Variants are fetched on demand from the origin server and cached locally.

Error Handling

Error	Action
Unknown format, preset allows Raw	Store as-is via raw handler
Unknown format, preset disallows Raw	Reject with “unsupported media type” error
FFmpeg failure	Log, mark task as failed, allow retry
Storage full	Queue for retry, alert admin
Timeout	Retry with extended timeout