System Architecture Overview

This document provides a technical overview of the Cloudillo system architecture, explaining the core patterns and components that enable its federated, privacy-focused design.

Workspace Structure

Cloudillo is organized as a Rust workspace with the following crates:

cloudillo-rs/
├── server/                      # Core library (cloudillo)
├── basic-server/                # Reference implementation
└── adapters/
    ├── auth-adapter-sqlite/     # Authentication & cryptography (SQLite)
    ├── meta-adapter-sqlite/     # Metadata storage (SQLite)
    ├── blob-adapter-fs/         # Binary blob storage (Filesystem)
    ├── rtdb-adapter-redb/       # Real-time database (Redb)
    └── crdt-adapter-redb/       # Collaborative editing CRDT (Redb)

Crate Responsibilities

server: Core business logic, HTTP handlers, federation, task system
basic-server: Executable reference implementation using SQLite, Redb, and filesystem
adapters: Five pluggable storage backends implementing the core adapter traits

Adapter Types

The five adapters separate concerns and enable flexible deployments:

AuthAdapter - Authentication, JWT tokens, certificate management, cryptographic operations
MetaAdapter - Tenant/profile data, action tokens, file metadata, tasks
BlobAdapter - Content-addressed immutable binary data (files, images, snapshots)
RtdbAdapter - Real-time hierarchical JSON database with queries and subscriptions
CrdtAdapter - Collaborative document editing with conflict-free merges

Core Architecture Patterns

The Five-Adapter Architecture

Cloudillo’s architecture is built on five fundamental adapters that separate concerns and enable flexible deployments:

1. AuthAdapter

Purpose: Authentication, authorization, and cryptographic operations

Responsibilities:

JWT/token validation and creation
TLS certificate management (ACME integration)
Profile signing key storage and rotation
VAPID keys for push notifications
Password hashing and WebAuthn credentials
Tenant management

Key Operations: Validate tokens, create access tokens, manage TLS certificates, handle profile keys, store passwords/credentials

Why separate? Authentication and cryptography require special security considerations and may need different storage backends (HSM, vault services, etc.).

2. MetaAdapter

Purpose: Structured metadata storage

Responsibilities:

Tenant and profile information
Action tokens (posts, comments, reactions, etc.)
File metadata with variants
Task persistence for the scheduler
Database metadata (for RTDB)

Key Operations: Create/query tenants, store/retrieve actions, manage file metadata, persist tasks, handle profiles

Why separate? Metadata can be stored in different databases (SQLite, PostgreSQL, MongoDB) based on scale and requirements.

3. BlobAdapter

Purpose: Immutable binary data storage

Responsibilities:

Content-addressed blob storage
File data (images, videos, documents)
Database snapshots (for RTDB)
Both buffered and streaming I/O

Key Operations: Write/read blobs (buffered or streaming), check blob existence and size

Why separate? Blob storage can use filesystem, S3, CDN, or specialized object storage based on scale and cost considerations.

4. RtdbAdapter

Purpose: Real-time structured database

Responsibilities:

Path-based hierarchical storage (Firebase-like)
Query with filtering, sorting, pagination
Real-time subscriptions via WebSocket
Transaction support for atomic writes
Secondary indexes for performance
Multi-tenant database isolation

Key Operations: Query/create/update/delete documents, transactions, real-time subscriptions

Why separate? Real-time database functionality can use different backends (redb, PostgreSQL, MongoDB) based on query requirements and scale.

Learn more: RTDB with redb

5. CrdtAdapter

Purpose: Collaborative document storage with CRDTs

Responsibilities:

Binary CRDT update storage (Yjs protocol)
Document metadata management
Real-time change subscriptions
Conflict-free merge guarantees
Awareness state (presence, cursors)
Multi-tenant document isolation

Key Operations: Create/read documents, append updates, retrieve update history, subscribe to changes

Why separate? CRDT storage can use different backends (redb, dedicated CRDT stores) and has different performance characteristics than traditional databases.

Learn more: CRDT Collaborative Editing

Benefits of the Adapter Pattern

✅ Flexible Deployment: Switch storage backends without changing core logic ✅ Separation of Concerns: Security, metadata, and binary data have different requirements ✅ Testing: Easy to create in-memory adapters for testing ✅ Scalability: Can distribute adapters across different services ✅ Cost Optimization: Use appropriate storage for each data type

Content-Addressed Architecture

Cloudillo uses content-addressing throughout its architecture, where resource identifiers are cryptographic hashes of their content. This creates a merkle tree structure that provides cryptographic proof of authenticity and immutability.

Hash-Based Identifiers

All resource IDs are SHA-256 hashes with versioned prefixes:

Prefix	Resource Type	Hash Input	Example
`a1~`	Action	Entire JWT token (header + payload + signature)	`a1~8kR3mN9pQ2vL6xW...`
`f1~`	File	File descriptor string	`f1~Qo2E3G8TJZ2HTGh...`
`b1~`	Blob	Blob bytes (actual image/video data)	`b1~abc123def456ghi...`
`d1~`	Descriptor	(not a hash, the encoded string itself)	`d1~tn:b1~abc:f=AVIF:...`

Version Scheme

Format: {prefix}{version}~{base64_encoded_hash}

Version 1: SHA-256 with base64url encoding (no padding)
Future versions: Can upgrade to SHA-3, BLAKE3, etc. without breaking old content
Backward compatibility: Old content remains valid forever
Algorithm agility: Migrate to new algorithms without breaking existing references

Example upgrade path:

a1~...  (SHA-256)
a2~...  (SHA-3)
a3~...  (BLAKE3)

Six-Level Merkle Tree

Content-addressing creates a hierarchical merkle tree:

Level 1: Blob Data (raw bytes)
   ↓ SHA-256 hash
Level 2: Blob ID (b1~hash)
   ↓ collected in descriptor
Level 3: File Descriptor (d1~variant:b1~hash:format:size:resolution,...)
   ↓ SHA-256 hash of descriptor
Level 4: File ID (f1~hash)
   ↓ referenced in action
Level 5: Action Token (JWT with content, parent, attachments)
   ↓ SHA-256 hash of entire JWT
Level 6: Action ID (a1~hash)

Properties

✅ Immutable: Content cannot change without changing the ID ✅ Tamper-Evident: Any modification is immediately detectable ✅ Deduplicatable: Identical content produces identical IDs ✅ Verifiable: Anyone can recompute and verify hashes ✅ Cacheable: Content-addressed data can be cached forever ✅ Trustless: No need to trust storage providers—verify the hash

Integration with Adapters

Content-addressing is implemented across multiple adapters:

BlobAdapter:

Stores blobs indexed by blob_id (b1~...)
Blob IDs are SHA-256 hashes of the blob bytes
Enables deduplication and integrity verification

MetaAdapter:

Stores action tokens indexed by action_id (a1~...)
Action IDs are SHA-256 hashes of the JWT token
Stores file metadata with file_id (f1~...)
File IDs are SHA-256 hashes of the descriptor

AuthAdapter:

Signs action tokens with profile keys (ES384)
Signature + content hash = complete authenticity proof

Security Benefits

Content-addressing provides multiple security layers:

Integrity Verification: SHA-256 ensures data hasn’t been tampered with
Deduplication: Same content = same hash, prevents storage waste
Cryptographic Binding: Parent references create immutable chains
Federation Trust: Remote instances can verify data integrity
Cache Safety: Content-addressed data can be cached without trust

Performance Benefits

Caching: Immutable content can be cached forever (max-age=31536000)
Deduplication: Identical blobs stored only once across all tenants
Parallel Verification: Hash verification can be parallelized
CDN-Friendly: Content-addressed resources perfect for CDN distribution

Learn more: Content-Addressing & Merkle Trees

Task-Based Asynchronous Processing

Complex operations in Cloudillo are modeled as persistent tasks that can execute asynchronously, survive restarts, and depend on other tasks.

Task System Components

Tasks

Tasks implement the Task<S> trait:

pub trait Task<S>: Debug + Send + Sync {
    fn kind() -> &'static str;
    async fn run(&self, state: &S) -> Result<()>;
    fn priority(&self) -> Priority { Priority::Medium }
    fn dependencies(&self) -> Vec<TaskId> { vec![] }
}

Built-in Task Types:

ActionCreatorTask: Creates and signs action tokens for federation
ActionVerifierTask: Validates incoming federated actions
FileIdGeneratorTask: Generates content-addressed file IDs
ImageResizerTask: Creates image variants (thumbnails, etc.)

Scheduler

The scheduler manages task lifecycle with dependency resolution:

Features:

Task registry with dynamic builders
Dependency resolution (DAG-based)
Scheduled execution (cron-like)
Persistence via MetaAdapter (survives restarts)
Notification system for task completion

Example Flow:

ActionCreatorTask (depends on FileIdGeneratorTask)
    ↓
    waits for file processing to complete
    ↓
FileIdGeneratorTask completes
    ↓
ActionCreatorTask auto-starts
    ↓
Creates signed JWT, stores in MetaAdapter

Worker Pool

A priority-based thread pool for CPU-intensive and blocking operations:

Architecture:

Three priority tiers: High > Medium > Low
Each tier has configurable worker thread count
Uses flume MPMC channels for work distribution
Returns futures for async integration

Default Configuration (basic-server):

1 high-priority worker
2 medium-priority workers
1 low-priority worker

Use Cases:

Image processing (CPU-intensive)
Cryptographic operations
File compression
Blocking I/O

Why Task-Based Processing?

✅ Resilience: Tasks survive server restarts ✅ Dependencies: Complex workflows with ordered execution ✅ Scheduling: Cron-like execution for periodic tasks ✅ Observability: Track task progress and failures ✅ Concurrency: Priority-based execution

Application State Management

AppState Structure

The core application state contains: scheduler, worker pool, HTTP client, TLS certificates, and all five adapters (auth, meta, blob, rtdb, crdt).

AppBuilder Pattern

Configuration uses a fluent builder API with mode, identity, domain, data directory, and adapter selections.

Configuration Options:

Server mode (Standalone, Proxy, StreamProxy)
Network binding (HTTPS/HTTP ports)
Domain configuration
Directory paths (dist, tmp, data)
Adapter injection
Worker pool sizing

Module Organization

The server crate is organized by functional domain:

core/ - Infrastructure Layer

Core system components providing foundational services:

app.rs: Application state, builder, bootstrap logic
acme.rs: Let’s Encrypt/ACME certificate management
webserver.rs: Axum/Rustls HTTPS server with SNI
middleware.rs: Authentication middleware
extract.rs: Custom Axum extractors (TnId, IdTag, Auth)
scheduler.rs: Task scheduling with dependencies
worker.rs: Thread pool with priority levels
websocket.rs: WebSocket infrastructure
request.rs: HTTP client for federation
hasher.rs: Content-addressable storage (SHA256)
utils.rs: Random ID generation

auth/ - Authentication Module

handler.rs: Login endpoints, token generation, password management

action/ - Action/Activity Subsystem

Implements the federated activity system:

action.rs: Action creation/verification tasks
process.rs: JWT verification, token processing
handler.rs: Action CRUD endpoints, federation inbox

profile/ - Profile Management

handler.rs: Tenant profile endpoints

file/ - File Storage & Processing

file.rs: File descriptor encoding, variant selection
handler.rs: File upload/download endpoints
image.rs: Image resizing tasks
store.rs: Storage layer abstraction

rtdb/ - Real-Time Database

Note: In development, see RTDB Architecture for details

handler.rs: Database CRUD endpoints
websocket.rs: WebSocket connection handler
manager.rs: Database instance lifecycle

routes/ - HTTP Routing

Separates public and protected route groups
API endpoints (/api/*)
Static file serving for frontend
CORS configuration

Concurrency Model

Multi-threaded Tokio Runtime

Cloudillo uses Tokio’s multi-threaded async runtime for handling concurrent requests.

Concurrency Layers

Async Layer (Tokio): HTTP request handling, WebSocket connections, I/O
Worker Pool: CPU-intensive tasks, blocking operations
Scheduler: Background task execution with dependencies

Interaction Example:

HTTP Request → Tokio async handler
    ↓
Spawn ImageResizerTask on scheduler
    ↓
Scheduler dispatches to worker pool (CPU-intensive)
    ↓
Worker completes, updates MetaAdapter
    ↓
Scheduler notifies waiting tasks
    ↓
Response returned to client

Request Handling Flow

Middleware Pipeline

HTTP requests flow through these layers:

HTTPS/SNI Resolution: CertResolver selects TLS certificate by domain
Tracing/Logging: Request tracing with structured logging
Authentication: require_auth or optional_auth middleware
Custom Extractors: TnId, IdTag, Auth extract from request context
Handler: Business logic execution
Response: JSON serialization, error handling

Custom Extractors

Axum extractors provide typed access to request context:

TnId: Tenant ID (database primary key, i64)
IdTag: Tenant identifier string (e.g., “alice.example.com”)
Auth: Full authentication context (tn_id, id_tag, scope, etc.)

Error Handling

Custom error enum with automatic HTTP response conversion: NotFound (404), PermissionDenied (403), DbError/Unknown/Parse/Io (500).

Server Modes

Cloudillo supports different deployment modes:

Standalone (Default)

Self-contained single instance:

HTTPS on configured port
Optional HTTP for ACME challenges
All adapters run locally

Use case: Personal servers, small communities

Proxy

Used if Cloudillo is behind a reverse proxy:

Listens on HTTP port
Certificate handled is the responsibility of the proxy

Use case: Managed hosting providers, self-hosting with multiple services on one IP address

Security Architecture

Implemented in Rust

Maximal memory and concurrency safety. Minimal attack surface.

No Unsafe Code

Cloudillo enforces memory safety:

#![forbid(unsafe_code)]

ABAC Permission System

Cloudillo uses Attribute-Based Access Control (ABAC) for fine-grained permissions across all resources. ABAC provides flexible permission rules based on:

User attributes (identity, roles, relationships)
Resource attributes (owner, visibility, type)
Contextual factors (time, environment)

Key Features:

Five visibility levels (public, private, followers, connected, direct)
Policy-based access control (TOP/BOTTOM policies)
Relationship-aware (following, connections)
Time-based permissions
Custom policy support

Learn more: ABAC Permission System

Cryptographic Algorithms

P384: Elliptic curve for action signing
ES384: JWT signature algorithm
SHA256: Content addressing and hashing
bcrypt: Password hashing

Security Layers

TLS/HTTPS: All connections encrypted (Rustls)
ACME: Automatic certificate management (Let’s Encrypt)
JWT: Cryptographically signed tokens
Content Addressing: Tamper detection via SHA256
Permission Checks: Authorization at every access point

Bootstrap Process

Initial setup when starting a new instance:

Create tenant with base_id_tag
Set password (if provided)
Generate profile signing key (P384)
Initiate ACME for TLS certificates (if email provided)
Start scheduler and worker pool
Start HTTP/HTTPS servers

Example configuration:

BASE_ID_TAG=alice.example.com        # Required
BASE_PASSWORD=secret                 # Initial password
ACME_EMAIL=alice@example.com         # Let's Encrypt email
MODE=standalone                      # Server mode
LISTEN=0.0.0.0:8443                  # HTTPS binding
DATA_DIR=./data                      # storage path

Key Dependencies

Web Framework

axum (0.8): Async web framework
tower: Service abstractions
tower-http: CORS, static files

TLS & Crypto

rustls (0.23): Pure Rust TLS
instant-acme (0.8): ACME client
jsonwebtoken (9.3): JWT handling
p384: Elliptic curve operations

Async Runtime

tokio (1.48): Multi-threaded async

Serialization

serde (1.0): Serialization framework
serde_json: JSON support

Database

sqlx (0.8): Async SQL (SQLite support)

Utilities

image (0.25): Image processing
sha2: SHA256 hashing
croner (3.0): Cron expressions
flume: MPMC channels

Architectural Strengths

✅ Pluggable Adapters: Easy to swap storage backends ✅ Self-Contained: No external dependencies required ✅ Federated: Communicates with other instances ✅ Task-Based: Persistent, resumable async execution ✅ Type-Safe: Leverages Rust’s type system ✅ Memory-Safe: Complete #![forbid(unsafe_code)] ✅ Observable: Built-in tracing integration

Next Steps

Identity System - DNS-based identity and key management
[Access Control](/architecture/data-layer/access-control/access - Token validation and permissions
ABAC Permissions - Attribute-based access control system
[Actions & Federation](/architecture/actions-federation/actions - Action tokens and cross-instance communication
File Storage - Content-addressed storage and processing
RTDB with redb - Query-based real-time database
CRDT Collaborative Editing - Conflict-free collaborative documents
Real-Time Systems Overview - Introduction to RTDB and CRDT