Vortex Sink

The Vortex sink writes data to Vortex files (.vtx) — a columnar format that delivers 100x faster random access and 10–20x faster sequential scans than Parquet, with comparable compression ratios. Use it when you need fast analytical queries on recent data without a database.

Configuration

[sinks.analytics]
type = "vortex"
path = "/data/vortex"
rotation = "hourly"

Field	Default	Notes
`path`	—	Output directory (required)
`rotation`	`"hourly"`	`"hourly"` or `"daily"`
`buffer_size`	10,000	Rows buffered before flush
`flush_interval`	`"60s"`	Time-based flush interval

Compression is automatic — Vortex selects the optimal encoding per column using cascading strategies (dictionary, run-length, ALP for timestamps, FSST for strings, Zstd as a final stage). There’s nothing to configure.

Tables

The Vortex sink writes the full Tell schema — 10 tables covering events, logs, identity, and group data:

File	Populated by
`events.vtx`	TRACK events
`logs.vtx`	Log entries
`snapshots.vtx`	Integration snapshots
`sessions.vtx`	CONTEXT events (device, location, session metadata)
`users.vtx`	IDENTIFY events (core identity)
`user_devices.vtx`	IDENTIFY events (device-to-user links)
`user_traits.vtx`	IDENTIFY events (key-value traits)
`groups.vtx`	GROUP events (core group identity)
`user_groups.vtx`	GROUP events (user-to-group links)
`group_traits.vtx`	GROUP events (key-value group traits)

IDENTIFY events fan out to users, user_devices, and user_traits. GROUP events fan out to groups, user_groups, and group_traits. CONTEXT events extract device and location fields into sessions. Fields in every table are ordered for predicate pushdown — frequently filtered columns (timestamp, workspace_id) come first, large payloads last.

File organization

vortex/
└── {workspace_id}/
    └── {date}/
        └── {hour}/
            ├── events.vtx
            ├── logs.vtx
            ├── snapshots.vtx
            ├── sessions.vtx
            ├── users.vtx
            ├── user_devices.vtx
            ├── user_traits.vtx
            ├── groups.vtx
            ├── user_groups.vtx
            └── group_traits.vtx

With daily rotation, the {hour}/ level is omitted.

Flushing

Data is flushed to disk in three situations:

Buffer threshold — when a workspace/time bucket accumulates buffer_size rows
Periodic flush — every flush_interval (default 60s), regardless of buffer fill
Shutdown — graceful shutdown flushes all remaining buffers

Events and logs use zero-allocation builders on the hot path — no per-event heap allocation. Cold-path tables (users, groups, sessions) buffer rows in memory and convert to Arrow on flush.

When to use Vortex

Use Vortex when you need fast analytical reads without running ClickHouse — random access is ~100x faster than Parquet and sequential scans are 10–20x faster. The trade-off is larger files on disk (0.99x ratio vs Parquet’s 0.10–0.14x) and slower writes (1.7M events/s vs Parquet’s 2.3–2.6M). Use Parquet for cold archival where compression matters more than read speed. Parquet files are also more widely supported by third-party tools. Use Arrow IPC for local development and inter-process communication where zero-copy reads matter and you’re already using Polars or DuckDB.

What’s next

Sinks overview — compare all available sinks
Parquet sink — columnar archival with Snappy/Zstd/LZ4 compression
Routing — control which sources send data to which sinks

​Configuration

​Tables

​File organization

​Flushing

​When to use Vortex

​What’s next