.vtx) — a columnar format that delivers 100x faster random access and 10–20x faster sequential scans than Parquet, with comparable compression ratios. Use it when you need fast analytical queries on recent data without a database.
Configuration
| Field | Default | Notes |
|---|---|---|
path | — | Output directory (required) |
rotation | "hourly" | "hourly" or "daily" |
buffer_size | 10,000 | Rows buffered before flush |
flush_interval | "60s" | Time-based flush interval |
Tables
The Vortex sink writes the full Tell schema — 10 tables covering events, logs, identity, and group data:| File | Populated by |
|---|---|
events.vtx | TRACK events |
logs.vtx | Log entries |
snapshots.vtx | Integration snapshots |
sessions.vtx | CONTEXT events (device, location, session metadata) |
users.vtx | IDENTIFY events (core identity) |
user_devices.vtx | IDENTIFY events (device-to-user links) |
user_traits.vtx | IDENTIFY events (key-value traits) |
groups.vtx | GROUP events (core group identity) |
user_groups.vtx | GROUP events (user-to-group links) |
group_traits.vtx | GROUP events (key-value group traits) |
users, user_devices, and user_traits. GROUP events fan out to groups, user_groups, and group_traits. CONTEXT events extract device and location fields into sessions.
Fields in every table are ordered for predicate pushdown — frequently filtered columns (timestamp, workspace_id) come first, large payloads last.
File organization
{hour}/ level is omitted.
Flushing
Data is flushed to disk in three situations:- Buffer threshold — when a workspace/time bucket accumulates
buffer_sizerows - Periodic flush — every
flush_interval(default 60s), regardless of buffer fill - Shutdown — graceful shutdown flushes all remaining buffers
When to use Vortex
Use Vortex when you need fast analytical reads without running ClickHouse — random access is ~100x faster than Parquet and sequential scans are 10–20x faster. The trade-off is larger files on disk (0.99x ratio vs Parquet’s 0.10–0.14x) and slower writes (1.7M events/s vs Parquet’s 2.3–2.6M). Use Parquet for cold archival where compression matters more than read speed. Parquet files are also more widely supported by third-party tools. Use Arrow IPC for local development and inter-process communication where zero-copy reads matter and you’re already using Polars or DuckDB.What’s next
- Sinks overview — compare all available sinks
- Parquet sink — columnar archival with Snappy/Zstd/LZ4 compression
- Routing — control which sources send data to which sinks