Skip to main content
The Vortex sink writes data to Vortex files (.vtx) — a columnar format that delivers 100x faster random access and 10–20x faster sequential scans than Parquet, with comparable compression ratios. Use it when you need fast analytical queries on recent data without a database.

Configuration

[sinks.analytics]
type = "vortex"
path = "/data/vortex"
rotation = "hourly"
FieldDefaultNotes
pathOutput directory (required)
rotation"hourly""hourly" or "daily"
buffer_size10,000Rows buffered before flush
flush_interval"60s"Time-based flush interval
Compression is automatic — Vortex selects the optimal encoding per column using cascading strategies (dictionary, run-length, ALP for timestamps, FSST for strings, Zstd as a final stage). There’s nothing to configure.

Tables

The Vortex sink writes the full Tell schema — 10 tables covering events, logs, identity, and group data:
FilePopulated by
events.vtxTRACK events
logs.vtxLog entries
snapshots.vtxIntegration snapshots
sessions.vtxCONTEXT events (device, location, session metadata)
users.vtxIDENTIFY events (core identity)
user_devices.vtxIDENTIFY events (device-to-user links)
user_traits.vtxIDENTIFY events (key-value traits)
groups.vtxGROUP events (core group identity)
user_groups.vtxGROUP events (user-to-group links)
group_traits.vtxGROUP events (key-value group traits)
IDENTIFY events fan out to users, user_devices, and user_traits. GROUP events fan out to groups, user_groups, and group_traits. CONTEXT events extract device and location fields into sessions. Fields in every table are ordered for predicate pushdown — frequently filtered columns (timestamp, workspace_id) come first, large payloads last.

File organization

vortex/
└── {workspace_id}/
    └── {date}/
        └── {hour}/
            ├── events.vtx
            ├── logs.vtx
            ├── snapshots.vtx
            ├── sessions.vtx
            ├── users.vtx
            ├── user_devices.vtx
            ├── user_traits.vtx
            ├── groups.vtx
            ├── user_groups.vtx
            └── group_traits.vtx
With daily rotation, the {hour}/ level is omitted.

Flushing

Data is flushed to disk in three situations:
  1. Buffer threshold — when a workspace/time bucket accumulates buffer_size rows
  2. Periodic flush — every flush_interval (default 60s), regardless of buffer fill
  3. Shutdown — graceful shutdown flushes all remaining buffers
Events and logs use zero-allocation builders on the hot path — no per-event heap allocation. Cold-path tables (users, groups, sessions) buffer rows in memory and convert to Arrow on flush.

When to use Vortex

Use Vortex when you need fast analytical reads without running ClickHouse — random access is ~100x faster than Parquet and sequential scans are 10–20x faster. The trade-off is larger files on disk (0.99x ratio vs Parquet’s 0.10–0.14x) and slower writes (1.7M events/s vs Parquet’s 2.3–2.6M). Use Parquet for cold archival where compression matters more than read speed. Parquet files are also more widely supported by third-party tools. Use Arrow IPC for local development and inter-process communication where zero-copy reads matter and you’re already using Polars or DuckDB.

What’s next

  • Sinks overview — compare all available sinks
  • Parquet sink — columnar archival with Snappy/Zstd/LZ4 compression
  • Routing — control which sources send data to which sinks