Configuration
| Field | Default | Notes |
|---|---|---|
path | — | Output directory (required) |
compression | "snappy" | "snappy", "zstd", "lz4", or "uncompressed" |
rotation | "hourly" | "hourly" or "daily" |
buffer_size | 10,000 | Rows buffered before flush |
row_group_size | 100,000 | Max rows per row group |
flush_interval | "60s" | Time-based flush interval |
Schemas
Data is separated into different Parquet files by type, each with an optimized schema. Fields are ordered for predicate pushdown — frequently filtered columns come first, large payloads last. Events (9 fields):timestamp, batch_timestamp, workspace_id, event_type, event_name, device_id, session_id, source_ip, payload
Logs (10 fields): timestamp, batch_timestamp, workspace_id, level, event_type, source, service, session_id, source_ip, payload
Snapshots (7 fields): timestamp, batch_timestamp, workspace_id, source, entity, source_ip, payload
File organization
Reading Parquet files
Parquet is a standard format readable by most analytics tools:file() table function), and PyArrow.