Skip to main content
The Parquet sink writes data to Apache Parquet files — a columnar format optimized for analytical queries with excellent compression ratios. Use it for cold storage and data warehousing.

Configuration

[sinks.archive]
type = "parquet"
path = "/data/parquet"
compression = "snappy"
rotation = "hourly"
FieldDefaultNotes
pathOutput directory (required)
compression"snappy""snappy", "zstd", "lz4", or "uncompressed"
rotation"hourly""hourly" or "daily"
buffer_size10,000Rows buffered before flush
row_group_size100,000Max rows per row group
flush_interval"60s"Time-based flush interval

Schemas

Data is separated into different Parquet files by type, each with an optimized schema. Fields are ordered for predicate pushdown — frequently filtered columns come first, large payloads last. Events (9 fields): timestamp, batch_timestamp, workspace_id, event_type, event_name, device_id, session_id, source_ip, payload Logs (10 fields): timestamp, batch_timestamp, workspace_id, level, event_type, source, service, session_id, source_ip, payload Snapshots (7 fields): timestamp, batch_timestamp, workspace_id, source, entity, source_ip, payload

File organization

parquet/
└── {workspace_id}/
    └── {date}/
        └── {hour}/
            ├── events.parquet
            ├── logs.parquet
            └── snapshots.parquet

Reading Parquet files

Parquet is a standard format readable by most analytics tools:
# DuckDB
SELECT * FROM 'parquet/1/2025-01-15/10/*.parquet';

# Polars
import polars as pl
df = pl.read_parquet("parquet/1/2025-01-15/10/events.parquet")

# Pandas
import pandas as pd
df = pd.read_parquet("parquet/1/2025-01-15/10/events.parquet")
Also readable by Apache Spark, ClickHouse (file() table function), and PyArrow.

When to use Parquet vs Arrow IPC

Use Parquet for cold data — archival, compliance, data warehousing. Compression ratios are excellent (especially with Zstd) and the format is widely supported. Use Arrow IPC for hot data — real-time dashboards, frequent reads, inter-process communication. Arrow IPC is ~10x faster to read but files are larger.