Pattern Extraction

The pattern transform extracts recurring patterns from log messages using the Drain algorithm. A message like "User alice logged in from 10.0.0.1" becomes the pattern "User <*> logged in from <*>" — grouping thousands of similar messages into a handful of templates. Patterns are used by Tell’s anomaly detection to spot unusual log activity.

Quick start

[[routing.rules.transformers]]
type = "pattern_matcher"

That’s it. The defaults work well for most log volumes. Each log message gets a pattern ID attached, enabling pattern-based grouping and anomaly scoring downstream.

How it works

The Drain algorithm builds a tree of log patterns:

Incoming messages are tokenized (split on whitespace)
Tokens that look like variables — numbers, IPs, UUIDs, URLs, timestamps, paths, emails — are detected automatically
Messages are matched against existing patterns by similarity
If a match is found, the pattern’s count increments. If not, a new pattern is created.

The result is a set of templates like:

"User <*> logged in from <*>"              count: 14,203
"Payment <*> failed with code <*>"         count: 847
"Request to <*> timed out after <*> ms"    count: 312

Similarity threshold

The similarity_threshold controls how aggressively messages are clustered:

[[routing.rules.transformers]]
type = "pattern_matcher"
similarity_threshold = 0.5

Value	Effect
Lower (0.3)	More lenient — fewer patterns, broader clusters
Default (0.5)	Balanced clustering
Higher (0.7)	More strict — more patterns, tighter clusters

Persistence

By default, patterns live in memory and are lost on restart. Enable file persistence to save them:

[[routing.rules.transformers]]
type = "pattern_matcher"
persistence_enabled = true
persistence_file = "/var/lib/tell/patterns.json"

Patterns are saved in the background — persistence doesn’t slow down the transform pipeline.

Caching

The pattern matcher uses a 3-level cache for performance:

Level	What it caches	Typical hit rate
L1	Exact message hash	70-80%
L2	Normalized template hash	Catches similar messages
L3	Drain tree lookup	Fallback for new messages

Most messages hit L1 (identical to a recent message) and skip the tree entirely. The cache_size setting controls L1 capacity.

Reference

Field	Default	Description
`type`	—	`"pattern_matcher"`
`similarity_threshold`	`0.5`	How similar messages must be to share a pattern (0.0–1.0)
`max_child_nodes`	`100`	Maximum branches per tree node
`cache_size`	`100000`	L1 cache capacity (exact message hashes)
`persistence_enabled`	`false`	Save patterns to disk
`persistence_file`	—	File path for pattern storage
`enabled`	`true`	Set to `false` to disable

What’s next

Anomaly Detection — how patterns power log anomaly scoring
Logs — structured logging reference

Getting Started

Tracking

Pipeline

Analytics

Pattern Extraction

Quick start

How it works

Similarity threshold

Persistence

Caching

Reference

What’s next

Getting Started

Tracking

Pipeline

Analytics

​Quick start

​How it works

​Similarity threshold

​Persistence

​Caching

​Reference

​What’s next

Quick start

How it works

Similarity threshold

Persistence

Caching

Reference

What’s next