MATCH
Find ordered sequences of events using pattern matching with quantifiers.
Syntax
MATCH pattern+
[AGG aggExpression [, aggExpression]*] [,]?
[BY groupClause [, groupClause]*] [,]?
[SORT [BY]? sortExpression [, sortExpression]*] [,]?
[WITHIN interval] [,]?
Parameters
- pattern - Named dataset reference with optional quantifier (
*,+,?,{n}) specifying sequence requirements - aggExpression - Aggregation expression applied to matched rows (e.g.,
total = sum(value)) - interval - Time interval specifying the maximum duration for the entire pattern sequence (e.g.,
5m,1h) - groupClause - Field or expression to group pattern matching by
- sortExpression - Field or expression to order results by, with optional
ASCorDESCdirection
Description
The MATCH command finds ordered sequences of events across multiple named
datasets using regular expression-style pattern matching. Unlike WINDOW,
which performs unordered correlation, MATCH requires that events occur in a
specific temporal sequence.
You specify patterns using named datasets (defined with DEF statements)
followed by optional quantifiers. Quantifiers include * (zero or more), +
(one or more), ? (zero or one), {n} (exactly n), and {n,m} (between n
and m occurrences).
Implicit Timestamp Ordering
MATCH commands automatically sort by the timestamp field unless you explicitly specify a different SORT clause. This default ordering ensures that pattern matching operates on temporally ordered event sequences.
If you explicitly provide a SORT clause, the automatic timestamp ordering is disabled, and your custom ordering takes effect.
Time Constraints with WITHIN
The WITHIN clause constrains the total duration from the start of the first pattern to the end of the last pattern. For example, WITHIN 5m ensures that the entire pattern sequence completes within 5 minutes. The constraint must be a positive value and cannot be negative.
When using WITHIN:
- You must specify exactly one
SORTexpression (or use the implicittimestampordering) - The
WITHINconstraint measures on the actualSORTfield, not hardcoded to timestamp - The
SORTfield type must be compatible with theWITHINtype:TIMESTAMPsort fields work withINTERVAL(e.g.,5m) orCALENDAR_INTERVAL(e.g.,1y,3mon)- Numeric sort fields require matching numeric
WITHINtypes (e.g.,INTsort withINTwithin)
- If you don't specify a
SORTclause, the implicittimestampordering is used automatically
The BY clause partitions data for independent pattern matching
within each group.
Aggregating matched rows with AGG
The AGG clause computes aggregations over the rows that
participated in each matched sequence. This is distinct from
regular AGG — match aggregation functions only see the rows
that were part of the match, not all rows in the time window.
The following functions are available in MATCH AGG:
| Function | Description |
|---|---|
count() | Count all matched rows |
count(x) | Count non-null values of x across matched rows |
sum(x) | Sum numeric values across matched rows |
avg(x) | Average numeric values across matched rows |
min(x) | Minimum value across matched rows |
max(x) | Maximum value across matched rows |
first(x) | Value of x from the first matched row |
last(x) | Value of x from the last matched row |
array_agg(x) | Collect values of x from matched rows into an array |
count_distinct(x) | Count distinct values of x across matched rows |
These functions share names with the regular aggregation functions
but operate exclusively on matched rows. You can combine multiple
aggregations in a single AGG clause.
Examples
Basic Pattern Matching with WITHIN
DEF login = FROM events WHERE event_type = "login";
DEF suspicious = FROM events WHERE event_type = "suspicious_activity";
MATCH login suspicious WITHIN 5m
Finds sequences where a login event is followed by suspicious activity within 5 minutes.
Multiple Patterns with Time Constraint
DEF file_access = FROM events WHERE event_type = "file_access";
DEF data_exfil = FROM events WHERE event_type = "data_transfer";
MATCH file_access+ data_exfil BY user_id WITHIN 10m
Finds sequences where one or more file access events are followed by data transfer, all occurring within 10 minutes, grouped by user.
Aggregating over matched sequences
DEF failed = FROM events WHERE event_type = "login_failed";
DEF success = FROM events WHERE event_type = "login_success";
MATCH failed+ success
AGG attempt_count = count(),
first_failure = first(timestamp),
last_failure = last(timestamp)
BY user_id
WITHIN 10m
Finds brute force login sequences and computes how many events
participated in each match, along with the timestamps of the first
and last matched events. The count() here counts only the rows
that matched the failed+ success pattern, not every row in the
10-minute window.
Numeric Ordering with WITHIN
DEF login = FROM events WHERE event_type = "login";
DEF action = FROM events WHERE event_type = "action";
MATCH login action SORT row_number WITHIN 10
Finds sequences where a login is followed by an action within 10 row numbers. The WITHIN constraint measures on the SORT field (row_number), ensuring the distance from first to last event is at most 10.
Related Commands
- WINDOW - Unordered correlation and aggregation