Skip to main content

MATCH

Find ordered sequences of events using pattern matching with quantifiers.

Syntax

MATCH pattern+
[AGG aggExpression [, aggExpression]*] [,]?
[BY groupClause [, groupClause]*] [,]?
[SORT [BY]? sortExpression [, sortExpression]*] [,]?
[WITHIN interval] [,]?

Parameters

  • pattern - Named dataset reference with optional quantifier (*, +, ?, {n}) specifying sequence requirements
  • aggExpression - Aggregation expression applied to matched rows (e.g., total = sum(value))
  • interval - Time interval specifying the maximum duration for the entire pattern sequence (e.g., 5m, 1h)
  • groupClause - Field or expression to group pattern matching by
  • sortExpression - Field or expression to order results by, with optional ASC or DESC direction

Description

The MATCH command finds ordered sequences of events across multiple named datasets using regular expression-style pattern matching. Unlike WINDOW, which performs unordered correlation, MATCH requires that events occur in a specific temporal sequence.

You specify patterns using named datasets (defined with DEF statements) followed by optional quantifiers. Quantifiers include * (zero or more), + (one or more), ? (zero or one), {n} (exactly n), and {n,m} (between n and m occurrences).

Implicit Timestamp Ordering

MATCH commands automatically sort by the timestamp field unless you explicitly specify a different SORT clause. This default ordering ensures that pattern matching operates on temporally ordered event sequences.

If you explicitly provide a SORT clause, the automatic timestamp ordering is disabled, and your custom ordering takes effect.

Time Constraints with WITHIN

The WITHIN clause constrains the total duration from the start of the first pattern to the end of the last pattern. For example, WITHIN 5m ensures that the entire pattern sequence completes within 5 minutes. The constraint must be a positive value and cannot be negative.

When using WITHIN:

  • You must specify exactly one SORT expression (or use the implicit timestamp ordering)
  • The WITHIN constraint measures on the actual SORT field, not hardcoded to timestamp
  • The SORT field type must be compatible with the WITHIN type:
    • TIMESTAMP sort fields work with INTERVAL (e.g., 5m) or CALENDAR_INTERVAL (e.g., 1y, 3mon)
    • Numeric sort fields require matching numeric WITHIN types (e.g., INT sort with INT within)
  • If you don't specify a SORT clause, the implicit timestamp ordering is used automatically

The BY clause partitions data for independent pattern matching within each group.

Aggregating matched rows with AGG

The AGG clause computes aggregations over the rows that participated in each matched sequence. This is distinct from regular AGG — match aggregation functions only see the rows that were part of the match, not all rows in the time window.

The following functions are available in MATCH AGG:

FunctionDescription
count()Count all matched rows
count(x)Count non-null values of x across matched rows
sum(x)Sum numeric values across matched rows
avg(x)Average numeric values across matched rows
min(x)Minimum value across matched rows
max(x)Maximum value across matched rows
first(x)Value of x from the first matched row
last(x)Value of x from the last matched row
array_agg(x)Collect values of x from matched rows into an array
count_distinct(x)Count distinct values of x across matched rows

These functions share names with the regular aggregation functions but operate exclusively on matched rows. You can combine multiple aggregations in a single AGG clause.

Examples

Basic Pattern Matching with WITHIN

DEF login = FROM events WHERE event_type = "login";
DEF suspicious = FROM events WHERE event_type = "suspicious_activity";

MATCH login suspicious WITHIN 5m

Finds sequences where a login event is followed by suspicious activity within 5 minutes.

Multiple Patterns with Time Constraint

DEF file_access = FROM events WHERE event_type = "file_access";
DEF data_exfil = FROM events WHERE event_type = "data_transfer";

MATCH file_access+ data_exfil BY user_id WITHIN 10m

Finds sequences where one or more file access events are followed by data transfer, all occurring within 10 minutes, grouped by user.

Aggregating over matched sequences

DEF failed = FROM events WHERE event_type = "login_failed";
DEF success = FROM events WHERE event_type = "login_success";

MATCH failed+ success
AGG attempt_count = count(),
first_failure = first(timestamp),
last_failure = last(timestamp)
BY user_id
WITHIN 10m

Finds brute force login sequences and computes how many events participated in each match, along with the timestamps of the first and last matched events. The count() here counts only the rows that matched the failed+ success pattern, not every row in the 10-minute window.

Numeric Ordering with WITHIN

DEF login = FROM events WHERE event_type = "login";
DEF action = FROM events WHERE event_type = "action";

MATCH login action SORT row_number WITHIN 10

Finds sequences where a login is followed by an action within 10 row numbers. The WITHIN constraint measures on the SORT field (row_number), ensuring the distance from first to last event is at most 10.

  • WINDOW - Unordered correlation and aggregation