Introducing Hamelin
Hamelin is a pipe-based query language for event analytics which targets the specific challenges detection engineers face when analyzing security events. The language makes event correlation straightforward, letting you define patterns, correlate them across time windows, and match ordered sequences of events.
Key Features
🔄 Pipe-Based
You write queries that read naturally from top to bottom. Each operation
connects to the next using the pipe operator |. Pipe-based languages let you
build queries incrementally, making them easier to read, write, and test than
approaches that rely heavily on nested subqueries.
FROM events
| WHERE event.action == 'login'
| WITHIN -1hr
| SELECT user.email, timestamp
🕐 Event-Native
Hamelin offers shorthand for working with timestamped events. Time intervals are written as simple expressions that match how you think about time. You can reference relative timestamps and truncate them to specific boundaries.
// Reference relative time
| WITHIN -15m // events within the last 15 minutes
| WITHIN -1h // events within the last hour
| WITHIN -7d // events within the last 7 days
// Truncate timestamps to boundaries
| SELECT timestamp // truncate to hour boundary
| SELECT timestamp // truncate to day boundary
| SELECT timestamp@4h // truncate to 4-hour boundaries
| SELECT timestamp@15m // truncate to 15-minute boundaries
🪟 Sliding Windows
Sliding windows move continuously with each event, giving you insights without gaps or duplicates. You can aggregate data over these moving time windows to detect patterns as they happen.
FROM events
| WHERE event.action == 'login'
| WINDOW count()
BY user.id
WITHIN -15m
🎯 Correlation of Named Subqueries
Named subqueries let you define specific event patterns and correlate them within sliding windows. You can drop these patterns into sliding windows and write correlations around them. Hamelin makes it straightforward to aggregate over specific patterns while also aggregating over the entire group of events.
DEF failed_logins = FROM events
| WHERE event.action == 'login_failed';
DEF successful_logins = FROM events
| WHERE event.action == 'login_success';
FROM failed = failed_logins, success = successful_logins
| WINDOW failures = count(failed),
successes = count(success),
total = count(),
BY user.id
WITHIN -5m
| WHERE successes >= 1 && failures / total > 0.2
This query demonstrates correlating failed and successful login events to detect
brute force attacks. Named subqueries define distinct event patterns:
failed_logins filters to login failure events while successful_logins
filters to login success events. The sliding window aggregates these patterns by
user over 5-minute periods, counting failures, successes, and total events. The
final filter identifies users who had at least one successful login where failed
attempts represent more than 20% of their total login activity within that
window.
🔍 Ordered Matching of Named Subqueries
You can ask Hamelin to match ordered patterns across events. Aggregations over sliding windows work well for many use cases, but others require that you search for specific events followed by other specific events. You can do that in Hamelin using regular expression quantifiers applied to named subqueries.
DEF failed_logins = FROM events
| WHERE event.action == 'login_failed';
DEF successful_logins = FROM events
| WHERE event.action == 'login_success';
MATCH failed_logins{10,} successful_logins+ WITHIN 10m
This searches for 10 failed logins followed by at least one successful login, with the entire sequence completing within a 10 minute period. The sliding window approach might miss attack patterns where timing and sequence matter, but ordered matching can detect the exact progression of a brute force attack.
🔗 Event Type Expansion
You can query across different event types without worrying about schema
differences. Hamelin automatically sets missing fields to null when they don't
exist in a particular event type.
FROM login_events, logout_events, error_events
// Filters by user.email when if this field exists in a row.
// Drops rows where this field does not exist
// (because NULL does not equal any string).
| WHERE user.email == '[email protected]'
🗂️ Structured Types
Hamelin supports structured types like structs, arrays, and maps to represent complex data. These types make data modeling more familiar, and reduce the need to rely too much on joins in analytic queries.
// Create struct literals with nested data
SET login_metadata = {
ip_address: '192.168.1.100',
user_agent: 'Mozilla/5.0',
location: 'San Francisco'
}
// Access nested fields using dot notation
| WHERE login_metadata.ip_address != '192.168.1.100'
// Use arrays to store multiple related values
| SET failed_attempts = [
{timestamp: '2024-01-15T14:25:00Z', reason: 'invalid_password'},
{timestamp: '2024-01-15T14:27:00Z', reason: 'account_locked'}
]
// Use maps when key data is high cardinality
// Using structs for this use case creates too many fields.
| SET host_metrics = map(
'web-server-01': {cpu: 85.2, memory: 72.1},
'web-server-02': {cpu: 91.7, memory: 68.9},
'db-primary-01': {cpu: 67.3, memory: 89.4}
)
// Look up map values using index notation
| WHERE host_metrics['web-server-01'].cpu > 80
📡 Array Broadcasting
Hamelin makes working with arrays simpler by offering broadcasting, which helps you distribute operations over each member of an array. It does this when you apply an operation to an array that makes more sense to be applied to each of its members. Broadcasting lets you work with arrays using simple, familiar syntax without asking you to resort to functional programming or inefficient unnesting.
| WHERE any(failed_attempts.reason == 'invalid_password')
This example demonstrates how the equality operator == broadcasts across the
reason field of each element in the failed_attempts array. This example
demonstrates two broadcasts:
- first, the lookup of the
reasonfield changes an array-of-struct into an array-of-string - second, applying equality to the resulting array applies it to each member
Hamelin can do this automatically because it is type-aware. It knows that
comparing equality between array(string) and string makes more sense to
broadcast: an array can never be equal to a string, but a member of an
array(string) might be.
🔀 Semi-Structured Types
Hamelin lets you parse json into instances of the variant type. This helps you
handle semi-structured data that doesn't fit nicely into fixed schemas. You can
parse JSON strings, access their fields, and convert them to more structured
types. This makes working with JSON feel fairly native.
// Parse JSON strings into the variant type
FROM logs
| SET event_data = parse_json(raw_json)
// Access nested fields using dot notation
| WHERE event_data.level AS string == 'ERROR'
// Access json array elements with index notation
| SET first_tag = event_data.tags[0]
// Cast variant data to structured types when you need type safety.
// Values that do not match will be null.
| SET user_info = event_data.user AS {id: int, name: string}
🚨 Excellent Error Messages
Hamelin provides clear, helpful error messages. Error messages point directly to the problematic Hamelin code and explain exactly what went wrong, rather than showing cryptic messages about generated SQL.
This matters especially when AI assistants write queries. AI tools need precise descriptions of errors to fix queries and complete tasks. Clear error messages let AI assistants debug queries effectively by giving the context needed to correct mistakes.
FROM simba.sysmon_events
| AGG count() BY host.hostname
| SET hostname = lower(host.hostname)
generates the error
Error: problem doing translation
╭─[ :3:24 ]
│
3 │ | SET hostname = lower(host.hostname)
│ ──┬─
│ ╰─── error while translating
│
│ Note: unbound field reference: host
│
│ the following entries in the environment are close:
│ - `host.hostname` (you must actually wrap with ``)
───╯
Here, the user has forgotten to escape an identifier that contains a dot character.
FROM simba.sysmon_events
| WINDOW count(),
all(winlog.event_data.events)
BY host.hostname
generates the error
Error: problem doing translation
╭─[ :3:10 ]
│
3 │ all(winlog.event_data.events)
│ ──────────────┬─┬────────────
│ ╰──────────────── could not find a matching function definition
│ │
│ ╰────────────── variant
│
│ Note: Attempted all(x=boolean)
│ - Type mismatch for x: expected boolean, got variant
│
│ Attempted all(x=array(boolean))
│ - Type mismatch for x: expected array(boolean), got variant
│
───╯
Here, the user has forgotten to cast variant to a primitive type so that it can be matched against the function call. (A future version of Hamelin will probably coerce this automatically!)