Lineage sub-struct
Hamelin tracks where data comes from when you assign results to struct fields
in FROM or MATCH clauses. This lineage tracking lets you correlate events
from different sources while maintaining visibility into which upstream source
contributed each piece of data. You can reference this lineage information to
build complex pattern detection queries.
How lineage sub-struct works
Hamelin creates a composite record that preserves the source of each piece of
data when you assign query results to struct fields. This happens automatically
when you use assignment syntax in FROM or MATCH clauses. As an example,
consider tracking both failed and successful login events:
FROM failed = failed_logins, success = successful_logins
| WINDOW failures = count(failed),
successes = count(success),
total = count()
BY user.id
WITHIN -5m
The failed = failed_logins assignment creates a struct field that gets
populated for events from the failed logins source, while success = successful_logins creates another struct field that gets populated for events
from the successful logins source. Events from failed_logins will have the
failed field populated and success as NULL. Events from successful_logins
will have the success field populated and failed as NULL. Hamelin maintains
this lineage information throughout the query pipeline.
Accessing lineage data
You can reference the assigned struct fields directly in queries. The field names become available for filtering, aggregation, and selection:
FROM failed = security_alerts, success = login_events
| WHERE failed.severity > 'medium' OR success.user_id IS NOT NULL
| SELECT failed.alert_type, success.login_time, failed.source_ip
Each event gets lineage tags that indicate which source it came from. Events
from security_alerts will have the failed field populated with their data
and success as NULL. Events from login_events will have the success field
populated with their data and failed as NULL. This lets you access any field
from the original data while knowing exactly which source contributed each
event.
Pattern correlation with lineage
Lineage tracking enables sophisticated event correlation patterns. As an example, consider detecting brute force attacks by correlating failed attempts with eventual successes:
DEF failed_logins = FROM events
| WHERE event.action == 'login_failed';
DEF successful_logins = FROM events
| WHERE event.action == 'login_success';
FROM failed = failed_logins, success = successful_logins
| WINDOW failures = count(failed),
successes = count(success),
total = count()
BY user.id
WITHIN -5m
| WHERE successes >= 1 && failures / total > 0.2
| SELECT user.id,
failed_count = failures,
success_count = successes,
failure_rate = failures / total,
This query correlates two distinct event patterns within sliding windows. The
lineage tracking lets you distinguish events by source - events from
failed_logins have the failed struct populated, while events from
successful_logins have the success struct populated. You can then access
source-specific fields and aggregate based on event lineage.
MATCH clause lineage
The MATCH command also supports lineage tracking when you assign pattern
results to struct fields. As an example, consider detecting brute force
patterns that span multiple login attempts:
DEF failed_logins = FROM events
| WHERE event.action == 'login_failed';
DEF successful_logins = FROM events
| WHERE event.action == 'login_success';
MATCH failed_logins = failed_logins{10,}, successful_logins = successful_logins+ WITHIN 10m
| AGG failed_count = count(failed_logins),
success_count = count(successful_logins),
first_failed_ip = min(failed_logins.source_ip),
success_duration = max(successful_logins.timestamp) - min(successful_logins.timestamp)
BY user_id
This pattern detects sequences where at least 10 failed login attempts are
followed by one or more successful logins, with the entire pattern completing
within a 10-minute window. The assignments (failed_logins = and successful_logins =) create
lineage tags that identify which pattern each event matched. Events matching the
failed login pattern have the failed_logins struct populated, while events
matching the successful login pattern have the successful_logins struct
populated. The AGG command then operates on these lineage-tagged events to
calculate metrics specific to each pattern type. The count(failed_logins)
aggregation counts only events that matched the failed login pattern, while
count(successful_logins) counts only events that matched the successful login
pattern. Similarly, min(failed_logins.source_ip) accesses the source_ip
field specifically from events in the failed login pattern, and the timestamp
calculations work with the timestamp field from events in the successful login
pattern.
Benefits of lineage tracking
Lineage sub-struct provides several key advantages for complex data analysis. You can correlate events from multiple sources while maintaining clear attribution of where each piece of data originated. This eliminates confusion in queries where data might come from multiple upstream sources with similar field names.
The feature also enables pattern detection across different event types. You can write queries that aggregate and filter across multiple event patterns while accessing specific fields from each pattern type. This supports use cases like security monitoring, user behavior analysis, and system performance correlation.