Skip to main content

Lineage sub-struct

Hamelin tracks where data comes from when you assign results to struct fields in FROM or MATCH clauses. This lineage tracking lets you correlate events from different sources while maintaining visibility into which upstream source contributed each piece of data. You can reference this lineage information to build complex pattern detection queries.

How lineage sub-struct works

Hamelin creates a composite record that preserves the source of each piece of data when you assign query results to struct fields. This happens automatically when you use assignment syntax in FROM or MATCH clauses. As an example, consider tracking both failed and successful login events:

FROM failed = failed_logins, success = successful_logins
| WINDOW failures = count(failed),
successes = count(success),
total = count()
BY user.id
WITHIN -5m

The failed = failed_logins assignment creates a struct field that gets populated for events from the failed logins source, while success = successful_logins creates another struct field that gets populated for events from the successful logins source. Events from failed_logins will have the failed field populated and success as NULL. Events from successful_logins will have the success field populated and failed as NULL. Hamelin maintains this lineage information throughout the query pipeline.

Accessing lineage data

You can reference the assigned struct fields directly in queries. The field names become available for filtering, aggregation, and selection:

FROM failed = security_alerts, success = login_events
| WHERE failed.severity > 'medium' OR success.user_id IS NOT NULL
| SELECT failed.alert_type, success.login_time, failed.source_ip

Each event gets lineage tags that indicate which source it came from. Events from security_alerts will have the failed field populated with their data and success as NULL. Events from login_events will have the success field populated with their data and failed as NULL. This lets you access any field from the original data while knowing exactly which source contributed each event.

Pattern correlation with lineage

Lineage tracking enables sophisticated event correlation patterns. As an example, consider detecting brute force attacks by correlating failed attempts with eventual successes:

DEF failed_logins = FROM events
| WHERE event.action == 'login_failed';

DEF successful_logins = FROM events
| WHERE event.action == 'login_success';

FROM failed = failed_logins, success = successful_logins
| WINDOW failures = count(failed),
successes = count(success),
total = count()
BY user.id
WITHIN -5m
| WHERE successes >= 1 && failures / total > 0.2
| SELECT user.id,
failed_count = failures,
success_count = successes,
failure_rate = failures / total,

This query correlates two distinct event patterns within sliding windows. The lineage tracking lets you distinguish events by source - events from failed_logins have the failed struct populated, while events from successful_logins have the success struct populated. You can then access source-specific fields and aggregate based on event lineage.

MATCH clause lineage

The MATCH command also supports lineage tracking when you assign pattern results to struct fields. As an example, consider detecting brute force patterns that span multiple login attempts:

DEF failed_logins = FROM events
| WHERE event.action == 'login_failed';

DEF successful_logins = FROM events
| WHERE event.action == 'login_success';

MATCH failed_logins = failed_logins{10,}, successful_logins = successful_logins+ WITHIN 10m
| AGG failed_count = count(failed_logins),
success_count = count(successful_logins),
first_failed_ip = min(failed_logins.source_ip),
success_duration = max(successful_logins.timestamp) - min(successful_logins.timestamp)
BY user_id

This pattern detects sequences where at least 10 failed login attempts are followed by one or more successful logins, with the entire pattern completing within a 10-minute window. The assignments (failed_logins = and successful_logins =) create lineage tags that identify which pattern each event matched. Events matching the failed login pattern have the failed_logins struct populated, while events matching the successful login pattern have the successful_logins struct populated. The AGG command then operates on these lineage-tagged events to calculate metrics specific to each pattern type. The count(failed_logins) aggregation counts only events that matched the failed login pattern, while count(successful_logins) counts only events that matched the successful login pattern. Similarly, min(failed_logins.source_ip) accesses the source_ip field specifically from events in the failed login pattern, and the timestamp calculations work with the timestamp field from events in the successful login pattern.

Benefits of lineage tracking

Lineage sub-struct provides several key advantages for complex data analysis. You can correlate events from multiple sources while maintaining clear attribution of where each piece of data originated. This eliminates confusion in queries where data might come from multiple upstream sources with similar field names.

The feature also enables pattern detection across different event types. You can write queries that aggregate and filter across multiple event patterns while accessing specific fields from each pattern type. This supports use cases like security monitoring, user behavior analysis, and system performance correlation.