Skip to main content

Type expansion

Hamelin expands types when you query multiple datasets. If you write FROM events, logs, Hamelin creates an expanded type that includes all fields from both sources. This lets you write queries that work across datasets without worrying about schema differences.

How type expansion works

Hamelin constructs expanded types by combining field names from all source types. Fields with the same name get aligned into a single field in the expanded type. This lets you write queries that work across datasets with consistent field naming. As an example, consider searching across different authentication logs with varying schemas:

FROM security_logs, audit_logs, access_logs
| WHERE action = 'login' OR event_type = 'authentication'
| SELECT timestamp, user_id, source_ip, action, event_type, session_id

Each logging system has its own schema:

Security logs track authentication attempts with IP addresses:

{timestamp: timestamp, user_id: string, source_ip: string, action: string}

Audit logs capture detailed session information:

{timestamp: timestamp, user_id: string, event_type: string, session_id: string}

Access logs record basic user activity:

{timestamp: timestamp, user_id: string, source_ip: string}

The expanded result type becomes:

{
timestamp: timestamp,
user_id: string,
source_ip: string,
action: string,
event_type: string,
session_id: string
}

Rows from security_logs will have NULL for the event_type and session_id fields. Rows from audit_logs will have NULL for the source_ip and action fields. Rows from access_logs will have NULL for the action, event_type, and session_id fields.

The result contains rows like:

timestampuser_idsource_ipactionevent_typesession_id
2024-01-15 10:30:00alice.smith192.168.1.100loginNULLNULL
2024-01-15 10:31:00alice.smithNULLNULLauthenticationsess_abc123
2024-01-15 10:32:00alice.smith192.168.1.100NULLNULLNULL

All three datasets contribute to the same timestamp and user_id fields because they use identical field names. You can filter and select on shared fields without knowing which source contributed each row. This lets you write a single query to search for authentication events across all systems, even though each system logs different fields. The expanded type accommodates all possible fields, and you can filter on any field that exists in any source.

Nested type expansion

Type expansion works with nested structures. Hamelin expands the type hierarchy to accommodate nested fields from different sources. As an example, consider combining user data from different systems:

FROM user_profiles, account_settings
| SELECT user.name, user.email, user.preferences

Each system has its own nested user structure:

User profiles contain basic identity information:

{user: {name: string, email: string}}

Account settings store user preferences:

{user: {preferences: string}}

The expanded result type becomes:

{user: {name: string, email: string, preferences: string}}

This means you can access user.name from profile data and user.preferences from settings data in the same query, even though the original sources have different nested structures.

note

Hamelin maintains stable field ordering when merging nested structures. Fields from the first source appear first, then fields from the second source are added in their original order. This consistent ordering means you can rely on the structure of expanded types being predictable across queries.

Array literal expansion

Type expansion also happens when you create array literals containing struct types with different schemas. Just like FROM clauses, Hamelin creates an expanded type that accommodates all fields from every struct in the array. As an example, consider creating an array mixing user records with different available fields:

SET mixed_users = [
{name: 'Alice', age: 30, department: 'Engineering'},
{name: 'Bob', email: '[email protected]', age: 25},
{name: 'Carol', email: '[email protected]', department: 'Sales'}
]

Each struct has its own schema:

The first user record has name, age, and department:

{name: string, age: number, department: string}

The second user record has name, email, and age:

{name: string, email: string, age: number}

The third user record has name, email, and department:

{name: string, email: string, department: string}

The expanded array type becomes:

[{name: string, age: number, department: string, email: string}]

Each element gets NULL values for missing fields. The first element has NULL for email. The second element has NULL for department. The third element has NULL for age. This lets you create arrays from structs with different schemas while maintaining type consistency across all elements.

You can then query the expanded array just like any other dataset:

SET mixed_users = [
{name: 'Alice', age: 30, department: 'Engineering'},
{name: 'Bob', email: '[email protected]', age: 25},
{name: 'Carol', email: '[email protected]', department: 'Sales'}
]
| UNNEST mixed_users

The query works across all elements regardless of which fields were originally present in each struct. Missing fields appear as NULL in the results, just like with FROM clause expansion.

The results would be:

nameagedepartmentemail
Alice30EngineeringNULL
Bob25NULL[email protected]
CarolNULLSales[email protected]