WINDOW: aggregating over sliding windows

The WINDOW command creates aggregations that slide across windows of data. This lets you calculate running totals, moving averages, and time-based metrics without losing the detail of individual events. Each row gets its own calculation based on a sliding window of related rows around it.

Window calculations

The WINDOW command supports two main types of calculations that operate on the sliding window of data. Aggregation functions like count(), sum(), avg(), min(), and max() calculate summary statistics across all rows in the current window frame.

FROM sales
| WINDOW
    total_sales = sum(amount),
    avg_sale = avg(amount),
    sale_count = count()
  BY region
  WITHIN -7d

This creates a 7-day rolling summary for each region, showing total sales, average sale amount, and number of sales within the sliding window.

Window-specific functions like row_number(), rank(), dense_rank(), and lag() analyze the position and relationships between rows within the window without aggregating the data.

FROM events
| WINDOW
    event_number = row_number(),
    event_rank = rank(),
    previous_value = lag(score, 1)
  BY user_id
  SORT timestamp

This assigns sequence numbers, ranks events by timestamp order, and shows the previous score value for each user's events.

You can combine multiple calculations in a single WINDOW command, and each calculation receives the same set of rows determined by the window frame, but produces different analytical results based on its specific function behavior:

FROM metrics
| WINDOW
    recent_count = count(),
    running_total = sum(value),
    current_rank = row_number(),
    percentile_rank = percent_rank()
  BY service
  SORT timestamp
  WITHIN -1hr

This example mixes aggregation functions (count(), sum()) with window-specific functions (row_number(), percent_rank()) to create comprehensive analytics for each service within a 1-hour sliding window.

When explicit names aren't provided for window calculations, Hamelin automatically generates field names from expressions. Learn more about this in Automatic Field Names.

WINDOW command parts

The WINDOW command has three optional clauses that control how the sliding window behaves. Each clause serves a specific purpose in defining which data gets included in each calculation.

WINDOW calculations
  BY grouping_fields     // optional: partitions data
  SORT ordering_fields   // optional: defines row order
  WITHIN frame_range     // optional: defines window size

BY clause: partitioning data

The BY clause divides data into separate groups, with each group getting its own independent sliding window. This lets you create per-user, per-host, or per-category calculations without mixing data across different entities.

With BY fields specified: Hamelin creates separate windows for each unique combination of those fields. This partitioning ensures that calculations for different users, devices, or categories remain completely independent. Each partition maintains its own window state, preventing data from different entities from interfering with each other. Here's how to create separate counting windows for each user:

FROM events
| WINDOW count()
  BY user_id

Without BY fields: Hamelin treats all data as one big group. This creates a single window that processes all events together, regardless of their source or category. The calculation accumulates across every row in the dataset, which proves useful for global metrics or overall trend analysis. This example creates one counting window that includes all events:

FROM events
| WINDOW count()

SORT clause: ordering rows

The SORT clause controls the order of rows within each window partition. This ordering determines which rows come "before" and "after" each row, affecting functions like row_number() and defining the direction of the sliding window.

With SORT specified: Hamelin uses the explicit ordering. The sort order determines which rows come before and after each current row in the window calculation. When you want to analyze transactions by value rather than time, you can sort by amount to create value-based rankings and running totals:

FROM transactions
| WINDOW
    running_total = sum(amount),
    transaction_rank = rank()
  BY account_id
  SORT amount DESC

Without SORT specified: Hamelin automatically orders by event timestamp. This chronological ordering makes sense for most time-series analysis where you want to track how metrics evolve over time. The automatic timestamp ordering eliminates the need to explicitly specify time-based sorting in typical analytical scenarios. This example creates a chronological sequence count for each user:

FROM events
| WINDOW event_sequence = count()
  BY user_id

WITHIN clause: defining the window frame

The WITHIN clause controls how much data gets included in the window around each row.

With WITHIN specified: Hamelin uses the explicit frame size. This sliding frame moves with each row, always maintaining the specified time period or row count. When you need to count events within a specific time window, you can specify the exact duration:

FROM events
| WINDOW count()
  BY user_id
  WITHIN -1hr

For each event, this counts all events for that user in the hour leading up to that event's timestamp.

Without WITHIN specified: Hamelin uses ..0r (from the beginning of the partition up to the current row). This default behavior creates cumulative calculations that include all rows from the start of each partition up to the current row. The cumulative approach works well for running totals, progressive counts, and other metrics that should include all historical data. This example creates a running count for each user from their first event:

FROM events
| WINDOW cumulative_count = count()
  BY user_id

Window frames

The WITHIN clause accepts different types of frame specifications that control how much data gets included around each row. Frame specifications determine whether the window slides based on time intervals, specific row counts, or bounded ranges between two points. Understanding these frame types lets you create exactly the sliding behavior you need for different analytical scenarios.

Value-based frames

Intervals like -5min or -1hr create sliding windows based on the values in the sorted field. Because the most common sort order is by timestamp, these frames typically create time-based windows that slide through data chronologically. The window maintains a consistent value range (usually time duration) regardless of how many events occur within that period. Value-based frames work particularly well for temporal metrics like monitoring system performance or analyzing user activity patterns over fixed time periods.

FROM metrics
| WINDOW avg_cpu = avg(cpu_usage)
  BY hostname
  WITHIN -5min

For each metric record, this calculates the average CPU usage for that host over the 5 minutes leading up to that metric's timestamp.

You can also use value-based frames with non-timestamp fields when the data is sorted by those values. This example calculates running statistics for orders based on order amounts, looking at orders within a $5 range of each current order:

FROM orders
| WINDOW
    nearby_orders = count(),
    avg_nearby_amount = avg(amount)
  BY customer_id
  SORT amount
  WITHIN -5

Row-based frames

Row counts create windows based on a specific number of surrounding rows rather than time periods. This approach proves valuable when you need consistent sample sizes for statistical calculations or when events occur at irregular intervals. Row-based frames ensure that each calculation includes exactly the specified number of data points, making comparisons more reliable across different time periods. This example counts events and assigns sequence numbers using a 4-row window (current row plus 3 preceding rows):

FROM events
| WINDOW
    recent_events = count(),
    event_sequence = row_number()
  BY user_id
  SORT timestamp
  WITHIN -3r

Range frames

Range frames like -2hr..-1hr create windows between two specific offsets from the current row. This capability lets you analyze data from specific time periods without including the current time period in the calculation. Range frames prove particularly useful for lag analysis, where you want to compare current metrics against historical periods, or when you need to exclude recent data that might be incomplete. This example counts events from the hour that ended one hour before each current event:

FROM events
| WINDOW previous_hour_count = count()
  BY user_id
  WITHIN -2hr..-1hr

Range frames can be unbounded by omitting one end of the range. An unbounded range like -2hr.. creates a window that extends infinitely in one direction from a starting point. This technique proves useful when you want all data from a specific threshold forward, such as calculating cumulative metrics that start counting only after a certain time delay. Unbounded ranges help create progressive totals that begin from meaningful starting points rather than the very beginning of the dataset. This example counts all events for each user starting from 2 hours before each current event:

FROM events
| WINDOW cumulative_count = count()
  BY user_id
  WITHIN -2hr..

Warning: Interval frames like -1hr create sliding windows, while unbounded ranges like -1hr.. include all future rows in the dataset. The unbounded version creates a massive window instead of the sliding window you typically want for temporal analysis.

Window calculations​

WINDOW command parts​

BY clause: partitioning data​

SORT clause: ordering rows​

WITHIN clause: defining the window frame​

Window frames​

Value-based frames​

Row-based frames​

Range frames​