Aggregation Functions
Functions that operate on groups of rows to produce summary values when used with AGG or WINDOW commands.
count([expression])
Returns the number of rows in each group during aggregation.
Parameters
- expression (optional) - When provided, only counts rows where this expression evaluates to a non-null value
Description
The count() function returns the number of rows in each group when you use it
with AGG or WINDOW commands. Unlike SQL, Hamelin uses count() rather than
count(*). When no grouping is specified, it counts all rows in the result.
When you provide an expression parameter, count() only counts rows where that
expression evaluates to a non-null value. This lets you count specific
conditions without using separate filtering.
When you omit an explicit field name, Hamelin automatically generates the
field name count() or count(expression) that you can reference using
backticks in subsequent commands.
count_distinct(x)
Counts the number of distinct values in each group.
Parameters
- x - Expression of any type to count distinct values for
Description
The count_distinct() function counts only unique values within each group,
ignoring duplicates. Null values are not counted. This provides the same
functionality as SQL's COUNT(DISTINCT x) but with cleaner syntax. For
example, counting distinct user IDs shows how many unique users performed
actions, regardless of how many actions each user performed.
approx_distinct(x)
Returns an approximate count of distinct values in each group.
Parameters
- x - Expression of any type to count distinct values for
Description
The approx_distinct() function provides an approximate count of distinct
values using probabilistic algorithms that are much faster and use less
memory than exact counting, especially for high-cardinality data. The result
is typically accurate within a few percent. This is particularly useful when
you need fast estimates for large datasets where exact precision isn't critical.
count_if(condition)
Counts the number of rows where a condition is true in each group.
Parameters
- condition - Boolean expression that must evaluate to true for the row to be counted
Description
The count_if() function counts only rows where the specified condition
evaluates to true. This provides a more concise alternative to using WHERE
clauses or case() expressions for conditional counting.
When you omit an explicit field name, Hamelin automatically generates the
field name count_if(condition) that you can reference using backticks in
subsequent commands.
sum(x)
Returns the sum of all values in each group.
Parameters
- x - Numeric expression to sum
Description
The sum() function adds up all non-null values of the specified expression
within each group. If all values are null, it returns null. The result type
matches the input type for exact numeric types.
avg(x)
Returns the average (arithmetic mean) of all values in each group.
Parameters
- x - Numeric expression to average
Description
The avg() function calculates the arithmetic mean of all non-null values
within each group. If all values are null, it returns null. The result is
always a floating-point type regardless of input type.
stddev(x)
Returns the standard deviation of all values in each group.
Parameters
- x - Numeric expression to calculate standard deviation for
Description
The stddev() function calculates the sample standard deviation of all
non-null values within each group. If there are fewer than two non-null
values, it returns null. The result is always a floating-point type.
approx_percentile(x, percentile)
Returns an approximate percentile value for the specified expression.
Parameters
- x - Numeric expression to calculate percentile for
- percentile - Numeric value between 0.0 and 1.0 representing the desired percentile
Description
The approx_percentile() function calculates an approximate percentile using
efficient algorithms suitable for large datasets. The percentile parameter
should be between 0.0 (minimum) and 1.0 (maximum). For example, 0.5 returns
the median, 0.95 returns the 95th percentile.
min(x)
Returns the minimum value in each group.
Parameters
- x - Expression of numeric, string, or timestamp type
Description
The min() function finds the smallest value within each group. It works with
numeric types (returning the numerically smallest), strings (lexicographic
ordering), and timestamps (chronologically earliest). If all values are null,
it returns null.
max(x)
Returns the maximum value in each group.
Parameters
- x - Expression of numeric, string, or timestamp type
Description
The max() function finds the largest value within each group. It works with
numeric types (returning the numerically largest), strings (lexicographic
ordering), and timestamps (chronologically latest). If all values are null,
it returns null.
any_value(x)
Returns an arbitrary value from each group.
Parameters
- x - Expression of any type to select a value from
Description
The any_value() function returns an arbitrary non-null value from each group.
This is useful when you're grouping data and need to include a field that's
not part of the grouping criteria, where you know all values in the group are
the same or where you don't care which specific value is selected. Unlike
min() or max(), this function doesn't impose any ordering overhead, making
it more efficient when you just need any representative value from the group.
array_agg(x)
Collects all values in each group into an array.
Parameters
- x - Expression of any type to collect into an array
Description
The array_agg() function creates an array containing all non-null values
from the specified expression within each group. The order of elements in the
resulting array follows the order specified by any SORT clause in the
aggregation command. If there are no non-null values, it returns an empty array.
map_agg(key, value)
Collects key-value pairs in each group into a map.
Parameters
- key - Expression of any type to use as map keys
- value - Expression of any type to use as map values
Description
The map_agg() function creates a map from key-value pairs within each group.
If the same key appears multiple times, only the last value is retained. The
order of processing follows any SORT clause in the aggregation command.
multimap_agg(key, value)
Collects key-value pairs in each group into a map where each key maps to an array of values.
Parameters
- key - Expression of any type to use as map keys
- value - Expression of any type to collect into arrays
Description
The multimap_agg() function creates a map where each unique key maps to an
array of all values associated with that key within each group. This preserves
all values for duplicate keys, unlike map_agg() which keeps only the last value.
any(x)
Returns true if any value in the group is true (logical OR aggregation).
Parameters
- x - Boolean expression to test
Description
The any() function performs logical OR aggregation on boolean values within
each group. It returns true if at least one value is true, false if all values
are false, and null if all values are null.
all(x)
Returns true if all values in the group are true (logical AND aggregation).
Parameters
- x - Boolean expression to test
Description
The all() function performs logical AND aggregation on boolean values within
each group. It returns true if all values are true, false if at least one
value is false, and null if all values are null.