The make_set_if aggregation function in APL allows you to create a set of distinct values from a column based on a condition. You can use this function to aggregate values that meet specific criteria, helping you filter and reduce data to unique entries while applying a conditional filter. This is especially useful when analyzing large datasets to extract relevant, distinct information without duplicates.

You can use make_set_if in scenarios where you need to aggregate conditional data points, such as log analysis, tracing information, or security logs, to summarize distinct occurrences based on particular conditions.

For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

Usage

Syntax

make_set_if(column, predicate, [max_size])

Parameters

  • column: The column from which distinct values will be aggregated.
  • predicate: A condition that filters the values to be aggregated.
  • [max_size]: (Optional) Specifies the maximum number of elements in the resulting set. If omitted, the default is 1048576.

Returns

The make_set_if function returns a dynamic array of distinct values from the specified column that satisfy the given condition.

Use case examples

In this use case, you’re analyzing HTTP logs and want to get the distinct cities from which requests originated, but only for requests that took longer than 500 ms.

Query

['sample-http-logs']
| summarize make_set_if(['geo.city'], req_duration_ms > 500) by ['method']

Run in Playground

Output

methodmake_set_if_geo.city
GET[‘New York’, ‘San Francisco’]
POST[‘Berlin’, ‘Tokyo’]

This query returns the distinct cities from which requests took more than 500 ms, grouped by HTTP request method.

  • make_list_if: Similar to make_set_if, but returns a list that can include duplicates instead of a distinct set.
  • make_set: Aggregates distinct values without a conditional filter.
  • countif: Counts rows that satisfy a specific condition, useful for when you need to count rather than aggregate distinct values.