The topk aggregation in Axiom Processing Language (APL) allows you to identify the top k results based on a specified field. This is especially useful when you want to quickly analyze large datasets and extract the most significant values, such as the top-performing queries, most frequent errors, or highest latency requests.

Use topk to find the most common or relevant entries in datasets, especially in log analysis, telemetry data, and monitoring systems. This aggregation helps you focus on the most important data points, filtering out the noise.

The topk aggregation in APL is estimated. The estimation comes with the benefit of speed at the expense of accuracy. This means that topk is fast and light on resources even on a large or high-cardinality dataset, but it doesn’t provide the most accurate results.

For completely accurate results, use the top operator.

For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

Usage

Syntax

topk(field, k)

Parameters

  • field: The field or expression to rank the results by.
  • k: The number of top results to return.

Returns

A subset of the original dataset with the top k values based on the specified field.

Use case examples

When analyzing HTTP logs, you can use the topk function to find the top 5 most frequent HTTP status codes.

Query

['sample-http-logs'] 
| summarize topk(status, 5)

Run in Playground

Output

statuscount_
2001500
404400
500200
301150
302100

This query groups the logs by HTTP status and returns the 5 most frequent statuses.

  • top: Returns the top values based on a field without requiring a specific number of results (k), making it useful when you’re unsure how many top values to retrieve.
  • sort: Orders the dataset based on one or more fields, which is useful if you need a complete ordered list rather than the top k values.
  • extend: Adds calculated fields to your dataset, which can be useful in combination with topk to create custom rankings.
  • count: Aggregates the dataset by counting occurrences, often used in conjunction with topk to find the most common values.