This page explains how to use the histogram aggregation function in APL.
histogram
aggregation in APL allows you to create a histogram that groups numeric values into intervals or “bins.” This is useful for visualizing the distribution of data, such as the frequency of response times, request durations, or other continuous numerical fields. You can use it to analyze patterns and trends in datasets like logs, traces, or metrics. It is especially helpful when you need to summarize a large volume of data into a digestible form, providing insights on the distribution of values.
The histogram
aggregation is ideal for identifying peaks, valleys, and outliers in your data. For example, you can analyze the distribution of request durations in web server logs or span durations in OpenTelemetry traces to understand performance bottlenecks.
histogram
aggregation in APL is a statistical aggregation that returns estimated results. The estimation comes with the benefit of speed at the expense of accuracy. This means that histogram
is fast and light on resources even on a large or high-cardinality dataset, but it doesn’t provide precise results.Splunk SPL users
histogram
is the timechart
or histogram
command, which groups events into time buckets. However, in APL, the histogram
function focuses on numeric values, allowing you to control the number of bins precisely.ANSI SQL users
GROUP BY
clause combined with range calculations to achieve a similar result to APL’s histogram
. However, APL’s histogram
function simplifies the process by automatically calculating bin intervals.numeric_field
: The numeric field to create a histogram for. For example, request duration or span duration.number_of_bins
: The number of bins (intervals) to use for grouping the numeric values.histogram
aggregation returns a table where each row represents a bin, along with the number of occurrences (counts) that fall within each bin.
histogram
aggregation to analyze the distribution of request durations in web server logs.Queryreq_duration_ms_bin | count |
---|---|
0 | 50 |
100 | 200 |
200 | 120 |
percentile
when you need to find the specific value below which a percentage of observations fall, which can provide more precise distribution analysis.avg
for calculating the average value of a numeric field, useful when you are more interested in the central tendency rather than distribution.sum
function adds up the total values in a numeric field, helpful for determining overall totals.count
when you need a simple tally of rows or events, often in conjunction with histogram
for more basic summarization.