# percentile

> This page explains how to use the percentile aggregation function in APL.

The `percentile` aggregation function in Axiom Processing Language (APL) allows you to calculate the value below which a given percentage of data points fall. It’s particularly useful when you need to analyze distributions and want to summarize the data using specific thresholds, such as the 90th or 95th percentile. This function can be valuable in performance analysis, trend detection, or identifying outliers across large datasets.

You can apply the `percentile` function to various use cases, such as analyzing log data for request durations, OpenTelemetry traces for service latencies, or security logs to assess risk patterns.

<Note>
  The `percentile` aggregation in APL is a statistical aggregation that returns estimated results. The estimation comes with the benefit of speed at the expense of accuracy. This means that `percentile` is fast and light on resources even on a large or high-cardinality dataset, but it doesn’t provide precise results.
</Note>

## For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

<AccordionGroup>
  <Accordion title="Splunk SPL users">
    In Splunk SPL, the `percentile` function is referred to as `perc` or `percentile`. APL's `percentile` function works similarly, but the syntax is different. The main difference is that APL requires you to explicitly define the column on which you want to apply the percentile and the target percentile value.

    <CodeGroup>
      ```sql Splunk example
      | stats perc95(req_duration_ms)
      ```

      ```kusto APL equivalent
      ['sample-http-logs']
      | summarize percentile(req_duration_ms, 95)
      ```
    </CodeGroup>
  </Accordion>

  <Accordion title="ANSI SQL users">
    In ANSI SQL, you might use the `PERCENTILE_CONT` or `PERCENTILE_DISC` functions to compute percentiles. In APL, the `percentile` function provides a simpler syntax while offering similar functionality.

    <CodeGroup>
      ```sql SQL example
      SELECT PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY req_duration_ms) FROM sample_http_logs;
      ```

      ```kusto APL equivalent
      ['sample-http-logs']
      | summarize percentile(req_duration_ms, 95)
      ```
    </CodeGroup>
  </Accordion>
</AccordionGroup>

## Usage

### Syntax

```kusto
percentile(column, percentile)
```

### Parameters

* **column**: The name of the column to calculate the percentile on. This must be a numeric field.
* **percentile**: The target percentile value (between 0 and 100).

### Returns

The function returns the value from the specified column that corresponds to the given percentile.

## Use case examples

<Tabs>
  <Tab title="Log analysis">
    In log analysis, you can use the `percentile` function to identify the 95th percentile of request durations, which gives you an idea of the tail-end latencies of requests in your system.

    **Query**

    ```kusto
    ['sample-http-logs']
    | summarize percentile(req_duration_ms, 95)
    ```

    [Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22%5B%27sample-http-logs%27%5D%20%7C%20summarize%20percentile%28req_duration_ms%2C%2095%29%22%7D)

    **Output**

    | percentile\_req\_duration\_ms |
    | ----------------------------- |
    | 1200                          |

    This query calculates the 95th percentile of request durations, showing that 95% of requests take less than or equal to 1200ms.
  </Tab>

  <Tab title="OpenTelemetry traces">
    For OpenTelemetry traces, you can use the `percentile` function to identify the 90th percentile of span durations for specific services, which helps to understand the performance of different services.

    **Query**

    ```kusto
    ['otel-demo-traces']
    | where ['service.name'] == 'checkoutservice'
    | summarize percentile(duration, 90)
    ```

    [Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22%5B%27otel-demo-traces%27%5D%20%7C%20where%20%5B%27service.name%27%5D%20%3D%3D%20%27checkoutservice%27%20%7C%20summarize%20percentile%28duration%2C%2090%29%22%7D)

    **Output**

    | percentile\_duration |
    | -------------------- |
    | 300ms                |

    This query calculates the 90th percentile of span durations for the `checkoutservice`, helping to assess high-latency spans.
  </Tab>

  <Tab title="Security logs">
    In security logs, you can use the `percentile` function to calculate the 99th percentile of response times for a specific set of status codes, helping you focus on outliers.

    **Query**

    ```kusto
    ['sample-http-logs']
    | where status == '500'
    | summarize percentile(req_duration_ms, 99)
    ```

    [Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22%5B%27sample-http-logs%27%5D%20%7C%20where%20status%20%3D%3D%20%27500%27%20%7C%20summarize%20percentile%28req_duration_ms%2C%2099%29%22%7D)

    **Output**

    | percentile\_req\_duration\_ms |
    | ----------------------------- |
    | 2500                          |

    This query identifies that 99% of requests resulting in HTTP 500 errors take less than or equal to 2500ms.
  </Tab>
</Tabs>

## List of related aggregations

* [**avg**](/apl/aggregation-function/avg): Use `avg` to calculate the average of a column, which gives you the central tendency of your data. In contrast, `percentile` provides more insight into the distribution and tail values.
* [**min**](/apl/aggregation-function/min): The `min` function returns the smallest value in a column. Use this when you need the absolute lowest value instead of a specific percentile.
* [**max**](/apl/aggregation-function/max): The `max` function returns the highest value in a column. It’s useful for finding the upper bound, while `percentile` allows you to focus on a specific point in the data distribution.
* [**stdev**](/apl/aggregation-function/stdev): `stdev` calculates the standard deviation of a column, which helps measure data variability. While `stdev` provides insight into overall data spread, `percentile` focuses on specific distribution points.