# join

> This page explains how to use the join operator in APL.

The `join` operator in Axiom Processing Language (APL) combines rows from two datasets based on matching values in specified columns. Use `join` to correlate data from different sources or datasets, such as linking logs to traces or enriching logs with additional metadata.

This operator is useful when you want to:

* Combine information from two datasets with shared keys.
* Analyze relationships between different types of events.
* Enrich existing data with supplementary details.

<Note>
  The `join` operator is currently in public preview. For more information, see [Feature states](/platform-overview/roadmap#feature-states).

  The preview of the `join` operator works with variable limits depending on the structure of your dataset. For the left side of the join, the limit is 50,000 rows when the dataset has fewer than 100 fields. This limit decreases linearly as the field count increases. For example, the limit is 25,000 rows when your dataset has 200 fields, 12,500 rows at 400 fields, and 10,000 rows at more than 500 fields. The right side of the join has a consistent limit of 50,000 rows.
</Note>

## Kinds of join

The kinds of join and their typical use cases are the following:

* `inner` (default): Returns rows where the join conditions exist in both datasets. All matching rows from the right dataset are included for each matching row in the left dataset. Useful to retain all matches without limiting duplicates.
* `innerunique`:  Matches rows from both datasets where the join conditions exist in both. For each row in the left dataset, only the first matching row from the right dataset is returned. Optimized for performance when duplicate matching rows on the right dataset are irrelevant.
* `leftouter`:  Returns all rows from the left dataset. If a match exists in the right dataset, the matching rows are included; otherwise, columns from the right dataset are `null`. Retains all data from the left dataset, enriching it with matching data from the right dataset.
* `rightouter`: Returns all rows from the right dataset. If a match exists in the left dataset, the matching rows are included; otherwise, columns from the left dataset are `null`. Retains all data from the right dataset, enriching it with matching data from the left dataset.
* `fullouter`: Returns all rows from both datasets. Matching rows are combined, while non-matching rows from either dataset are padded with `null` values. Combines both datasets while retaining unmatched rows from both sides.
* `leftanti`: Returns rows from the left dataset that have no matches in the right dataset. Identifies rows in the left dataset that don’t have corresponding entries in the right dataset.
* `rightanti`: Returns rows from the right dataset that have no matches in the left dataset. Identifies rows in the right dataset that don’t have corresponding entries in the left dataset.
* `leftsemi`: Returns rows from the left dataset that have at least one match in the right dataset. Only columns from the left dataset are included. Filters rows in the left dataset based on existence in the right dataset.
* `rightsemi`: Returns rows from the right dataset that have at least one match in the left dataset. Only columns from the right dataset are included. Filters rows in the right dataset based on existence in the left dataset.

<Note>
  The preview of the `join` operator currently only supports `inner` join. Support for other kinds of join is coming soon.
</Note>

### Summary of kinds of join

| Kind of join  | Behavior                                                              | Matches returned                   |
| ------------- | --------------------------------------------------------------------- | ---------------------------------- |
| `inner`       | All matches between left and right datasets                           | Multiple matches allowed           |
| `innerunique` | First match for each row in the left dataset                          | Only unique matches                |
| `leftouter`   | All rows from the left, with matching rows from the right or `null`   | Left-dominant                      |
| `rightouter`  | All rows from the right, with matching rows from the left or `null`   | Right-dominant                     |
| `fullouter`   | All rows from both datasets, with unmatched rows padded with `null`   | Complete join                      |
| `leftanti`    | Rows in the left dataset with no matches in the right dataset         | No matches                         |
| `rightanti`   | Rows in the right dataset with no matches in the left dataset         | No matches                         |
| `leftsemi`    | Rows in the left dataset with at least one match in the right dataset | Matching rows (left dataset only)  |
| `rightsemi`   | Rows in the right dataset with at least one match in the left dataset | Matching rows (right dataset only) |

### Choose the right kind of join

* Use `inner` for standard joins where you need all matches.
* Use `leftouter` or `rightouter` when you need to retain all rows from one dataset.
* Use `leftanti` or `rightanti` to find rows that don’t match.
* Use `fullouter` for complete combinations of both datasets.
* Use `leftsemi` or `rightsemi` to filter rows based on existence in another dataset.

## For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

<AccordionGroup>
  <Accordion title="Splunk SPL users">
    The `join` operator in APL works similarly to the `join` command in Splunk SPL. However, APL provides additional flexibility by supporting various join types (for example, `inner`, `outer`, `leftouter`). Splunk uses a single default join type.

    <CodeGroup>
      ```sql Splunk example
      index=logs | join type=inner [search index=traces]
      ```

      ```kusto APL equivalent
      ['sample-http-logs']
      | join kind=inner ['otel-demo-traces'] on id == trace_id
      ```
    </CodeGroup>
  </Accordion>

  <Accordion title="ANSI SQL users">
    The `join` operator in APL resembles SQL joins but uses distinct syntax. SQL uses `FROM` and `ON` clauses, whereas APL uses the `join` operator with explicit `kind` and `on` clauses.

    <CodeGroup>
      ```sql SQL example
      SELECT *
      FROM logs
      JOIN traces
      ON logs.id = traces.trace_id
      ```

      ```kusto APL equivalent
      ['sample-http-logs']
      | join kind=inner ['otel-demo-traces'] on id == trace_id
      ```
    </CodeGroup>
  </Accordion>
</AccordionGroup>

## Usage

### Syntax

```kusto
LeftDataset
| join kind=KindOfJoin RightDataset on Conditions
```

### Parameters

* `LeftDataset`: The first dataset, also known as the outer dataset or the left side of the join. If you expect one of the datasets to contain consistently less data than the other, specify the smaller dataset as the left side of the join.
* `RightDataset`: The second dataset, also known as the inner dataset or the right side of the join.
* `KindOfJoin`: Optionally, the [kind of join](#kinds-of-join) to perform.
* `Conditions`: The conditions for matching rows. The conditions are equality expressions that determine how Axiom matches rows from the `LeftDataset` (left side of the equality expression) with rows from the `RightDataset` (right side of the equality expression). The two sides of the equality expression must have the same data type.
  * To join datasets on a field that has the same name in the two datasets, simply use the field name. For example, `on id`.
  * To join datasets on a field that has different names in the two datasets, define the two field names in an equality expression such as `on id == trace_id`.
  * You can use expressions in the join conditions. For example, to compare two fields of different data types, use `on id_string == tostring(trace_id_int)`.
  * You can define multiple join conditions. To separate conditions, use commas (`,`). Don’t use `and`. For example, `on id == trace_id, span == span_id`.

### Returns

The `join` operator returns a new table containing rows that match the specified join condition. The fields from the left and right datasets are included.

## Use case example

Join HTTP logs with trace data to correlate user activity with performance metrics.

**Query**

```kusto
['otel-demo-traces']
| join kind=inner ['otel-demo-logs'] on trace_id
```

**Output**

| \_time     | trace\_id | span\_id | service.name | duration |
| ---------- | --------- | -------- | ------------ | -------- |
| 2024-12-01 | trace123  | span123  | frontend     | 500ms    |

This query links user activity in HTTP logs to trace data to investigate performance issues.

## List of related operators

* [union](/apl/tabular-operators/union-operator): Combines rows from multiple datasets without requiring a matching condition.
* [where](/apl/tabular-operators/where-operator): Filters rows based on conditions, often used with `join` for more precise results.