union operator in APL allows you to combine the results of two or more queries into a single output. The operator is useful when you need to analyze or compare data from different datasets or tables in a unified manner. By using union, you can merge multiple sets of records, keeping all data from the source tables without applying any aggregation or filtering.
The union operator is particularly helpful in scenarios like log analysis, tracing OpenTelemetry events, or correlating security logs across multiple sources. You can use it to perform comprehensive investigations by bringing together information from different datasets into one query.
Union of two datasets
To understand how theunion operator works, consider these datasets:
Server requests
| _time | status | method | trace_id |
|---|---|---|---|
| 12:10 | 200 | GET | 1 |
| 12:15 | 200 | POST | 2 |
| 12:20 | 503 | POST | 3 |
| 12:25 | 200 | POST | 4 |
| _time | trace_id | message |
|---|---|---|
| 12:12 | 1 | foo |
| 12:21 | 3 | bar |
| 13:35 | 27 | baz |
Server requests and Application logs would result in a new dataset with all the rows from both DatasetA and DatasetB.
A union of requests and logs would produce the following result set:
| _time | status | method | trace_id | message |
|---|---|---|---|---|
| 12:10 | 200 | GET | 1 | |
| 12:12 | 1 | foo | ||
| 12:15 | 200 | POST | 2 | |
| 12:20 | 503 | POST | 3 | |
| 12:21 | 3 | bar | ||
| 12:25 | 200 | POST | 4 | |
| 13:35 | 27 | baz |
For users of other query languages
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.Splunk SPL users
Splunk SPL users
In Splunk SPL, the
append command works similarly to the union operator in APL. Both operators are used to combine multiple datasets. However, while append in Splunk typically adds one dataset to the end of another, APL’s union merges datasets while preserving all records.ANSI SQL users
ANSI SQL users
In ANSI SQL, the
UNION operator performs a similar function to the APL union operator. Both are used to combine the results of two or more queries. However, SQL’s UNION removes duplicates by default, whereas APL’s union keeps all rows unless you use union with=kind=unique.Usage
Syntax
Parameters
T1, T2, T3, ...: Tables or query results you want to combine into a single output.withsource: Optional, adds a field to the output where each value specifies the source dataset of the row. Specify the name of this additional field inFieldName.
Returns
Theunion operator returns all rows from the specified tables or queries. If fields overlap, they are merged. Non-overlapping fields are retained in their original form.
Use case examples
- Log analysis
- OpenTelemetry traces
- Security logs
In log analysis, you can use the Output
This query combines two datasets (HTTP logs and security logs) and filters the combined data to show only those entries where the HTTP status code is 500.
union operator to combine HTTP logs from different sources, such as web servers and security systems, to analyze trends or detect anomalies.Query| _time | id | status | uri | method | geo.city | geo.country | req_duration_ms |
|---|---|---|---|---|---|---|---|
| 2024-10-17 12:34:56 | user123 | 500 | /api/login | GET | London | UK | 345 |
| 2024-10-17 12:35:10 | user456 | 500 | /api/update-profile | POST | Berlin | Germany | 123 |
Other examples
Basic union
This example combines all rows fromgithub-push-event and github-pull-request-event without any transformation or filtering.
Filter after union
This example combines the datasets, and then filters the data to only include rows where themethod is GET.
Aggregate after union
This example combines the datasets and summarizes the data, counting the occurrences of each combination ofcontent_type and actor.
Filter and project specific data from combined log sources
This query combines GitHub pull request event logs and GitHub push events, filters by actions made bygithub-actions[bot], and displays key event details such as time, repository, commits, head , id.
Union with field removing
This example removes thecontent_type and commits field in the datasets sample-http-logs and github-push-event before combining the datasets.
Filter after union
This example performs a union and then filters the resulting set to only include rows where themethod is GET.
Union with order by
After the union, the result is ordered by thetype field.
Union with joint conditions
This example performs a union and then filters the resulting dataset for rows wherecontent_type contains the letter a and city is seattle.
Union and count unique values
After the union, the query calculates the number of uniquegeo.city and repo entries in the combined dataset.
Union using withsource
The example below returns the union of all datasets that match the patterngithub* and counts the number of events in each.
Best practices for the union operator
To maximize the effectiveness of the union operator in APL, here are some best practices to consider:- Before using the
unionoperator, ensure that the fields being merged have compatible data types. - Use
projectorproject-awayto include or exclude specific fields. This can improve performance and the clarity of your results, especially when you only need a subset of the available data.