Exploring Traces

Distributed tracing in Axiom allows you to observe how requests propagate through your distributed systems. This could involve a user request going through several microservices, and resources until the requested information is retrieved and returned. By tracing these requests, you're able to understand the interactions between these microservices, pinpoint issues, understand latency, and trace the life of the request through your application's architecture.

Traces and Spans

  1. Trace: A trace is a representation of a single operation or transaction as it moves through a system. A trace is made up of multiple spans.

  2. Span: Each span represents a logical unit of work in the system with a start and end time. For example, an HTTP request handling process might be a span.

Each span includes metadata like unique identifiers (trace_id and span_id), start & end times, parent-child relationships with other spans, and optional events, logs, or other details to help describe the span's operation.

Trace Schema Overview

FieldTypeDescription
trace_idStringUnique identifier for a trace
span_idStringUnique identifier for a span within a trace
parent_span_idStringIdentifier of the parent span
nameStringName of the span e.g. the operation
kindStringType of the span (e.g., client, server, producer)
durationTimespanDuration of the span
errorbooleanWhether this span contains an error
status.codeStringStatus of the span (e.g. null, OK, error)
status.messageStringStatus message of the span
attributesObjectKey-value pairs providing additional metadata
eventsArrayTimestamped events associated with the span
linksArrayLinks to related spans or external resources
resourceObjectInformation about the source of the span

Below we explore the various ways Axiom can be used to analyze and interrogate your trace data from simple overviews to complex queries.

Browsing traces with the OpenTelemetry App

The Axiom OpenTelemetry app automatically detects any OpenTelemetry trace data flowing into your datasets and publishes dashboards that let you easily browse your trace data:

OpenTelemetry Traces app

OpenTelemetry Traces app

Navigating the App

  • Use the Filter Bar at the top of the app to narrow the charts to a specific service or operation.
  • Use the Search Input to find a trace id in the selected time period.
  • Use the Slowest Operations chart to identify performance issues across services and traces.
  • Use the Top Errors list to quickly identify the worst-offending causes of errors.
  • Use the Results table to get an overview and navigate between services, operations, and traces.

Viewing a Trace

Clicking on any trace id in a results table will show the "trace waterfall" view which will allow you to see that span in context of the entire trace from start to finish.

OpenTelemetry Traces app

Customizing the App

Should you want to customize the app to your own liking, use the fork button at any time to duplicate an editable version for you and your team.


Querying Traces

In Axiom, trace events are just like any other events inside datasets. This means they are directly queryable in the UI. While this is can be a powerful experience, it is important to note some important details to consider before querying:

  • Directly aggregating upon the duration field will produce aggregate values across every span in the dataset. This is usually not the desired outcome when wanting to inspect a service's performance or robustness.

  • For request, rate, and duration aggregations, it's best to only include the root span, which is as easy as using isnull(parent_span_id)


The Waterfall Traces View

Axiom provides a view for inspecting traces in a waterfall with each span in the trace correlated with it's parent and child spans:

OpenTelemetry Traces app

The trace waterfall is accessible when the query is executed on a dataset with trace data and when the _time and trace_id fields are present in the results.

OpenTelemetry Traces app

Example Queries

Below are a collection of queries that can help get you started with traces inside Axiom. Queries are all executable on the Axiom Play sandbox.

Number of requests, avg response

['otel-demo-traces']
| where isnull(parent_span_id)
| summarize count(),
            avg(duration),
            percentiles_array(duration, 95, 99, 99.9)
  by bin_auto(_time)

Top five slowest services by operation

['otel-demo-traces']
| summarize count(), avg(duration) by name
| sort by avg_duration desc
| limit 5

Top 5 errors per service and operation

['otel-demo-traces']
| summarize topk(['status.message'], 5) by ['service.name'], name
| limit 5

Was this page helpful?