Write Evaluations skill for AI agents

Last week we've added the Write Evaluations skill for AI agents and made significant performance improvements to how Axiom renders query results, improved trace viewing for long-running traces, and enhanced the review experience in AI Engineering.

Write Evaluations skill for AI agents

In case you've missed it: we've launched a dedicated AI engineering toolkit, a lens over generative AI machine data built for teams shipping AI products. Alongside rich telemetry and AI-native trace visualization, the toolkit now includes a full evaluation framework: offline evals to catch regressions before you deploy, and online evals to score live production traffic continuously.

The Write Evaluations skill builds on that foundation. It turns AI agents into evaluation suite authors for AI capabilities. Install it in Claude Code, Cursor, Amp, Codex, or any compatible agent. Your AI assistant can then discover available evaluations, compose and execute evaluations against Axiom's AI Engineering dataset, and iterate on results, all from natural language prompts like "Write an evaluation for the support agent's message categorization function."

Install all Axiom skills at once:

npx skills add axiomhq/skills

For agent-specific setup instructions, see the documentation.

Faster event rendering

We've made significant performance improvements to how Axiom renders query results. Event rows now render up to 4x faster, computed results and the events table are more responsive, and full-page re-renders no longer trigger on simple search parameter changes.

These improvements are most noticeable when exploring large result sets or working with datasets that have many fields.

Smarter trace viewing for long-running traces

Previously, trace span loading used a fixed ±1-hour time window, which meant spans in traces longer than two hours could be missed entirely. Axiom now uses an adaptive approach: it starts with the fast ±1-hour window for immediate results, then simultaneously checks a wider 24-hour window. If additional spans are found, the view updates automatically.

This means you can now view complete traces that span up to 24 hours without any extra steps — especially useful if you're running long AI pipelines or batch processing workloads.

More of our favorite changes

Monitor types on the empty state placeholder are now clickable, navigating directly to monitor creation with the selected type.
Numeric column headers in query results are now right-aligned for better readability.
Field selections in the query results table now persist across queries.
Dashboards default to dark theme in Axiom Playground.