Evals for AI engineering

Mano TothSenior Technical Writer

December 5, 2025

We’ve added support for offline evaluations (evals) in the AI engineering workflow and improved alerting for data availability this week. These updates allow you to evaluate AI capabilities and stay informed when your data starts flowing again.

Offline evals for AI engineering

Evals are systematic tests that measure how well your AI features perform by automatically running your AI code against test collections and evaluating the results using custom scorers.

Instead of manually testing AI outputs or relying on anecdotal checks, evals provide a data-driven approach to:

Catch regressions before they reach production
Compare different models, prompts, or approaches
Track quality improvements over time
Ensure capabilities meet your quality benchmarks

The evaluation framework uses a declarative Eval function that lets you define test suites directly in your codebase with ground truth data, scoring functions, and configurable flags for experimentation. The Axiom AI SDK captures detailed OpenTelemetry traces for each evaluation run, allowing you to analyze results in depth.

See the blog post or learn more about setting up and running evals in the documentation.

Alerts when data returns after no-data state

Monitors with Alert on no data enabled now send notifications when data returns after a no-data alert state. Previously, you only received alerts when data stopped flowing, but not when it resumed. Now you get notified both when data disappears and when it comes back, giving you complete visibility into your data availability.

More of our favorite changes

Improved bar chart hover interactions and visual feedback
Improved ability to save views even when there are no events in the selected time range
Enhanced authentication error handling for better user experience

#LAUNCHEDStop guessing. Ship AI products with confidenceLearn more→

#PLATFORM

Observability

Distributed traces

Volumetric logging

Application performance monitoring

Infrastructure monitoring

AI Engineering

AI workflow tracing

AI SDK & telemetry

Long‑term active retention

Evaluation & experimentation

#LATEST

Latest from the blog

#SIGNALS

Features

Logs

Traces

Metrics

AI

#ARCHITECTURE

#TECHNOLOGIES

Technologies

OpenTelemetry

Events API

Vercel & AI SDK

Cloudflare

#INGEST_FROM_ANYWHERE

#CHANGELOG

See what’s new at Axiom

#GET_STARTED

Documentation

Axiom Playground

Axiom CLI

Support

#COMPANY

Blog

Changelog

About us

Careers

#NEWS

From burden to asset: reimagining logs at scale

Evals for AI engineering

Offline evals for AI engineering

Alerts when data returns after no-data state

More of our favorite changes

Bug fixes