We’ve added support for offline evaluations (evals) in the AI engineering workflow and improved alerting for data availability this week. These updates allow you to evaluate AI capabilities and stay informed when your data starts flowing again.
Offline evals for AI engineering
Evals are systematic tests that measure how well your AI features perform by automatically running your AI code against test collections and evaluating the results using custom scorers.
Instead of manually testing AI outputs or relying on anecdotal checks, evals provide a data-driven approach to:
- Catch regressions before they reach production
- Compare different models, prompts, or approaches
- Track quality improvements over time
- Ensure capabilities meet your quality benchmarks
The evaluation framework uses a declarative Eval function that lets you define test suites directly in your codebase with ground truth data, scoring functions, and configurable flags for experimentation. The Axiom AI SDK captures detailed OpenTelemetry traces for each evaluation run, allowing you to analyze results in depth.
See the blog post or learn more about setting up and running evals in the documentation.
Alerts when data returns after no-data state
Monitors with Alert on no data enabled now send notifications when data returns after a no-data alert state. Previously, you only received alerts when data stopped flowing, but not when it resumed. Now you get notified both when data disappears and when it comes back, giving you complete visibility into your data availability.
More of our favorite changes
- Resolved problem with saving monitor edits that was preventing changes from being persisted
- Improved bar chart hover interactions and visual feedback
- Fixed incorrect percentage display for unitless metrics
- Resolved query toolbar button overflow issues that could hide controls
- Fixed failed queries in timeline view that were causing display issues
- Improved ability to save views even when there are no events in the selected time range
- Enhanced authentication error handling for better user experience