Learn how to iterate on your AI capabilities by using production data and evaluation scores to drive improvements.
The iteration workflow described here is in active development. Axiom is working with design partners to shape what’s built. Contact Axiom to get early access and join a small group of teams shaping these tools.
The Iterate stage is where the Rudder workflow comes full circle. It’s the process of taking the real-world performance data from the Observe stage and the quality benchmarks from the Measure stage, and using them to make concrete improvements to your AI . This creates a cycle of continuous, data-driven enhancement.
Identifying opportunities for improvement
Iteration begins with insight. The telemetry you gather while observing your capability in production is a goldmine for finding areas to improve. By analyzing traces in the Axiom Console, you can:
- Find real-world user inputs that caused your capability to fail or produce low-quality output.
- Identify high-cost or high-latency interactions that could be optimized.
- Discover common themes in user feedback that point to systemic weaknesses.
These examples can be used to create a new, more robust of data for offline testing.
Testing changes against ground truth
Prompt
object, you need to verify that it’s actually an improvement. The best way to do this is to run an “offline evaluation”—testing your new version against the same ground truth collection you used in the Measure stage.
The Axiom Console will provide views to compare these evaluation runs side-by-side:
- A/B Comparison Views: See the outputs of two different prompt versions for the same input, making it easy to spot regressions or improvements.
- Leaderboards: Track evaluation scores across all versions of a capability to see a clear history of its quality over time.
This ensures you can validate changes with data before they ever reach your users.
Deploying with confidence
Once you’re satisfied with the challenger’s performance, you can promote it to become the new “champion” using the SDK’s deploy
function.
What’s next?
By completing the Iterate stage, you have closed the loop. Your improved capability is now in production, and you can return to the Observe stage to monitor its performance and identify the next opportunity for improvement.
This cycle of creating, measuring, observing, and iterating is the core of the Rudder workflow, enabling you to build better AI systems, backed by data.