May 28, 2025

#product, #engineering

What building AI features taught us about the future of observability

Author
Dominic Chapman

Head of Product

Last month, we did something unusual at Axiom. We put our entire company—engineers, designers, salespeople, technical support, everyone—into small teams and gave them one week to build with AI. No lengthy planning cycles. No months of requirements gathering. Just pure, focused building.

The result? Three production-ready AI features that have rolled out to customers, and a treasure trove of insights about what it really takes to build AI features with confidence.

But here’s the thing: while many are talking about AI’s potential, we discovered that the real challenge isn’t getting AI to work—it’s knowing when it’s working well.

The features we’re announcing today

Let’s start with what you can use right now:

Natural language querying

Press Cmd/Ctrl+K in Axiom’s Query editor and describe what your goals in plain English. Watch as Axiom generates the corresponding APL (Axiom Processing Language) query for you.

This isn’t just about making queries easier (though it definitely does that). It’s about democratizing access to your observability data. New team members can start exploring data immediately, while experienced users can shape complex queries faster.

Dashboard generation

Creating dashboards from scratch is like staring at a blank canvas—it can be overwhelming and time-consuming. Now, select any dataset and let Axiom generate an entire dashboard in seconds.

Behind the scenes, Axiom analyzes your recent events and dataset schema to produce relevant queries and valuable dashboard elements. The feedback is showing it’s giving teams a massive head start. Think of it as having an experienced engineer sketch out your monitoring strategy, which you can then refine.

Smart query naming

Small improvements compound. Every time you save a query, Axiom now suggests a descriptive name automatically. No more “Logs query” cluttering your saved queries.

The story behind AI week

These features didn’t emerge from a typical product roadmap. They came from an experiment in radical focus and hands-on learning in our “AI week”.

The premise was simple: Form small teams (2-4 people) across all departments. Pick a feature or process that could be augmented or replaced with AI. Build and evaluate a functioning demo within one week.

Neil, our CEO, said:

We want everybody to develop empathy for AI practitioners through hands-on experience. We want you to develop concrete understanding of key concepts.

Ten teams participated. Projects ranged from the practical (the features we’re announcing today) to the ambitious (an AI that analyzes distributed distributed traces to identify root cause in a multi-turn fashion). Some worked. Some didn’t. All taught us something valuable.

What we learned about building AI features

1. Effectiveness first, efficiency second

The teams that succeeded followed a counterintuitive pattern: they started manually by documenting a repeated human process. Before writing any code, they validated their ideas using popular services like ChatGPT or Claude, copying and pasting data, manually checking outputs.

Only after proving the AI could solve the problem did they build automation around it. This approach—watching how a human expert completes a task, then replicating it with AI—turned out to be crucial.

The lesson? Until you can describe what effective looks like, there’s no point optimizing for efficiency.

2. Evaluation is everything

Here’s what separates AI features that ship from those that stall: systematic evaluation. Without it, we’re stuck in “vibe coding”—making changes based on intuition rather than data.

The most successful teams built evaluation frameworks including:

  • Fixture-based testing for their prompts
  • Rubrics to track performance across multiple dimensions
  • Curated examples of good outputs for given inputs

3. Observability for AI is different

Traditional monitoring asks “Is it up?” and “How fast is it?” AI observability asks harder questions:

  • Which step in a multi-step workflow failed?
  • How much did that conversation cost?
  • Is output quality degrading over time?
  • Are we seeing out-of-distribution inputs?

We learned that every LLM invocation needs to capture:

  • Input/output pairs for debugging
  • Latency and cost for optimization
  • Intermediate steps for complex workflows
  • User feedback beyond simple thumbs up and thumbs down

4. Context is your competitive advantage

Generic LLMs are powerful, but domain-specific context makes them valuable. The teams that built the most impressive features were those that effectively marshalled Axiom-specific knowledge:

  • Dataset schemas and recent events for dashboard generation
  • Common query patterns for natural language processing
  • Naming conventions for query suggestions

Building an “address book” of relevant context sources—and managing their quality—turned out to be as important as prompt engineering.

5. AI workflows compound error

When you chain AI calls together (analyze error → extract keywords → search logs → summarize findings), errors multiply.

This mathematical reality forced us to think differently about complex AI features. Instead of building monolithic prompts, we need observable, testable components that we can improve independently.

What this means for Axiom’s future

These three features are just the beginning. Through AI Week, we experienced firsthand what our customers building AI features face every day: the challenge of shipping AI with confidence.

We’ve also been spending time with some of the most forward-thinking companies on the planet—watching them rethink entire product categories through generative AI and learning from their processes. They’re all hitting the same challenges:

  • How do you evaluate if an AI feature is actually working?
  • How do you manage the context and data that powers these systems?
  • How do you observe and debug complex AI workflows?
  • How do you capture meaningful feedback to drive improvements?

Here’s our belief: As AI features evolve from simple prompts to sophisticated workflows to fully agentic systems, engineering teams need infrastructure that grows with them.

Axiom has always been about making observability accessible to developers. Now, we’ll be extending that mission to AI engineering. Our data infrastructure—built for capturing, storing, and analyzing events at scale—is uniquely positioned to help teams understand system behavior with AI.

Try it today

The three AI features we’ve announced are available now:

  • Natural language querying: Press Cmd/Ctrl+K in the Query editor
  • Dashboard generation: Click Generate Dashboard when viewing any dataset
  • Smart query naming: Save any query and see the magic happen

AI features are enabled by default for most organizations. If you don’t see them, you can enable them in your organization settings. As a reminder: AI features in Axiom are powered by leading foundation models through trusted enterprise providers including Amazon Bedrock and Google Gemini. Your inputs and outputs are never used to train generative models.

But, more importantly, if you’re building AI features yourself and struggling with evaluation, observability, or confidence—we want to hear from you. We’re creating a tight-knit group of AI builders to shape what’s next in Axiom. Connect with our team by emailing support@axiom.co and tell us more about what you’re building with AI.

Share
Get started with Axiom

Learn how to start ingesting, streaming, and querying data into Axiom in less than 10 minutes.