Close the loop: User feedback for AI capabilities

Direct signal from production: Collect thumbs up/down, ratings, comments, and implicit signals like regenerations or copies
Linked to traces: Every feedback event connects to the AI trace that produced the output, so you can see exactly what happened
Aggregate and prioritize: Spot quality trends across capabilities, filter by feedback type, and focus engineering effort on high-impact issues
Closes the improvement loop: Production feedback surfaces issues that evaluations miss, feeding back into your test suite

Offline evaluations catch regressions before deployment. You run your capability against curated test cases, score the outputs, and ship with confidence that you haven’t broken what was working.

But offline evaluations only test what you thought to test. They can’t catch the edge cases you didn’t anticipate, the user inputs you didn't imagine, or the subtle quality degradations that accumulate over time. For those, you need signal from the people actually using your AI capability in production.

This is why every AI product with staying power has a feedback mechanism. Thumbs up, thumbs down, optional comments. It’s become ubiquitous because it works. Users tell you when something breaks. The challenge is connecting that signal to action.

Today we're releasing user feedback capture for AI capabilities in Axiom. Purpose-built tooling to collect feedback from end users, link it to the traces that show what your capability did, and surface patterns that guide improvement.

The gap between evaluation and production

Teams running continuous improvement cycles have a workflow that looks roughly like this: capture production traces, have domain experts review failures, turn those failures into test cases, build evaluations, and repeat.

But there’s a bootstrapping problem. How do you know which traces to review? Random sampling catches some issues, but it’s inefficient. You’re looking for needles in a haystack of successful interactions.

User feedback helps address this. When someone clicks thumbs down or writes “this answer was completely wrong,” you’ve found a needle. That trace goes to the top of the review queue. A domain expert examines what happened, documents the failure, and the trace becomes a test case that ensures the issue stays fixed.

Feedback is lightweight, high-volume, and direct user signal. It complements the deep qualitative insight you get from expert annotation with broad coverage you couldn’t achieve through sampling alone.

How it works

Axiom’s feedback system has two parts: a client SDK that captures feedback and links it to traces, and Console views that surface patterns and connect feedback to the AI behavior that produced it.

Capturing feedback

On the server, your AI capability runs inside withSpan, which creates a trace. You extract the trace and span IDs and return them to the client alongside your AI response.

import { withSpan } from "axiom/ai";
import type { FeedbackLinks } from "axiom/ai/feedback";

async function handleQuestion(input: string) {
  return await withSpan(
    { capability: "support-agent", step: "respond" },
    async (span) => {
      const response = await generateResponse(input);

      const links: FeedbackLinks = {
        traceId: span.spanContext().traceId,
        spanId: span.spanContext().spanId,
        capability: "support-agent",
      };

      return { response, links };
    }
  );
}

On the client, you initialize a feedback client and send feedback events when users interact with your UI. The feedback is linked to the trace, so you can always see what the AI did when a user gave their rating.

import { createFeedbackClient, Feedback } from "axiom/ai/feedback";

const { sendFeedback } = createFeedbackClient({
  token: process.env.AXIOM_FEEDBACK_TOKEN,
  dataset: process.env.AXIOM_FEEDBACK_DATASET,
  url: process.env.AXIOM_URL,
});

// User clicks thumbs down
await sendFeedback(
  links,
  Feedback.thumbDown({
    name: "response-quality",
    message: "The answer was incorrect",
  })
);

The SDK supports multiple feedback types for different signals:

Type	Description	Example use
`thumb`	Thumbs up (`+1`) or down (`-1`)	Response quality rating
`number`	Numeric value	Star rating or relevance score
`bool`	Boolean `true`/`false`	“Was this helpful?”
`text`	Free-form string	User comments
`enum`	Constrained string	Issue category selection
`signal`	Event occurred, no value	User copied response or user regenerated

Signals are particularly useful for implicit feedback. When a user copies your AI response, that’s a positive signal. When they regenerate, that’s often negative. These behavioral cues provide volume that explicit ratings can’t match.

Analyzing feedback in Console

Feedback events flow into a dedicated view in Axiom’s AI engineering tab. You see a table of feedback events with the feedback name, value, message, timestamp, and a link to the associated trace.

Feedback events listed with values, messages, and direct links to traces.

Filter by feedback name to focus on specific signals. When you select a name, a chart appears showing feedback trends over time. A spike in negative feedback after a deployment tells you something changed. A gradual improvement after a fix confirms it landed.

The key interaction is clicking through to the trace. When a user reports a problem, you want to see exactly what your capability did: the prompts, the completions, the tool calls, the retrieved context. One click from a feedback event takes you to the full AI trace in Axiom’s waterfall view and a handy conversation viewer. You’re not guessing what went wrong. You’re seeing it.

Click a feedback event to open the detail panel, which shows all fields including any metadata you attached. From there, “View trace” takes you to the associated AI trace where you can inspect every step of your capability’s execution.

From feedback to improvement

Collecting feedback is the starting point. The value comes from what you do with it.

The immediate use is triage. Negative feedback surfaces traces worth investigating. Your team reviews them, understands what went wrong, and fixes the issue. This is reactive, but it’s reactive to real user problems rather than hypothetical ones.

The deeper use is building your evaluation suite from production reality. When you find a trace where your capability failed, document what should have happened and add it to a collection. That failure becomes a test case. Your evaluation suite grows from real-world edge cases rather than examples you imagined during development.

Over time, patterns emerge. If you’re categorizing feedback (using enum types or text analysis), you can aggregate failures into themes. Maybe 30% of negative feedback mentions hallucinated features. Maybe a specific user cohort reports more issues than others. These patterns direct engineering effort to high-impact problems.

This is the continuous improvement loop that world-class AI engineering teams run. Production traces and user feedback reveal issues. Domain experts annotate what went wrong. Annotations become test cases. Evaluations verify fixes. The cycle continues. Feedback is the signal that makes the loop turn.

The bigger picture

User feedback is one piece of a larger system we’re building for AI engineering in Axiom.

We started with observability: rich telemetry capture for prompts, completions, tool calls, and costs. Then offline evaluations: systematic testing against curated collections before deployment. Now feedback capture: direct signal from production that surfaces issues evaluations miss.

What’s coming next:

Review workflows: Give domain experts a workspace to annotate flagged traces and document failures in structured ways
Online evaluations: Run scorers against live production traffic for real-time quality monitoring

The goal is a system where production insights directly strengthen your test coverage, and your evaluation results directly inform what to ship. Feedback is the bridge that connects what users experience to what you measure and improve.

Get started

User feedback is available now in Axiom’s AI SDK and Console.

Read the documentation for setup and integration patterns
Explore the SDK for the full feedback API

The teams shipping the most reliable AI capabilities aren’t just testing before deployment. They’re listening after it. That practice is now native to Axiom.

#LAUNCHEDStop guessing. Ship AI products with confidenceLearn more→

#PLATFORM

Observability

Distributed traces

Volumetric logging

Application performance monitoring

Infrastructure monitoring

AI Engineering

AI workflow tracing

AI SDK & telemetry

Long‑term active retention

Evaluation & experimentation

#LATEST

Latest from the blog

#SIGNALS

Features

Logs

Traces

Metrics

AI

#ARCHITECTURE

#TECHNOLOGIES

Technologies

OpenTelemetry

Events API

Vercel & AI SDK

Cloudflare

#INGEST_FROM_ANYWHERE

#CHANGELOG

See what’s new at Axiom

#GET_STARTED

Documentation

Axiom Playground

Axiom CLI

Support

#COMPANY

Blog

Changelog

About us

Careers

#NEWS

From burden to asset: reimagining logs at scale

Close the loop: User feedback for AI capabilities

The gap between evaluation and production

How it works

Capturing feedback

Analyzing feedback in Console

From feedback to improvement

The bigger picture

Get started

More posts

Catch what tests miss: Online evaluations for AI capabilities

Read more→

Teaching AI to speak Splunk, then proving it works

Read more→

2025 recap: Reflections on building data infrastructure for the AI era

Read more→

Get started with Axiom