Designing MCP servers for wide schemas and large result sets

Neil Jagdish PatelCEO / Co-founder
October 7, 2025

At Axiom, we live and breathe data at a massive scale. Our customers send us petabytes of logs, traces, and events daily - data that is often incredibly wide, regularly spanning thousands of fields with deeply nested JSON. While this is a goldmine for observability and analytics, it poses a fascinating challenge for AI assistants and other clients that talk to our MCP server (GitHub).

The problem is simple: if your server is too chatty - sending verbose payloads with repetitive keys and uncapped results - the client pays a steep price in tokens and latency. For the end user, this translates to a frustrating experience: slower answers and hitting context limits faster, which means fewer useful steps in a single session.

That's why we designed our MCP server to prioritize compact contexts from the start. The idea is straightforward: empower any client to handle more queries and think more sharply without constant manual tweaks. In the sections below, I'll walk through the key design choices that made the biggest difference for us.

Optimizing the tabular result format

With any MCP best practice guide there is a lot of focus on not just wrapping REST APIs and instead thinking deeper about the use-cases and tuning the tool list for those use cases.

While that’s very relevant advice, usually somewhere behind the tool call is an API and there isn’t as much discussion on how to format responses from the tool calls to the client. We knew this was going to be an issue for us due to the reasons discussed earlier.

And so our first, and arguably biggest, win came from rethinking our default format for tabular data. JSON is the lingua franca of APIs for a reason - it's human-readable, machine-parseable, and self-describing. But for tabular results, it's a real chatterbox. Keys are repeated on every single row, and the overhead from brackets and commas adds up surprisingly fast.

LLMs don't need all that ceremony to understand a table. In most cases, a clean set of column names followed by the values is more than enough - especially since the user rarely sees the raw payload anyway.

Let's look at a small, real-world example: five log events with six fields. First, the JSON, pretty-printed so we can see what's going on.

[
  {
    "time": "2025-09-29T12:00:01Z",
    "service": "api-gateway",
    "level": "error",
    "message": "upstream timeout contacting user service",
    "user_id": "u_18392",
    "duration_ms": 1203
  },
  {
    "time": "2025-09-29T12:00:02Z",
    "service": "api-gateway",
    "level": "info",
    "message": "retrying request to user service",
    "user_id": "u_18392",
    "duration_ms": 87
  },
  {
    "time": "2025-09-29T12:00:02Z",
    "service": "billing-worker",
    "level": "warn",
    "message": "invoice total missing tax_id field",
    "user_id": "u_99801",
    "duration_ms": 342
  },
  {
    "time": "2025-09-29T12:00:03Z",
    "service": "auth",
    "level": "error",
    "message": "jwt expired for session",
    "user_id": "u_77110",
    "duration_ms": 15
  },
  {
    "time": "2025-09-29T12:00:04Z",
    "service": "auth",
    "level": "info",
    "message": "refreshed session token",
    "user_id": "u_77110",
    "duration_ms": 22
  }
]

Now the same in CSV:

time,service,level,message,user_id,duration_ms
2025-09-29T12:00:01Z,api-gateway,error,upstream timeout contacting user service,u_18392,1203
2025-09-29T12:00:02Z,api-gateway,info,retrying request to user service,u_18392,87
2025-09-29T12:00:02Z,billing-worker,warn,invoice total missing tax_id field,u_99801,342
2025-09-29T12:00:03Z,auth,error,jwt expired for session,u_77110,15
2025-09-29T12:00:04Z,auth,info,refreshed session token,u_77110,22

Token counts use the OpenAI o200k_base encoding via gpt-tokenizer and are exact for the strings shown. Results can vary with model and tokenizer. To be fair we minified the JSON payload.

FormatBytesApprox Tokens
JSON array (5 rows)753235
CSV (5 rows)442166

For the same 5 rows, CSV used 166 tokens and JSON used 235 tokens. That is 69 fewer tokens, about 29 percent less, or roughly 14 tokens saved per row, with identical fields and values. At this rate the savings scale linearly: about 690 tokens at 50 rows, about 1,380 at 100 rows, and about 13,800 at 1,000 rows, with the percentage reduction staying close to 29 percent assuming similar content.

Whether returning aggregations or raw events, the benefit of using CSV is clear for our MCP server, especially with multiple turns during an investigation.

And you might be thinking: doesn't this sacrifice clarity? In our experience, for tabular data, the answer is a resounding no. The kinds of models our MCP server is likely to be used with handle CSV with headers wonderfully. And for the rare cases where a client truly needs nested structures or typed objects, they can request those fields directly. We also do want to add support for more formats in the future configurable via flags (explained below).

Bounding result size with a global budget

Once we felt comfortable about the format of a single table, the next challenge was that APL (Axiom’s query language) allows returning multiple tables per query:

http-logs
| where region startswith "us-" and status_code < 400
| summarize count(), histogram(duration, 15) by bin_auto(_time), method, status_code

The query above will produce three tables:

  1. A timeseries table for counts across N buckets. N is determined automatically depending on the size of the time range e.g. could be 1m buckets if the time range was last 30 mins.
  2. A timeseries table for the histogram across N buckets
  3. A totals table which summarizes both timeseries per group

We left out a limit on the above query which means that it could be up to 50,000 groups (it can be even more via API but the MCP doesn’t support that yet). We also used bin_auto which we will talk about later but, depending on the time range there could be hundreds or thousands of buckets per timeseries too.

Our goal then is to not completely blow up the user’s context while still returning a relevant result set that can be acted upon.

For this we came up with an idea to set a global cell budget on the max number of cells we would return per result set. We prioritize totals and summary tables first, as they give the model the most important context for reasoning. Then, we distribute the remaining budget evenly across the data tables, trimming rows where necessary. Every trimmed table includes a clear, actionable note:

Showing 100 of 2,340 rows...

This keeps response sizes predictable and makes the server's behavior explicit. The model isn't left guessing; it knows it's seeing a slice and has clear instructions on how to get more.

The cell area would adapt to the shape of the result set, but it would always be bound to a max size.

Choosing the right columns when schemas are wide

A global cell budget is great, but it immediately creates a new question: when a table has thousands of fields but you only have room to show, say, 20, which 20 do you pick? A random or alphabetical selection would be simple, but it would almost certainly hide the most important information.

To solve this, we don't just truncate the data; we intelligently select the most valuable fields using a heuristic scoring system. Our server analyzes each column and gives it a score based on a few key principles:

  • Prioritize the essentials -  We start with a priority list of common, high-value field names. Columns like timestampservice.namestatus, and trace_id get an immediate head start because they are fundamental to observability.
  • Look at the data itself -  This is where it gets fun. If we have a sample of rows, we analyze them. Does this column actually contain data, or is it mostly empty? Is the data varied, or is it the same value repeated over and over? Columns with a high "fill rate" and multiple unique values get a significant boost, as they are much more likely to be informative.
  • Favor summaries - We also give extra points to shorter field names and any fields that represent aggregations (like count or avg_duration), as they tend to carry important summary information.

After scoring every field, we simply pick the top N. This intelligent selection means that even when we have to shrink a massive table, the summary the model sees is far more likely to contain the signal instead of the noise, giving it the best possible context to work with.

Cap results at the source, not just in the formatter

Trimming in formatting helps, but it doesn't save you from the overhead of computing and serializing massive results upstream. Why fetch, compute, and serialize a massive result set only to chop it down at the last second? That's wasted work. Instead, we push limits as close to the data source as possible.

This includes obvious things like applying LIMIT clauses in the query itself, but also capping automated features like aggregation bin grouping. We touched on bin_auto earlier, a feature that automatically groups time-series data into reasonable buckets. While the Axiom Console is optimized to render thousands of buckets per chart, sending that many to an MCP client is a recipe for context bloat.

Look at how quickly the payload for a simple three-field histogram grows, both in JSON and CSV formats.

BucketsJSON bytesJSON approx tokensCSV bytesCSV approx tokens
1003,4948741,109278
1550812716341

The difference is stark. A 100-bucket histogram is over 6x larger than a 15-bucket one.

To help with this we added support for a new query parameter called maxBinAutoGroups , which will not win any naming awards but does the job of reining in the auto-binner for clients like our MCP server that need to control fidelity.

Through testing we found about 15 buckets were enough to initially convey the shape of the distribution without drowning the client. If a task needs higher resolution, the client can always ask for it via bin directly.

This principle extends to time windows, too. We default to a narrow, recent window (e.g., the last hour), which is a great starting point for most operational questions. The client can always widen the window when it needs more history. The mantra is start small, expand on demand.

Keep discovery light and prompts opt-in

Even before a client asks a single question, there's a cost to establishing a connection: discovering the available tools, resources, and prompts. We call this the "idle context" size, and it can get out of hand if you're not careful.

Our server takes a configurable approach - only a minimal core toolset is exposed by default. Specialized tool families (like observability helpers) are available behind a flag. This keeps the initial handshake light and removes potential distractions for the model.

We believe this will allow us to experiment more easily with different toolsets (potentially by data shape, use-case, or integration) without bloating up the core value the MCP server brings. Expect more on this soon!

Optionality through the server URL

Our philosophy is simple: defaults should be lean, but full fidelity should always be just one parameter away. We empower clients to choose the right trade-off for their task by embedding options directly in the server URL. This creates discoverable "presets" for different use cases.

Here are a few examples to show the idea:

# expanded result set size for models with large context and great recall
https://mcp.axiom.co/mcp?max_cells=10000

# core tools plus observability tools, prompts, and resources
https://mcp.axiom.co/mcp?tools=core,otel

Wrapping up

Limitations and trade-offs

No design is without trade-offs, and it's important to be upfront about them.

  • CSV drops types and nesting. This is perfectly fine for most tabular analysis and a column-store like our own EventDB, but may not work as great if communicating structure is important
  • Global budgets create slices. The model sometimes needs to ask for more rows, but we consider that a healthy trade for predictable performance. We try and keep an eye on this in our own use of our MCP in our AI features and agents.
  • Bin caps reduce resolution. This could be an issue for certain use-cases but APL’s bin allows controlling this more directly.
  • Narrow time windows can hide older events. The model can easily request a wider range.

We prefer clear, explicit trade-offs over "magic" that might do the wrong thing. Defaults that survive real-world workloads are always better than clever defaults that fail at scale

Ideas for the MCP protocol

While we've built a system that works well for us, we think a few additions to the MCP protocol itself could make life easier for everyone and help establish "small-by-default" as the norm.

  • Budget hints: Let servers advertise soft budgets, allowing a client to say, "Give me a response that fits within 2,000 tokens."
  • Paging hints: Let clients communicate with smaller chunks of data and allow paging if necessary across the same result set.
  • Result provenance: A standard block in the response that lists which caps were applied, making it trivial for a client to know what was omitted and link to full fidelity outputs.

None of these would fundamentally change how MCP works, but they would reduce guesswork and improve the out-of-the-box experience everywhere.


When we set out to build our MCP server, we had an inkling of some of the issues we’d face but it was still the kind of journey that kept tossing new challenges at us.

Our approach boiled down to being pragmatic and making some judgement calls on how folks will likely use the server. The proof will be in the pudding though and so we’ll keep listening to users and tweaking and adjusting. This feels like one of those endeavors with no clear end in sight, so we’ll just have to be vigilant.

We encourage you to try it and would love to hear your feedback. Our goal is to make our MCP server a great citizen inside your LLMs!

Share:

Interested to learn more about Axiom?

Sign up for free or contact us at sales@axiom.co to talk with one of the team about our enterprise plans.

Get started with Axiom

Learn how to start ingesting, streaming, and querying data into Axiom in less than 10 minutes.