September 5, 2024

#product

Sampling could be a costly mistake


Blog Screenshot
Author
Christina Noren

Advisor

Many organizations only capture and store a fraction of the event data generated by their infrastructure. It’s understandable. The volume of data grew faster than their logging tools’ capability to handle it all, and the cost skyrocketed. The only option was to sample. People who made that decision on behalf of their company can’t be blamed.

The good news is now there’s a choice. Try Axiom today for free and you’ll likely find that you can stop sampling and reduce your logging budget. Axiom was founded on the principle that organizations — and their engineers — should always be able to see and query 100% of their event data, at a fraction of what it would cost to deploy previous technologies.

Sampling to slash the cost of logging may make sense for a special class of businesses with a very large number of customers whose experiences — at least from an infrastructure event perspective — are largely identical. Facebook doesn’t need to log every event generated by every user, because a problem for one user is almost always a problem for millions. And if a customer’s session fails, they haven’t lost much compared to a retailer losing a payment.

But for most companies, the money saved by throwing away information won’t matter if unavailable log data causes much more expensive problems.

Lost observability

As the number of ways customers interact with the infrastructure multiplies, the associated events take on what’s called a high cardinality: the individual fields in events generated by different customers can have more and more different values. Some customers are using the company’s app in English on an iPhone, some in Russian on an Android Xiaomi, and so forth. Not only will their apps have different User-Agent strings, one’s app might cause infrastructure events that the other doesn’t. Sampling could lose Xiaomi-generated events that indicate a problem Americans aren’t having.

This a simple and obvious example for illustration. The real threat is that sampling will skip esoteric edge cases that engineers may not even realize exist.

Holes in security

Cyberattacks rarely come in through obvious doors. The best (or perhaps worst) hackers might spend years on low-and-slow attacks. They connect through an obscure access point. They slowly defeat one barrier or remove one safeguard at a time to avoid setting off alarms. Sampling can throw away the events their subtle and infrequent moves generate. In fact, they may be counting on that, attacking through methods that generate events they know are frequently sampled out and never seen or analyzed. After a breach, sampled events can also fail to contain the important evidence of each step in the attack and each possible exfiltration of data, making it impossible to reconstitute the attacker’s complete trail with certainty.

Stunted AI training

It’s the basic nature of machine learning and large language models that the more fine-grained the data used to train them, the more sophisticated, accurate and insightful their mechanical logic will be. Every drop counts when building out an LLM or RAG. Most organizations will begin using infrastructure event data to train in-house AI systems soon, if not already. That’s a good reason to stop sampling today. When the training begins, the difference between success and failure may come not only from the totality of new incoming data, but the completeness in detail of historical archives. Those who have the foresight to start now will have an advantage over their competitors..

Full logs as part of your product

A growing number of B2B SaaS companies offer products in which logging for each customer is a core component. Some, like our customer Hapn, track their customers’ assets — in Hapn’s case, real-world treasures from dump trucks to pets fitted with a wireless Hapn tracker. Beyond the records of all transactions with each tracker, Hapn can also provide all related infrastucture logs to prove to customers that their systems are working smoothly if a tractor’s daily report doesn’t appear – there’s definitely a problem out in the field.

A more typical emerging example is Subkeys. Many new startups serve as a proxy between their client companies and those clients’ human customers. They outsource an intricate critical service — in Subkeys’ case, API key generation, management, monitoring and reporting. Here again, Subkeys makes 100% of all logs related to the client available, so they needn’t wonder if there’s something going on about which they don’t know. Full logging lets clients feel as fully aware as if they hosted the API key system themselves.

Axiom lets you keep every event

Axiom’s developers optimized – and continue to improve – every single step of ingesting, storing, and querying events to use the least bytes and fewest compute cycles possible. In fact, Axiom has no off-the-shelf components such as Kafka routing or Parquet data storage. Instead, it uses custom algorithms and outperforms even Parquet at optimizing real-world event datastores.

To remove both the risk and the worry of logging 100% of everything on your network, we’ve kept our pricing both low and simple. Customers only pay for the amount of event data that Axiom ingests, and they only pay for each event once. After that, every stored event is available to them as often as they want, without additional fees.

Sampling isn't a best practice. It evolved as a coping mechanism when old software architectures and old business models were too soon overwhelmed. The still-rising tide of infrastructure data flooded their antiquated tanks. It’s wrong to call it a tsunami. It’s a rich, growing, vibrant ocean of data where new forms of information and insight flourish, evolve, and proliferate. No engineer in their bones really wants to delete a single bit of data. They might come up with a use for it later. The good news is: now you can keep it all.

Interested in a trial of Axiom? Please reach out to our team with any questions via emailon X or through our community Discord.

Share
Get started with Axiom

Learn how to start ingesting, streaming, and
querying data into Axiom in less than 10 minutes.