June 29, 2023

#engineering, #product

Observability: A brilliant idea whose name has been hijacked

Blog Screenshot

Software marketers have diluted the concept of Observability, a property of systems introduced by engineer, mathematician and inventor Rudolf E. Kálmán in 1960 to improve control systems for aerospace guidance, industrial processes, electrical power, and robotics. Kálmán defined, in mathematical terms, the ability to imply the internal state of a system based on knowledge of its external outputs. As distributed software systems emerged in the past decade, deep thinkers including Charity Majors at Honeycomb applied Kálmán’s theory to modern SaaS and microservices infrastructure.

The driver for their focus was the widespread adoption of Agile and DevOps methodologies. DevOps has given software developers new responsibility for production rather than throwing releases over the wall to Ops. The spread of this Shift Left movement, combined with the unpredictable behavior of increasingly complex software systems, as well as their expanding number and breadth of users, obliterated developers’ former confidence that they could anticipate and catch all potential problems before deploying their software to production. They might not know, for example, that there’s a problem with users on Android phones who have Cyrillic characters in their account names.

To enable continuous problem-solving in live software systems, developers needed tools designed with an Observability mindset. Tools like Honeycomb, Dynatrace, Lightstep, and, yes, Axiom are built to empower developers to find and quantify the impact of unexpected patterns and problems in their systems’ behavior. They do so by analyzing the external outputs of software systems — logs, metrics, and traces — from which the internal states of all system components can be inferred.

You can’t observe what you don’t have

Today, though, the word Observability has been hijacked by makers of legacy monitoring and logging tools that can’t affordably process all logs, metrics, and traces. They offer simplistic dashboards and monitoring for already-expected problems, with limited ability to visualize unexpected anomalies under real-world loads. Developers resort to filtering and sampling their systems’ signals. They make the best decisions they can on which dimensions won’t matter, or which will have intermittent values they won’t need to see every record to spot.

When you do that based on the assumption that you can predict which information you need to find and debug problems, you regress, perhaps unwittingly, to the old mindset that you know enough to predict what problems you’ll need to debug ahead of time. Yet, as every developer knows, you can’t plan for the unexpected.

Sampling thwarts Observability

Developers who use Splunk, Datadog, New Relic or other tools with sampling or filtering applied to keep their budgets down literally don’t know what they are missing. Calling this Observability is a misuse of the word by companies willing to truncate the truth to fit what they have to offer. They’ve successfully compromised the concept.

A core motivation for Axiom’s founding was to create a logging system that could affordably centralize, manage and analyze 100% of event data from all software in all systems. For Observability, Axiom provides a good start with an easy learning curve to query and chart all your data on any dimension. We make it easy to route all your event data into one place: We have APIs, SDKs for Node, Go, Rust, and Python, a data source plugin for Grafana, and a growing number of other connection enablers.

Meanwhile, if it isn’t really “all your data, all the time,” it can’t deliver the Observability that developers need as software becomes ever more complex. If you’re at the forefront of software’s new frontiers, you deserve to see the full picture.

In search of true Observability? Talk to us today!

Our new pricing starts as low as $25 per month, free for personal projects. No surprise bills, ever. Contact us today to get started: sales@axiom.co

Get started with Axiom

Learn how to start ingesting, streaming, and
querying data into Axiom in less than 10 minutes.