Case Study

#customers

Axiom lets Salad’s distributed AI computing customers see and use their own logs


Customer Case Study Screenshot
Featured
Cory Rieth

Senior Tech Product Manager, Salad

Shawn Rushefsky

Senior Generative AI Solutions Architect, Salad

Salad logo

Salad is the world’s largest distributed cloud, used by AI/ML companies for inference/training and other large-scale computing clients.

“We were blind, and now with Axiom we can see.”

Cory Rieth

Senior Tech Product Manager

Takeaways

  • Axiom lets Salad build a platform to serve themselves and their customers from a single source of truth with a simple cost structure that doesn’t penalize Salad’s sprawling distributed network.
  • Salad engineers have visibility into the tens of thousands of home PC nodes running at once, so they can be proactive to problems rather than waiting for customer to report them.
  • They serve each customer’s events out to the Salad UI through Axiom’s query API, providing observability and monitoring automatically to each customer.
  • Customers who want to do their own deeper analysis can load their events into their own Axiom instance or other tools.

About Salad

Salad is the world’s largest distributed cloud with more than 1 Million PCs with AI-enabled GPUs on the network. They provide these to computing customers as an affordable, easily scalable alternative cloud with the lowest GPU prices in the market. Their infrastructure is perfect for large-scale AI/ML operations like inference, providing a cost-efficient alternative to expensive hyperscalers and hard-to-get, enterprise-grade GPUs like the Nvidia A100.

They are also an environment-friendly cloud. By using individual PCs instead of data centers, they reduce the need to burden one specific region or area for water consumption, with less used for electricity generation and cooling. They don’t call themselves an Airbnb for GPUs, but they’re OK when someone else does. Shawn Rushefsky, Senior Generative AI Solutions Architect, says, “You can run many applications on a RTX 4090 GPU in a home PC and get 98% of the performance on an A100 in a data center, all for one-tenth of the cost.“

Today, companies like Civitai & Pareto AI use SaladCloud to run many workloads like AI image generation, voice AI, large language models (LLMs) & more at a low cost. For example, AI transcription of 1 Million YouTube videos can be done via Salad at almost 90% less cost than traditional cloud options. And a use case for which Salad stands nearly alone in the market is data collection and annotation. Both websites and Internet infrastructure providers have identified and blocked the IP addresses of large server farms used to gather data from sites. Salad’s GPUs are located individually on PCs with residential IP addresses which websites and networks are extremely wary of blocking. This helps companies collect and annotate valuable data easily. locking.

The challenge: A million home PCs that provide distributed AI computing

Salad’s customers containerize their workloads for deployment and request the number of GPUs they need. Salad manages the rest, running the workloads on individual GPUs around the world.This approach is scalable, efficient and affordable to customers, but their unique configuration prevents most logging products from meeting their needs.

The team cites several reasons:

  • Customers needed to do some of the setup on their own nodes to enable event data collection — leaving many nodes silent.
  • Having Salad engineers connect to do setup on customers’ Docker lockers posed obvious security risks.
  • Customers are often newer to development, and have problems with their containers. It’s difficult to debug without high visibility.
  • Most logging products charge per node or per seat, which blows completely out of proportion for Salad’s tens of thousands of individual nodes, each in a separate location.
  • Trying to fit other tools into their budget required them to sample the data, which prevents engineers from the observability they need to be proactive or even reactive about customer troubles.

Cory Rieth, Senior Technical Product Manager, wants every event generated. “Some customers are using Salad for the first time, or even using docker containers for the first time,” he says. “That creates some fun situations that are difficult to debug without visibility. We were blind, and now with Axiom we can see.”

The solution: A platform for both Salad and their customers

“We set up Axiom so we could troubleshoot our customers, and that evolved into, ‘What if they didn't have to ask us about it?’”

Shawn Rushefsky

Senior Generative AI Solutions Architect

Building robust logging and observability solutions is a complex and expensive undertaking. The build-vs-buy decision was quick for Salad engineers. Shawn concludes, “I recommend that basically nobody should build their own logging solution unless they want to be a logging provider.”

Axiom has enabled Salad to avoid this burden and focus on their core business, while still providing a high-quality experience for their customers.

Axiom is both flexible and performant, and moreover doesn’t bill based on nodes or other methods that a network of home PCs would blow sky-high. Salad pays only based on how much data Axiom ingests. Moreover, Axiom charges for that only once. There are no further downstream fees for storage, retention time, or exporting data from Axiom to other destinations.

Three use cases, one datastore

To start, events from all customer containers are sent to Salad’s Axiom organization, so their software engineers can understand what’s happening for their customers rather than rely on customers to report issues.

Second, they provide customers with a frictionless, reliable way to see their Salad logs. Logging is auto-enabled for everyone by default. These are sent to Salad’s Axiom instance, from which Salad’s UI obtains and renders them via Axiom’s query API.

└ Salad’s UI includes the logs for the user’s own containers, which Salad’s UI pulls via Axiom API.

A third option lets more advanced users go beyond Salad’s default log display. They can ship the events collected by their containers for Salad’s Axiom to their own Axiom instance (or another tool) for deeper analytics.

└ Advanced Salad users can ship their container’s events via Axiom API to their own Axiom instance or another tool for analysis.

“The absence of negative feedback is the biggest feedback.”

Shawn Rushefsky

Senior Generative AI Solutions Architect

Cory details the night-and-day difference in incident detection and response Axiom provides: “There'll be a customer who’s trying to get a container group up, and I can see that it’s not getting up and running. I can quickly look and see they’re misconfigured or they’re missing some library in their docker image. I can set up monitoring tools to see if they fix the issue over time.”

What, then, do customers contact Salad to complain about? Shawn says customers often ask for next-level features, but rarely gripe about what Salad already provides them through Axiom: “The absence of negative feedback is the biggest feedback.”


Interested to learn more about Axiom?

Sign up for free or contact us at sales@axiom.co to talk with one of the team about our enterprise plans.

Share