Migrating from Sumo Logic Query Language to Axiom Procssing Langauge (APL)

Introduction

In the sphere of data analytics and log management, being able to query data efficiently and effectively is of paramount importance.

This guide dives into why APL could be a superior choice for your data needs, the differences between Sumo Logic and APL, and the potential benefits you could reap from migrating from Sumo Logic to APL. Let's explore the compelling case for APL as a robust, powerful tool for handling your complex data querying requirements.

APL is powerful and flexible and uses a pipe (|) operator for chaining commands, and it provides a richer set of functions and operators for more complex queries.

Benefits of Migrating from SumoLogic to APL

  • Scalability and Performance: APL was built with scalability in mind. It handles very large volumes of data more efficiently and provides quicker query execution compared to Sumo Logic, making it a suitable choice for organizations with extensive data requirements. APL is designed for high-speed data ingestion, real-time analytics, and providing insights across structured, semi-structured data. It's also optimized for time-series data analysis, making it highly efficient for log and telemetry data.

  • Advanced Analytics Capabilities: With APL’s support for aggregation and conversion functions and more advanced statistical visualization, organizations can derive more sophisticated insights from their data.

Query Examples

Let's see some examples of how to convert SumoLogic queries to APL.

Parse, and Extract Operators

Extract from and to fields. For example, if a raw event contains From: Jane To: John, then from=Jane and to=John.

Sumo Logic:

* | parse "From: * To: *" as (from, to)

APL:

['sample-http-logs']
| extend (method) == extract("From: (.*?) To: (.*)", 1, method)

Extract Source IP with Regex

In this section, we will utilize a regular expression to identify the four octets of an IP address. This will help us efficiently extract the source IP addresses from the data.

Sumo Logic:

*| parse regex "(\<src_i\>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

APL:

['sample-http-logs']
| extend ip = extract("(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})", 1, "23.45.67.90")

Extract Visited URLs

This section focuses on identifying all URL addresses visited and extracting them to populate the "url" field. This method provides an organized way to track user activity using APL.

Sumo Logic:

_sourceCategory=apache 
| parse "GET * " as url

APL:

['sample-http-logs']
| where method == "GET"
| project url = extract(@"(\w+)", 1, method)

Extract Data from Source Category Traffic

This section aims to identify and analyze traffic originating from the Source Category. We will extract critical information including the source addresses, the sizes of messages transmitted, and the URLs visited, providing valuable insights into the nature of the traffic using APL.

Sumo Logic:

_sourceCategory=apache
| parse "* " as src_IP
| parse " 200 * " as size
| parse "GET * " as url

APL:

['sample-http-logs']
| extend src_IP = extract("^(\\S+)", 0, uri)
| extend size = extract("^(\\S+)", 1, status)
| extend url = extract("^(\\S+)", 1, method)

Calculate Bytes Transferred per Source IP

In this part, we will compute the total number of bytes transferred to each source IP address. This will allow us to gauge the data volume associated with each source using APL.

Sumo Logic:

_sourceCategory=apache 
| parse "* " as src_IP 
| parse " 200 * " as size 
| count, sum(size) by src_IP

APL:

['sample-http-logs']
| extend src_IP = extract("^(\\S+)", 1, uri)
| extend size = toint(extract("200", 0, status))
| summarize count(), sum(size) by src_IP

Compute Average HTTP Response Size

In this section, we will calculate the average size of all successful HTTP responses. This metric helps us to understand the typical data load associated with successful server responses.

Sumo Logic:

_sourceCategory=apache 
| parse " 200 * " as size 
| avg(size)

APL:

Get the average value from a string:

['sample-http-logs']
| extend number = todouble(extract("\\d+(\\.\\d+)?", 0, status))
| summarize Average = avg(number)

Extract Data with Missing Size Field (NoDrop)

This section focuses on extracting key parameters like src, size, and URL, even when the size field may be absent from the log message.

Sumo Logic:

_sourceCategory=apache 
| parse "* " as src_IP 
| parse " 200 * " as size nodrop 
| parse "GET * " as url

APL:

['sample-http-logs']
| where content_type == "text/css"
| extend src_IP = extract("^(\\S+)", 1, ['id'])
| extend size = toint(extract("(\\w+)", 1, status))
| extend url = extract("GET", 0, method)

Count URL Visits

This section is dedicated to identifying the frequency of visits to a specific URL. By counting these occurrences, we can gain insights into website popularity and user behavior.

Sumo Logic:

_sourceCategory=apache 
| parse "GET * " as url 
| count by url

APL:

['sample-http-logs']
| extend url = extract("^(\\S+)", 1, method)
| summarize Count = count() by url

Page Count by Source IP

In this section, we will identify the total number of pages associated with each source IP address. This analysis will allow us to understand the volume of content generated or hosted by each source.

Sumo Logic:

_sourceCategory=apache 
| parse "* -" as src_ip 
| count by src_ip

APL:

['sample-http-logs']
| extend src_ip = extract(".*", 0,  ['id'])
| summarize Count = count() by src_ip

Reorder Pages by Load Frequency

We aim to identify the total number of pages per source IP address in this section. Following this, the pages will be reordered based on the frequency of loads, which will provide insights into the most accessed content.

Sumo Logic:

_sourceCategory=apache 
| parse "* " as src_ip 
| parse "GET * " as url 
| count by url 
| sort by _count

APL:

['sample-http-logs']
| extend src_ip = extract(".*", 0, ['id'])
| extend url = extract("(GET)", 1, method)
| where isnotnull(url)
| summarize _count = count() by url, src_ip
| order by _count desc

Identify the top 10 requested pages.

Sumo Logic:

* | parse "GET * " as url 
| count by url 
| top 10 url by _count

APL:

['sample-http-logs']
| where method == "GET"
| top 10 by method desc

Top 10 IPs by Bandwidth Usage

In this section, we aim to identify the top 10 source IP addresses based on their bandwidth consumption.

Sumo Logic:

_sourceCategory=apache 
| parse " 200 * " as size 
| parse "* -" as src_ip 
| sum(size) as total_bytes by src_ip 
| top 10 src_ip by total_bytes

APL:

['sample-http-logs']
| extend size = req_duration_ms
| summarize total_bytes = sum(size) by ['id']
| top 10 by total_bytes desc

Top 6 IPs by Number of Hits

This section focuses on identifying the top six source IP addresses according to the number of hits they generate. This will provide insight into the most frequently accessed or active sources in the network.

Sumo Logic

_sourceCategory=apache 
| parse "* -" as src_ip 
| count by src_ip 
| top 100 src_ip by _count

APL:

['sample-http-logs']
| extend src_ip = extract("^(\\S+)", 1, user_agent)
| summarize _count = count() by src_ip
| top 6 by _count desc

Timeslice and Transpose

For the Source Category "apache", count by status_code and timeslice of 1 hour.

Sumo Logic:

_sourceCategory=apache*
| parse "HTTP/1.1\" * * \"" as (status_code, size)
| timeslice 1h
| count by _timeslice, status_code

APL:

['sample-http-logs']
| extend status_code = extract("^(\\S+)", 1, method)
| where status_code == "POST"
| summarize count() by status_code, bin(_time, 1h)

Hourly Status Code Count for "Text" Source

In this section, We aim to count instances by status_code, grouped into one-hour timeslices, and then transpose status_code to column format. This will help us understand the frequency and timing of different status codes.

Sumo Logic:

_sourceCategory=text*
| parse "HTTP/1.1\" * * \"" as (status_code, size)
| timeslice 1h
| count by _timeslice, status_code
| transpose row _timeslice column status_code

APL:

['sample-http-logs']
| where content_type startswith 'text/css'
| extend status_code= status
| summarize count() by bin(_time, 1h), content_type, status_code

Status Code Count in 5 Time Buckets

In this example, we will perform a count by 'status_code', sliced into five time buckets across the search results. This will help analyze the distribution and frequency of status codes over specific time intervals.

Sumo Logic:

_sourceCategory=apache*
| parse "HTTP/1.1\" * * \"" as (status_code, size)
| timeslice 5 buckets
| count by _timeslice, status_code

APL:

['sample-http-logs']
| where content_type  startswith 'text/css'
| extend p=("HTTP/1.1\" * * \""), tostring( is_tls)
| extend status_code= status
| summarize count() by bin(_time, 12m), status_code

Grouped Status Code Count

In this example, we will count messages by status code categories. We will group all messages with status codes in the 200s, 300s, 400s, and 500s together, we are also groupint the method requests with the GET, POST, PUT, DELETE attributes. This will provide an overview of the response status distribution.

Sumo Logic:

_sourceCategory=Apache/Access
| timeslice 15m
| if (status_code matches "20*",1,0) as resp_200
| if (status_code matches "30*",1,0) as resp_300
| if (status_code matches "40*",1,0) as resp_400
| if (status_code matches "50*",1,0) as resp_500
| if (!(status_code matches "20*" or status_code matches "30*" or status_code matches "40*" or status_code matches "50*"),1,0) as resp_others
| count(*), sum(resp_200) as tot_200, sum(resp_300) as tot_300, sum(resp_400) as tot_400, sum(resp_500) as tot_500, sum(resp_others) as tot_others by _timeslice

APL:

['sample-http-logs']
| extend MethodCategory = case(
   method == "GET", "GET Requests",
   method == "POST", "POST Requests",
   method == "PUT", "PUT Requests",
   method == "DELETE", "DELETE Requests",
   "Other Methods")
| extend StatusCodeCategory = case(
   status startswith "2", "Success",
   status startswith "3", "Redirection",
   status startswith "4", "Client Error",
   status startswith "5", "Server Error",
   "Unknown Status")
| extend ContentTypeCategory = case(
   content_type == "text/csv", "CSV",
   content_type == "application/json", "JSON",
   content_type == "text/html", "HTML",
   "Other Types")
| summarize Count=count() by bin_auto(_time), StatusCodeCategory, MethodCategory, ContentTypeCategory

Conditional Operators

For the Source Category "apache", find all messages with a client error status code (40*):

Sumo Logic:

_sourceCategory=apache*
| parse "HTTP/1.1\" * * \"" as (status_code, size)
| where status_code matches "40*"

APL:

['sample-http-logs']
| where content_type startswith 'text/css'
| extend p = ("HTTP/1.1\" * * \"")
| where status == "200"

Browser-based Hit Count

In this query example, we aim to count the number of hits by browser. This analysis will provide insights into the different browsers used to access the source and their respective frequencies.

Sumo Logic:

_sourceCategory=Apache/Access
| extract "\"[A-Z]+ \S+ HTTP/[\d\.]+\" \S+ \S+ \S+ \"(?<agent>[^\"]+?)\""
| if (agent matches "*MSIE*",1,0) as ie
| if (agent matches "*Firefox*",1,0) as firefox
| if (agent matches "*Safari*",1,0) as safari
| if (agent matches "*Chrome*",1,0) as chrome
| sum(ie) as ie, sum(firefox) as firefox, sum(safari) as safari, sum(chrome) as chrome

APL:

['sample-http-logs']
| extend ie = case(tolower(user_agent) contains "msie", 1, 0)
| extend firefox = case(tolower(user_agent) contains "firefox", 1, 0)
| extend safari = case(tolower(user_agent) contains "safari", 1, 0)
| extend chrome = case(tolower(user_agent) contains "chrome", 1, 0)
| summarize data = sum(ie), lima = sum(firefox), lo = sum(safari), ce = sum(chrome)

Use the where operator to match only weekend days.

Sumo Logic:

* | parse "day=*:" as day_of_week
| where day_of_week in ("Saturday","Sunday")

APL:

['sample-http-logs']
| extend day_of_week = dayofweek(_time)
| where day_of_week == 1 or day_of_week == 0

Extract Numeric Version Numbers

In this section, we will identify version numbers that match numeric values 2, 3, or 1. We will utilize the num operator to convert these strings into numerical format, facilitating easier analysis and comparison.

Sumo Logic:

* | parse "Version=*." as number | num(number)
| where number in (2,3,6)

APL:

['sample-http-logs']
| extend p= (req_duration_ms)
| extend number=toint(p)
| where number in (2,3,6)

Making the Leap: Transform Your Data Analytics with APL

As we've navigated through the process of migrating from Sumo Logic to APL, we hope you've found the insights valuable. The powerful capabilities of Axiom Processing Lnaguage are now within your reach, ready to empower your data analytics journey.

Ready to take the next step in your data analytics journey? Dive deeper into APL and discover how it can unlock even more potential in your data. Check out our APL learning resources and tutorials to become proficient in APL, and join our community forums to engage with other APL users. Together, we can redefine what's possible in data analytics. Remember, the migration to APL is not just a change, it's an upgrade. Embrace the change, because better data analytics await you.

Begin your APL journey today!

Was this page helpful?