There are countless blogs, articles, and Splunk ‘answers’ regarding the optimization of Splunk queries (and here’s another one). In this article, I’d like to share a few consistent tips that I’ve learned to improve the performance of queries. The following tips are listed in order that they are used within the search.
1. Minimize the number of trips to the indexers
Namely – avoid subsearches via the use of ‘join’ and ‘append’. While the ‘join’ and ‘append’ commands are widely used and familiar to most of us, they are not necessarily the most efficient commands. Why is this the case? A few problems include:
- Both commands make use of a subsearch (the stuff between the square brackets). With every use of these commands, the number of times that you need to access the indexers increases (and increases all of the communication and overhead that may be involved).
- Subsearches have limitations. By default, they have a timeout of 60s and a limitation of 50000 events (see subsearch_maxtime and subsearch_maxout in limits.conf). This leads to a truncation of results, which leads to incorrect answers. This can go unnoticed, pay attention to the error messages that are returned with the use of these commands.
What’s the solution? The above problems can be mitigated by combining your subsearch with your primary search and accomplishing the ‘join’ with the use of a stats command. An example of this is shown below.
Using join (before)
index=_internal sourcetype=splunkd component=Metrics
| stats count as metric_count by host
| join host type=left
[search index=_audit sourcetype=audittrail
| stats count as audit_count by host]
| table host metric_count audit_count
Using stats (after)
(index=_internal sourcetype=splunkd component=Metrics) OR
| stats count(eval(sourcetype=”splunkd”)) as metric_count count(eval(sourcetype=”audittrail”)) as audit_count by host
The technique used above can also be used to address the use of the ‘append’ command as well. In general, I exclusively make use of the stats command to avoid the use of the following commands: dedup, table, join, append.
2. Minimize the amount data coming back from the indexers
Another items that is also mentioned in many articles is the goal to filter your data early in order to help lower the number of events returned. While this cuts down on the number of events (vertical), there can also be substantial benefits to limiting the number of fields that are retrieved (horizontal).
By utilizing the ‘fields’ streaming command early within your spl, you not only lower the sheer amount of data that is being pulled from the indexers, but also the amount that has to be transferred to the search head, and processed by the search head.
Where possible, I’ve made it a habit to use the fields command right after the first pipe of my spl.
|fields <field list>
|fields – _raw
A sample job that I created showed the following improvements when simply limiting the number of fields early within the query:
|# of Fields||Disk Usage||Events||Time Spent|
|Query without use of fields||155||8458240||498478||166s|
|Query with use of fields||18||5681152||498478||103s|
3. Perform calculations on the smallest amount of data
Try and keep calculations using commands such as eval, lookups, foreach until after you have a succinct data set that’s been culled by the above steps. Combine commands where possible. For example, I combine all eval statements into one, comma-delimited, eval statement.
| eval var1=”value1”
| eval var2=”value2”
| eval var3=”value3”
eval var1=”value1”, var2=”value2”, var3=”value3”
4. Use non-streaming commands as late in the query as possible
Use non-streaming, transforming commands until last. These are the commands that are really getting the answers you’re looking for such as stats, chart, timechart.
In summary, the basic structure of my queries follow a similar format such as below:
|Base query||Base query|
|Minimize data||| fields <list of fields>|
|Combine/Summarize data||| use of stats for join/append/summarizations|
|Execute calculations||| eval, lookup, etc|
|Format the data||| stats, chart, timechart, etc.|
As we all know, every query and requirement is different, and the thoughts above aren’t strict rules, but rather guidelines I’ve found helpful. Below are a couple links that should help you along the way.
SP6 is a niche technology firm advising organizations on how to best leverage the combination of big data analytics and automation across distinct (3) practice areas:
- Cybersecurity Operations and Cyber Risk Management (including automated security compliance and security maturity assessments)
- Fraud detection and prevention
- IT and DevOps Observability and Site Reliability
Each of these distinct domains is supported by SP6 team members with subject matter expertise in their respective disciplines. SP6 provides Professional Services as well as ongoing Co-Managed Services in each of these solution areas. We also assist organizations in their evaluation and acquisition of appropriate technology tools and solutions. SP6 operates across North America and Europe.