There are times we’ve been bitten by simple omissions we made laying the foundation for a Splunk environment. When you’re winding down your evening, for example, and the call comes in that Splunk is down.
You race to troubleshoot the problem, only to discover it could have been prevented during your initial Splunk deployment. With that in mind, here are 3 Splunk best practices that will save you time (and face), and let you enjoy your evenings in peace.
Splunk Best Practice #1: Use Volumes to Manage Your Indexes
We’ll assume you’re already familiar with the maxTotalDataSizeMB setting in the indexes.conf file – it’s used to set the maximum size per index (default value: 500 GB).
While maxTotalDataSizeMB is your first line of defense to avoid reaching the minimum free disk space before indexing halts, volumes will protect you from a miscalculation made when creating a new index. Even if you’ve diligently sized your indexes to account for the right growth, retention, and space available, another admin who creates an index in your absence may not be as prudent.
Once you enter volumes, you can bind indexes together and make sure that they do not surpass the limits set. Volumes are configured via indexes.conf and they require a very simple stanza:
[volume:CustomerIndexes] path = /san/splunk maxVolumeDataSizeMB = 120000
In the example above, the stanza tells Splunk we want to define a volume called “CustomerIndexes.”
In addition, we want it to use the path “/san/splunk” to store the associated indexes. Finally, we want to limit the total size of all of the indexes assigned to this volume to 120,000 MB.
No doubt your mind has already conceived the next step, which is where we assign indexes to our “CustomIndexes” volume. This is also done in indexes.conf by prefixing your index’s cold and warm (home) path with the name of the volume:
[AppIndex] homePath = volume:CustomerIndexes/AppIndex/db coldPath = volume:CustomerIndexes/AppIndex/colddb thawedPath = $SPLUNK_DB/AppIndex/thaweddb [RouterIndex] homePath = volume:CustomerIndexes/RouterIndex/db coldPath = volume:CustomerIndexes/RouterIndex/colddb thawedPath = $SPLUNK_DB/RouterIndex/thaweddb
*PRO TIP – use $_index_name to reference the name of your index definition
[RouterIndex] homePath = volume:CustomerIndexes/$_index_name/db coldPath = volume:CustomerIndexes/$_index_name/colddb thawedPath = $SPLUNK_DB/$_index_name/thaweddb
Why this approach? Though the next admin might be unaware, once you set it, it’s difficult to miss when creating a new index via indexes.conf or the web interface. By using volumes, the volume’s “maxVolumeDataSizeMB” setting overrides the indexes “maxTotalDataSizeMB” setting.
If left to their own devices, the AppIndex and RouterIndex would grow to their default maximum size of 500,000 MB each, taking up a total of 1 TB of storage. With volumes, we no longer have to worry about this. As a bonus, there is nothing stopping you from using separate volumes for cold and warm/hot buckets, in case you have different tiers of storage available.
Splunk Best Practice #2: Use Apps and Add-Ons Wherever Possible
As cliché as this phrase is nowadays, in the Splunk world, it pays to be the admin that says, “there’s an app for that”.
Yes, you can download apps from Splunkbase to extend Splunk’s functionality. Furthermore, you’ve probably already used the deployment server to manage inputs and technical add-ons (TAs) on universal forwarders. But what about using apps to manage an environment? It’s not only possible. It’s recommended.
Let’s assume we have all of our Splunk nodes configured to use our deployment server. If not, then you can use this handy CLI command to do that on each instance:
splunk set deploy-poll <IP_address/hostname>: splunk restart
After you’ve done that, continue by identifying stanzas that will be common across groups of nodes (search heads, indexers, forwarders, etc.) or all nodes. For instance, there are two useful stanzas in outputs.conf used to make sure every node is aware of the indexers it needs to forward data to outputs.conf:
[tcpout] defaultGroup=Indexers [tcpout:Indexers] server=IndexerA:9997, IndexerB:9996
Deployment server directory structure
Next, create the following directory structure on your deployment server to accommodate our new app’s config files. You may then place your version of outputs.conf file with the stanzas above in the “local” subdirectory. In this example, we’re naming our app “all_outputs”.
That was the hard part! Go ahead and repeat this exercise and create a new app on the deployment server for every config file that a group of nodes has in common. Here are a few ideas:
- All search heads usually share the same search peers, and this can be accomplished via an app that provides distsearch.conf
- Indexers will need to have the same version of props.conf and transforms.conf to consistently parse the data they ingest
- Forwarders can use an app configuring the allowRemoteLogin setting via server.conf, allowing them to be managed remotely
In order to tie everything together, log on to your deployment server’s GUI and go to Settings > Forwarder Management. Create server classes for the different groups of nodes in your Splunk environment. Assign the appropriate apps and hosts to each server class.
Here comes the fun part. Next time someone calls you up asking how to stand up a new heavy forwarder (or any other instance type), you can answer, “There’s an app for that.”
Splunk Best Practice #3: Keep an Eye on Free Disk Space
We know from experience Splunk frequently checks the free space available on any partition that contains indexes. It also looks for enough free space where the search dispatch directory is mounted, before executing a search (usually wherever Splunk is installed).
By default, the threshold is set at 5,000 MB and configurable by the “minFreeSpace” on server.conf. When it’s reached, expect a call from your users informing you Splunk has stopped indexing, or that searches are not working.
It’s important to keep a close eye on this when your instance is running on a partition with less than 20 GB of free space. This is because Splunk will use several GB for its own processes. It’s difficult to pinpoint with certainty how an environment will grow, as several directories grow according to the daily use of Splunk not governed by the limits set on indexes or volumes.
The top places to look for growth in an environment
- Dispatch directory ($SPLUNK_HOME/var/run/splunk/dispatch)
- KV store directory ($SPLUNK_DB/kvstore)
- Configuration bundle directory ($SPLUNK_HOME/var/run/splunk/cluster/remote-bundle)
- Knowledge bundle directory ($SPLUNK_HOME/var/run/searchpeers)
There’s a way to avoid any surprises: use your monitoring tool of choice to alert for low disk space. A favorite fix of ours: implementing NMON across our cluster. It provides all types of useful metrics when troubleshooting and monitoring your environment. And NMON conveniently has a predefined low disk space alert you can adjust to your environment.