I ran a sizable Splunk instance for a few years and have seen dozens of my customers’ Splunk environments out in the wild, and I want to help you avoid some common mistakes.
This post is most helpful if you are just standing Splunk up for the first time, but all of these apply even if your company has run Splunk for years.
1. Never Edit the Default Version of a Configuration File.
Splunk lives and dies by configuration files, and you can tune Splunk in all kinds of ways by changing settings in these files.
A common mistake, one I made myself when I was new to Splunk, is to edit .conf files in the /default/ subdirectories in Splunk in order to change something. Your changes will be applied and Splunk will run merrily along right up until you upgrade Splunk or upgrade the app where that configuration lived. Depending on what those configurations were, that upgrade could wreak havoc with your Splunk environment.
The good news is Splunk makes this relatively safe and easy by overwriting the /default/ versions of files but leaving the /local/ version alone and preserving all of your hard-won tweaks and improvements.
Pro tip: the local version only needs your specific changes, not the entire contents of the default version of the file.
2. What Does “All Time” Mean in Your Company?
Splunk’s search interface includes an option to search over All Time. Think carefully about what this means: how long do you plan to keep your Splunk data? Who might need to run a search overall time, however long that might be? How many results would be returned?
I mention this because many people new to searching in Splunk will run all-time searches; I have even seen a customer’s alert configured to run every five minutes and search overall time. It took 13 minutes to run before I fixed it to look back six minutes; it then ran in under one second.
There are use cases where searching overall time might make sense. My advice is to limit how far back the user role in Splunk can search, removing all time capability from basic users, and only allow it for users you know have been trained enough to have a solid understanding of Splunk. This will reduce the number of extremely long-running searches, reducing the load on Splunk and helping keep your users happy.
3. Real-time Sounds Really Cool.
The other gotcha in Splunk’s search user interface is the option of running real-time searches.
As with all-time, carefully consider who might have a good business case for seeing data in Splunk the moment it gets indexed. If you do not, real-time sounds great and it is entertaining to watch data roll in before your eyes, but I guarantee people will set up dashboards full of real-time searches and bring even the beefiest Splunk environment to its knees.
I ran Splunk against a production application stack for a large e-commerce retailer for years and used real-time searches maybe once a month for a few minutes if we needed to know exactly what was happening during application startup. For alerts, ask the audience for the alert if running the search once a minute is sufficient for their needs – it often is.
If it is not, plan ahead and dedicate extra hardware to running real-time searches, particularly CPU cores on your search heads.
You can assign real-time search capability and scheduled real-time search capability in the Roles portion of Splunk’s Access Controls settings. It’s easy to give someone real-time capabilities; it is much harder to take them away.
4. Remember the Main.
Splunk allows you to set up user-defined indexes. It’s also pretty smart – if you do not tell it which index to send its data to, it will send it to the default index, called Main.
When your Splunk environment is new, this might not seem so bad, and it makes it easy to remember where to search. However, as you add data sources and your Splunk audience grows, Main starts to get messy.
Splitting your data out into separate indexes will make searches run faster, and it is a tough balancing act to move production data to a new index after it’s been going to Main for a long time. It will give you more control over how much data you hang onto and for how long – you might need security logs for one year, but development environment application logs for just a week. And it lets you define who is able to see that data – do your application developers need to see who’s logging on to your Unix servers?
Pro Tip: when you split those indexes out, be sure to add them to the “Indexes searched by default,” section of Access Controls for the appropriate roles. This saves users from having to know which index their data is in.
Splunk can be installed in all kinds of places. Even if you do not have the budget for testing in an exact replica of your production environment, you can install and run it on a laptop to try out ingesting a new data source or an unfamiliar app before you send it to production. Try Splunk n’ a Box and you can mock up your full production environment on your laptop.
SP6 is a Splunk consulting firm focused on Splunk professional services including Splunk deployment, ongoing Splunk administration, and Splunk development. SP6 has a separate division that also offers Splunk recruitment and the placement of Splunk professionals into direct-hire (FTE) roles for those companies that may require assistance with acquiring their own full-time staff, given the challenge that currently exists in the market today.