Splunk Knowledge Objects & Eventtypes: Background
It’s important to keep your eventtypes clear, concise, and restrictive to the applicable dataset to avoid unnecessary KO generation and memory usage.
All Splunk Knowledge Objects (KOs) generated during search time are maintained in memory. Every field name extracted from the data, every eventtype, and tag applied to each event take up a little bit of memory – our wonderful “schema on the fly” concept!
Now, a single KO doesn’t take up too much memory. However, that minuscule amount of memory adds up as the number of events in the search results returned increases. Run a search that returns 10 million events, and the KOs generated and maintained in memory increase by that same 10 million counts!
Eventtypes are a construct of filtering that is applied to a specific dataset to aid in searching, reporting, and dashboard generation related to that dataset.
An eventtype definition is supposed to identify a particular subset of a dataset, often for the purpose of applying ‘tags’ to that subset of data.
Both the generation of the eventtype and the subsequently applied tags are KOs, and therefore consume some amount of memory. Incorrectly configuring and applying an eventtype definition to data not related to the designed purpose causes unnecessary memory usage. To prevent this, write clear, concise, and restrictive eventtype definitions.
The Splunk Technical Add On (TA) called Splunk Add-on for Unix and Linux (https://splunkbase.splunk.com/app/833/) is one particular TA that causes unnecessary and bloated eventtype and tag generation in datasets not related to Unix, Linux, or any other *nix data. The definition used for (4) eventtypes in this TA is so broad as to falsely create eventtypes and subsequent tags in searches of some non-*nix datasets.
Provide guidance in refining (4) specific eventtypes within the Splunk Add On for Unix and Linux. Additionally, these concepts can be applied to other TAs.
Structure of Eventtypes in Splunk
Each eventtype has (2) required elements to it, with an optional set of third/fourth element(s). The (2) required elements are the ‘declaration’ and the ‘definition’. The optional elements (always commented and out of active configuration) are the listing of the potential tags and data models associated with the declared eventtype.
The following is an example of an eventtype, configured in eventtypes.conf:
search = sourcetype=foo signal=bar corrupted
# tags = operations services configuration
# datamodel = change
In the above sample, the first line is the declaration element (encompassed by the square brackets’[‘ and ‘]’); that is it ‘declares’ the eventtype title to be “sample_signal”. It is best to use a descriptive title that clearly and concisely identifies the subset of the overall dataset that the eventtype applies to.
The second line is the definition of the specific search. This is the filter to be applied to the selected dataset to identify a subset from this data. The terms of the definition need to be met if the underlying event is to have this eventtype assigned (and possibly accompanying tag) and KOs generated in memory. In the example, only events within a sourcetype of foo also have a signal field with a value of bar, and the word corrupted within it with be identified by the eventtype filter of sample_signal.
Lines 3 and 4 are optional comments that identify the potential tag KOs to be generated for this event and the appropriate data models this event may support.
Eventtypes in the Splunk Add On for Unix and Linux
The declaring or title of the eventtypes in the *nix TA is not usually an issue. They represent certain characteristics that are desired to fit a filter when looking at event data, such as [login_authentication], [passwd-auth-failure], or [sshd_authentication].
Most, but not all of the eventtype definitions in the TA are specific enough to concisely identify very specific events to apply the eventtype KO too. One way to tighten the breadth of the definition is to include the sourcetype or index in the definition. A good number of the eventtypes in this TA applies the sourcetype. Additionally, the newest version of the TA (8.10 released June 24, 2020) even uses the punct field, identifying specific punctuation in an event as a definition filter. This makes it a VERY precise filter for that type of event!
There are (4) eventtypes in the TA that still use vague definitions, the end result being that incorrect events/datasets get an eventtype assigned to them, increasing unneeded memory usage.
Example One: Eventtype does not use the optional lines, but simply has a declaration and its definition:
search = source=”/etc/*” OR source=”*.conf” OR source=”*.cfg”
The issue with this eventtype is that the definition is too broad. ANY event that has a source with an ending of .conf or .cfg gets assigned this eventtype. Sources that include Splunk configuration files (using .conf) would get assigned the eventtype of “nix_configs”. Clearly, this would be an incorrect assignment of an eventtype to a dataset.
The other (3) eventtype definitions in the Splunk Add On for Unix and Linux present similar issues; their definitions are too broad and are too easily inadvertently applied to events in other datasets.
search = (NOT sourcetype=stash) error OR critical OR failure OR fail OR failed OR fatal
#tags = error
search = (NOT sourcetype=stash) kernel
#tags = os unix kernel
search = source=”*.log” OR source=”*.log.*” OR source=”*/log/*” OR source=”/var/adm/*” OR source=”access*” OR source=”*error*” OR sourcetype=”syslo*” NOT source=usersWithLoginPrivs NOT sourcetype=lastlog
Solution: There are multiple ways in which to resolve this unnecessary expansion of eventtype KOs in memory.
Method 1: add index and /or sourcetype to every eventtype definition in the TA.
For example, in the first eventtype, change
search = source=”/etc/*” OR source=”*.conf” OR source=”*.cfg”
search = index = linux source=”/etc/*” OR source=”*.conf” OR source=”*.cfg”
Note: As we are only talking about (4) eventtypes in this particular TA, this is the simplest and most straight forward way to ensure these eventtypes get applied to ONLY *nix data.
Method 2: This can be used if many eventtypes had to be modified. This entails creating an additional eventtype that defines the index or source type, and then applying that eventtype into the definition of every other eventtype.
For example, add the index definition eventtype:
search = index=linux
Then, in every subsequent eventtype definition, add this eventtype into it:
search = eventtype=nix_index source=”/etc/*” OR source=”*.conf” OR source=”*.cfg”
Note: This effectively adds an index name to every eventtype definition, narrowing the scope of which events the eventtype can be applied to. The number of eventtypes generated and memory used will be reduced.
Method 3: This method makes use of a macro within Splunk.
This entails the generation of a macro to define the index name or source type, similar to the eventtype index naming method. The macro would be called within every eventtype definition search.
definition = index=linux
Then, in eventtypes.conf, for every eventtype definition, add this macro:
search = `nix-indexes` source=”/etc/*” OR source=”*.conf” OR source=”*.cfg”
Note: This effectively adds an index name to every eventtype definition via the macro definition, again, narrowing the scope of which events the eventtype can be applied to.
Eventtypes: In Conclusion
If you’ve ever searched a dataset not related to a *nix OS, looked at the listing of eventtypes, and saw nix_configs, nix_errors, or nix-all-logs in the listing, now you know why. Poorly restricted eventtype definitions are the cause. The lesson can also carry over to any custom homegrown Add Ons you might develop – keep your eventtype definitions clear, concise, and restrictive to the applicable dataset to avoid unnecessary KO generation and memory usage.
SP6 is a Splunk consulting firm focused on Splunk professional services including Splunk deployment, ongoing Splunk administration and Splunk development. SP6 has a separate division that also offers Splunk recruitment and the placement of Splunk professionals into direct-hire (FTE) roles for those companies that may require assistance with acquiring their own full-time staff, given the challenge that currently exists in the market today.