-
Notifications
You must be signed in to change notification settings - Fork 1
Spath Issues
In a word: arrays
In general Splunk's spath
command is great. It works with both XML and JSON and it's fairly straightforward to use. However, one area where spath
struggles is with arrays. This limitation is also clearly seen with Splunk's automatic JSON field extraction.
While these issues can often be avoided in simple homogeneous arrays, that's not always the case. Sometimes data is messy.
Splunk can handle this data set without issue
{
"records": [
{
"Name": "name",
"Value": "Ron Swanson"
},
{
"Name": "worst_food",
"Value": "salad"
},
{
"Name": "ex-wife",
"Value": "Tammy"
}
]
}
Using spath
this will result in the following new multivalued fields:
records{}.Name |
records{}.Value |
---|---|
name | Ron Swanson |
worst_food | salad |
ex-wife | Tammy |
Test this for yourself in Splunk:
| makeresults | eval _raw="{\"records\":[{\"Name\":\"name\",\"Value\":\"Ron Swanson\"},{\"Name\":\"worst_food\",\"Value\":\"salad\"},{\"Name\":\"ex-wife\",\"Value\":\"Tammy\"}]}" | spath
But wait, that was easy, you say! Sure, but didn't Ron have 2 ex-wives named Tammy? Right you are.
Let's take a look at what happens when both of Ron's ex-wives are properly accounted for:
{
"records": [
{
"Name": "name",
"Value": "Ron Swanson"
},
{
"Name": "worst_food",
"Value": "salad"
},
{
"Name": "ex-wife",
"Value": [
"Tammy one",
"Tammy two"
]
},
{
"Name": "best_food",
"Value": "steak"
}
]
}
This produces a much more confusing set of new multivalued fields:
records{}.Name |
records{}.Value |
records{}.Value{} |
---|---|---|
name | Ron Swanson | Tammy one |
worst_food | salad | Tammy two |
ex-wife | steak | |
best_food |
From this output, both name and worst_food are still retrievable, but how would you know that ex-wife is the field that goes with records{}.Value{}
rather than being estranged from steak?
Test this for yourself in Splunk:
| makeresults | eval _raw="{\"records\":[{\"Name\":\"name\",\"Value\":\"Ron Swanson\"},{\"Name\":\"worst_food\",\"Value\":\"salad\"},{\"Name\":\"ex-wife\",\"Value\":[\"Tammy one\",\"Tammy two\"]},{\"Name\":\"best_food\",\"Value\":\"steak\"}]}" spath
The same problem happens if the keys vary per item in the array. For examples, say an array captures changes and current values at the same time:
{
"records": [
{
"Name": "name",
"Value": "Ron Swanson"
},
{
"Name": "worst_food",
"OldValue": "salad",
"NewValue": "fish"
},
{
"Name": "ex-wife",
"OldValue": "Tammy one",
"NewValue": "Tammy two"
}
]
}
INCOMPLETE EXAMPLE
For those interested in a more real-life value, here's the ultimate example of how/where spath
style processing falls apart. This part of a nested JSON from a Microsoft Office 365 management event.
[
{
"NewValue": [
{
"RelyingParty": "*",
"State": 1,
"RememberDevicesNotIssuedBefore": "2018-11-08T19:37:42.7363619Z"
}
],
"OldValue": [],
"Name": "StrongAuthenticationRequirement"
},
{
"NewValue": "StrongAuthenticationRequirement",
"OldValue": null,
"Name": "Included Updated Properties"
}
]
This results in fields that look like this:
{}.Name |
{}.NewValue |
{}.NewValue{}.RelyingParty |
{}.NewValue{}.RememberDevicesNotIssuedBefore |
{}.NewValue{}.State |
{}.OldValue |
---|---|---|---|---|---|
StrongAuthenticationRequirement | Included Updated Properties | StrongAuthenticationRequirement | * | 2018-11-08T19:37:42.7363619Z | 1 |
field name | value |
---|---|
{}.Name |
StrongAuthenticationRequirement |
{}.NewValue |
Included Updated Properties |
{}.NewValue{}.RelyingParty |
* |
{}.NewValue{}.RememberDevicesNotIssuedBefore |
StrongAuthenticationRequirement |
{}.NewValue{}.State |
1 |
{}.OldValue |
null |
Note the use of the {}.
prefix for all the fields. That's because the data is in an array without a named top-level key.
That's a mess. Good luck.
Test this for yourself in Splunk:
| makeresults | eval _raw="[{\"Name\":\"StrongAuthenticationRequirement\",\"OldValue\":[],\"NewValue\":[{\"RelyingParty\":\"*\",\"State\":1,\"RememberDevicesNotIssuedBefore\":\"2018-11-08T19:37:42.7363619Z\"}]},{\"Name\":\"Included Updated Properties\",\"OldValue\":null,\"NewValue\":\"StrongAuthenticationRequirement\"}]" | spath
Note that in all the above example we ignored that fact that simply having multi-value fields line up horizontally is good enough, but the reality is that still a bit of clever search magic is required to make this type of output usable. This topic is explored more deeply on other pages, but for now let's leave you with one example.
Revisiting example 1, since it's the simplest. This is what's required to get Name/Value into their own fields and process them in pairs.
| makeresults | eval _raw="{\"records\":[{\"Name\":\"name\",\"Value\":\"Ron Swanson\"},{\"Name\":\"worst_food\",\"Value\":\"salad\"},{\"Name\":\"ex-wife\",\"Value\":\"Tammy\"}]}"
| spath | fields - _*
| eval records=mvzip('records{}.Name', 'records{}.Value', "::DELIM::")
| mvexpand records
| fields - records{}.*
| eval records=split(records, "::DELIM::")
| eval Name=mvindex(records,0)
| eval Value=mvindex(records,1)
| fields - records
The output looks like:
Name | Value |
---|---|
name | Ron Swanson |
worst_food | salad |
ex-wife | Tammy |
This may look similar to the results in example one. The big difference is that now these values all within their own events. They are no longer multivalued fields part of a single event. This means that you can now manipulate this data using normal field operation instead of being limited to the various mv*()
functions of eval
.
- Introduction
- So what is JMESPath?
- What's wrong with spath?
- Command Reference
- Tutorial (Search examples)
- Change Log