Skip to content

Spath Issues

Lowell Alleman edited this page Nov 17, 2018 · 2 revisions

What's wrong with Spath?

In a word: arrays

In general Splunk's spath command is great. It works with both XML and JSON and it's fairly straightforward to use. However, one area where spath struggles is with arrays. This limitation is also clearly seen with Splunk's automatic JSON field extraction.

While these issues can often be avoided in simple homogeneous arrays, that's not always the case. Sometimes data is messy.

Example 1

Splunk can handle this data set without issue

{
    "records": [
        {
            "Name": "name",
            "Value": "Ron Swanson"
        },
        {
            "Name": "worst_food",
            "Value": "salad"
        },
        {
            "Name": "ex-wife",
            "Value": "Tammy"
        }
    ]
}

Using spath this will result in the following new multivalued fields:

records{}.Name records{}.Value
name Ron Swanson
worst_food salad
ex-wife Tammy

Test this for yourself in Splunk:

| makeresults | eval _raw="{\"records\":[{\"Name\":\"name\",\"Value\":\"Ron Swanson\"},{\"Name\":\"worst_food\",\"Value\":\"salad\"},{\"Name\":\"ex-wife\",\"Value\":\"Tammy\"}]}" | spath

Example 2

But wait, that was easy, you say! Sure, but didn't Ron have 2 ex-wives named Tammy? Right you are.

Let's take a look at what happens when both of Ron's ex-wives are properly accounted for:

{
    "records": [
        {
            "Name": "name",
            "Value": "Ron Swanson"
        },
        {
            "Name": "worst_food",
            "Value": "salad"
        },
        {
            "Name": "ex-wife",
            "Value": [
                "Tammy one",
                "Tammy two"
            ]
        },
        {
            "Name": "best_food",
            "Value": "steak"
        }
    ]
}

This produces a much more confusing set of new multivalued fields:

records{}.Name records{}.Value records{}.Value{}
name Ron Swanson Tammy one
worst_food salad Tammy two
ex-wife steak
best_food

From this output, both name and worst_food are still retrievable, but how would you know that ex-wife is the field that goes with records{}.Value{} rather than being estranged from steak?

Test this for yourself in Splunk:

| makeresults | eval _raw="{\"records\":[{\"Name\":\"name\",\"Value\":\"Ron Swanson\"},{\"Name\":\"worst_food\",\"Value\":\"salad\"},{\"Name\":\"ex-wife\",\"Value\":[\"Tammy one\",\"Tammy two\"]},{\"Name\":\"best_food\",\"Value\":\"steak\"}]}" spath

Example 3

The same problem happens if the keys vary per item in the array. For examples, say an array captures changes and current values at the same time:

{
    "records": [
        {
            "Name": "name",
            "Value": "Ron Swanson"
        },
        {
            "Name": "worst_food",
            "OldValue": "salad",
            "NewValue": "fish"
        },
        {
            "Name": "ex-wife",
            "OldValue": "Tammy one",
            "NewValue": "Tammy two"
        }
    ]
}

INCOMPLETE EXAMPLE

Example 4

For those interested in a more real-life value, here's the ultimate example of how/where spath style processing falls apart. This part of a nested JSON from a Microsoft Office 365 management event.

[
  {
    "NewValue": [
      {
        "RelyingParty": "*", 
        "State": 1, 
        "RememberDevicesNotIssuedBefore": "2018-11-08T19:37:42.7363619Z"
      }
    ], 
    "OldValue": [], 
    "Name": "StrongAuthenticationRequirement"
  }, 
  {
    "NewValue": "StrongAuthenticationRequirement", 
    "OldValue": null, 
    "Name": "Included Updated Properties"
  }
]

This results in fields that look like this:

{}.Name {}.NewValue {}.NewValue{}.RelyingParty {}.NewValue{}.RememberDevicesNotIssuedBefore {}.NewValue{}.State {}.OldValue
StrongAuthenticationRequirement Included Updated Properties StrongAuthenticationRequirement * 2018-11-08T19:37:42.7363619Z 1
field name value
{}.Name StrongAuthenticationRequirement
{}.NewValue Included Updated Properties
{}.NewValue{}.RelyingParty *
{}.NewValue{}.RememberDevicesNotIssuedBefore StrongAuthenticationRequirement
{}.NewValue{}.State 1
{}.OldValue null

Note the use of the {}. prefix for all the fields. That's because the data is in an array without a named top-level key. That's a mess. Good luck.

Test this for yourself in Splunk:

| makeresults | eval _raw="[{\"Name\":\"StrongAuthenticationRequirement\",\"OldValue\":[],\"NewValue\":[{\"RelyingParty\":\"*\",\"State\":1,\"RememberDevicesNotIssuedBefore\":\"2018-11-08T19:37:42.7363619Z\"}]},{\"Name\":\"Included Updated Properties\",\"OldValue\":null,\"NewValue\":\"StrongAuthenticationRequirement\"}]" | spath

What we didn't talk about

Note that in all the above example we ignored that fact that simply having multi-value fields line up horizontally is good enough, but the reality is that still a bit of clever search magic is required to make this type of output usable. This topic is explored more deeply on other pages, but for now let's leave you with one example.

Revisiting example 1, since it's the simplest. This is what's required to get Name/Value into their own fields and process them in pairs.

| makeresults | eval _raw="{\"records\":[{\"Name\":\"name\",\"Value\":\"Ron Swanson\"},{\"Name\":\"worst_food\",\"Value\":\"salad\"},{\"Name\":\"ex-wife\",\"Value\":\"Tammy\"}]}"
| spath | fields - _* 
| eval records=mvzip('records{}.Name', 'records{}.Value', "::DELIM::")
| mvexpand records
| fields - records{}.*
| eval records=split(records, "::DELIM::")
| eval Name=mvindex(records,0)
| eval Value=mvindex(records,1)
| fields - records

The output looks like:

Name Value
name Ron Swanson
worst_food salad
ex-wife Tammy

This may look similar to the results in example one. The big difference is that now these values all within their own events. They are no longer multivalued fields part of a single event. This means that you can now manipulate this data using normal field operation instead of being limited to the various mv*() functions of eval.