Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search requests fail on a fresh install with the GEMET INSPIRE thesaurus present #8535

Open
jahow opened this issue Dec 5, 2024 · 10 comments

Comments

@jahow
Copy link
Contributor

jahow commented Dec 5, 2024

Describe the bug

On a fresh install of GeoNetwork 4.4.7-SNAPSHOT, if the GEMET INSPIRE thesaurus is present and no record has been added yet, search requests will sometimes never succeed. The error is typically an HTTP 500 error with the following body:

{
    "servlet": "spring",
    "message": "Error is: Bad Request.\nRequest:\n...",
    "url": "/geonetwork/srv/api/search/records/_search",
    "status": "400"
}

The enclosed ElasticSearch error is typically (field can vary):

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on [OrgForResourceObject.default] in [gn-records]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [OrgForResourceObject.default] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "gn-records",
        "node": "B6mBDNS0S-enufKBDM7j5g",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on [OrgForResourceObject.default] in [gn-records]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [OrgForResourceObject.default] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on [OrgForResourceObject.default] in [gn-records]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [OrgForResourceObject.default] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on [OrgForResourceObject.default] in [gn-records]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [OrgForResourceObject.default] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    }
  },
  "status": 400
}

This does not happen reliably all the time.

To Reproduce
Steps to reproduce the behavior:

  1. Start a postgres database and an ElasticSearch 8.14.3 instance
  2. Put the GEMET INSPIRE Thesaurus in the web/src/main/webapp/WEB-INF/data/config/codelist/external/thesauri/theme directory
  3. Start GN with mvn jetty:run
  4. Go to http://localhost:8080/geonetwork/srv/eng/catalog.search

An error message shows up and the results list is broken:
image

Expected behavior
No error, the message "no records yet" appears.

Log file
No significant message in the GeoNetwork log.

Additional context
Preliminary discussion on Discourse: https://discourse.osgeo.org/t/clean-out-geonetwork-for-re-install/111527/13

@jahow
Copy link
Contributor Author

jahow commented Dec 5, 2024

Interestingly, when such a state happens, the ElasticSearch index mapping for the OrgForResourceObject field is like so:

        "OrgForResourceObject": {
          "properties": {
            "default": {
              "type": "text",
              "fields": {
                "keyword": { "type": "keyword", "ignore_above": 256 }
              }
            },
            "langeng": {
              "type": "text",
              "fields": {
                "keyword": { "type": "keyword", "ignore_above": 256 }
              }
            }
          }
        }

OrgForResourceObject.default is indexed as text, although the records.json config should make it a keyword:

      {
        "org": {
          "match": "*Org*Object",
          "mapping": {
            "type": "object",
            "properties": {
              "default": {
                "type": "keyword",
                "copy_to": ["any.default", "organisationName.default"]
              },
              "langeng": {
                "type": "keyword",
                "copy_to": ["any.langeng", "organisationName.langeng"]
              },
              ...
              "link": {
                "type": "keyword"
              }
            }
          }
        }
      },

@josegar74
Copy link
Member

@jahow it is strange, if you reproduce the problem again, please check to run Admin console > Tools > Delete index and reindex, so that the index definition is recreated, to verify if that solves the problem.

It's not a solution, but to check if it works that way, as somehow it seems that by creating the wrong index you are not using the right index configuration.

@jahow
Copy link
Contributor Author

jahow commented Dec 5, 2024

Yes theres's something going on where the index is created without the proper configuration, looking into it

@josegar74
Copy link
Member

Ok, for fields defined in records.json it should use those definitions. There are some fields that are not explicitly listed in records.json and then the type is inferred, so in certain cases if the metadata has invalid content for example in a date field and the field is not explicitly listed, it can cause problems. But this does not seem to be the case for this field.

@jahow
Copy link
Contributor Author

jahow commented Dec 5, 2024

Indeed, I think what happens is that in some scenario the gn-records.json configuration file is not correctly read from the data dir, and so the index is created without a proper mapping.

I've seen this happen as well for thesauri that were not correctly registered by GeoNetwork on startup. This is going to be tricky to track down though...

@josegar74
Copy link
Member

@jahow in any case the error message you point, it happens always when the catalog / index is empty.

Loading some metadata fixes it, but I haven't check the exact error.

@matself
Copy link
Contributor

matself commented Dec 6, 2024

Can this be related to Elasticsearch / Update to 8.14.3. #8337? That is a major change from 4.4.4
On the server, where I ran in to this problem, I am running ES 7.17.15.
I can report that GN 4.2.2 and GN 4.4.4 are working fine in that environment.
Is an upgrade to ES 8.14 necessary?

@jahow
Copy link
Contributor Author

jahow commented Dec 6, 2024

@matself I really don't think this is related to the ES version. I have seen this happening both on 7.x and 8.x, although I can't seem to reproduce this lately... Still trying to figure out what is the differentiating factor here.

@matself
Copy link
Contributor

matself commented Dec 6, 2024 via email

@josegar74
Copy link
Member

@matself If you use an older version than 8.14 of Elasticsearch server, you need to create the index manually as described in #8337 (comment), but I'm not sure if the issue you are facing is related to this.

I would try if possible to upgrade to Elasticsearch server 8.14 and verify if the issue still happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants