-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update geopoint to geoshape #306
Conversation
Why these changes are being introduced: * Given that OpenSearch can process WKT strings and the values tend to be bounding boxes rather than points, we will use the geo_shape type instead of the geo_point type How this addresses that need: * Update geopoint > geoshape in OpenSearch mapping * Remove include_in_parent property from locations field Side effects of this change: * This will impact how the TIMDEX API constructs queries Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/GDT-116
@ehanson8 does dev1 opensearch have this data loaded? |
No, given the breaking nature of this change, we wanted to keep testing local until we had a chance to discuss in case this is a bad idea for a reason that we didn't realize |
* Update include_in_parent and doc_values properties after discussion on the merits of several approaches
Pull Request Test Coverage Report for Build 7576158927
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Noting for posterity that both docs_value=false
or include_in_parent=false
under the locations.geoshape
property have the desired effect of avoiding the error:
Details: {"type": "illegal_argument_exception", "reason": "DocValuesField "locations.geoshape" appears more than once in this document (only one value is allowed per field)"}
where docs_value=false
has the advantage of keeping the index size smaller, with the potential trade-off of slower queries.
Confirmed with the following query that the document was found:
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"locations.geoshape": {
"left": -110,
"right": -109,
"top": 44,
"bottom": 43
}
}
}
}
}
}
and this query fails to find it as the longitude values are outside of the box:
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"locations.geoshape": {
"left": -50,
"right": -49,
"top": 44,
"bottom": 43
}
}
}
}
}
}
Helpful background context
While testing the updated mapping, we ran into an OpenSearch error when a record could not be ingested if it contained more than one
geo_shape
type. This caused issues because we mapped bothdcat_bbox
andlocn_geometry
toLocation.geoshape
, which is ingested as an OpenSearchgeo_shape
type. While MIT records do not have different values for these fields, we have already found examples in OGM records where the fields differ and we would prefer not to lose those unique values.We discovered that removing
include_in_parent
property allowed us to successfully index the records. @JPrevost This has obvious implications for the GraphQL queries so we wanted to get your input and approval before proceeding with this change.How can a reviewer manually see the effects of these changes?
Ensure that
TIMDEX_OPENSEARCH_ENDPOINT
is not set as an env var and spin up a local Docker container with:In new terminal window, create a new
gismit
index:Copy the index name and promote the index to the
gismit
alias:Set
Dev1
credentials andbulk-index
the following S3 file withLocation.geoshape
to see that it successfully indexes:Includes new or updated dependencies?
NO
What are the relevant tickets?
Developer
Code Reviewer