-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Places aggregation #795
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these are questions to make sure I'm not mistaking something, but there's at least one test (the eastern vs western hemisphere box test) which I think needs to be tweaked on order to still be valid. Or maybe the test itself isn't meaningful, as it relies on OpenSearch working correctly and not anything that we're actually doing in the application. Regardless, the change as currently proposes seems to allow for incorrect test passage.
I also have a concern about the year bucketing being applied, because the most recent bucket seems inconsistent with the rest of them.
As you mentioned in the PR, I didn't look at each cassette individually - but did inspect a few of the ones that changed to see how the anonymization was being applied.
I'm happy to chat more about this on Tuesday, though - you're right that there was a lot involved in regenerating all these cassettes, and I'm glad that you took this on.
post '/graphql', params: { query: '{ | ||
search(geobox: { | ||
minLongitude: 0, | ||
minLatitude: -90, | ||
maxLongitude: 180, | ||
maxLatitude: 90 | ||
}) { | ||
}, | ||
source: "MIT GIS Resources") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand why we're adding source parameters to this search, because now the tests are running against a bigger index, so a source filter is needed to get down to only GIS records. But if that is the case, the same parameter should also be added to the cassette on line 343 to keep them in sync. Otherwise the test for different results on line 369 can be down to different source values, rather than different geobox definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I thought I had done that but clearly didn't. I'll fix.
The reason I am making this change is solely because the test failed with the larger index because the hits for both queries came back as 10,000 so I limited to just MIT data... poorly :)
Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted this test to have both hemispheres use the same source filter. I've also reconfigured it to only use a single cassette for the whole test as we can instruct the cassette to have responses for all of the queries that take place during that test which is generally a bit easier to maintain.
@@ -367,8 +369,8 @@ def setup | |||
refute_equal(eastern_hits, western_hits) | |||
end | |||
|
|||
test 'graphqlv2 geobox search with keyword search' do | |||
VCR.use_cassette('graphqlv2 geobox with keyword') do | |||
test 'graphql geobox search with keyword search' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question for this test as the other geospatial tests - should we be adding the source parameter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to in this case because our assertions don't care if there are more than 10,000 results.
@@ -398,9 +400,9 @@ def setup | |||
end | |||
end | |||
|
|||
test 'graphqlv2 geobox search with geodistance search' do | |||
test 'graphql geobox search with geodistance search' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question for this test as the other geospatial tests - should we be adding the source parameter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to in this case because our assertions don't care if there are more than 10,000 results.
@matt-bernhardt I think I have addressed and this is ready for another review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good now - thanks. I reviewed all the tests in the graphql controller file, and I think I get it all.
4f3b4df
to
9c46914
Compare
Why are these changes being introduced: * geo ui has reqeusted the ability to filter on additional fields Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/GDT-168 * https://mitlibraries.atlassian.net/browse/GDT-169 How does this address that need: * Adds aggregation in OpenSearch for places by utilizing `Subjects` with a `kind` of `Dublin Core; Spatial` Not included * Aggregations for `Access Type` and `Data Type` are not yet included as we are still working out exactly how that data should be stored as well as `years` which which will also be handled separately Document any side effects to this change: The default index we used in tests was changed from the old value to `all-current` which should make it easier to generate new tests and cassettes in the future as the old value (`timdex-prod`) was no longer used anywhere but in our tests. Changing aggregations required rebuilding most of our cassettes. While I was rebuilding most of them, I opted to rebuild them all. This includes sensitive data scrubbing automatically when using AWS opensearch instances (srubs credentials and URI of the instance). The README has instructions on how to generate cassettes successfully. I generated cassettes from our current dev1 data, which means a lot of explicitly checked for values changed in the tests. While updating tests for value changes, I also took this opportunity to remove most of the remnants from the v2 naming we had used during the transition from v1. I have not yet renamed the test file, as I was worried that would be more complicated of a review. A follow on change to just rename that file will be forthcoming.
9c46914
to
3102332
Compare
NOTE:
test/controllers/graphql_controller_v2_test.rb
has significant changes and GitHub's code review windows will not show them by default. Please make sure to review those changes. The cassettes are less interesting and you can probably just confirm one or two indeed scrub the sensitive data but ignore the rest as it was all automated.Why are these changes being introduced:
Relevant ticket(s):
How does this address that need:
Subjects
with akind
ofDublin Core; Spatial
date_ranges
andplaces
Not included
Access Type
andData Type
are not yet included as we are still working out exactly how that data should be storedDocument any side effects to this change:
The default index we used in tests was changed from the old value to
all-current
which should make it easier to generate new tests and cassettes in the future as the old value (timdex-prod
) was no longer used anywhere but in our tests.Changing aggregations required rebuilding most of our cassettes. While I was rebuilding most of them, I opted to rebuild them all. This includes sensitive data scrubbing automatically when using AWS opensearch instances (srubs credentials and URI of the instance). The README has instructions on how to generate cassettes successfully.
I generated cassettes from our current dev1 data, which means a lot of explicitly checked for values changed in the tests. While updating tests for value changes, I also took this opportunity to remove most of the remnants from the v2 naming we had used during the transition from v1. I have not yet renamed the test file, as I was worried that would be more complicated of a review. A follow on change to just rename that file will be forthcoming.
Developer
our guide and
all issues introduced by these changes have been resolved or opened as new
issues (link to those issues in the Pull Request details above)
Code Reviewer
(not just this pull request message)
Requires database migrations?
NO
Includes new or updated dependencies?
NO