Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Places aggregation #795

Merged
merged 2 commits into from
Feb 22, 2024
Merged

Adds Places aggregation #795

merged 2 commits into from
Feb 22, 2024

Conversation

JPrevost
Copy link
Member

@JPrevost JPrevost commented Feb 16, 2024

NOTE: test/controllers/graphql_controller_v2_test.rb has significant changes and GitHub's code review windows will not show them by default. Please make sure to review those changes. The cassettes are less interesting and you can probably just confirm one or two indeed scrub the sensitive data but ignore the rest as it was all automated.

Why are these changes being introduced:

  • geo ui has reqeusted the ability to filter on additional fields

Relevant ticket(s):

How does this address that need:

  • Adds aggregation in OpenSearch for fixed date_ranges
  • Adds aggregation in OpenSearch for places by utilizing Subjects with a kind of Dublin Core; Spatial
  • Adds GraphQL aggregation capabilities for date_ranges and places

Not included

  • Aggregations for Access Type and Data Type are not yet included as we are still working out exactly how that data should be stored

Document any side effects to this change:

The default index we used in tests was changed from the old value to all-current which should make it easier to generate new tests and cassettes in the future as the old value (timdex-prod) was no longer used anywhere but in our tests.

Changing aggregations required rebuilding most of our cassettes. While I was rebuilding most of them, I opted to rebuild them all. This includes sensitive data scrubbing automatically when using AWS opensearch instances (srubs credentials and URI of the instance). The README has instructions on how to generate cassettes successfully.

I generated cassettes from our current dev1 data, which means a lot of explicitly checked for values changed in the tests. While updating tests for value changes, I also took this opportunity to remove most of the remnants from the v2 naming we had used during the transition from v1. I have not yet renamed the test file, as I was worried that would be more complicated of a review. A follow on change to just rename that file will be forthcoming.

Developer

  • All new ENV is documented in README
  • All new ENV has been added to Heroku Pipeline, Staging and Prod
  • ANDI or Wave has been run in accordance to
    our guide and
    all issues introduced by these changes have been resolved or opened as new
    issues (link to those issues in the Pull Request details above)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Requires database migrations?

NO

Includes new or updated dependencies?

NO

@mitlib mitlib temporarily deployed to timdex-pr-795 February 16, 2024 19:14 Inactive
@matt-bernhardt matt-bernhardt self-assigned this Feb 16, 2024
Copy link
Member

@matt-bernhardt matt-bernhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these are questions to make sure I'm not mistaking something, but there's at least one test (the eastern vs western hemisphere box test) which I think needs to be tweaked on order to still be valid. Or maybe the test itself isn't meaningful, as it relies on OpenSearch working correctly and not anything that we're actually doing in the application. Regardless, the change as currently proposes seems to allow for incorrect test passage.

I also have a concern about the year bucketing being applied, because the most recent bucket seems inconsistent with the rest of them.

As you mentioned in the PR, I didn't look at each cassette individually - but did inspect a few of the ones that changed to see how the anonymization was being applied.

I'm happy to chat more about this on Tuesday, though - you're right that there was a lot involved in regenerating all these cassettes, and I'm glad that you took this on.

app/models/aggregations.rb Outdated Show resolved Hide resolved
test/models/opensearch_test.rb Show resolved Hide resolved
post '/graphql', params: { query: '{
search(geobox: {
minLongitude: 0,
minLatitude: -90,
maxLongitude: 180,
maxLatitude: 90
}) {
},
source: "MIT GIS Resources") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand why we're adding source parameters to this search, because now the tests are running against a bigger index, so a source filter is needed to get down to only GIS records. But if that is the case, the same parameter should also be added to the cassette on line 343 to keep them in sync. Otherwise the test for different results on line 369 can be down to different source values, rather than different geobox definitions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I thought I had done that but clearly didn't. I'll fix.

The reason I am making this change is solely because the test failed with the larger index because the hits for both queries came back as 10,000 so I limited to just MIT data... poorly :)

Will fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted this test to have both hemispheres use the same source filter. I've also reconfigured it to only use a single cassette for the whole test as we can instruct the cassette to have responses for all of the queries that take place during that test which is generally a bit easier to maintain.

@@ -367,8 +369,8 @@ def setup
refute_equal(eastern_hits, western_hits)
end

test 'graphqlv2 geobox search with keyword search' do
VCR.use_cassette('graphqlv2 geobox with keyword') do
test 'graphql geobox search with keyword search' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for this test as the other geospatial tests - should we be adding the source parameter here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to in this case because our assertions don't care if there are more than 10,000 results.

@@ -398,9 +400,9 @@ def setup
end
end

test 'graphqlv2 geobox search with geodistance search' do
test 'graphql geobox search with geodistance search' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for this test as the other geospatial tests - should we be adding the source parameter here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to in this case because our assertions don't care if there are more than 10,000 results.

@mitlib mitlib temporarily deployed to timdex-pr-795 February 20, 2024 19:47 Inactive
@JPrevost
Copy link
Member Author

@matt-bernhardt I think I have addressed and this is ready for another review. Thanks!

@JPrevost JPrevost changed the title Adds additional aggregations Adds Places aggregation Feb 22, 2024
Copy link
Member

@matt-bernhardt matt-bernhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good now - thanks. I reviewed all the tests in the graphql controller file, and I think I get it all.

Why are these changes being introduced:

* geo ui has reqeusted the ability to filter on additional fields

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/GDT-168
* https://mitlibraries.atlassian.net/browse/GDT-169

How does this address that need:

* Adds aggregation in OpenSearch for places by utilizing `Subjects` with
  a `kind` of `Dublin Core; Spatial`

Not included

* Aggregations for `Access Type` and `Data Type` are not yet included
  as we are still working out exactly how that data should be stored as
  well as `years` which which will also be handled separately

Document any side effects to this change:

The default index we used in tests was changed from the old value to
`all-current` which should make it easier to generate new tests and
cassettes in the future as the old value (`timdex-prod`) was no longer
used anywhere but in our tests.

Changing aggregations required rebuilding most of our cassettes. While I
was rebuilding most of them, I opted to rebuild them all. This includes
sensitive data scrubbing automatically when using AWS opensearch
instances (srubs credentials and URI of the instance). The README has
instructions on how to generate cassettes successfully.

I generated cassettes from our current dev1 data, which means a lot of
explicitly checked for values changed in the tests. While updating tests
for value changes, I also took this opportunity to remove most of the
remnants from the v2 naming we had used during the transition from v1.
I have not yet renamed the test file, as I was worried that would be
more complicated of a review. A follow on change to just rename that
file will be forthcoming.
@JPrevost JPrevost force-pushed the gdt-168-aggregations branch from 9c46914 to 3102332 Compare February 22, 2024 19:56
@JPrevost JPrevost temporarily deployed to timdex-pr-795 February 22, 2024 19:56 Inactive
@JPrevost JPrevost merged commit 6feb507 into main Feb 22, 2024
3 of 4 checks passed
@JPrevost JPrevost deleted the gdt-168-aggregations branch February 22, 2024 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants