Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update metadata to include HLS granule ID and links to Fmask layer #48

Merged
merged 9 commits into from
Jan 8, 2025

Conversation

ceholden
Copy link
Collaborator

@ceholden ceholden commented Dec 20, 2024

Description

This PR adds two extra pieces of metadata to the HLS-VI product metadata and addresses these tickets,

How I did it

  • I added the original HLS granule ID as an <AdditionalAttribute> section
  • I added two links (HTTP and S3) to the Fmask layer from the source HLS granule to the OnlineAccessURLs section. The links are well formed so we can simply construct the links based on the original HLS product granule ID.
  • I ensured that the XML file was indented correctly before writing.
  • I updated the test "golden (meta)data" with the outputs after manually verifying that it looked OK. I also tested one of the links to the Fmask layer and was able to download it ✅
  • I updated the Granule.xsd to define the Input_GranuleUR and used the GranuleUR type as its type because both granule IDs should share the same validation (e.g., 1 <= length <= 250)
  • Couple package changes,
    • Bumped lxml to latest so we can use the indent method and have an easier time installing it locally. This should also resolve 5 of the 7 Dependabot changes.
    • I also relaxed the NumPy pin to make it easier to compile locally
  • Along the way my text editor caught a few typos

How you can test it

I updated the test "golden (meta)data" with these two metadata additions. You can run the metadata tests to compare against these by running,

pytest tests -k metadata

(the -k metadata scopes the run to just the metadata tests, which is nice because the other ones involve a lot of computation so take longer)

@ceholden ceholden force-pushed the ceh/update-metadata-fmask-granule-id branch from edc7f5a to c1048b3 Compare December 20, 2024 22:49
@ceholden ceholden marked this pull request as ready for review December 20, 2024 22:54
setup.py Outdated
@@ -9,8 +9,10 @@
"dataclasses",
"geojson",
"importlib_resources",
"lxml==3.6.0",
"numpy~=1.19.0",
"lxml==5.3.0",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed there was a PR to pin this but I don't know the backstory,
#34

It looks like there (shockingly) is a wheel for lxml==5.3.0 for py36, so maybe this is fine? I'm guessing the pin to an older version was to match the old version of the C library in our container. Maybe we can bump this to use the binary and then drop the libxml2-dev and libxslt1-dev?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall an issue with this a while back due to my having used a later version like you're doing here, but @sharkinsspatial pointed out the reason we need to pin to the older version, so perhaps he can refresh my memory here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chuckwondo for the review and calling this out!

I reverted the change here after figuring out how to do indentation of XML without bumping versions. Python>=3.9 and lxml>=4.5 have an indent() function but our dependencies are too old for those!

I found a way to do it using xml.dom.minidom thanks to Google

@ceholden ceholden requested a review from madhuksridhar January 7, 2025 15:36
Copy link
Collaborator

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and good eyes on finding the various spelling errors! Just check w/Sean regarding the lxml dependency version.

setup.py Outdated
@@ -9,8 +9,10 @@
"dataclasses",
"geojson",
"importlib_resources",
"lxml==3.6.0",
"numpy~=1.19.0",
"lxml==5.3.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall an issue with this a while back due to my having used a later version like you're doing here, but @sharkinsspatial pointed out the reason we need to pin to the older version, so perhaps he can refresh my memory here.

@ceholden ceholden requested a review from chuckwondo January 8, 2025 20:08
@ceholden
Copy link
Collaborator Author

ceholden commented Jan 8, 2025

Thanks for the review @chuckwondo! I found that PR reverting the lxml version bump and let's definitely keep that pinned

I found a way to have our XML nicely indented with our existing code dependencies. It's not as nice as indent() available in more recent versions, but the solution was only a little wacky. I pushed this change in,
359a6ae

Would you mind re-reviewing with this change added?

Copy link
Collaborator

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Chris!

@ceholden ceholden merged commit 56d2d9a into main Jan 8, 2025
1 check passed
@ceholden ceholden deleted the ceh/update-metadata-fmask-granule-id branch January 8, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants