Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsupported datatype listing group delta_time #17

Closed
rwegener2 opened this issue Nov 7, 2023 · 2 comments
Closed

unsupported datatype listing group delta_time #17

rwegener2 opened this issue Nov 7, 2023 · 2 comments

Comments

@rwegener2
Copy link
Collaborator

rwegener2 commented Nov 7, 2023

Description

While running a listGroup() call on an ATL03 product, the group gt1l/heights/delta_time returns an error about not supporting the datatype:

H5Coro encountered an error listing the group gt1l/heights/delta_time: unsupported datatype: 6
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[40], line 1
----> 1 variables, attributes = h5obj.listGroup('/gt1l/heights', w_attr=True, w_inspect=True)

ValueError: too many values to unpack (expected 2)

From the code it looks like this means that some kind of RuntimeError was encountered while reading.

When I hard reset h5coro back to this commit it seems to work fine, so it may be related to changes from the following commit.

Minimum Reproducible Example

from h5coro import h5coro, s3driver
import earthaccess

auth = earthaccess.login()
s3_creds = auth.get_s3_credentials(daac='NSIDC')

my_bucket = 'nsidc-cumulus-prod-protected'
filepath = 'ATLAS/ATL03/006/2019/11/30/ATL03_20191130112041_09860505_006_01.h5'

h5obj = h5coro.H5Coro(f'{my_bucket}/{filepath}', s3driver.S3Driver,
                     credentials={"aws_access_key_id": s3_creds["accessKeyId"],
                                 "aws_secret_access_key": s3_creds["secretAccessKey"],
                                 "aws_session_token": s3_creds["sessionToken"], })

variables, attributes = h5obj.listGroup('/gt1l/heights', w_attr=True, w_inspect=True)  # Error on this line
@betolink
Copy link
Contributor

betolink commented Nov 29, 2023

I tested this and I got the same error with both versions, I see less errors if we use the latest code from main.

%pip uninstall -y h5coro
%pip install git+https://github.com/ICESat2-SlideRule/h5coro.git@c69e6f0

groups = h5obj.listGroup('/gt1l/heights', w_attr=True, w_inspect=True) 

stderr output:

H5Coro encountered error reading /gt1l/heights/ph_id_channel/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/lon_ph/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/ph_id_count/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/pce_mframe_cnt/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/h_ph/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/ph_id_pulse/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/signal_conf_ph/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/dist_ph_along/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/dist_ph_across/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/lat_ph/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/weight_ph/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/quality_ph/DIMENSION_LIST: variable length data types require reading a global heap, which is not yet supported
H5Coro encountered error reading /gt1l/heights/delta_time/REFERENCE_LIST: unsupported datatype: 6

with

%pip uninstall -y h5coro
%pip install git+https://github.com/ICESat2-SlideRule/h5coro.git@main

groups = h5obj.listGroup('/gt1l/heights', w_attr=True, w_inspect=True) 

stdout

H5Coro encountered an error listing the group gt1l/heights/delta_time: unsupported datatype: 6

which is weird because the original errors were send to stderr, the changes on the last commits didn't modify the code in the metadata readers, I wonder if there is something else going on. @jpswinski

@jpswinski
Copy link
Member

@rwegener2 @betolink Thanks for reporting this. It looks like there are a couple of things going on. I'll try to address each of them.

  1. @rwegener2 There were some significant changes made in the PR Ux #15 associated with how groups are listed and the data structures returned back to the user. You can take a look there to see all the details, but the short summary is that data returned for a group listing is a dictionary without the flattened paths. Please let me know what you think. The hope is that the way the data is being returned now is more intuitive and easier to work with - along the lines of some of the discussions we were having back at the end of the summer.

Here is an update to your script that lists the group with the latest code base:

group = h5obj.listGroup('/gt1l/heights', w_attr=True, w_inspect=True)
for variable, listing in group.items():
    print(f'{variable}:')
    for key, value in listing.items():
        print(f'  {key}: {value}')
  1. @betolink I was able to reproduce what you are seeing. The difference in behavior you are seeing between the older and latest version of the code is the older version of the code choked on variable length data types, which are used for the DIMENSION_LIST attribute in the ICESat-2 data. The latest code gracefully ignores it and moves on. But even with the latest version of the code, for the test @rwegener2 is running the delta_time variable has a REFERENCE_LIST attribute which uses a compound data type which the code does not support. So when the code went to inspect the heights group and then started inspecting the delta_time variable, it kicks out an error message saying that it couldn't list the attributes of the delta_time group. Note that it still was able to inspect all of the other variables in the group because none of the others have the REFERENCE_LIST attribute with a compound data type.

  2. I committed an update to the code to gracefully handle compound data types by just ignoring them and moving on. This should allow @rwegener2's script to run without errors. As a future effort, we will want to add in support for these and the other non-supported data types. For now, I've not yet seen a dataset where the data itself has one of these special data types (it is always in the attributes), so it probably isn't a high priority. But let me know if it becomes a high priority.

  3. You'll also notice in the latest commit a few cosmetic updates to error messages to make them more consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants