split up metadata #7

d70-t · 2022-05-20T17:11:15Z

Currently the entire metadata tree is part of a single output object. Due to the 2MB size limit, this will quickly cause trouble for larger datasets. A natural approach would be to split the metadata object up into multiple objects based on the hierarchy inherent to zarr (e.g. one DAG-CBOR block per variable and dataset instead of only one object per dataset).

We might still need to resort to HAMT if the dictionaries on a single hierarchical level become too large, but that might still be quite far away. We probably might also want to introduce further hierarchical levels within the chunk keys of a single zarr variable as proposed here demonstrated here. This would reduce the number of items per dictionary while aligning IPLD objects to (to-be-introduced) zarr-shards, which in turn may lead to better locality within block requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split up metadata #7

split up metadata #7

d70-t commented May 20, 2022

split up metadata #7

split up metadata #7

Comments

d70-t commented May 20, 2022