Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate simulation metadata from (sxs) metadata #64

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

SSL32081
Copy link

This pull request aims to fix #63.

The current behaviour of the metadata in nrcatalogtools.waveform.WaveformModes is that after the instance is created, the simulation metadata is stored in both self.metadata and self.sim_metadata, whereas the necessary metadata needed for the internals of sxs (e.g. TimeSeries and WaveformModes) to work is stored in self._metadata.

This is the current behaviour explicitly:

h_rit = rit_catalog.get('RIT:BBH:0001-n100-id3')

h_rit.metadata.keys()
> odict_keys(['Unnamed__0', 'catalog_tag', 'resolution_tag', 'id_tag', 'run_name', 'data_type',  ...
h_rit._metadata.keys()
> dict_keys(['time', 'time_axis', 'modes_axis', 'ell_min', 'ell_max', 'metadata', ...
h_rit.sim_metadata.keys()
> odict_keys(['Unnamed__0', 'catalog_tag', 'resolution_tag', 'id_tag', 'run_name', 'data_type',  ...

This mismatch between metadata and _metadata causes issues in operations as a sxs.WaveformModes or sxs.TimeSeries. In particular, whenever the ndarray.view() method is invoked (e.g. [1] and [2]), the keys in _metadata will be overwritten/combined with those in metadata, which then lead to the slicing issue in #63.

This PR addresses this issue by creating an additional hidden attribute _sim_metadata to store the simulation metadata, thereby completely separate metadata from sim_metadata.

After this PR, the behaviour should become:

h_rit = rit_catalog.get('RIT:BBH:0001-n100-id3')

h_rit.metadata.keys()
> dict_keys(['time', 'time_axis', 'modes_axis', 'ell_min', 'ell_max', 'metadata', ...
h_rit._metadata.keys()
> dict_keys(['time', 'time_axis', 'modes_axis', 'ell_min', 'ell_max', 'metadata', ...
h_rit.sim_metadata.keys()
> odict_keys(['Unnamed__0', 'catalog_tag', 'resolution_tag', 'id_tag', 'run_name', 'data_type',  ...
h_rit._sim_metadata.keys()
> odict_keys(['Unnamed__0', 'catalog_tag', 'resolution_tag', 'id_tag', 'run_name', 'data_type',  ...

Copy link
Collaborator

@adivijaykumar adivijaykumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @SSL32081! I think I have not hit this error so far because I use the get_modes method, but I do think this is worth fixing.

Would it be possible for you to add the same changes to maya.py?

@prayush At some point we should think of refactoring these to reduce code duplication :)

@@ -69,6 +69,8 @@ def get(self, sim_name):
metadata = self.get_metadata(sim_name)
if type(metadata) is not dict and hasattr(metadata, "to_dict"):
metadata = metadata.to_dict()
elif isinstance(metadata, dict):
metadata = dict(metadata.items())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused why this check is needed. Wouldn't metadata before and after this action be the same thing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a quick fix for another small issue. Tbh, I think it could be addressed more appropriately from the sxs side.

The issue is that somehow the type of metadata becomes the sxs.Metadata, and the update method of which supports syntax like: metadata.update(new_meta), but not metadata.update(**new_data).

But then the latter syntax is used in here. Since I am not sure whether sxs expects the metadata there has to be a pure dict or that they are not aware of this issue, I opt for this simple fix here.

Going back to the original question, the sxs.Metadata is a subclass from OrderedDict, so the isinstance and items() methods work, and converting it to a simple dict revert the update method back to the usual one.

@@ -165,7 +165,7 @@ def load_from_h5(cls, file_path_or_open_file, metadata={}, verbosity=0):
data[:, idx] = amp_interp(times) * np.exp(1j * phase_interp(times))

w_attributes = {}
w_attributes["metadata"] = metadata
# w_attributes["metadata"] = metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to comment this line out, please remove it entirely. Please also remove the other commented out # w_attribute below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just did.

@SSL32081
Copy link
Author

SSL32081 commented Dec 19, 2024

Thanks for the PR @SSL32081! I think I have not hit this error so far because I use the get_modes method, but I do think this is worth fixing.

Would it be possible for you to add the same changes to maya.py?

Sure, let me have a look at that!

But since maya.py inherits from catalog.py, I suppose this problem is solved for maya as well.

@SSL32081
Copy link
Author

SSL32081 commented Dec 19, 2024

Actually, may I ask why these two attributes are set as class attributes? (see here)

        # Set the file path attribute
        cls._filepath = h5_file.filename
        # If _metadata is not already
        # a set attribute, then set
        # it here.


        try:
            cls._sim_metadata
        except AttributeError:
            cls._sim_metadata = metadata

To me, it seems to make more sense to set them as instance attributes later in the __new__ method.

The current behaviour is like this:

from nrcatalogtools import RITCatalog
rit_catalog = RITCatalog.load(download=False)

rit_code = 'RIT:BBH:0001-n100-id3'
h_rit_1 = rit_catalog.get(rit_code)
print('BBH:0001:', h_rit_1.sim_metadata['catalog_tag'])
print('BBH:0001:', h_rit_1._filepath)

rit_code = 'RIT:BBH:0003-n100-id0'
h_rit_3 = rit_catalog.get(rit_code)
print('BBH:0003:', h_rit_3.sim_metadata['catalog_tag'])
print('BBH:0003:', h_rit_3._filepath)

## Output:
# BBH:0001: RIT:BBH:0001
# BBH:0001: None
# BBH:0003: RIT:BBH:0001
# BBH:0003: None

(I can quickly push a commit that fixes this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slicing WaveformModes does not work
2 participants