Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Each attribute in metadata needs to exist somewhere else in the OME-Zarr file #212

Closed
tcompa opened this issue Nov 17, 2022 · 8 comments
Closed
Labels
flexibility Support more workflow-execution use cases

Comments

@tcompa
Copy link
Collaborator

tcompa commented Nov 17, 2022

This would be the best-case scenario. Let's see whether we can enforce it strictly.

@jluethi
Copy link
Collaborator

jluethi commented Nov 22, 2022

Result from some conversation with people using OME-Zarrs:

  1. We should not be using the OME-XML transitionary metadata. It's not meant for long-term and we don't need to get into that
  2. We should be able to store arbitrary key-value pairs with metadata in the .zattrs. Let's think about where in the .zattrs they should go for each case and, if it's something more general, let's try to contribute it back to the spec

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 22, 2022

We should not be using the OME-XML transitionary metadata.

Are we currently using any of those? (or: is there an easily-accessible list?)

@jluethi
Copy link
Collaborator

jluethi commented Nov 22, 2022

Are we currently using any of those? (or: is there an easily-accessible list?)

No, we aren't using it. It would have been one of the options to store additional metadata, see details here: https://ngff.openmicroscopy.org/latest/#bf2raw

I'm glad we don't need to get into this :)

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 22, 2022

Current items that are in metadata and that we could think about moving to .zattrs:

  1. Lists of plates/wells/images. These are already present in the .zattrs files, but it is convenient to duplicate them in the metadata to avoid a lot of parsing and string styling here and there.
  2. Number of pyramid levels. This is also already available in the .zattrs files. If we accept that tasks should start by reading this information, then we can remove it from the metadata and only use the OME-NGFF information.
  3. Coarsening factor. This is not directly available, but it can be retrieved with a bit of work from the scale transformations. It would be a good candidate (in my view) for being stored in the .zattrs, but I'm not sure I would suggest it as part of the specs - since it would mix with the scale information.
  4. Channel list. This is totally fractal-custom, and it currently has an (ordered!) list like ["A01_C01", ...]. To be updated based on Refactor: How do we refer to channels? #211.

Then there are a few attributes which we use to smoothly propagate some parameters within pairs of associated tasks (server-side ref fractal-analytics-platform/fractal-server#6). This is related to #177, that is, the choice of how to split arguments into two sets: the ones to be filled in automatically by fractal (e.g. component will most likely be here) and the ones to be specified as part of args.

Currently, we use
5. original_paths, to pass the image folders from zarr-creation task to yokogawa_to_zarr
6. replicate_zarr attributes (again, there are some paths of the original file to replicate), which are then used for the MIP task. This one we could in principle refactor, since it is probably a bit redundant.

Unless I've missed something, this is the current list.
Note that some of these parameters already exist in two forms, for (non-)multiplexing cases.

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 22, 2022

Broadly speaking, I think that adding some key-value pairs to some .zattrs could be extremely helpful, e.g., for #211 (and maybe somehow also for #199 or #200).

@jluethi
Copy link
Collaborator

jluethi commented Nov 22, 2022

From a task side, it's very attractive if it only needs a path to an OME-Zarr file and content-parameter (like a model choice for cellpose), but can get the rest of the metadata directly from the OME-Zarr file.

Things like the list of plates, wells (& images?) are not that though, because they are used by Fractal, not by the individual task, right? If it makes sense to have some of that metadata on the Fractal side, that doesn't take away anything from the generalizability of the tasks.

=> Things that are needed for the tasks to run should be read from the OME-Zarr metadata where-ever possible. Additional metadata is ok to be Fractal specific. This goes in hand with your suggestion of separation between component info and args for me, maybe there are some inputs like component that we keep Fractal specific.

Concretely:

  • Pyramid levels may be a nice thing to load from the metadata. It would be nice if tasks can reliably read it from the OME-Zarr. (but not urgent)
  • Coarsening factor: Hmm, I don't think we should use OME-Zarr metadata for things we don't envision contributing back to the spec. Fine if we think that it will make it into the spec. But for this case: Either we continue to use our metadata (default) or we process it reliably from the existing metadata in the OME-Zarr
  • Let's tackle the channel list! That will be an interesting question of where it should go and whether we add information like the light-path specific parts (e.g. A01_C01) or some other metadata as key-value pairs to the OME-Zarr. Strengthens the point from Refactor: How do we refer to channels? #211 again that we may need multiple ways to refer to channel, also depending on what metadata is available in the OME-Zarr.

@tcompa
Copy link
Collaborator Author

tcompa commented Dec 1, 2022

Note that once PR #239 is merge we won't have channel_list in metadata any more.

@tcompa
Copy link
Collaborator Author

tcompa commented Sep 15, 2023

Yesterday we re-discussed this issue, and we identified multiple (current) uses of metadata. Most of them should be deprecated, in favor of more specific sources of information.

  1. Provide read/write access to component list ("read" from fractal-server, "write" from a combination of tasks and fractal-server). We plan to defer this functionality to a new task (ref Introduce import-ome-zarr task #521). TBD later if this has to be a standard task or the "init" phase of a new, more complex, task object.
  2. Store dataset information (e.g. coarsening_xy and the number of pyramid levels). This use is now deferred to Extract attributes from ome-zarr rather than from metadata (whenever possible) #351; and then this use should be deprecated, and these parameters should not belong to Dataset.meta.
  3. Store the dataset history -> this should not pass through meta any more - ref Move Dataset.meta["history"] into Dataset.history fractal-server#838.
  4. Exchange information in a pair of tasks - ref How do task pairs share information? #299. This is a way of using meta as a temporary buffer.

Since each use of metadata is related to a specific issue, I'm closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flexibility Support more workflow-execution use cases
Projects
None yet
Development

No branches or pull requests

2 participants