Refactor: How do we refer to channels? #211

jluethi · 2022-11-17T07:16:15Z

We currently refer to channels via index or via the string we use in parsing (i.e. A01_C01). There are two issues that aren't covered well by this:

Multiplexing: We can have A01_C01 in each cycle => we need further specification to know what we are processing
Parsing different microscopes: Other microscopes like e.g. the MD have a totally different file name structure, don't contain e.g. an A01_C01.

Thus, the broader question: How do we refer to a given channel in a way that works for multiplexing as well?
Channels have the following so far: An identifier string (e.g. A01_C01), an index (e.g. 0, only within an OME-Zarr image) and a name (e.g. DAPI).
There is additional metadata we currently don't have, but which may be important in some use-cases: acquisition number (hard-coded to 0 atm) and some microscope metadata (e.g. wavelength, laser used and such)

The identifier string isn't really relevant to the users, but we could get them used to it. For most use-cases, e.g. segmentation or napari workflows, users would probably related to the channel by its name. For illumination correction, the wavelength is the most important criteria. For registration, it's typically current cycle vs. cycle 0 on a specific wavelength or identifier.

Conclusion: We should have different options to refer to channels.
But we can start with 1-2 for the moment, as long as they are general enough to handle multiplexing. Selection by name is reasonable to me (if we make them unique, see below). Alternatively, selection by channel identifier (A01_C01) and acquisition number is also unique.
=> Can we define a channel selection function that can take different identifiers and as long as they can be mapped to a unique channel?

Question:
Should we enforce channel names to be unique? Would simplify selection of channels by name only and be useful when we make measurements on channel names (because we could use e.g. the channel name as a suffix to identify which intensity was measured) => I'd argue for unique channel names

The text was updated successfully, but these errors were encountered:

jluethi · 2022-11-25T09:53:34Z

We should enforce unique channel names during parsing (e.g. have DAPI_1, DAPI_2, DAPI_3, not just 3x DAPI). Then we can use channel names as a unique identifier. Let's expand on it here: #231

Ways of addressing channels

Then, there are 3 valid ways to specify a channel:

Directly by its index in the image (everything eventually maps back to this)
By its wavelength identifier and, in multiplexing, its acquisition (optional, our default experiments don't have an acquisition metadata, but then the wavelength needs to be unique [this is enforced by the way we parse the data]).
By its channel name: the label in the omero metadata

On 2: let's use the A01_C01 as a wavelength identifier for Yokogawa for the time being. If we add more microscopes, maybe we generalize this to more to a string that is more relevant to the user, but it should always be some identifier string

Where is the channel information stored?

We may want to have a cached place within Fractal (e.g. within the dataset or the metadata) to cache this information for easy access. But the ground truth should be in the OME-Zarr file. If it's not necessary to cache the information for Fractal, even better! :) [also, if we see the data within metadata/dataset as a cache of what's in the OME-Zarr file, it makes it easier to accept an external OME-Zarr file, we may just have to set/refresh some caches on "import"]
The whole omero metadata we are currently using has been labeled "transitional", but it's not clear yet where it will go. For simplicity, I suggest we introduce a new key-value pair into the "omero" > "channels" list and treat this list as the place where we find relevant channel info. We can then start the conversation with the OME-NGFF community on where such channel info should go. It will be easier when we have everything we want in one place (=> all in the "omero" > "channels" list that concerns channel selection).

Is natively stored as part of the OME-Zarr file
Let's store it as an additional key-value pair name wavelength_id in the omero metadata (see below)
Already part of the omero metadata (see below)

Example omero metadata

"omero": {
        "channels": [
            {
                "active": true,
                "coefficient": 1,
                "color": "00FFFF",
                "family": "linear",
                "inverted": false,
                "label": "DAPI",
                "wavelength_id": "A01_C01",
                "window": {
                    "end": 700,
                    "max": 65535,
                    "min": 0,
                    "start": 50
                }
            },
            {
                "active": true,
                "coefficient": 1,
                "color": "FF00FF",
                "family": "linear",
                "inverted": false,
                "label": "nanog",
                "wavelength_id": "A01_C02",
                "window": {
                    "end": 200,
                    "max": 65535,
                    "min": 0,
                    "start": 20
                }
            },
            {
                "active": true,
                "coefficient": 1,
                "color": "FFFF00",
                "family": "linear",
                "inverted": false,
                "label": "Lamin B1",
                "wavelength_id": "A02_C03",
                "window": {
                    "end": 1500,
                    "max": 65535,
                    "min": 0,
                    "start": 50
                }
            }
        ],
        "id": 1,
        "name": "TBD",
        "version": "0.4"
    }

I tested this new omero metadata on the 2x2 example manually and the napari viewer doesn't have an issue with us adding such additional key-value pairs. Thus, the omero metadata can be the centralized place where we put all the metadata for each image.

Edge case handling

Because we use the transitional omero metadata, which is optional, we don't know whether someone who brings their own OME-Zarr file will have this metadata. We can enforce it within Fractal for our standard workflows, but we should be able to handle the case where they are not available and fail gracefully otherwise.

Specifically: Option 1 is always possible, options 2 & 3 are dependent on what metadata is available.

Also, this check should allow us to catch non-unique channel names or missing channel names

If we are able to do input inference (see #200), then this would become very relevant. Even if we only do it at runtime (i.e. each task checks whether the label selection its getting is valid for the OME-Zarr file it should process before it starts running), this would be useful and allow us to gracefully catch channel selection that is either not available or non-unique.

tcompa · 2022-11-25T13:22:44Z

Preliminary to-do list for the next PR(s)

OME-zarr-creation tasks (multiplexing or not)

Upon creating the OMERO metadata for each image, always add all channels to the OMERO metadata.
Always include, for each channel, a wavelength_id attribute (e.g. A01_C01).
Always include, for each channel, a label attribute. If it is provided to the task, use it. If not, default to something related to wavelength_id (e.g A01_C01 or 0_A01_C01).
Explicitly verify that label attributes are globally (AKA per-well) unique, fail otherwise.
EDIT: Explicitly verify that wavelength_id attributes are unique for each image, fail otherwise.
Add tests

Helper functions:

Tasks:

For each task that loops over channels (e.g. illumination_correction), use the helper functions to correctly assign channel parameters to the right array index.
NOT INCLUDED IN THE FIRST VERSION For each task that only acts on a given image (e.g. cellpose_segmentation or the current version of napari_workflows) and need to be channel-aware:
- Expose at least the wavelength_id (EDIT: this is in-place, but we should rename the argument).
- As for the previous point: provide a meaningful name
- Also expose the label field, when appropriate, and use helper function to get the matching channel.

Broader:

Remove channel_list from dataset metadata - ref Each attribute in metadata needs to exist somewhere else in the OME-Zarr file #212.
Clean up docstrings
Update changelog

@jluethi, anything I missed from our discussion?

jluethi · 2022-11-25T13:37:21Z

Looks good!

Upon creating the OMERO metadata for each image, always add all channels to the OMERO metadata.

Aren't we already doing this? Also, we just add the channels contained in that image, not channels from other acquisitions, right?

Less urgent: Create a second helper function that (in its first version)

That's the one to load channels by name in the multiplexing case, correct?

For each task that only acts on a given image (e.g. cellpose_segmentation or the current version of napari_workflows) expose one or both of the label and wavelength_id arguments and use the helper functions to map this to a given array index.

If we build the "run-per-image" functionality first, then using label will not work in the multiplexing case (because labels are unique per plate, so a label will not be available in all the cycles. But it's a good thing to select in the non-multiplexed case. We can think of making this fail for multiplexing with a relevant error message while keeping the general functionality.

tcompa · 2022-12-01T16:25:59Z

I updated the check-list above with the current status of #239

tcompa · 2022-12-01T16:41:26Z

Still missing:

Explicitly verify that wavelength_id attributes are unique for each image, fail otherwise.

jluethi · 2022-12-01T17:12:36Z

Explicitly verify that wavelength_id attributes are unique for each image, fail otherwise.

Let's discuss tomorrow, not sure we should have this as such a hard condition

jluethi · 2022-12-02T08:51:46Z

Explicitly verify that wavelength_id attributes are unique for each image, fail otherwise.

I agree now on this, just the between images in a well case to be discussed :)

tcompa · 2022-12-02T09:36:23Z

TO-DOs:

At the beginning of create-ome-zarr tasks, check validity of allowed_channels (that is, uniqueness of wavelength_id).
In cellpose_segmentation expose both wavelength_id and channel_label (for what is now labeling_channel).
Do not allow both of them simultaneously.
Rename arguments for cellpose task.
Fix docstring parameters.
In cellpose_segmentation, capture failures for channel not found in image. This enables usecase X below.
In napari_workflows_wrapper (which is a per-image task) expose both wavelength_id and channel_label (this is part of the input_specs definition).
Fix docstring with examples of input_specs / output_specs.

In the future, support Usecase X with some other solution which does not create 39 useless jobs.
In the future, include napari-workflows wrapper task at the well level, with its own logic to select channels (from the same cycle or from different cycles).

Usecase X:

Run cellpose task with channel_label="DAPI_1" in a multiplexing well with 40 cycles.
40 jobs are created (one per image).
39 jobs fail gracefully, doing nothing.
1 job acts on DAPI channel of cycle 0.

jluethi added this to the 2) Parsing multiplexed data into OME-Zarr milestone Nov 17, 2022

jluethi moved this to TODO in Fractal Project Management Nov 17, 2022

jluethi added this to Fractal Project Management Nov 17, 2022

jluethi mentioned this issue Nov 17, 2022

Approach for multiplexing workflows: new parallelization levels #199

Closed

tcompa mentioned this issue Nov 22, 2022

Each attribute in metadata needs to exist somewhere else in the OME-Zarr file #212

Closed

This was referenced Nov 24, 2022

Make cellpose task work with multiplexed processing #227

Closed

Overview Multiplexing #228

Open

Multiplexing parameter handling #230

Closed

Enforce unique channel names #231

Closed

jluethi mentioned this issue Nov 25, 2022

Where does omero metadata go? #232

Closed

jluethi added the High Priority Current Priorities & Blocking Issues label Nov 25, 2022

tcompa added a commit that referenced this issue Dec 1, 2022

First commit for lib_channels (ref #211)

88e8779

tcompa mentioned this issue Dec 1, 2022

Channels refactor #239

Merged

tcompa linked a pull request Dec 1, 2022 that will close this issue

Channels refactor #239

Merged

tcompa closed this as completed in #239 Dec 2, 2022

Repository owner moved this from TODO to Done in Fractal Project Management Dec 2, 2022

tcompa mentioned this issue Jun 8, 2023

Add Pydantic model for Channels #386

Closed

jluethi removed this from Fractal Project Management Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: How do we refer to channels? #211

Refactor: How do we refer to channels? #211

jluethi commented Nov 17, 2022

jluethi commented Nov 25, 2022

tcompa commented Nov 25, 2022 •

edited

Loading

jluethi commented Nov 25, 2022

tcompa commented Dec 1, 2022

tcompa commented Dec 1, 2022 •

edited

Loading

jluethi commented Dec 1, 2022

jluethi commented Dec 2, 2022

tcompa commented Dec 2, 2022 •

edited

Loading

Refactor: How do we refer to channels? #211

Refactor: How do we refer to channels? #211

Comments

jluethi commented Nov 17, 2022

jluethi commented Nov 25, 2022

Ways of addressing channels

Where is the channel information stored?

Edge case handling

tcompa commented Nov 25, 2022 • edited Loading

jluethi commented Nov 25, 2022

tcompa commented Dec 1, 2022

tcompa commented Dec 1, 2022 • edited Loading

jluethi commented Dec 1, 2022

jluethi commented Dec 2, 2022

tcompa commented Dec 2, 2022 • edited Loading

tcompa commented Nov 25, 2022 •

edited

Loading

tcompa commented Dec 1, 2022 •

edited

Loading

tcompa commented Dec 2, 2022 •

edited

Loading