Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand cellpose options #650

Merged
merged 26 commits into from
Feb 15, 2024
Merged

Expand cellpose options #650

merged 26 commits into from
Feb 15, 2024

Conversation

jluethi
Copy link
Collaborator

@jluethi jluethi commented Jan 31, 2024

Closes #649
Closes #401

Checklist before merging

  • I added an appropriate entry to CHANGELOG.md

Copy link

github-actions bot commented Jan 31, 2024

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  fractal_tasks_core/tasks
  cellpose_segmentation.py 145-146
  cellpose_transforms.py 181-183, 197-199, 206-210
Project Total  

This report was generated by python-coverage-comment-action

@jluethi
Copy link
Collaborator Author

jluethi commented Jan 31, 2024

I tested this new version of the task on the UZH servers. But the Enum part of Fractal web isn't deployed there yet. And it doesn't solve the actual issue we had with sparse images (see comments in #649 )
=> No immediate hurry on this, I'll go over it again in the coming days hopefully

@tcompa tcompa marked this pull request as draft February 1, 2024 07:46
@jluethi
Copy link
Collaborator Author

jluethi commented Feb 6, 2024

I added a more complex set of normalization options now.

Remaining todos:

  • Figure out why the docstrings for the attributes in my CellposeCustomNormalizer do not get added to the manifest (do I need to specify the model somewhere? It reads the rest of it.
  • Test the task in an actual Fractal server & web instance: Is the manifest valid? Does the interface work?
  • Can I find a way to constrain the lower_percentile & upper_percentile to be min 0.0 and max 100.0 in the pydantic model? Would that be used correctly in the Fractal manifest afterwards?
  • Add tests for the new normalization functionality

@jluethi
Copy link
Collaborator Author

jluethi commented Feb 7, 2024

Testing in local deployment: The task works in a local Fractal server. But the normalize entry is not shown in the interface. It's supposed to come between Flow-Threshold & Anisotropy. Instead, I get a double line there now

Screenshot 2024-02-07 at 09 45 22

And seeing some validator errors in the console:
Screenshot 2024-02-07 at 09 46 35

@jluethi
Copy link
Collaborator Author

jluethi commented Feb 7, 2024

Maybe it's an issue with having Optional floats with default value None?

@jluethi
Copy link
Collaborator Author

jluethi commented Feb 7, 2024

I updated the manifest creation scripts to include my models, but things still show up weirdly in the interface.

I now looked at the manifest to try and see if the issue could be there.

Other external pydantic models are referenced like this:

          "channel": {
            "$ref": "#/definitions/ChannelInputModel",
            "title": "Channel",
            "description": "Primary channel for segmentation; requires either `wavelength_id` (e.g. `A01_C01`) or `label` (e.g. `DAPI`)."
          },

But my new entry has a setup like that:

          "normalize": {
            "title": "Normalize",
            "default": {
              "default_normalize": true,
              "lower_percentile": null,
              "upper_percentile": null,
              "lower_bound": null,
              "upper_bound": null
            },
            "allOf": [
              {
                "$ref": "#/definitions/CellposeCustomNormalizer"
              }
            ],
            "description": "By default, data is normalized so 0.0=1st percentile and 1.0=99th percentile of image intensities in each channel. This automatic normalization can lead to issues when the image to be segmented is very sparse. You can turn off the automated rescaling and either provide your own rescaling percentiles or fixed rescaling upper and lower bound integers"
          },

I can see this making sense. I don't fully understand the "allOf" references, but maybe that's how this is supposed to be specified?

But I still get the following validator error in the console:

Screenshot 2024-02-07 at 17 44 38

@tcompa Do you have any idea how to debug this to figure out why it fails?

@tcompa
Copy link
Collaborator

tcompa commented Feb 8, 2024

A first sanity check:

This is the current schema:

{
  "title": "CellposeSegmentation",
  "type": "object",
  "properties": {
    "input_paths": {
      "title": "Input Paths",
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "List of input paths where the image data is stored as OME-Zarrs. Should point to the parent folder containing one or many OME-Zarr files, not the actual OME-Zarr file. Example: `[\"/some/path/\"]`. This task only supports a single input path. (standard argument for Fractal tasks, managed by Fractal server)."
    },
    "output_path": {
      "title": "Output Path",
      "type": "string",
      "description": "This parameter is not used by this task. (standard argument for Fractal tasks, managed by Fractal server)."
    },
    "component": {
      "title": "Component",
      "type": "string",
      "description": "Path to the OME-Zarr image in the OME-Zarr plate that is processed. Example: `\"some_plate.zarr/B/03/0\"`. (standard argument for Fractal tasks, managed by Fractal server)."
    },
    "metadata": {
      "title": "Metadata",
      "type": "object",
      "description": "This parameter is not used by this task. (standard argument for Fractal tasks, managed by Fractal server)."
    },
    "level": {
      "title": "Level",
      "type": "integer",
      "description": "Pyramid level of the image to be segmented. Choose `0` to process at full resolution."
    },
    "channel": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel",
      "description": "Primary channel for segmentation; requires either `wavelength_id` (e.g. `A01_C01`) or `label` (e.g. `DAPI`)."
    },
    "channel2": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel2",
      "description": "Second channel for segmentation (in the same format as `channel`). If specified, cellpose runs in dual channel mode. For dual channel segmentation of cells, the first channel should contain the membrane marker, the second channel should contain the nuclear marker."
    },
    "input_ROI_table": {
      "title": "Input Roi Table",
      "default": "FOV_ROI_table",
      "type": "string",
      "description": "Name of the ROI table over which the task loops to apply Cellpose segmentation. Examples: `FOV_ROI_table` => loop over the field of views, `organoid_ROI_table` => loop over the organoid ROI table (generated by another task), `well_ROI_table` => process the whole well as one image."
    },
    "output_ROI_table": {
      "title": "Output Roi Table",
      "type": "string",
      "description": "If provided, a ROI table with that name is created, which will contain the bounding boxes of the newly segmented labels. ROI tables should have `ROI` in their name."
    },
    "output_label_name": {
      "title": "Output Label Name",
      "type": "string",
      "description": "Name of the output label image (e.g. `\"organoids\"`)."
    },
    "use_masks": {
      "title": "Use Masks",
      "default": true,
      "type": "boolean",
      "description": "If `True`, try to use masked loading and fall back to `use_masks=False` if the ROI table is not suitable. Masked loading is relevant when only a subset of the bounding box should actually be processed (e.g. running within `organoid_ROI_table`)."
    },
    "relabeling": {
      "title": "Relabeling",
      "default": true,
      "type": "boolean",
      "description": "If `True`, apply relabeling so that label values are unique for all objects in the well."
    },
    "diameter_level0": {
      "title": "Diameter Level0",
      "default": 30.0,
      "type": "number",
      "description": "Expected diameter of the objects that should be segmented in pixels at level 0. Initial diameter is rescaled using the `level` that was selected. The rescaled value is passed as the diameter to the `CellposeModel.eval` method."
    },
    "model_type": {
      "title": "Model Type",
      "default": "cyto2",
      "enum": [
        "cyto",
        "nuclei",
        "tissuenet",
        "livecell",
        "cyto2",
        "general",
        "CP",
        "CPx",
        "TN1",
        "TN2",
        "TN3",
        "LC1",
        "LC2",
        "LC3",
        "LC4"
      ],
      "type": "string",
      "description": "Parameter of `CellposeModel` class. Defines which model should be used. Typical choices are `nuclei`, `cyto`, `cyto2`, etc."
    },
    "pretrained_model": {
      "title": "Pretrained Model",
      "type": "string",
      "description": "Parameter of `CellposeModel` class (takes precedence over `model_type`). Allows you to specify the path of a custom trained cellpose model."
    },
    "cellprob_threshold": {
      "title": "Cellprob Threshold",
      "default": 0.0,
      "type": "number",
      "description": "Parameter of `CellposeModel.eval` method. Valid values between -6 to 6. From Cellpose documentation: \"Decrease this threshold if cellpose is not returning as many ROIs as you\u2019d expect. Similarly, increase this threshold if cellpose is returning too ROIs particularly from dim areas.\""
    },
    "flow_threshold": {
      "title": "Flow Threshold",
      "default": 0.4,
      "type": "number",
      "description": "Parameter of `CellposeModel.eval` method. Valid values between 0.0 and 1.0. From Cellpose documentation: \"Increase this threshold if cellpose is not returning as many ROIs as you\u2019d expect. Similarly, decrease this threshold if cellpose is returning too many ill-shaped ROIs.\""
    },
    "normalize": {
      "title": "Normalize",
      "default": {
        "default_normalize": true,
        "lower_percentile": null,
        "upper_percentile": null,
        "lower_bound": null,
        "upper_bound": null
      },
      "allOf": [
        {
          "$ref": "#/definitions/CellposeCustomNormalizer"
        }
      ],
      "description": "By default, data is normalized so 0.0=1st percentile and 1.0=99th percentile of image intensities in each channel. This automatic normalization can lead to issues when the image to be segmented is very sparse. You can turn off the automated rescaling and either provide your own rescaling percentiles or fixed rescaling upper and lower bound integers"
    },
    "anisotropy": {
      "title": "Anisotropy",
      "type": "number",
      "description": "Ratio of the pixel sizes along Z and XY axis (ignored if the image is not three-dimensional). If `None`, it is inferred from the OME-NGFF metadata."
    },
    "min_size": {
      "title": "Min Size",
      "default": 15,
      "type": "integer",
      "description": "Parameter of `CellposeModel` class. Minimum size of the segmented objects (in pixels). Use `-1` to turn off the size filter."
    },
    "augment": {
      "title": "Augment",
      "default": false,
      "type": "boolean",
      "description": "Parameter of `CellposeModel` class. Whether to use cellpose augmentation to tile images with overlap."
    },
    "net_avg": {
      "title": "Net Avg",
      "default": false,
      "type": "boolean",
      "description": "Parameter of `CellposeModel` class. Whether to use cellpose net averaging to run the 4 built-in networks (useful for `nuclei`, `cyto` and `cyto2`, not sure it works for the others)."
    },
    "use_gpu": {
      "title": "Use Gpu",
      "default": true,
      "type": "boolean",
      "description": "If `False`, always use the CPU; if `True`, use the GPU if possible (as defined in `cellpose.core.use_gpu()`) and fall-back to the CPU otherwise."
    },
    "batch_size": {
      "title": "Batch Size",
      "default": 8,
      "type": "integer",
      "description": "number of 224x224 patches to run simultaneously on the GPU (can make smaller or bigger depending on GPU memory usage)"
    },
    "invert": {
      "title": "Invert",
      "default": false,
      "type": "boolean",
      "description": "invert image pixel intensity before running network (if True, image is also normalized)"
    },
    "tile": {
      "title": "Tile",
      "default": true,
      "type": "boolean",
      "description": "tiles image to ensure GPU/CPU memory usage limited (recommended)"
    },
    "tile_overlap": {
      "title": "Tile Overlap",
      "default": 0.1,
      "type": "number",
      "description": "fraction of overlap of tiles when computing flows"
    },
    "resample": {
      "title": "Resample",
      "default": true,
      "type": "boolean",
      "description": "run dynamics at original image size (will be slower but create more accurate boundaries)"
    },
    "interp": {
      "title": "Interp",
      "default": true,
      "type": "boolean",
      "description": "interpolate during 2D dynamics (not available in 3D) (in previous versions it was False, now it defaults to True)"
    },
    "stitch_threshold": {
      "title": "Stitch Threshold",
      "default": 0.0,
      "type": "number",
      "description": "if stitch_threshold>0.0 and not do_3D and equal image sizes, masks are stitched in 3D to return volume segmentation"
    },
    "overwrite": {
      "title": "Overwrite",
      "default": true,
      "type": "boolean",
      "description": "If `True`, overwrite the task output."
    }
  },
  "required": [
    "input_paths",
    "output_path",
    "component",
    "metadata",
    "level",
    "channel"
  ],
  "additionalProperties": false,
  "definitions": {
    "ChannelInputModel": {
      "title": "ChannelInputModel",
      "description": "A channel which is specified by either `wavelength_id` or `label`.",
      "type": "object",
      "properties": {
        "wavelength_id": {
          "title": "Wavelength Id",
          "type": "string",
          "description": "Unique ID for the channel wavelength, e.g. `A01_C01`."
        },
        "label": {
          "title": "Label",
          "type": "string",
          "description": "Name of the channel."
        }
      }
    },
    "CellposeCustomNormalizer": {
      "title": "CellposeCustomNormalizer",
      "description": "Validator to handle different normalization scenarios for Cellpose models",
      "type": "object",
      "properties": {
        "default_normalize": {
          "title": "Default Normalize",
          "default": true,
          "type": "boolean",
          "description": "Whether to use the default Cellpose normalization approach (rescaling the image between the 1st and 99th percentile)"
        },
        "lower_percentile": {
          "title": "Lower Percentile",
          "type": "number",
          "description": "Specify a custom lower-bound percentile for rescaling as a float value between 0 and 100. Set to 1 to run the same as default). You can only specify percentils or bounds, not both."
        },
        "upper_percentile": {
          "title": "Upper Percentile",
          "type": "number",
          "description": "Specify a custom upper-bound percentile for rescaling as a float value between 0 and 100. Set to 99 to run the same as default, set to e.g. 99.99 if the default rescaling was too harsh. You can only specify percentils or bounds, not both."
        },
        "lower_bound": {
          "title": "Lower Bound",
          "type": "integer",
          "description": "Explicit lower bound value to rescale the image at. Needs to be an integer, e.g. 100. You can only specify percentils or bounds, not both."
        },
        "upper_bound": {
          "title": "Upper Bound",
          "type": "integer",
          "description": "Explicit upper bound value to rescale the image at. Needs to be an integer, e.g. 2000 You can only specify percentils or bounds, not both."
        }
      }
    }
  }
}

This is how it's rendered in fractal-web
image

This is how it's rendered in https://rjsf-team.github.io/react-jsonschema-form/:
image


Let's note again the difference between two entries:

    "channel2": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel2",
      "description": "Second channel for segmentation (in the same format as `channel`). If specified, cellpose runs in dual channel mode. For dual channel segmentation of cells, the first channel should contain the membrane marker, the second channel should contain the nuclear marker."
    },

    "normalize": {
      "title": "Normalize",
      "default": {
        "default_normalize": true,
        "lower_percentile": null,
        "upper_percentile": null,
        "lower_bound": null,
        "upper_bound": null
      },
      "allOf": [
        {
          "$ref": "#/definitions/CellposeCustomNormalizer"
        }
      ],
      "description": "By default, data is normalized so 0.0=1st percentile and 1.0=99th percentile of image intensities in each channel. This automatic normalization can lead to issues when the image to be segmented is very sparse. You can turn off the automated rescaling and either provide your own rescaling percentiles or fixed rescaling upper and lower bound integers"
    },

coming from

def cellpose_segmentation(
    *,
    ...
    channel2: Optional[ChannelInputModel] = None,
    ...
    normalize: CellposeCustomNormalizer = CellposeCustomNormalizer(
        default_normalize=True
    ),
...

My first guess is that creating an instance of the normalizer and using it as default for the argument cannot be transformed into a nice default within the JSON schema. In general, it'd be better to say normalize: Optional[CellposeCustomNormalizer] = None, and then creating a default instances from within the task. I haven't tested this yet, I'll dig deeper later.

@tcompa
Copy link
Collaborator

tcompa commented Feb 8, 2024

As of a9185ba, this is the new rendering:
image

@jluethi
Copy link
Collaborator Author

jluethi commented Feb 8, 2024

My first guess is that creating an instance of the normalizer and using it as default for the argument cannot be transformed into a nice default within the JSON schema. In general, it'd be better to say normalize: Optional[CellposeCustomNormalizer] = None, and then creating a default instances from within the task. I haven't tested this yet, I'll dig deeper later.

Ah, that makes sense! And it gets a default object (with default_normalize = True) in your screenshot above, so that's the behavior I was trying to ensure with setting a model. Thanks for that fix!

I'll test a few more things with it now! :)

@jluethi
Copy link
Collaborator Author

jluethi commented Feb 8, 2024

Ok, it works as expected for me now in the interface as well! :) Thanks @tcompa !

On the remaining list:

  • Test the task in an actual Fractal server & web instance: Is the manifest valid? Does the interface work? => Done
  • Can I find a way to constrain the lower_percentile & upper_percentile to be min 0.0 and max 100.0 in the pydantic model? Would that be used correctly in the Fractal manifest afterwards? => used Pydantic Field constraints, mostly works, will report a minor issue on the Fractal web side later
  • Add tests for the new normalization functionality => WIP

@jluethi
Copy link
Collaborator Author

jluethi commented Feb 8, 2024

The consequence of this is that when using fractal-web, we can never run the task with normalize=None

I can see how this is a general problem. For this task, that's actually fine.

As far as we understand (debugged with @mfranzon), the CellposeCustomNormalizer.default_normalize attribute is redundant. If any of the other four values is set, this is automatically false. But there is no use case where we may want to simultaneously set CellposeCustomNormalizer.default_normalize and e.g. CellposeCustomNormalizer.lower_bound (and that's why the validator has to check this condition, and raise an error if appropriate).

Unfortunately, there is one case:
One can set CellposeCustomNormalizer.default_normalize=False and set all the other elements to None. That is supported and has a specific behavior in Cellpose (though not a very typically useful one in my tests)

@tcompa
Copy link
Collaborator

tcompa commented Feb 8, 2024

As of our call:

There are mainly four branches to be covered:

  1. Do not apply any normalization at all;
  2. Apply the default CP normalization;
  3. Apply a user-provided bound-based normalization;
  4. Apply a user-provided percentiles-based normalization.

And the expected default behavior, when there is no user input, is case number 2 (apply default CP normalization).

@tcompa
Copy link
Collaborator

tcompa commented Feb 8, 2024

Here is a naive, untested, example:

from typing import Optional, Literal
from pydantic import BaseModel, Field

class CellposeCustomNormalizer(BaseModel):
    type: Optional[Literal["custom", "no_normalization"]] = None
    lower_percentile: Optional[float] = Field(None, ge=0, le=100)
    upper_percentile: Optional[float] = Field(None, ge=0, le=100)
    lower_bound: Optional[int] = None
    upper_bound: Optional[int] = None

    @property
    def apply_normalization(self) -> bool:
        # returns False when we do not want to apply any normalization
        # returns True when _some_ normalization has to be applied (CP-default, bounds, or percentiles)
        return self.type != "no_normalization"
        
    def validator_TBD(self):
        # This should enforce some relevant constraints, like:
        # if type=None, then no other attribute can be set
        # if type="no_normalization", then no other attributes can be set
        # if type="custom", then either the bounds or the percentiles attributes must be set, but not both
        pass

def task(
    ...
    normalize: Optional[CellposeCustomNormalizer] = None,
    ...):

In this version, we'd have

  1. Do not apply any normalization at all; -> type="no_normalization"
  2. Apply the default CP normalization; -> type=None (i.e. no input)
  3. Apply a user-provided bound-based normalization; -> type="custom"
  4. Apply a user-provided percentiles-based normalization. -> type="custom"

Something I'm not happy about is that I think that if you modify the default once then you may not be able to get back to the original situation. That is, I don't think that Optional[Literal] translates into an enum with three options (None, custom and no_normalization). I'll iterate on this later.

This preliminary example is in the direction of coming up with a model with no default attributes - which would then "fix" the JSON-schema glitch.
The other option is to actually provide a default in the function call signature, and make sure that it's transformed into a JSON Schema that is easily understood from fractal-web.
A third option is to extend fractal-web to handle this "new" kind of JSON schema.

@jluethi
Copy link
Collaborator Author

jluethi commented Feb 8, 2024

Very interesting thinking & the direction looks promising, thanks Tommaso!

The other thing to consider: We want to make sure users understand what they are running by default. e.g. "if I don't modify anything here, it will apply default Cellpose normalization"

@tcompa
Copy link
Collaborator

tcompa commented Feb 9, 2024

I have a better-defined proposal (based on the previous one) which consists in using the model and call signature below. I'll document in two other comments the Python behavior and the JSON-Schema behavior.

Call-signature (simplified) and task function

@validate_arguments
def cellpose_segmentation(
    *,
    # Fractal arguments
    channel: ChannelInputModel,
    channel2: Optional[ChannelInputModel] = None,
    normalize: Optional[CellposeCustomNormalizer] = None,
    normalizeV2: Optional[CellposeCustomNormalizerV2] = None,
) -> dict[str, Any]:

    if normalizeV2 is None:
        normalizeV2 = CellposeCustomNormalizerV2()

Model for argument

class CellposeCustomNormalizerV2(BaseModel):
    """
    Validator to handle different normalization scenarios for Cellpose models

    If `type` is unset or set to `default`, Cellpose default normalization is
    used and no other parameters can be specified.
    If `type` is set to `no_normalization`, no normalization is used and no
    other parameters can be specified.
    If `type` is set to `custom`, either percentiles or explicit integer
    bounds can be applied.

    Attributes:
        type:
            One of `default` (Cellpose default normalization), `custom`
            (using the other custom parameters) or `no_normalization`.
        lower_percentile: Specify a custom lower-bound percentile for rescaling
            as a float value between 0 and 100. Set to 1 to run the same as
            default). You can only specify percentiles or bounds, not both.
        upper_percentile: Specify a custom upper-bound percentile for rescaling
            as a float value between 0 and 100. Set to 99 to run the same as
            default, set to e.g. 99.99 if the default rescaling was too harsh.
            You can only specify percentiles or bounds, not both.
        lower_bound: Explicit lower bound value to rescale the image at.
            Needs to be an integer, e.g. 100.
            You can only specify percentiles or bounds, not both.
        upper_bound: Explicit upper bound value to rescale the image at.
            Needs to be an integer, e.g. 2000.
            You can only specify percentiles or bounds, not both.
    """
    type: Optional[Literal["default", "custom", "no_normalization"]] = None
    lower_percentile: Optional[float] = Field(None, ge=0, le=100)
    upper_percentile: Optional[float] = Field(None, ge=0, le=100)
    lower_bound: Optional[int] = None
    upper_bound: Optional[int] = None


    @root_validator
    def validate_conditions(cls, values):
        
        # Replace type=None with type="default"
        type = values.get("type")
        if type is None:
            type = "default"
            values["type"] = type

        # Extract values of custom parameters
        lower_percentile = values.get("lower_percentile")
        upper_percentile = values.get("upper_percentile")
        lower_bound = values.get("lower_bound")
        upper_bound = values.get("upper_bound")

        # Verify that custom parameters are only provided when type="custom"
        if type != "custom":
            if lower_percentile is not None:
                raise ValueError(
                    f"Type='{type}' but {lower_percentile=}. "
                    "Hint: set type='custom'."
                    )
            if upper_percentile is not None:
                raise ValueError(
                    f"Type='{type}' but {upper_percentile=}. "
                    "Hint: set type='custom'."
                    )
            if lower_bound is not None:
                raise ValueError(
                    f"Type='{type}' but {lower_bound=}. "
                    "Hint: set type='custom'."
                )
            if upper_bound is not None:
                raise ValueError(
                    f"Type='{type}' but {upper_bound=}. "
                    "Hint: set type='custom'."
                )

        # The only valid options are:
        # 1. Both percentiles are set and both bounds are unset
        # 2. Both bounds are set and both percentiles are unset
        are_percentiles_set = (
            lower_percentile is not None,
            upper_percentile is not None,
            )
        are_bounds_set = (
            lower_bound is not None,
            upper_bound is not None,
            )
        if len(set(are_percentiles_set)) != 1:
            raise ValueError(
                "Both lower_percentile and upper_percentile must be set "
                "together."
                )
        if len(set(are_bounds_set)) != 1:
            raise ValueError(
                "Both lower_bound and upper_bound must be set together"
                )
        if lower_percentile is not None and lower_bound is not None:
            raise ValueError(
                "You cannot set both explicit bounds and percentile bounds "
                "at the same time. Hint: use only one of the two options."
            )

        return values

@tcompa
Copy link
Collaborator

tcompa commented Feb 9, 2024

Pydantic-model behavior:

>>> from fractal_tasks_core.tasks.cellpose_transforms import CellposeCustomNormalizerV2

>>> CellposeCustomNormalizerV2()
CellposeCustomNormalizerV2(type='default', lower_percentile=None, upper_percentile=None, lower_bound=None, upper_bound=None)

>>> CellposeCustomNormalizerV2(type="default")
CellposeCustomNormalizerV2(type='default', lower_percentile=None, upper_percentile=None, lower_bound=None, upper_bound=None)

>>> CellposeCustomNormalizerV2(lower_bound=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Type='default' but lower_bound=1. Hint: set type='custom'. (type=value_error)


>>> CellposeCustomNormalizerV2(type="default", lower_bound=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Type='default' but lower_bound=1. Hint: set type='custom'. (type=value_error)


>>> CellposeCustomNormalizerV2(type="no_normalization")
CellposeCustomNormalizerV2(type='no_normalization', lower_percentile=None, upper_percentile=None, lower_bound=None, upper_bound=None)


>>> CellposeCustomNormalizerV2(type="no_normalization", lower_bound=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Type='no_normalization' but lower_bound=1. Hint: set type='custom'. (type=value_error)


>>> CellposeCustomNormalizerV2(type="custom")
CellposeCustomNormalizerV2(type='custom', lower_percentile=None, upper_percentile=None, lower_bound=None, upper_bound=None)

>>> CellposeCustomNormalizerV2(type="custom", lower_percentile=2, upper_percentile=90)
CellposeCustomNormalizerV2(type='custom', lower_percentile=2.0, upper_percentile=90.0, lower_bound=None, upper_bound=None)

>>> CellposeCustomNormalizerV2(type="custom", lower_bound=1, upper_bound=10)
CellposeCustomNormalizerV2(type='custom', lower_percentile=None, upper_percentile=None, lower_bound=1, upper_bound=10)


>>> CellposeCustomNormalizerV2(type="custom", lower_percentile=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Both lower_percentile and upper_percentile must be set together. (type=value_error)

>>> CellposeCustomNormalizerV2(type="custom", lower_bound=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Both lower_bound and upper_bound must be set together (type=value_error)

>>> CellposeCustomNormalizerV2(type="custom", lower_percentile=2, upper_percentile=90, lower_bound=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Both lower_bound and upper_bound must be set together (type=value_error)

>>> CellposeCustomNormalizerV2(type="custom", lower_percentile=2, upper_percentile=90, lower_bound=2, upper_bound=123)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  You cannot set both explicit bounds and percentile bounds at the same time. Hint: use only one of the two options. (type=value_error)

@tcompa
Copy link
Collaborator

tcompa commented Feb 9, 2024

JSON Schema (simplified call signature):

{
  "title": "CellposeSegmentation",
  "type": "object",
  "properties": {
    "channel": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel",
      "description": "Primary channel for segmentation; requires either `wavelength_id` (e.g. `A01_C01`) or `label` (e.g. `DAPI`)."
    },
    "channel2": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel2",
      "description": "Second channel for segmentation (in the same format as `channel`). If specified, cellpose runs in dual channel mode. For dual channel segmentation of cells, the first channel should contain the membrane marker, the second channel should contain the nuclear marker."
    },
    "normalize": {
      "$ref": "#/definitions/CellposeCustomNormalizer",
      "title": "Normalize",
      "description": "By default, data is normalized so 0.0=1st percentile and 1.0=99th percentile of image intensities in each channel. This automatic normalization can lead to issues when the image to be segmented is very sparse. You can turn off the automated rescaling and either provide your own rescaling percentiles or fixed rescaling upper and lower bound integers"
    },
    "normalizeV2": {
      "$ref": "#/definitions/CellposeCustomNormalizerV2",
      "title": "Normalizev2",
      "description": "By default, data is normalized so 0.0=1st percentile and"
    }
  },
  "required": [
    "channel"
  ],
  "additionalProperties": false,
  "definitions": {
    "ChannelInputModel": {
      "title": "ChannelInputModel",
      "description": "A channel which is specified by either `wavelength_id` or `label`.",
      "type": "object",
      "properties": {
        "wavelength_id": {
          "title": "Wavelength Id",
          "type": "string",
          "description": "Unique ID for the channel wavelength, e.g. `A01_C01`."
        },
        "label": {
          "title": "Label",
          "type": "string",
          "description": "Name of the channel."
        }
      }
    },
    "CellposeCustomNormalizer": {
      "title": "CellposeCustomNormalizer",
      "description": "Validator to handle different normalization scenarios for Cellpose models",
      "type": "object",
      "properties": {
        "default_normalize": {
          "title": "Default Normalize",
          "default": true,
          "type": "boolean",
          "description": "Whether to use the default Cellpose normalization approach (rescaling the image between the 1st and 99th percentile)"
        },
        "lower_percentile": {
          "title": "Lower Percentile",
          "type": "number",
          "description": "Specify a custom lower-bound percentile for rescaling as a float value between 0 and 100. Set to 1 to run the same as default). You can only specify percentils or bounds, not both."
        },
        "upper_percentile": {
          "title": "Upper Percentile",
          "type": "number",
          "description": "Specify a custom upper-bound percentile for rescaling as a float value between 0 and 100. Set to 99 to run the same as default, set to e.g. 99.99 if the default rescaling was too harsh. You can only specify percentils or bounds, not both."
        },
        "lower_bound": {
          "title": "Lower Bound",
          "type": "integer",
          "description": "Explicit lower bound value to rescale the image at. Needs to be an integer, e.g. 100. You can only specify percentils or bounds, not both."
        },
        "upper_bound": {
          "title": "Upper Bound",
          "type": "integer",
          "description": "Explicit upper bound value to rescale the image at. Needs to be an integer, e.g. 2000 You can only specify percentils or bounds, not both."
        }
      }
    },
    "CellposeCustomNormalizerV2": {
      "title": "CellposeCustomNormalizerV2",
      "description": "Validator to handle different normalization scenarios for Cellpose models",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "enum": [
            "default",
            "custom",
            "no_normalization"
          ],
          "type": "string",
          "description": "One of `default` (Cellpose default normalization), `custom` (using the other custom parameters) or `no_normalization`."
        },
        "lower_percentile": {
          "title": "Lower Percentile",
          "minimum": 0,
          "maximum": 100,
          "type": "number",
          "description": "Specify a custom lower-bound percentile for rescaling as a float value between 0 and 100. Set to 1 to run the same as default). You can only specify percentiles or bounds, not both."
        },
        "upper_percentile": {
          "title": "Upper Percentile",
          "minimum": 0,
          "maximum": 100,
          "type": "number",
          "description": "Specify a custom upper-bound percentile for rescaling as a float value between 0 and 100. Set to 99 to run the same as default, set to e.g. 99.99 if the default rescaling was too harsh. You can only specify percentiles or bounds, not both."
        },
        "lower_bound": {
          "title": "Lower Bound",
          "type": "integer",
          "description": "Explicit lower bound value to rescale the image at. Needs to be an integer, e.g. 100. You can only specify percentiles or bounds, not both."
        },
        "upper_bound": {
          "title": "Upper Bound",
          "type": "integer",
          "description": "Explicit upper bound value to rescale the image at. Needs to be an integer, e.g. 2000. You can only specify percentiles or bounds, not both."
        }
      }
    }
  }
}

Behavior in fractal-web sandbox
image

Behavior in react-jsonschema-form

image

image

image

@tcompa
Copy link
Collaborator

tcompa commented Feb 9, 2024

Something I'm not happy about is that I think that if you modify the default once then you may not be able to get back to the original situation. That is, I don't think that Optional[Literal] translates into an enum with three options (None, custom and no_normalization). I'll iterate on this later.

This is still present, but mitigated by the fact that now there is "default" entry in the enum, and then one could easily decide to use that one and provide a non-null value for normalize={"type": "default"}. In my view that's a good-enough trade-off.

@tcompa
Copy link
Collaborator

tcompa commented Feb 9, 2024

@jluethi I'll let you evaluate if this option fits with the expectation. If so, in principle you should just replace the current Pydantic model with the new one, and apply minor changes to the parts of the task where this object is used.

@tcompa
Copy link
Collaborator

tcompa commented Feb 9, 2024

@jluethi
Copy link
Collaborator Author

jluethi commented Feb 9, 2024

I like the logic of the new Enum with the 3 options. And the checks look good.
=> I like the new logic, thanks for revisiting it!

One detail that I'm not 100% happy with yet:

This is still present, but mitigated by the fact that now there is "default" entry in the enum, and then one could easily decide to use that one and provide a non-null value for normalize={"type": "default"}. In my view that's a good-enough trade-off.

The thing that confuses me somewhat: We now avoid setting defaults twice:

  1. The parameter of the Cellpose model doesn't have a default, but we set it in an if normalizeV2 is None: normalizeV2 = CellposeCustomNormalizerV2() check
  2. The type doesn't have a default, and we set it the same way (in such an if check).

Now this achieves the desired behavior of running default normalization if the user doesn't set anything. But it's even less transparent than the one before (where we also discussed that the first default = None => if check in the function) was suboptimal.
If such a setup is necessary to do this without revisiting the whole pydantic schemas for this, I'm ok with that and find it an acceptable trade-off.
But do we gain anything from

type: Optional[Literal["default", "custom", "no_normalization"]] = None

vs.

type: Literal["default", "custom", "no_normalization"] = "default"

?

Because it worked fine (as far as I could judge) with

default_normalize: bool = True

And that has the benefit that the user interface displays what default is being used in the web. Is there a reason that type couldn't be a literal with a default (instead of an optional literal)?

@jluethi
Copy link
Collaborator Author

jluethi commented Feb 9, 2024

Also, neat new validation approaches!

@tcompa
Copy link
Collaborator

tcompa commented Feb 9, 2024

Briefly

And that has the benefit that the user interface displays what default is being used in the web. Is there a reason that type couldn't be a literal with a default (instead of an optional literal)?

I fully agree, the best (-> most transparent) option is

type: Literal["default", "custom", "no_normalization"] = "default"

My previous understanding, however, was that this immediately leads to the anyOf issue as in your first version (ref pydantic/pydantic#2592), and then it was easier to avoid it. I'm now trying to verify this assumption, and it turns out it's likely maybe wrong. More on this shortly. If it's true that we hit no anyOf issue, then I fully agree with switching to the more explicit version.

@tcompa
Copy link
Collaborator

tcompa commented Feb 9, 2024

Provisional information: the allOf issue only shows up in the presence of the default:

from pydantic.decorator import ValidatedFunction
from typing import Literal
from devtools import debug
from fractal_tasks_core.dev.lib_args_schemas import _validate_function_signature
from fractal_tasks_core.dev.lib_args_schemas import _remove_args_kwargs_properties
from fractal_tasks_core.dev.lib_args_schemas import _remove_pydantic_internals
from typing import Literal, Optional

from pydantic import BaseModel


class Normalizer(BaseModel):
    type: Literal["A", "B"] = "A"


def task_function_1(normalizer: Normalizer):
    pass

def task_function_2(normalizer: Optional[Normalizer] = None):
    pass

def task_function_3(normalizer: Normalizer = Normalizer()):
    pass


# GENERATE JSON SCHEMA
for task_function in [task_function_1, task_function_2, task_function_3]:
    _validate_function_signature(task_function)
    vf = ValidatedFunction(task_function, config=None)
    schema = vf.model.schema()
    schema = _remove_args_kwargs_properties(schema)
    schema = _remove_pydantic_internals(schema)
    debug(schema)

Output:

    schema: {
        'title': 'TaskFunction1',
        'type': 'object',
        'properties': {
            'normalizer': {
                '$ref': '#/definitions/Normalizer',    # <--------- this is OK
            },
        },
        'required': [
            'normalizer',
        ],
        'additionalProperties': False,
        'definitions': {
            'Normalizer': {
                'title': 'Normalizer',
                'type': 'object',
                'properties': {
                    'type': {
                        'title': 'Type',
                        'default': 'A',
                        'enum': ['A', 'B'],
                        'type': 'string',
                    },
                },
            },
        },
    } (dict) len=6

    schema: {
        'title': 'TaskFunction2',
        'type': 'object',
        'properties': {
            'normalizer': {
                '$ref': '#/definitions/Normalizer',    # <--------- this is OK
            },
        },
        'additionalProperties': False,
        'definitions': {
            'Normalizer': {
                'title': 'Normalizer',
                'type': 'object',
                'properties': {
                    'type': {
                        'title': 'Type',
                        'default': 'A',
                        'enum': ['A', 'B'],
                        'type': 'string',
                    },
                },
            },
        },
    } (dict) len=5

    schema: {
        'title': 'TaskFunction3',
        'type': 'object',
        'properties': {
            'normalizer': {
                'title': 'Normalizer',
                'default': {
                    'type': 'A',
                },
                'allOf': [
                    {
                        '$ref': '#/definitions/Normalizer',      # <--------- this is NOT OK
                    },
                ],
            },
        },
        'additionalProperties': False,
        'definitions': {
            'Normalizer': {
                'title': 'Normalizer',
                'type': 'object',
                'properties': {
                    'type': {
                        'title': 'Type',
                        'default': 'A',
                        'enum': ['A', 'B'],
                        'type': 'string',
                    },
                },
            },
        },
    } (dict) len=5

@tcompa
Copy link
Collaborator

tcompa commented Feb 9, 2024

As discussed with @jluethi: let's go with the option mocked as

from pydantic import BaseModel


class Normalizer(BaseModel):
    type: Literal["A", "B"] = "A"

def task_function_2(normalizer: Optional[Normalizer] = None):
    if normalizer is None:
        normalizer = Normalizer()

with JSON Schema

{
  "title": "TaskFunction2",
  "type": "object",
  "properties": {
    "normalizer": {
      "$ref": "#/definitions/Normalizer"
    }
  },
  "additionalProperties": false,
  "definitions": {
    "Normalizer": {
      "title": "Normalizer",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "default": "A",
          "enum": [
            "A",
            "B"
          ],
          "type": "string"
        }
      }
    }
  }
}

The glitch described in #650 is still present:

which is problematic for at least two reasons:

It's highly counterintuitive, since the call signature says normalize: Optional[CellposeCustomNormalizer] = None
It may be plain wrong, in case we slightly modify the content of that if branch, e.g. to change some default.

We accept this trade-off for the moment, and perhaps we'll move towards option 3 after some fractal-web updates (fractal-analytics-platform/fractal-web#413, fractal-analytics-platform/fractal-web#412).

@tcompa
Copy link
Collaborator

tcompa commented Feb 15, 2024

Option 3 from #650 (comment) is now also supported, as of fractal-web 0.9.2 (thanks @zonia3000!), therefore let's go with this one right away, since it's the "best" option (i.e. it doesn't have any known unintended effect - especially re: possible unexpected behaviors when the argument isn't provided).


The mocked version is then

from pydantic.decorator import ValidatedFunction
import json
from typing import Literal
from devtools import debug
from fractal_tasks_core.dev.lib_args_schemas import _validate_function_signature
from fractal_tasks_core.dev.lib_args_schemas import _remove_args_kwargs_properties
from fractal_tasks_core.dev.lib_args_schemas import _remove_pydantic_internals
from typing import Literal, Optional

from pydantic import BaseModel


class Normalizer(BaseModel):
    type: Literal["default", "custom"] = "default"


def task_function_3(normalizer: Normalizer = Normalizer()):
    pass


# GENERATE JSON SCHEMA
_validate_function_signature(task_function_3)
vf = ValidatedFunction(task_function_3, config=None)
schema = vf.model.schema()
schema = _remove_args_kwargs_properties(schema)
schema = _remove_pydantic_internals(schema)
print(json.dumps(schema, indent=2))

producing this schema

{
  "title": "TaskFunction3",
  "type": "object",
  "properties": {
    "normalizer": {
      "title": "Normalizer",
      "default": {
        "type": "default"
      },
      "allOf": [
        {
          "$ref": "#/definitions/Normalizer"
        }
      ]
    }
  },
  "additionalProperties": false,
  "definitions": {
    "Normalizer": {
      "title": "Normalizer",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "default": "default",
          "enum": [
            "default",
            "custom"
          ],
          "type": "string"
        }
      }
    }
  }
}

and this UI in fractal-web 0.9.2:
image


While the actual model (untested!) should look like

class CellposeCustomNormalizerV2(BaseModel):
    """
    Validator to handle different normalization scenarios for Cellpose models

    If `type="default"`, then Cellpose default normalization is
    used and no other parameters can be specified.
    If `type="no_normalization"`, then no normalization is used and no
    other parameters can be specified.
    If `type="custom"`, then either percentiles or explicit integer
    bounds can be applied.

    Attributes:
        type:
            One of `default` (Cellpose default normalization), `custom`
            (using the other custom parameters) or `no_normalization`.
        lower_percentile: Specify a custom lower-bound percentile for rescaling
            as a float value between 0 and 100. Set to 1 to run the same as
            default). You can only specify percentiles or bounds, not both.
        upper_percentile: Specify a custom upper-bound percentile for rescaling
            as a float value between 0 and 100. Set to 99 to run the same as
            default, set to e.g. 99.99 if the default rescaling was too harsh.
            You can only specify percentiles or bounds, not both.
        lower_bound: Explicit lower bound value to rescale the image at.
            Needs to be an integer, e.g. 100.
            You can only specify percentiles or bounds, not both.
        upper_bound: Explicit upper bound value to rescale the image at.
            Needs to be an integer, e.g. 2000.
            You can only specify percentiles or bounds, not both.
    """
    type: Optional[Literal["default", "custom", "no_normalization"]] = None
    lower_percentile: Optional[float] = Field(None, ge=0, le=100)
    upper_percentile: Optional[float] = Field(None, ge=0, le=100)
    lower_bound: Optional[int] = None
    upper_bound: Optional[int] = None


    @root_validator
    def validate_conditions(cls, values):
        
        # Extract values
        type = values.get("type")
        lower_percentile = values.get("lower_percentile")
        upper_percentile = values.get("upper_percentile")
        lower_bound = values.get("lower_bound")
        upper_bound = values.get("upper_bound")

        # Verify that custom parameters are only provided when type="custom"
        if type != "custom":
            if lower_percentile is not None:
                raise ValueError(
                    f"Type='{type}' but {lower_percentile=}. "
                    "Hint: set type='custom'."
                    )
            if upper_percentile is not None:
                raise ValueError(
                    f"Type='{type}' but {upper_percentile=}. "
                    "Hint: set type='custom'."
                    )
            if lower_bound is not None:
                raise ValueError(
                    f"Type='{type}' but {lower_bound=}. "
                    "Hint: set type='custom'."
                )
            if upper_bound is not None:
                raise ValueError(
                    f"Type='{type}' but {upper_bound=}. "
                    "Hint: set type='custom'."
                )

        # The only valid options are:
        # 1. Both percentiles are set and both bounds are unset
        # 2. Both bounds are set and both percentiles are unset
        are_percentiles_set = (
            lower_percentile is not None,
            upper_percentile is not None,
            )
        are_bounds_set = (
            lower_bound is not None,
            upper_bound is not None,
            )
        if len(set(are_percentiles_set)) != 1:
            raise ValueError(
                "Both lower_percentile and upper_percentile must be set "
                "together."
                )
        if len(set(are_bounds_set)) != 1:
            raise ValueError(
                "Both lower_bound and upper_bound must be set together"
                )
        if lower_percentile is not None and lower_bound is not None:
            raise ValueError(
                "You cannot set both explicit bounds and percentile bounds "
                "at the same time. Hint: use only one of the two options."
            )

        return values

@jluethi
Copy link
Collaborator Author

jluethi commented Feb 15, 2024

This PR should be ready now. I switched to using Option 3 described above: Using the CellposeCustomNormalizer() as default input for the argument.
Therefore, this version of the tasks will require Fractal web 0.9.2 (because of the allOf in the json schema).

@jluethi jluethi marked this pull request as ready for review February 15, 2024 12:22
@jluethi jluethi merged commit cb37b44 into main Feb 15, 2024
18 checks passed
@jluethi jluethi deleted the 649-cellpose-options branch February 15, 2024 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose additional cellpose options in the task Add Pydantic model for cellpose models?
2 participants