Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pydantic model for cellpose models? #401

Closed
tcompa opened this issue Jun 7, 2023 · 8 comments · Fixed by #650
Closed

Add Pydantic model for cellpose models? #401

tcompa opened this issue Jun 7, 2023 · 8 comments · Fixed by #650
Labels
JSON Schemas july2023 Maintenance work planned for July 2023

Comments

@tcompa
Copy link
Collaborator

tcompa commented Jun 7, 2023

Refs:

@tcompa tcompa changed the title Consider having a new model for cellpose models Add Pydantic model for cellpose models? Jun 12, 2023
@tcompa
Copy link
Collaborator Author

tcompa commented Jun 19, 2023

For the record.

Script

from pydantic.decorator import ValidatedFunction
from pydantic.decorator import validate_arguments
from pydantic import ValidationError
from enum import Enum
from cellpose import models
from devtools import debug
from lib_args_schemas import _validate_function_signature
from lib_args_schemas import _remove_args_kwargs_properties
from lib_args_schemas import _remove_pydantic_internals


# CREATE TYPE (see also https://github.com/tiangolo/fastapi/issues/13)
ModelInCellposeZoo = Enum(
    "ModelInCellposeZoo",
    {value: value for value in models.MODEL_NAMES},
    type=str,
)
debug(models.MODEL_NAMES)


# USE PYDANTIC DECORATOR

@validate_arguments
def task_function_2(model: ModelInCellposeZoo = "cyto2"):
    pass


try:
    task_function_2(model="asd")
    raise RuntimeError("This line should never be executed.")
except ValidationError:
    print("All good, we raised a ValidationError.")


def task_function(model: ModelInCellposeZoo = "cyto2"):
    pass


# GENERATE JSON SCHEMA
_validate_function_signature(task_function)
vf = ValidatedFunction(task_function, config=None)
schema = vf.model.schema()
schema = _remove_args_kwargs_properties(schema)
schema = _remove_pydantic_internals(schema)

debug(schema)

Output

test.py:18 <module>
    models.MODEL_NAMES: [
        'cyto',
        'nuclei',
        'tissuenet',
        'livecell',
        'cyto2',
        'general',
        'CP',
        'CPx',
        'TN1',
        'TN2',
        'TN3',
        'LC1',
        'LC2',
        'LC3',
        'LC4',
    ] (list) len=15
All good, we raised a ValidationError.
test.py:46 <module>
    schema: {
        'title': 'TaskFunction',
        'type': 'object',
        'properties': {
            'model': {
                'default': 'cyto2',
                'allOf': [
                    {
                        '$ref': '#/definitions/ModelInCellposeZoo',
                    },
                ],
            },
        },
        'additionalProperties': False,
        'definitions': {
            'ModelInCellposeZoo': {
                'title': 'ModelInCellposeZoo',
                'description': 'An enumeration.',
                'enum': [
                    'cyto',
                    'nuclei',
                    'tissuenet',
                    'livecell',
                    'cyto2',
                    'general',
                    'CP',
                    'CPx',
                    'TN1',
                    'TN2',
                    'TN3',
                    'LC1',
                    'LC2',
                    'LC3',
                    'LC4',
                ],
                'type': 'string',
            },
        },
    } (dict) len=5

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 19, 2023

@tcompa tcompa added the july2023 Maintenance work planned for July 2023 label Jul 17, 2023
@tcompa
Copy link
Collaborator Author

tcompa commented Jul 20, 2023

I explored this issue a bit further, and still could not find a nice solution.

Option A: Use Literal

Here is a use_literal.py script`:

from typing import Literal
from cellpose import models
from pydantic import ValidationError
from pydantic.decorator import validate_arguments

@validate_arguments
def task_function(model_type: Literal[tuple(models.MODEL_NAMES)] = "cyto2"):
    print(f"Here we go: {model_type=} is valid.")
    print()


task_function("cyto2")

try:
    task_function("invalid_model_type")
except ValidationError:
    print("Providing and invalid model type leads to ValidationError.")

which works as expected:

$ poetry  run python use_literal.py 
Here we go: model_type='cyto2' is valid.

Providing and invalid model type leads to ValidationError.

What's wrong?

The way we use Literal is not how it's meant to be, and indeed mypy tells us that Invalid type: Literal[...] cannot contain arbitrary expressions [valid-type].

Option B: Use Enum

Here is a use_enum.py script

from enum import Enum

from cellpose import models
from pydantic import ValidationError
from pydantic.decorator import validate_arguments

CellposeModelType = Enum(
    "CellposeModelType",
    {value: value for value in models.MODEL_NAMES},
    type=str,
)


@validate_arguments
def task_function(model_type: CellposeModelType = "cyto2"):
    print(f"Here we go: {model_type=} is valid.")
    print(
        "NOTE: if you want the actual model_type string, "
        f"you need to access {model_type.name=}"
    )
    print()


task_function("cyto2")

try:
    task_function("invalid_model_type")
except ValidationError:
    print("Providing and invalid model type leads to ValidationError.")

which works as expected:

$ poetry  run python use_enum.py 
Here we go: model_type=<CellposeModelType.cyto2: 'cyto2'> is valid.
NOTE: if you want the actual model_type string, you need to access model_type.name='cyto2'

Providing and invalid model type leads to ValidationError.

What's wrong?

Similar to #343, Pydantic will coerce "cyto2" into model_type = <CellposeModelType.cyto2: 'cyto2'>, and we would need to use its .name (or .value) attribute to access the actual original name that we provided as an input (i.e. the string "cyto2").

Another issue: "cyto2" is not a valid default, because it's not a CellposeModelType instance; it is only accepted because it can be coerced to this type.

Schemas

With create_schemas.py:

from devtools import debug
from pydantic.decorator import ValidatedFunction
from use_enum import task_function as task_function_enum
from use_literal import task_function as task_function_literal

from fractal_tasks_core.dev.lib_args_schemas import (
    _remove_args_kwargs_properties,
)
from fractal_tasks_core.dev.lib_args_schemas import _remove_pydantic_internals

vf_literal = ValidatedFunction(task_function_literal, config=None)
schema_literal = vf_literal.model.schema()
schema_literal = _remove_args_kwargs_properties(schema_literal)
schema_literal = _remove_pydantic_internals(schema_literal)
debug(schema_literal)

vf_enum = ValidatedFunction(task_function_enum, config=None)
schema_enum = vf_enum.model.schema()
schema_enum = _remove_args_kwargs_properties(schema_enum)
schema_enum = _remove_pydantic_internals(schema_enum)
debug(schema_enum)

we can notice that Option A would lead to the expected schema, while Option B would produce a more complex one:

$ poetry run python create_schemas.py 
Here we go: model_type=<CellposeModelType.cyto2: 'cyto2'> is valid.
NOTE: if you want the actual model_type string, you need to access model_type.name='cyto2'

Providing and invalid model type leads to ValidationError.
Here we go: model_type='cyto2' is valid.

Providing and invalid model type leads to ValidationError.
2023-07-20 12:27:10,330; INFO; [_remove_args_kwargs_properties] END
2023-07-20 12:27:10,330; INFO; [_remove_pydantic_internals] END
create_schemas.py:15 <module>
    schema_literal: {
        'title': 'TaskFunction',
        'type': 'object',
        'properties': {
            'model_type': {
                'title': 'Model Type',
                'default': 'cyto2',
                'enum': [
                    'cyto',
                    'nuclei',
                    'tissuenet',
                    'livecell',
                    'cyto2',
                    'general',
                    'CP',
                    'CPx',
                    'TN1',
                    'TN2',
                    'TN3',
                    'LC1',
                    'LC2',
                    'LC3',
                    'LC4',
                ],
                'type': 'string',
            },
        },
        'additionalProperties': False,
    } (dict) len=4
2023-07-20 12:27:10,355; INFO; [_remove_args_kwargs_properties] END
2023-07-20 12:27:10,355; INFO; [_remove_pydantic_internals] END
create_schemas.py:21 <module>
    schema_enum: {
        'title': 'TaskFunction',
        'type': 'object',
        'properties': {
            'model_type': {
                'default': 'cyto2',
                'allOf': [
                    {
                        '$ref': '#/definitions/CellposeModelType',
                    },
                ],
            },
        },
        'additionalProperties': False,
        'definitions': {
            'CellposeModelType': {
                'title': 'CellposeModelType',
                'description': 'An enumeration.',
                'enum': [
                    'cyto',
                    'nuclei',
                    'tissuenet',
                    'livecell',
                    'cyto2',
                    'general',
                    'CP',
                    'CPx',
                    'TN1',
                    'TN2',
                    'TN3',
                    'LC1',
                    'LC2',
                    'LC3',
                    'LC4',
                ],
                'type': 'string',
            },
        },
    } (dict) len=5

TL;DR

I think Option B is to be discarded. Option A is by far closer to the expected behavior, but I'm not yet 100% sure that we'd like to rely on a non-standard use of Literal.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 20, 2023

Option C

Similar to Option A, but we build the forward-reference string.

from typing import Literal
from cellpose import models
from pydantic import ValidationError
from pydantic.decorator import validate_arguments


cellpose_model_strings = ", ".join(
    [f'"{model_type}"' for model_type in models.MODEL_NAMES]
)
CellposeModelType = f"Literal[{cellpose_model_strings}]"
print(CellposeModelType)


@validate_arguments
def task_function(model_type: CellposeModelType = "cyto2"):
    print(f"Here we go: {model_type=} is valid.")
    print()


task_function("cyto2")

try:
    task_function("invalid_model_type")
except ValidationError:
    print("Providing and invalid model type leads to ValidationError.")
$ poetry run python use_literal_string.py 
Literal["cyto", "nuclei", "tissuenet", "livecell", "cyto2", "general", "CP", "CPx", "TN1", "TN2", "TN3", "LC1", "LC2", "LC3", "LC4"]
Here we go: model_type='cyto2' is valid.

Providing and invalid model type leads to ValidationError.
$ poetry run python create_schema.py 
Literal["cyto", "nuclei", "tissuenet", "livecell", "cyto2", "general", "CP", "CPx", "TN1", "TN2", "TN3", "LC1", "LC2", "LC3", "LC4"]
Here we go: model_type='cyto2' is valid.

Providing and invalid model type leads to ValidationError.
2023-07-20 13:40:10,380; INFO; [_remove_args_kwargs_properties] END
2023-07-20 13:40:10,380; INFO; [_remove_pydantic_internals] END
create_schema.py:14 <module>
    schema_literal: {
        'title': 'TaskFunction',
        'type': 'object',
        'properties': {
            'model_type': {
                'title': 'Model Type',
                'default': 'cyto2',
                'enum': [
                    'cyto',
                    'nuclei',
                    'tissuenet',
                    'livecell',
                    'cyto2',
                    'general',
                    'CP',
                    'CPx',
                    'TN1',
                    'TN2',
                    'TN3',
                    'LC1',
                    'LC2',
                    'LC3',
                    'LC4',
                ],
                'type': 'string',
            },
        },
        'additionalProperties': False,
    } (dict) len=4

What's wrong

Using a dynamically-defined string as a forward reference does not comply with mypy:

Variable "fractal_tasks_core.tasks.tmp.use_literal_string.CellposeModelType" is not valid as a type [valid-type]

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 21, 2023

After discussion with @mfranzon:

  • Option A is clearly the best one: it is clean and transparent, and ignoring the mypy error is not a big deal. Note that we use type hints at runtime in two cases: schema generation and argument validation. The mypy issue concerns the non-runtime use of static code check, which is not something we are strictly enforcing in fractal-tasks-core.
  • Option B is clearly not viable: it is cumbersome to use the parameter within the task, and the generated schema is not the right one
  • Option C is just a more obscure rephrasing of Option A.

Let's move on with option A.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 21, 2023

Yet another problem appeared, and I think this one leads to the conclusion that for now we cannot implement the feature required in the current issue.

Option A works, but what gets written into the JSON Schema is the list of cellpose.models.MODEL_NAMES for the specific Cellpose version available in the specific environment where schema-generation scripts are run. This is currently 2.2.2, but fractal-tasks-core also accepts other Cellpose versions. And if Cellpose 2.3 adds a new model to MODEL_NAMES, the JSON Schema won't be valid any more, or we would need to re-create it and release a new version of fractal-tasks-core.

More generally, the JSON Schema would depend on the specific version of Cellpose that was available upon schema-generation, which may differ from the one coming in the task-execution environment.

Note that the current solution (performing this check within the task) is obviously safe in terms of schemas, since the schema only asks for a string (rather than enum) variable. The downside is that the user could provide and invalid model type, and this error would only be caught at runtime.

I'm closing this issue since I judge the benefit (model-type check in fractal-web) is not worth the cost (the risk of having to re-release fractal-tasks-core at "random" times, or the requirement that we pin Cellpose to a specific version in our dependencies).

Feel free to re-open for further discussion.

@jluethi
Copy link
Collaborator

jluethi commented Jan 31, 2024

Returning to this issue now that we have better support for Enums / Literals in Fractal web.

Thinking about this again: Cellpose is not changing default models often. And we have its version pinned anyway as: cellpose >=2.2,<2.3

The benefit of actually seeing models in the web interface would be significant enough to warrant this complexity. Users are much more likely to try valid models if those models are a dropdown than if they have to search for the model string. And as long as we keep the cellpose version pinned, there should be a match between the manifest and what gets installed.

@jluethi jluethi mentioned this issue Jan 31, 2024
1 task
@jluethi
Copy link
Collaborator

jluethi commented Jan 31, 2024

I now included this in #650 using the A option from above.
The expected list of models is also included in the expected error message in the tests when the task is run with a different version of cellpose that contains a new potential model list.
It should make the tests fail if one updates the cellpose version (to something where the model selection changes, which should be quite rare) and does not update the manifest accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JSON Schemas july2023 Maintenance work planned for July 2023
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants