Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update test dataset #240

Closed
tcompa opened this issue Dec 1, 2022 · 6 comments · Fixed by #322
Closed

Update test dataset #240

tcompa opened this issue Dec 1, 2022 · 6 comments · Fixed by #322
Assignees
Labels
Backlog Backlog issues we may eventually fix, but aren't a priority maintenance
Milestone

Comments

@tcompa
Copy link
Collaborator

tcompa commented Dec 1, 2022

While working on #239, I though a few updates of the test datasets would be useful.

The first one is that we should put a new zarr on zenodo, complying with the new rules (that is, the omero channels should always include also a wavelength_id attribute).

tcompa added a commit that referenced this issue Dec 1, 2022
@tcompa tcompa added the Backlog Backlog issues we may eventually fix, but aren't a priority label Dec 1, 2022
@jluethi
Copy link
Collaborator

jluethi commented Dec 1, 2022

Once this is in place, I can update the examples we have so that they produce the new output and we can just update the zenodo dataset with a new version (already did this once for one of them).

@tcompa
Copy link
Collaborator Author

tcompa commented Dec 1, 2022

For testing, this is currently taken care of with a workaround in a fixture: e3690ef

@tcompa
Copy link
Collaborator Author

tcompa commented Dec 1, 2022

Once this is in place, I can update the examples we have so that they produce the new output and we can just update the zenodo dataset with a new version (already did this once for one of them).

(of course this is the way to go..)

@jluethi jluethi added this to the maintenance milestone Jan 20, 2023
@jluethi
Copy link
Collaborator

jluethi commented Feb 24, 2023

@tcompa I updated the main 2 datasets on Zenodo. Because the data changed, they also get a new identifier. They can now be found here:

Tiny: https://zenodo.org/record/7674545
2x2: https://zenodo.org/record/7674571

I ran the tiny at higher cellpose labeling resolution than before. So if you update this in the test, check whether it still matches well (are we testing for getting the exact same data actually? Could anyway be tricky, not sure whether cellpose is pixel-perfect reproducible).

I used those workflows to generate the datasets:
Tiny: https://github.com/fractal-analytics-platform/fractal-demos/tree/1a82185bc0132095275ea012155e04b666c33ba1/examples/01_cardio_tiny_dataset
2x2: https://github.com/fractal-analytics-platform/fractal-demos/tree/1a82185bc0132095275ea012155e04b666c33ba1/examples/02_cardio_small

@tcompa
Copy link
Collaborator Author

tcompa commented Feb 27, 2023

Thanks!

To do:

  • Update the way they are used in the tests; that is, remove the workaround introduced in e3690ef.

@tcompa tcompa linked a pull request Feb 27, 2023 that will close this issue
@tcompa
Copy link
Collaborator Author

tcompa commented Feb 27, 2023

For the record: we do not use the multiplexing dataset in the CI.

are we testing for getting the exact same data actually?

We aren't. The zarr dataset are mostly used as a valid starting point for additional processing steps, but we do not check that tasks produce exactly the same output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backlog Backlog issues we may eventually fix, but aren't a priority maintenance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants