Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow submission scenarios #261

Closed
jluethi opened this issue Nov 14, 2022 · 3 comments
Closed

Workflow submission scenarios #261

jluethi opened this issue Nov 14, 2022 · 3 comments
Labels
flexibility Support more workflow-execution use cases High Priority Current Priorities & Blocking Issues Overview

Comments

@jluethi
Copy link
Collaborator

jluethi commented Nov 14, 2022

While working out the web GUI interactions, I came up with a more detailed version of workflow submissions. I'm summarizing here such that we have a centralized overview of the scenarios (see also draft of new functional specs, submission section).

  1. A user submits a full workflow for the first time (already supported)
  2. A user reruns a workflow from the beginning (should be supported in the 1.x release, right)
  3. A user runs a workflow until a given step (e.g. 4 task workflow, the user runs the first 2 => running the preprocessing steps, but not the image analysis steps yet)
  4. A user continues the run of a workflow (e.g. after 3, now runs the second part of the workflow, tasks 3 & 4)
  5. A user tests parameters of the workflow on a subset of the data. E.g. user ran scenario 3 (tasks 1 & 2), then runs multiple options of tasks 3 on a subset of the data. These subsets are saved to temporary files, not overwriting the main OME-Zarr file => see Running a workflow on a subset of data #109
  6. A user reruns part of a workflow (e.g. reruns tasks 3 & 4). There need to be some restrictions to what can be rerun, will cover this in a separate issue.

Here is a sketch of these scenarios:

IMG_1699

There is a main workflow (see discussion here: #236) that the user sees & where the user edits parameters. They first submit tasks 1 & 2 (workflow submission ID1). Then, the user experiments with options for task 3 (workflow submission IDs 2-4). Finally, they submit the fitting parameter on the whole dataset & run tasks 3 & 4 (workflow submission ID5)

@jluethi jluethi added the High Priority Current Priorities & Blocking Issues label Dec 14, 2022
@jluethi jluethi added Priority Important, but not the highest priority and removed High Priority Current Priorities & Blocking Issues labels Jan 11, 2023
@tcompa tcompa added the High Priority Current Priorities & Blocking Issues label Mar 7, 2023
@jluethi jluethi removed the High Priority Current Priorities & Blocking Issues label Mar 15, 2023
@tcompa
Copy link
Collaborator

tcompa commented May 23, 2023

I'm dumping here some meetings note concerning the "running a workflow from task A to task B" feature.


Running workflow from A to B: to be more clearly defined => what do we need to clarify? User stories? Implementation approach?

User stories:

  1. I’m not ready for the whole workflow yet. I just want to look at the image first (=> run 0 to n)
  2. Continue a workflow: n to m
  3. Something failed at step 3 (because input parameters were wrong). Let me correct parameters and rerun from there
  4. Tricky rerun part: clean up old outputs? Partial failure? => not initial scope [reserved keyword argument of the task]
  5. I’m just testing things (trying some parameters)

Goal: Tackle user stories 1-3

Assumptions:

Current status
Datasets are little metadata layers (collection of resources => paths + metadata)
Running a workflow has an input & output. Those are for the whole workflow => no clear access to intermediate states
Use case 3 is tricky => no info on what the last valid dataset is [some history is stored, but only at the end of the workflow]
Database is never touched from beginning to end of the workflow (separation of concern between runner & db)
Get job status is retrieved from disk (from the metadata.json, which is updated after every task [at least the history is updated, metadata dict updates are optional]) => use of this file is only for job monitoring endpoint
At the end of the workflow, db is updated

New
If a workflow fails, should the metadata be updated?
What we can do now: Update to the last valid state, i.e. to the state at the end of step 2.
Writing the last valid state

Requirement: We need to select to correct task to restart from
User needs to start from the correct dataset (the output dataset)
This will require an additional check: IO compatibility of tasks: We only check for input of workflow and output of workflow, not for each task
Tasks can modify dataset types

Q: Does the user need to define the output dataset? If the task defines (& changes) the output type, not necessarily
=> Tasks define their input/output

Runner has no access to the db => monitoring & updates goes via the metadata file

@tcompa tcompa added High Priority Current Priorities & Blocking Issues and removed Priority Important, but not the highest priority labels Jul 6, 2023
@tcompa
Copy link
Collaborator

tcompa commented Jul 10, 2023

Use case 6:
Switch back and forth between 3D and 2D OME-Zarrs, e.g. go back to 3D dataset after performing MIP.

Likely this could go through a more structured Dataset.meta attribute. Instead of just a list parallelization_level -> list_of_components, we could have something a bit more structured (possibly mimicking some part of the OME-Zarr structure).
This work would happen mostly on the task side, and possibly lead to some limited fractal-server updates.

The guiding principle in this should be that "Each attribute in metadata needs to exist somewhere else in the OME-Zarr file", and we would also need to rely on "Bootstrap metadata from an existing OME-Zarr file".

Other related fractal-tasks-core issues:

@jluethi
Copy link
Collaborator Author

jluethi commented Sep 27, 2023

Reviewed. Functionality of 1-4 is already implemented, 5 is indeed covered with:
fractal-analytics-platform/fractal-tasks-core#342
fractal-analytics-platform/fractal-tasks-core#279

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flexibility Support more workflow-execution use cases High Priority Current Priorities & Blocking Issues Overview
Projects
None yet
Development

No branches or pull requests

2 participants