-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review I/O of tasks #785
Labels
flexibility
Support more workflow-execution use cases
High Priority
Current Priorities & Blocking Issues
Comments
Some other thoughts coming up:
|
1 task
This issue in the current form is made obsolete by ongoing V2 work. |
github-project-automation
bot
moved this from TODO
to Done
in Fractal Project Management
Mar 22, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
flexibility
Support more workflow-execution use cases
High Priority
Current Priorities & Blocking Issues
I'm reviewing the methods for task execution (in view of #261), and I (re)-discovered a somewhat cumbersome way we use to define I/O parameters of tasks. This was probably implemented with a clear target in mind (the typical image->zarr conversion), but it doesn't sound fully right any more.
--
The
execute_tasks
function (streamlined in #780, so that it's also a bit more readable now) iterates over a list of wftasks. For each wftask, eithercall_single_task
orcall_parallel_task
is called (and waited for), and then the loop continues. The main I/O structure of these functions is defined in thetask_pars
argument and in the returned object, seeThose are two
TaskParameters
objects, whereWe should then review how the I/O
TaskParameters
objects are constructed for each wftask, and especially forTl;DR The current way these objects are constructed does not seem robust, and was probably fine-tuned to work only on the typical Fractal workflows.
First wftask
Input parameters
The definition of what will be passed as an input
TaskParameter
object to the first task is (e.g.) inapp/runner/_local/__init__.py
:where the
input_paths
,output_path
andinput_metadata
values come directly from the DB and they are obtained (inapp/runner/__init__.py
) asThere is something clearly suspicious here: the first task already needs some
output_dataset
properties, even ifoutput_dataset
is meant to be the output of the whole workflow.Output parameters
At the end of
call_single_task
(and the same holds forcall_parallel_task
), there is this definition of the return valueThis is strictly enforcing a task structure where every subsequent task will only act on a given path, both for input and output.
Also: this shows why we only support multiple paths as the input of the very first task (which essentially means only the
create_ome_zarr_multiplex
task).Intermediate wftask
An intermediate wftask (starting from the second one) will always have an input
TaskParameter
object which was the output of a previous one, and then by construction it can only haveinput_paths
andoutput_path
identical (apart from the former being cast to a list).The same holds for the output
TaskParameter
object, which will look like the output of the first wftask described above.Last wftask
As should be clear by what is described above, the last wftask is not different from any intermediate wftask, because the
output_dataset
was already used in the very first task.Notes and questions
What kind of workflows are actually supported?
If I'm getting it right, we only support a workflow such that:
Why does it currently work, for typical image->zarr workflows?
This use case perfectly matches what is described: the first task is the one that goes from the input (image) dataset to the output (zarr) dataset, and all the others keep using the output (zarr) dataset.
Why does it currently work, for a zarr->zarr workflow?
This use case always has the same (zarr) dataset both as an input and as an output, so that it's not affected by the current I/O definitions.
Can we make a clear example of a workflow that would fail because of how we define I/O parameters?
The following workflow is not supported:
compress_tif
task - if it were available)create_ome_zarr
and thenyokogawa_to_ome_zarr
).Where to move from here?
The text was updated successfully, but these errors were encountered: