Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflows as programs #1

Open
3 tasks
genehack opened this issue Aug 14, 2024 · 4 comments
Open
3 tasks

Workflows as programs #1

genehack opened this issue Aug 14, 2024 · 4 comments
Assignees

Comments

@genehack
Copy link
Contributor

genehack commented Aug 14, 2024

This is a meta-issue for tracking work around "workflows as programs". Extracted from Q3 planning meeting notes by @genehack. Fleshed out by @tsibley.

Background

Todos

@huddlej
Copy link

huddlej commented Nov 5, 2024

One question that occurred to me during @tsibley's lab meeting on this topic today is: do we want to guarantee that users can run our Nextstrain workflows with snakemake instead of with nextstrain?

I was thinking about how the workflow authors need to define shared functions (like shared/functions.smk from the measles PR) within and across pathogen repos and how nextstrain run could always inject shared logic into the workflows or provide a runtime environment that guarantees those functions will be available. But that kind of approach would break the workflows with vanilla snakemake. Even if code injection is a terrible idea, it made me wonder about our long-term guarantees that Nextstrain workflows will always work with Snakemake...

@tsibley tsibley transferred this issue from another repository Nov 6, 2024
@tsibley tsibley transferred this issue from another repository Nov 6, 2024
@genehack
Copy link
Contributor Author

genehack commented Nov 7, 2024

Copying this (and mildly cleaning it up) from a comment I sent to @tsibley on Slack after his lab meeting:

  • I suspect for many current “authors” this is going to feel like more work for little-to-no return; I think one way to assuage some of that would be to try as hard as possible to make the requirements of extra “stuff” in config.yaml and the snakemake rules as minimal as possible — for example, instead of requiring helpers to do path translations, what if nextstrain run pre-parsed the workflow config and the snakerules and automatically applied those transformations? (or attempted to and introspected the filesystem with fallbacks, etc.)

  • I wonder if something like the mpox clade I builds, or something WA DOH is working on, wouldn’t make for a better initial implementation target — a lot (most? all?) of the benefits to doing this will be to the “external consumer” part of the experience; would make sense to do prototype runs on something where we can get feedback from somebody in that external pool as quickly as possible

@tsibley
Copy link
Member

tsibley commented Nov 7, 2024

To @huddlej and @genehack's suggestions re: adding more "magic": I think it'd be more complicating and mystifying for authors (and readers) if we use techniques like code injection or config/workflow rewriting or similar. I don't want to rule out using such "magic" techniques in the future—they may prove to be the best way to meet our eventual goals/expectations—but for now I think it would be way more complex and fragile than making clear guidelines that all file paths from the config that have defaults should be wrapped in a function call.

As @huddlej notes, adding magic also ruins the ability to run workflows directly with Snakemake, and at least for now, I think we want to preserve that ability. (Though again, I wouldn't rule it out for the future.)

To @genehack's explicit concern (and @huddlej's implicit concern, if I'm reading correct) around extra stuff to do in authoring, the extra burden for authoring new workflows doesn't feel high to me. The function-wrapped-config-values are going to be mostly copied/cribbed from existing workflows and/or our pathogen-repo-guide template. Our Snakemake best practices will include appropriate guidance as reference material.

While this does create new work to update existing workflows, that is paid once and is straightforward work to do. We also do not need to do it all at once; only as we want a pathogen's support for nextstrain run.

@tsibley
Copy link
Member

tsibley commented Nov 7, 2024

For @genehack's thoughts around initial implementations, I agree that we should consider that and try to pick from existing partners. I was thinking of INRB (see also) in addition to WA DOH. There are also many external users of ncov that could likely benefit from this work and potentially be willing and able to provide valuable feedback if we make them aware of it and how to try it out (e.g. perhaps a post on the discussion forum).

Measles was merely a convenient example for this initial "how do we actually do this" phase (the same way zika or zika-tutorial has been that in the past, though much less so in recent years).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants