-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow different (multiple) inputs #106
Conversation
82abda3
to
c6d92ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments regarding sequence merge
input: | ||
metadata = lambda w: collect_inputs(segment=w.segment) | ||
output: | ||
metadata = "results/sequences_merged_{segment}.fasta" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming nitpick:
input: | |
metadata = lambda w: collect_inputs(segment=w.segment) | |
output: | |
metadata = "results/sequences_merged_{segment}.fasta" | |
input: | |
sequences = lambda w: collect_inputs(segment=w.segment) | |
output: | |
sequences = "results/sequences_merged_{segment}.fasta" |
additional_inputs: | ||
- name: secret | ||
metadata: secret.tsv | ||
sequencs: secret_{segment}.fasta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
By having all phylogenetic workflows start from two lists of inputs (`config.inputs`, `config.additional_inputs`) we enable a broad range of uses with a consistent interface. 1. Using local ingest files is trivial (see added docs) and doesn't need a bunch of special-cased logic that is prone to falling out of date (as it had indeed done) 2. Adding extra / private data follows the similar pattern, with an additional config list being used so that we are explicit that the new data is additional and enforce an ordering which is needed for predictable `augur merge` behaviour. The canonical data can be removed / replaced via step (1) if needed. I considered adding additional data after the subtype-filtering step, which would avoid the need to add subtype in the metadata but requires encoding this in the config overlay. I felt the chosen way was simpler and more powerful. Note that this workflow uses an old version of the CI workflow, <https://github.com/nextstrain/.github/blob/v0/.github/workflows/pathogen-repo-ci.yaml#L233-L240> which copies `example_data`. We could upgrade to the latest version and use a config overlay to swap out the canonical inputs with the example data.
c6d92ed
to
05b4622
Compare
fb99903
to
c60554a
Compare
Closing in favor of #112 |
By having all phylogenetic workflows start from two lists of inputs
(
config.inputs
,config.additional_inputs
) we enable a broad range ofuses with a consistent interface.
a bunch of special-cased logic that is prone to falling out of date
(as it had indeed done)
additional config list being used so that we are explicit that the
new data is additional and enforce an ordering which is needed for
predictable
augur merge
behaviour. The canonical data can beremoved / replaced via step (1) if needed.
I considered adding additional data after the subtype-filtering step,
which would avoid the need to add subtype in the metadata but requires
encoding this in the config overlay. I felt the chosen way was simpler
and more powerful.
Note that this workflow uses an old version of the CI workflow,
https://github.com/nextstrain/.github/blob/v0/.github/workflows/pathogen-repo-ci.yaml#L233-L240
which copies
example_data
. We could upgrade to the latest versionand use a config overlay to swap out the canonical inputs with the
example data.
See added docs for examples.