Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow different (multiple) inputs #106

Closed

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Dec 2, 2024

By having all phylogenetic workflows start from two lists of inputs
(config.inputs, config.additional_inputs) we enable a broad range of
uses with a consistent interface.

  1. Using local ingest files is trivial (see added docs) and doesn't need
    a bunch of special-cased logic that is prone to falling out of date
    (as it had indeed done)
  2. Adding extra / private data follows the similar pattern, with an
    additional config list being used so that we are explicit that the
    new data is additional and enforce an ordering which is needed for
    predictable augur merge behaviour. The canonical data can be
    removed / replaced via step (1) if needed.

I considered adding additional data after the subtype-filtering step,
which would avoid the need to add subtype in the metadata but requires
encoding this in the config overlay. I felt the chosen way was simpler
and more powerful.

Note that this workflow uses an old version of the CI workflow,
https://github.com/nextstrain/.github/blob/v0/.github/workflows/pathogen-repo-ci.yaml#L233-L240
which copies example_data. We could upgrade to the latest version
and use a config overlay to swap out the canonical inputs with the
example data.

See added docs for examples.

Copy link
Member

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments regarding sequence merge

Snakefile Show resolved Hide resolved
Comment on lines +320 to +332
input:
metadata = lambda w: collect_inputs(segment=w.segment)
output:
metadata = "results/sequences_merged_{segment}.fasta"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nitpick:

Suggested change
input:
metadata = lambda w: collect_inputs(segment=w.segment)
output:
metadata = "results/sequences_merged_{segment}.fasta"
input:
sequences = lambda w: collect_inputs(segment=w.segment)
output:
sequences = "results/sequences_merged_{segment}.fasta"

additional_inputs:
- name: secret
metadata: secret.tsv
sequencs: secret_{segment}.fasta
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

By having all phylogenetic workflows start from two lists of inputs
(`config.inputs`, `config.additional_inputs`) we enable a broad range of
uses with a consistent interface.

1. Using local ingest files is trivial (see added docs) and doesn't need
   a bunch of special-cased logic that is prone to falling out of date
   (as it had indeed done)
2. Adding extra / private data follows the similar pattern, with an
   additional config list being used so that we are explicit that the
   new data is additional and enforce an ordering which is needed for
   predictable `augur merge` behaviour. The canonical data can be
   removed / replaced via step (1) if needed.

I considered adding additional data after the subtype-filtering step,
which would avoid the need to add subtype in the metadata but requires
encoding this in the config overlay. I felt the chosen way was simpler
and more powerful.

Note that this workflow uses an old version of the CI workflow,
<https://github.com/nextstrain/.github/blob/v0/.github/workflows/pathogen-repo-ci.yaml#L233-L240>
which copies `example_data`. We could upgrade to the latest version
and use a config overlay to swap out the canonical inputs with the
example data.
@jameshadfield
Copy link
Member Author

Closing in favor of #112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants