Taxon to Biogeochemical Cycle

A nextflow workflow created to predict functions involving major biogeochemical cycles (carbon, sulfur, nitrogen) for taxonomic affiliations (that can be created from metabarcoding or metagenomic sequencing). It relies on EsMeCata and bigecyhmm.

Requirements

Nextflow: to run workflow.nf
esmecata, bigecyhmm and several python packages for visualisation: this can be done with the following pip command: pip install esmecata bigecyhmm seaborn pandas plotly kaleido
esmecata precomputed database: it can be downloaded from this Zenodo archive. This precomputed database size is 4 Gb.

Usage

This workflow can be called by nextflow in two ways:

by downloading this repository and calling the tabigecy.nf file.
by calling this repository in the nextflow command with nextflow run ArnaudBelcour/tabigecy ....

You can print the help with the following command:

nextflow run ArnaudBelcour/tabigecy --help

By default, the script will be using files in the directory where the script has been launched. It uses 3 files:

EsMeCaTa input file.
EsMeCaTa precomputed database.

Optionally, it can take:

Abundance file containing the abundance in different samples for the different rows of the EsMeCaTa input file.

At the end, it will create an output folder containing the output folders of EsMeCaTa, the one of bigecyhmm and the visualisation output folder. To do this on your own file you can specify the input files with the command line:

nextflow run ArnaudBelcour/tabigecy --infile esmecata_input_file.tsv --inAbundfile abundance.tsv --precomputedDB esmecata_database.zip --visualisationScript create_bigecyhmm_plot.py --outputFolder output_folder --coreBigecyhmm 5

Output

An output folder (by default called output_folder) is created. It contains three subfolders:

output_1_esmecata: the output folder of the esmecata precomputed command. For more information, look at EsMeCaTa readme.
output_2_bigecyhmm: the output folder of bigecyhmm command. For more information, look at bigecyhmm readme.
output_3_visualisation: the output folder for the visualisation of the predictions and (if given) the addition of sample abundances.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
article_data		article_data
test_data		test_data
LICENSE		LICENSE
README.md		README.md
nextflow.config		nextflow.config
tabigecy.nf		tabigecy.nf
tabigecy_diagram.svg		tabigecy_diagram.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taxon to Biogeochemical Cycle

Requirements

Usage

Output

About

Releases

Packages

Languages

License

ArnaudBelcour/tabigecy

Folders and files

Latest commit

History

Repository files navigation

Taxon to Biogeochemical Cycle

Requirements

Usage

Output

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages