Skip to content

Commit

Permalink
Merge pull request #94 from sanger-tol/dev
Browse files Browse the repository at this point in the history
1.1 release
  • Loading branch information
muffato authored Jan 4, 2024
2 parents 3f2d401 + e5164d5 commit 81c53c6
Show file tree
Hide file tree
Showing 36 changed files with 623 additions and 151 deletions.
9 changes: 7 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
name: nf-core CI
# This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors
on:
workflow_dispatch:
push:
branches:
- dev
pull_request:
release:
types: [published]

env:
NXF_ANSI_LOG: false
Expand All @@ -19,7 +24,7 @@ jobs:
strategy:
matrix:
NXF_VER:
- "23.04.0"
- "22.10.1"
- "latest-everything"
steps:
- name: Check out pipeline code
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/fix-linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,21 @@ jobs:
# Only run if comment is on a PR with the main repo, and if it contains the magic keywords
if: >
contains(github.event.comment.html_url, '/pull/') &&
contains(github.event.comment.body, '@nf-core-bot fix linting') &&
contains(github.event.comment.body, '@sanger-tolsoft fix linting') &&
github.repository == 'sanger-tol/genomenote'
runs-on: ubuntu-latest
steps:
# Use the @nf-core-bot token to check out so we can push later
# Use the @sanger-tolsoft token to check out so we can push later
- uses: actions/checkout@v3
with:
token: ${{ secrets.nf_core_bot_auth_token }}
token: ${{ secrets.sangertolsoft_access_token }}

# Action runs on the issue comment, so we don't get the PR by default
# Use the gh cli to check out the PR
- name: Checkout Pull Request
run: gh pr checkout ${{ github.event.issue.number }}
env:
GITHUB_TOKEN: ${{ secrets.nf_core_bot_auth_token }}
GITHUB_TOKEN: ${{ secrets.sangertolsoft_access_token }}

- uses: actions/setup-node@v3

Expand All @@ -46,8 +46,8 @@ jobs:
- name: Commit & push changes
if: steps.prettier_status.outputs.result == 'fail'
run: |
git config user.email "core@nf-co.re"
git config user.name "nf-core-bot"
git config user.email "105875386+sanger-tolsoft@users.noreply.github.com"
git config user.name "sanger-tolsoft"
git config push.default upstream
git add .
git status
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
run: npm install -g editorconfig-checker

- name: Run ECLint check
run: editorconfig-checker -exclude README.md $(find .* -type f | grep -v '.git\|.py\|.md\|json\|yml\|yaml\|html\|css\|work\|.nextflow\|build\|nf_core.egg-info\|log.txt\|Makefile')
run: editorconfig-checker -exclude README.md $(find .* -type f | grep -v '.git\|.py\|.md\|cff\|json\|yml\|yaml\|html\|css\|work\|.nextflow\|build\|nf_core.egg-info\|log.txt\|Makefile')

Prettier:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -84,7 +84,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install nf-core
pip install nf-core==2.8
- name: Run nf-core lint
env:
Expand Down
29 changes: 29 additions & 0 deletions .github/workflows/sanger_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: sanger-tol LSF tests

on:
workflow_dispatch:
jobs:
run-tower:
name: Run LSF tests
runs-on: ubuntu-latest
steps:
- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
revision: ${{ github.sha }}
workdir: ${{ secrets.TOWER_WORKDIR_PARENT }}/work/${{ github.repository }}/work-${{ github.sha }}
parameters: |
{
"outdir": "${{ secrets.TOWER_WORKDIR_PARENT }}/results/${{ github.repository }}/results-${{ github.sha }}",
}
profiles: test,sanger,singularity,cleanup

- uses: actions/upload-artifact@v3
with:
name: Tower debug log file
path: |
tower_action_*.log
tower_action_*.json
43 changes: 43 additions & 0 deletions .github/workflows/sanger_test_full.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: sanger-tol LSF full size tests

on:
push:
branches:
- main
- dev
workflow_dispatch:
jobs:
run-tower:
name: Run LSF full size tests
runs-on: ubuntu-latest
steps:
- name: Sets env vars for push
run: |
echo "REVISION=${{ github.sha }}" >> $GITHUB_ENV
if: github.event_name == 'push'

- name: Sets env vars for workflow_dispatch
run: |
echo "REVISION=${{ github.sha }}" >> $GITHUB_ENV
if: github.event_name == 'workflow_dispatch'

- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
revision: ${{ env.REVISION }}
workdir: ${{ secrets.TOWER_WORKDIR_PARENT }}/work/${{ github.repository }}/work-${{ env.REVISION }}
parameters: |
{
"outdir": "${{ secrets.TOWER_WORKDIR_PARENT }}/results/${{ github.repository }}/results-${{ env.REVISION }}",
}
profiles: test_full,sanger,singularity,cleanup

- uses: actions/upload-artifact@v3
with:
name: Tower debug log file
path: |
tower_action_*.log
tower_action_*.json
1 change: 1 addition & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ lint:
- .github/ISSUE_TEMPLATE/bug_report.yml
- lib/NfcoreTemplate.groovy
- .github/CONTRIBUTING.md
- .github/workflows/linting.yml
nextflow_config:
- manifest.name
- manifest.homePage
Expand Down
53 changes: 53 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,59 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[1.1.0](https://github.com/sanger-tol/genomenote/releases/tag/1.1.0)] - Golden Retriever - [2024-01-04]

### Enhancements & fixes

- The pipeline now queries the NCBI Taxonomy API rather than
[GoaT](https://goat.genomehubs.org/api) to establish the list of lineages on
which to run BUSCO. The possible lineages are now defined [in the pipeline
configuration](assets/mapping_taxids-busco_dataset_name.eukaryota_odb10.2019-12-16.txt)
but can be overridden with the `--lineage_tax_ids` parameter.
- The pipeline will now immediately fail if the assembly can't be retrieve by
the [datasets](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install/)
command-line tool.
- Pipeline information is now outputted in `pipeline_info/genomenote/` instead
of `genomenote_info/`.
- `maxRetries` increased to 5 to cope with large datasets.
- BUSCO now runs in "scratch" mode, i.e. off a temporary directory, as the
number of files it creates could otherwise overwhelm a network filesystem.
- `SORT`, `FASTK`, and `MERQURYFK`, can now put their temporary files in the
work directory rather than `/tmp`. Turn that on with the `--use_work_dir_as_temp`
flag.
- The memory requirement of `SORT` is adjusted to account for some overheads
and avoid the job to be killed.
- All resource requirements (memory, time, CPUs) now fit the actual usage. This
is achieved by automatically adjusting to the size of the input whenever
possible.
- Genomes with sequences longer than 2 Gbp are now supported thanks to
upgrading FastK and MerquryFK.
- Fixed a bug that was causing the Completeness to be reported as 0 in the
statistics CSV file, when the k-mer database was constructed from BAM files.
- QV/Completeness can now be computed off 10X sequencing data.
- Minimal version of Nextflow downgraded to 23.04 to 22.10. 22.10 is tested as
part of our continuous integration (CI) pipeline.
- The "test" profile now runs faster, thanks to tuning some Busco/Metaeuk
parameters.
- The "test_full" profile is now tested automatically when updating the `dev`
and `main` branches.
- The pipelines now support Hi-C alignment files in the BAM format.

### Parameters

| Old parameter | New parameter |
| ------------- | ---------------------- |
| | --lineage_tax_ids |
| | --use_work_dir_as_temp |

### Software dependencies

| Dependency | Old version | New version |
| ----------- | ------------------------------------------ | ------------------------------------------ |
| `datasets` | 14.2 | 15.12 |
| `FastK` | `f18a4e6d2207539f7b84461daebc54530a9559b0` | `427104ea91c78c3b8b8b49f1a7d6bbeaa869ba1c` |
| `MerquryFK` | `8ae344092df5dcaf83cfb7f90f662597a9b1fc61` | `d00d98157618f4e8d1a9190026b19b471055b22e` |

## [[1.0.0](https://github.com/sanger-tol/genomenote/releases/tag/1.0.0)] - Czechoslovakian Wolfdog - [2023-05-18]

Initial release of sanger-tol/genomenote, created with the [nf-core](https://nf-co.re/) template.
Expand Down
38 changes: 38 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: sanger-tol/genomenote v1.1.0
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Cibin
family-names: Sadasivan Baby
affiliation: Wellcome Sanger Institute
orcid: "https://orcid.org/0000-0002-8538-276X"
- given-names: Matthieu
family-names: Muffato
affiliation: Wellcome Sanger Institute
orcid: "https://orcid.org/0000-0002-7860-3560"
- given-names: Guoying
family-names: Qi
affiliation: Wellcome Sanger Institute
orcid: "https://orcid.org/0000-0003-1262-8973"
- given-names: Priyanka
family-names: Surana
affiliation: Wellcome Sanger Institute
orcid: "https://orcid.org/0000-0002-7167-0875"
- given-names: Yates
family-names: Bethan
affiliation: Wellcome Sanger Institute
orcid: "https://orcid.org/0000-0003-1658-1762"
identifiers:
- type: doi
value: 10.5281/zenodo.7949384
repository-code: "https://github.com/sanger-tol/genomenote"
license: MIT
commit: TODO
version: 1.1.0
date-released: "2022-10-07"
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@

> Abdennur, Nezar, and Leonid A Mirny. “Cooler: Scalable Storage for Hi-C Data and Other Genomically Labeled Arrays.” Bioinformatics, vol. 36, no. 1, 2019, pp. 311–316., https://doi.org/10.1093/bioinformatics/btz540.
- [Crumble](https://github.com/jkbonfield/crumble)

> James K Bonfield, Shane A McCarthy, and Richard Durbin. "Crumble: reference free lossy compression of sequence quality values" Bioinformatics, Volume 35, Issue 2, January 2019, Pages 337–339, https://doi.org/10.1093/bioinformatics/bty608
- [FastK](https://github.com/thegenemyers/FASTK)

- [MerquryFK](https://github.com/thegenemyers/MERQURY.FK)
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![GitHub Actions Linting Status](https://github.com/sanger-tol/genomenote/workflows/nf-core%20linting/badge.svg)](https://github.com/sanger-tol/genomenote/actions?query=workflow%3A%22nf-core+linting%22)
[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.7949384-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.7949384)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/)
[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A522.10.1-23aa62.svg)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
Expand All @@ -22,9 +22,9 @@
3. Filter BED ([`GNU sort`](https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html), [`filter bed`](https://raw.githubusercontent.com/sanger-tol/genomenote/main/bin/filter_bed.sh))
4. Contact maps ([`Cooler cload`](https://cooler.readthedocs.io/en/latest/cli.html#cooler-cload-pairs), [`Cooler zoomify`](https://cooler.readthedocs.io/en/latest/cli.html#cooler-zoomify), [`Cooler dump`](https://cooler.readthedocs.io/en/latest/cli.html#cooler-dump))
5. Summary statistics ([`NCBI datasets summary genome accession`](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/reference-docs/command-line/datasets/summary/genome/datasets_summary_genome_accession/))
6. Genome completeness ([`GoaT API`](https://goat.genomehubs.org/api-docs/), [`BUSCO`](https://busco.ezlab.org))
6. Genome completeness ([`NCBI API`](https://www.ncbi.nlm.nih.gov/datasets/docs/v1/reference-docs/rest-api/), [`BUSCO`](https://busco.ezlab.org))
7. Consensus quality and k-mer completeness ([`FASTK`](https://github.com/thegenemyers/FASTK), [`MERQURY.FK`](https://github.com/thegenemyers/MERQURY.FK))
8. Collated summary table ([`createtable`](https://raw.githubusercontent.com/sanger-tol/genomenote/main/bin/create_table.py))
8. Collated summary table ([`createtable`](bin/create_table.py))
9. Present results and visualisations ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))

## Usage
Expand All @@ -44,7 +44,7 @@ mMelMel3,hic,/analysis/mMelMel3.2_paternal_haplotype/read_mapping/hic/GCA_922984
mMelMel3,pacbio,/genomic_data/mMelMel3/pacbio/kmer/k31
```

Each row represents an aligned HiC reads file or an unaligned PacBio reads file or a PacBio k-mer database.
Each row represents an aligned HiC reads file, an unaligned PacBio/10X reads file, or a PacBio/10X k-mer database.

Now, you can run the pipeline using:

Expand Down Expand Up @@ -75,13 +75,13 @@ We thank the following people for their assistance in the development of this pi

## Contributions and Support

If you would like to contribute to this pipeline, please see the [contributing guidelines](https://raw.githubusercontent.com/sanger-tol/genomenote/main/.github/CONTRIBUTING.md).
If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).

## Citations

If you use sanger-tol/genomenote for your analysis, please cite it using the following doi: [10.5281/zenodo.7949384](https://doi.org/10.5281/zenodo.7949384)

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](https://raw.githubusercontent.com/sanger-tol/genomenote/main/CITATIONS.md) file.
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
422676 aconoidasida
7898 actinopterygii
5338 agaricales
155619 agaricomycetes
33630 alveolata
5794 apicomplexa
6854 arachnida
6656 arthropoda
4890 ascomycota
8782 aves
5204 basidiomycota
68889 boletales
3699 brassicales
134362 capnodiales
33554 carnivora
91561 cetartiodactyla
34395 chaetothyriales
3041 chlorophyta
5796 coccidia
28738 cyprinodontiformes
7147 diptera
147541 dothideomycetes
3193 embryophyta
33392 endopterygota
314146 euarchontoglires
33682 euglenozoa
2759 eukaryota
5042 eurotiales
147545 eurotiomycetes
9347 eutheria
72025 fabales
4751 fungi
314147 glires
1028384 glomerellales
5178 helotiales
7524 hemiptera
7399 hymenoptera
5125 hypocreales
50557 insecta
314145 laurasiatheria
147548 leotiomycetes
7088 lepidoptera
4447 liliopsida
40674 mammalia
33208 metazoa
6029 microsporidia
6447 mollusca
4827 mucorales
1913637 mucoromycota
6231 nematoda
33183 onygenales
9126 passeriformes
5820 plasmodium
92860 pleosporales
38820 poales
5303 polyporales
9443 primates
4891 saccharomycetes
8457 sauropsida
4069 solanales
147550 sordariomycetes
33634 stramenopiles
32523 tetrapoda
155616 tremellomycetes
7742 vertebrata
33090 viridiplantae
71240 eudicots
4 changes: 2 additions & 2 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,datatype,datafile
uoEpiScrs1,pacbio,https://tolit.cog.sanger.ac.uk/test-data/Epithemia_sp._CRS-2021b/genomic_data/uoEpiScrs1/pacbio/m64228e_220617_134154.ccs.bc1015_BAK8B_OA--bc1015_BAK8B_OA.rmdup.bam
uoEpiScrs1,pacbio,https://tolit.cog.sanger.ac.uk/test-data/Epithemia_sp._CRS-2021b/genomic_data/uoEpiScrs1/pacbio/m64016e_220621_193126.ccs.bc1008_BAK8A_OA--bc1008_BAK8A_OA.rmdup.bam
uoEpiScrs1,pacbio,https://tolit.cog.sanger.ac.uk/test-data/Epithemia_sp._CRS-2021b/genomic_data/uoEpiScrs1/pacbio/m64228e_220617_134154.ccs.bc1015_BAK8B_OA--bc1015_BAK8B_OA.rmdup.subset.bam
uoEpiScrs1,pacbio,https://tolit.cog.sanger.ac.uk/test-data/Epithemia_sp._CRS-2021b/genomic_data/uoEpiScrs1/pacbio/m64016e_220621_193126.ccs.bc1008_BAK8A_OA--bc1008_BAK8A_OA.rmdup.subset.bam
uoEpiScrs1,hic,https://tolit.cog.sanger.ac.uk/test-data/Epithemia_sp._CRS-2021b/analysis/uoEpiScrs1.1/read_mapping/hic/GCA_946965045.1.unmasked.hic.uoEpiScrs1.subsampled.cram
2 changes: 1 addition & 1 deletion assets/slackreport.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{
"fallback": "Plain-text summary of the attachment.",
"color": "<% if (success) { %>good<% } else { %>danger<%} %>",
"author_name": "sanger-tol/readmapping v${version} - ${runName}",
"author_name": "sanger-tol/genomenote v${version} - ${runName}",
"author_icon": "https://www.nextflow.io/docs/latest/_static/favicon.ico",
"text": "<% if (success) { %>Pipeline completed successfully!<% } else { %>Pipeline completed with errors<% } %>",
"fields": [
Expand Down
Loading

0 comments on commit 81c53c6

Please sign in to comment.