Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning Parameters for Plant Assembly #35

Open
FedericoFontana94 opened this issue Nov 19, 2024 · 3 comments
Open

Tuning Parameters for Plant Assembly #35

FedericoFontana94 opened this issue Nov 19, 2024 · 3 comments

Comments

@FedericoFontana94
Copy link

Hi,

We are looking to optimize the polishing step for reconstructing a plant genome using PacBio HiFi reads (Q30). Specifically, we tested both NextPolish and NextPolish2, followed by a variant calling step on both the raw assembly and the polished genome in order to evaluate the polishing impact.

With NextPolish, we observed a significant reduction in homozygous INDELs (from 3,420 to 945), as expected, but also an increase in homozygous SNPs (from 853 to 2,678). In contrast, using NextPolish2 led to a slight increase in both homozygous INDELs (from 3,420 to 3,741) and SNPs (from 853 to 862).

For NextPolish2, we used the default Yak k-mer sizes (21 and 31) and the default minimap2 alignment settings.

My question is: Are there optimal parameters recommended for polishing a plant genome using NextPolish2?

Thank you in advance for your help.

Best regards,
Federico

@moold
Copy link
Member

moold commented Nov 19, 2024

Can you provide detailed commands and input data statistics, also NextPolish2 version.

@FedericoFontana94
Copy link
Author

Thank you for the quick response. Here are the commands we used:

$minimap2
--eqx
-ax map-hifi
-t 20
../../1_Assemblaggio/HIFIASM/q30/raw_assembly.fasta
../../0_Reads/PacBio/q30_hifi_reads.fastq.gz
| samtools sort -@ 20 -o hifi.map.sort.bam -;

samtools index hifi.map.sort.bam;

yak count
-o k21.yak
-k 21
-b 37
<(zcat ../../0_Reads/Illumina_WGS/trimmed_NextPolish2/trimmed_NP2_R1.fastq.gz)
<(zcat ../../0_Reads/Illumina_WGS/trimmed_NextPolish2/trimmed_NP2_R2.fastq.gz);

yak count
-o k31.yak
-k 31
-b 37
<(zcat ../../0_Reads/Illumina_WGS/trimmed_NextPolish2/trimmed_NP2_R1.fastq.gz)
<(zcat ../../0_Reads/Illumina_WGS/trimmed_NextPolish2/trimmed_NP2_R2.fastq.gz);

nextPolish2 -t 20 hifi.map.sort.bam ../../1_Assemblaggio/HIFIASM/q30/raw_assembly.fasta k21.yak k31.yak -o asm.np2.fa

And the input statistics:

  • PacBio HiFi reads (=>Q30): Coverage 36.0 X;
  • Illumina reads: Coverage 31.7 X;
  • Raw Assembly: BUSCO (C:95.6%[S:76.1%,D:19.5%],F:0.3%,M:4.1%​), ContigN50 (6,718,969), Size (694,411,396);
  • NextPolish2: 0.2.1 (bioconda env).

best,
Federico

@moold
Copy link
Member

moold commented Nov 20, 2024

  1. To maximize correction accuracy, quality filtering steps (fastp) such as adapter removal, global or quality trimming, and read filtering are essential for short reads.
  2. We recommend using >=60X short reads. if you only have 30x reads, may be you need to optimize the value of -k.
  3. Try to mapping hifi reads using winnowmap, it usually gives better results in our tests.
  4. try -r

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants