[*] Error: run seqtrack R script failed. #8

Czirion · 2023-09-01T00:19:05Z

Dear developers,

I am having a weird problem while running Transflow. In the past, I made a successful run but after changing the snp_threshold I am having an error in the transmission analysis module. My dataset has 1,652 samples, it works fine with smaller datasets.

Here is a piece of the error message :

=> Using SeqTrack to infer transmission events for all clusters with at least 4 samples.
==> Cluster 1 ... Using longitude and latitude information data.
Done
==> Cluster 2 ... Using longitude and latitude information data.
Done
==> Cluster 3 ... Using longitude and latitude information data.
[*] Error: run seqtrack R script failed.
Full Traceback (most recent call last):
  File "/hpc/home/user/miniconda3/envs/transflow/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2576, in run_wrapper
    run(
  File "/work/user/transflow/L2/workflow/rules/transmission_detection.smk", line 76, in __rule_transmission_network
  File "/hpc/home/user/miniconda3/envs/transflow/lib/python3.10/site-packages/snakemake/shell.py", line 294, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  python3 /work/user/transflow/L2/workflow/scripts/run_transmission_detection.py --cluster 5.Transmission_cluster/SNP_based_method/samples_cluster_SNP_12.csv --distance 4.SNP_distance/samples_pairwise_distance_matrix.txt --network True --output 5.Transmission_cluster/SNP_based_method --date /work/user/transflow/L2/metadata_date_L2_genomes.tsv --coord True --method trans 2> 5.Transmission_cluster/SNP_based_method/transmission_detection.log' returned non-zero exit status 1.

The complete log

The configfile

The command I am running:
snakemake --snakefile workflow/transmission_analysis.snakefile --configfile configfile.yaml --verbose -c 16

The resources:
A SLURM cluster, using #SBATCH --mem-per-cpu=32G and #SBATCH -c 16

Thank you,

Claudia

The text was updated successfully, but these errors were encountered:

cvn001 · 2023-09-01T01:43:31Z

Hi Claudia,

The log file you uploaded shows that the transflow pipeline encountered an error when running the R package SeqTrack.

Please upload the contents of the "seqtrack.log" file in "5.Transmission_cluster/SNP_based_method/cluster_3", so that we can further investigate the cause of the error.

Best,

Xiangchen Li

Czirion · 2023-09-01T15:45:37Z

Thank you Xiangchen Li,

This is the seqtrack.log:

During startup - Warning messages:
1: Setting LC_COLLATE failed, using "C" 
2: Setting LC_TIME failed, using "C" 
3: Setting LC_MESSAGES failed, using "C" 
4: Setting LC_MONETARY failed, using "C" 
5: Setting LC_PAPER failed, using "C" 
6: Setting LC_MEASUREMENT failed, using "C" 
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
Calls: seqTrack ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning message:
non-unique values when setting 'row.names': ‘M_tb_ERS6403200’, ‘M_tb_ERS6403349’, ‘M_tb_ERS6403653’ 
Execution halted

cvn001 · 2023-09-02T09:28:09Z

Thank you Claudia,

The error message you uploaded shows that some sample names in the first column of the metadata file are duplicated. Please look at the "samples.txt" file in "5.Transmission_cluster/SNP_based_method/cluster_3", or use Excel software to open the metadata file and highlight the duplicate values to check it comprehensively.

Czirion · 2023-09-02T13:13:50Z

In the metadata file, those sample names appear only once:

M_tb_ERS6403349	2017-05-04	-33.546977	20.72753	Lineage 2	lineage2.2	lineage2.2	lineage2.2	ZAF	Western Cape							False	False	S	S	R	S	R	S	R	R	R	S	S	R	S	MXF_INH_RIF_RFB_LEV_KAN
M_tb_ERS6403653	2017-02-01	-32.2171831	26.6386401	Lineage 2	lineage2.2	lineage2.2	lineage2.2	ZAF	Eastern Cape							False	False	S	S	S	S	I	S	R	R	S	R	I	S		RIF_RFB_EMB
M_tb_ERS6403200	2013-07-17	-33.546977	20.72753	Lineage 2	lineage2.2	lineage2.2	lineage2.2	ZAF	Western Cape							False	False	S	S	S	S	I	S	S	S	S	S	S	S	S	S

In the samples.txt they are indeed duplicated.

cvn001 · 2023-09-05T09:25:24Z

Sorry Claudia, I haven't encountered this kind of problem, so it's a bit late to reply.

Since the R language error report does not have specific location information, it is impossible to determine where the error occurred. Could you please send me the input "metadata" file and "samples_pairwise_distance_matrix.txt" file for testing? Other characteristic information except the sample name can be deleted in the metadata file.

Czirion · 2023-09-05T19:38:51Z

Of course, here they are:

samples_pairwise_distance_matrix.txt.gz
list of samples

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[*] Error: run seqtrack R script failed. #8

[*] Error: run seqtrack R script failed. #8

Czirion commented Sep 1, 2023

cvn001 commented Sep 1, 2023

Czirion commented Sep 1, 2023

cvn001 commented Sep 2, 2023

Czirion commented Sep 2, 2023

cvn001 commented Sep 5, 2023

Czirion commented Sep 5, 2023

[*] Error: run seqtrack R script failed. #8

[*] Error: run seqtrack R script failed. #8

Comments

Czirion commented Sep 1, 2023

cvn001 commented Sep 1, 2023

Czirion commented Sep 1, 2023

cvn001 commented Sep 2, 2023

Czirion commented Sep 2, 2023

cvn001 commented Sep 5, 2023

Czirion commented Sep 5, 2023