Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[*] Error: run seqtrack R script failed. #8

Open
Czirion opened this issue Sep 1, 2023 · 6 comments
Open

[*] Error: run seqtrack R script failed. #8

Czirion opened this issue Sep 1, 2023 · 6 comments

Comments

@Czirion
Copy link

Czirion commented Sep 1, 2023

Dear developers,

I am having a weird problem while running Transflow. In the past, I made a successful run but after changing the snp_threshold I am having an error in the transmission analysis module. My dataset has 1,652 samples, it works fine with smaller datasets.

Here is a piece of the error message :

=> Using SeqTrack to infer transmission events for all clusters with at least 4 samples.
==> Cluster 1 ... Using longitude and latitude information data.
Done
==> Cluster 2 ... Using longitude and latitude information data.
Done
==> Cluster 3 ... Using longitude and latitude information data.
[*] Error: run seqtrack R script failed.
Full Traceback (most recent call last):
  File "/hpc/home/user/miniconda3/envs/transflow/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2576, in run_wrapper
    run(
  File "/work/user/transflow/L2/workflow/rules/transmission_detection.smk", line 76, in __rule_transmission_network
  File "/hpc/home/user/miniconda3/envs/transflow/lib/python3.10/site-packages/snakemake/shell.py", line 294, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  python3 /work/user/transflow/L2/workflow/scripts/run_transmission_detection.py --cluster 5.Transmission_cluster/SNP_based_method/samples_cluster_SNP_12.csv --distance 4.SNP_distance/samples_pairwise_distance_matrix.txt --network True --output 5.Transmission_cluster/SNP_based_method --date /work/user/transflow/L2/metadata_date_L2_genomes.tsv --coord True --method trans 2> 5.Transmission_cluster/SNP_based_method/transmission_detection.log' returned non-zero exit status 1.

The complete log

The configfile

The command I am running:
snakemake --snakefile workflow/transmission_analysis.snakefile --configfile configfile.yaml --verbose -c 16

The resources:
A SLURM cluster, using #SBATCH --mem-per-cpu=32G and #SBATCH -c 16

Thank you,

Claudia

@cvn001
Copy link
Owner

cvn001 commented Sep 1, 2023

Hi Claudia,

The log file you uploaded shows that the transflow pipeline encountered an error when running the R package SeqTrack.

Please upload the contents of the "seqtrack.log" file in "5.Transmission_cluster/SNP_based_method/cluster_3", so that we can further investigate the cause of the error.

Best,

Xiangchen Li

@Czirion
Copy link
Author

Czirion commented Sep 1, 2023

Thank you Xiangchen Li,

This is the seqtrack.log:

During startup - Warning messages:
1: Setting LC_COLLATE failed, using "C" 
2: Setting LC_TIME failed, using "C" 
3: Setting LC_MESSAGES failed, using "C" 
4: Setting LC_MONETARY failed, using "C" 
5: Setting LC_PAPER failed, using "C" 
6: Setting LC_MEASUREMENT failed, using "C" 
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
Calls: seqTrack ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning message:
non-unique values when setting 'row.names': ‘M_tb_ERS6403200’, ‘M_tb_ERS6403349’, ‘M_tb_ERS6403653’ 
Execution halted

@cvn001
Copy link
Owner

cvn001 commented Sep 2, 2023

Thank you Claudia,

The error message you uploaded shows that some sample names in the first column of the metadata file are duplicated. Please look at the "samples.txt" file in "5.Transmission_cluster/SNP_based_method/cluster_3", or use Excel software to open the metadata file and highlight the duplicate values to check it comprehensively.

@Czirion
Copy link
Author

Czirion commented Sep 2, 2023

In the metadata file, those sample names appear only once:

M_tb_ERS6403349	2017-05-04	-33.546977	20.72753	Lineage 2	lineage2.2	lineage2.2	lineage2.2	ZAF	Western Cape							False	False	S	S	R	S	R	S	R	R	R	S	S	R	S	MXF_INH_RIF_RFB_LEV_KAN
M_tb_ERS6403653	2017-02-01	-32.2171831	26.6386401	Lineage 2	lineage2.2	lineage2.2	lineage2.2	ZAF	Eastern Cape							False	False	S	S	S	S	I	S	R	R	S	R	I	S		RIF_RFB_EMB
M_tb_ERS6403200	2013-07-17	-33.546977	20.72753	Lineage 2	lineage2.2	lineage2.2	lineage2.2	ZAF	Western Cape							False	False	S	S	S	S	I	S	S	S	S	S	S	S	S	S

In the samples.txt they are indeed duplicated.

@cvn001
Copy link
Owner

cvn001 commented Sep 5, 2023

Sorry Claudia, I haven't encountered this kind of problem, so it's a bit late to reply.

Since the R language error report does not have specific location information, it is impossible to determine where the error occurred. Could you please send me the input "metadata" file and "samples_pairwise_distance_matrix.txt" file for testing? Other characteristic information except the sample name can be deleted in the metadata file.

@Czirion
Copy link
Author

Czirion commented Sep 5, 2023

Of course, here they are:

samples_pairwise_distance_matrix.txt.gz
list of samples

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants