Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.mpileup: Pathname not found #31

Open
ivan108 opened this issue Feb 17, 2021 · 8 comments
Open

1.mpileup: Pathname not found #31

ivan108 opened this issue Feb 17, 2021 · 8 comments

Comments

@ivan108
Copy link
Contributor

ivan108 commented Feb 17, 2021

Getting weird error in the beginning of 1.mpileup run:

Exception: Pathname not found: /home/jocostello/repositories/HenrikBengtsson/Costello-PSCN-Seq/annotationData/organisms/Homo_sapiens/GRCh37,hg19/UCSC/hg19.fa (/home/jocostello/repositories/HenrikBengtsson/Costello-PSCN-Seq/ exists, but nothing beyond)

However, pathname actually exists and readable:

>ls -l /home/jocostello/repositories/HenrikBengtsson/Costello-PSCN-Seq/annotationData/organisms/Homo_sapiens/GRCh37,hg19/UCSC/hg19.fa
-rw-r--r-- 1 henrik cbc 3199905909 Apr  6  2015 /home/jocostello/repositories/HenrikBengtsson/Costello-PSCN-Seq/annotationData/organisms/Homo_sapiens/GRCh37,hg19/UCSC/hg19.fa 
@HenrikBengtsson
Copy link
Owner

That's odd. Is it reproducible or sporadic? If you run it again, does it work then?

This looks related to #14, which I still to this date have no good explanation for. I've gone through the code responsible for these validations and I cannot spot the mistake. It appears to be some odd race condition with the file system. However, that would at best explain the case when a file that was written is not immediately available from reading, but in your case, this file has been sitting on the file system for years.

@HenrikBengtsson
Copy link
Owner

I'll dive into the session details in #32 to see if this possibly could be explained by an outdated package, but I really doubt it.

@ivan108
Copy link
Contributor Author

ivan108 commented Feb 19, 2021

It seems that the problem could be again with parallel processing, implemented in future batchtools. When I disable parallel processing e.g. by deleting .future.R, it works! But of course it is 100 times slower...

@HenrikBengtsson
Copy link
Owner

Yes, that gives more evidence to the hypothesis that there's a race condition going on towards the file system.

When you run parallel processing, is it sporadic or reproducible? Knowing that would really help me figure out how to troubleshoot this and possibly fix it (or at least lower the risk for it to occur).

@HenrikBengtsson
Copy link
Owner

I've just checked the NEWS of R.utils 2.10.1 and R.filesets 2.14.0 and I don't see anything that would address this problem since R.utils 2.7.0 and R.filesets 2.12.1 that you're running (#32), so from this perspective there's not need to update.

@ivan108
Copy link
Contributor Author

ivan108 commented Feb 21, 2021

Thanks Henrik for your efforts to figure out the issue!
Today I re-run 1.mpileup just to see reproducibility of the problem.

The program ran for an our and generated 205 pileups out of expected 375 (I have 15 samples for one patients). And then it crushed with the similar error. See attached log.

1.mpileup.o2017963.txt

@HenrikBengtsson
Copy link
Owner

Turns out TIPCC has mount issues again; /cbc where these files lives is empty on n27 (and n20). I've just reported this - hopefully fixed soon.

Though, your log output was from a job running on n18 and there it works; so it could be an TIPCC problem that comes and goes. Time to migrate over C4...

@HenrikBengtsson
Copy link
Owner

Update: To fix this, n20 & n27 need to be rebooted. I've taken n20 & n27 offline so that they won't take on any new jobs. As soon as your existing jobs running there finish, let Harry know and he can fix & reboot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants