Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with the detection of deletions that I see are visible #504

Open
tanyasarkjain opened this issue Nov 22, 2024 · 18 comments
Open

Issue with the detection of deletions that I see are visible #504

tanyasarkjain opened this issue Nov 22, 2024 · 18 comments

Comments

@tanyasarkjain
Copy link

Describe the bug
Hi I am running:

CRISPRessoWGS -b "reads.bam -f vegfa.tsv -r hg38.fa --name "output" --exclude_bp_from_right 0 --exclude_bp_from_left 0 --quantification_window_size 10 --min_frequency_alleles_around_cut_to_consider 0.001

Expected behavior
Visually looking at the produced ANALYZED_REGIONS folder, I can see there is a deletion that goes into the target site in my tsv, (the target site is 20bp) however, the deletions are not being counted when I look at the indels, perhaps because it goes beyond the target region. However, even after extending --quantification_window_size 10, and the target location to span the entirety of the deletion, it is still not being identified. Is there another parameter I should be running?

@kclem
Copy link
Member

kclem commented Nov 23, 2024

Hi @tanyasarkjain

Are you using short or long-read sequencing?

How long is your deletion? If the deletion is more than 60% of the length of your region in -f file, it won't align and will be discarded. You can change this behavior with the parameter --default_min_aln_score.

CRISPRessoWGS will only include reads that completely span the region in your -f file. It does this by chopping all reads down to start and end at the regions in your -f file. Unfortunately, if the start or end is deleted, the read may not be included. Could this be the case? Is the deletion completely in your -f region or does it overlap?

Alternatively, you can try this branch: https://github.com/pinellolab/CRISPResso2/tree/wgs-whole-region-deleted which allows for analysis when the start or end (or the whole region) is deleted. We're still testing this branch, so if you have any feedback, that would be great.

Otherwise, these are the steps I'd take to debug:

  1. Identify the read id of the read with the deletion using IGV or another viewer
  2. Check to see if the read made it into the ANALYZED_REGIONS/*.bam and fastq.gz files
  3. Run with --fastq_output to see the alignment and the alignment score for the read

Feel free to reach out if you need additional help or want to walk through this together - k.clement@utah.edu.

@tanyasarkjain
Copy link
Author

tanyasarkjain commented Nov 23, 2024

Screenshot 2024-11-22 at 5 55 19 PM

Hi @kclem , thank you for the response.

Here is a segment of the reads -- there are more below that are not visible in this screenshot.

They are short reads and the reads do span the whole region - however the deletion extends past the target region. I am using BAMs not fastqs also as my input. Thank you! The red indicates the target region, and the deletions seen in the ifrst few reads are not being counted). All these reads where in the ANALYZED_REGIONS folder for the associated region so they seemed to make it in the ANALYZED_REGIONS/*.bam, but when I look and the "deletions" stats in the SUMMARY txt file, I only see a deletion of 1

@kclem
Copy link
Member

kclem commented Nov 25, 2024

Hi @tanyasarkjain,

Yes, it looks like the reads are being discarded because they deletions are not contained within the region.

Can you expand your window to include the bases surrounding the deletion?

Are you able to try this branch which allows these types of reads? https://github.com/pinellolab/CRISPResso2/tree/wgs-whole-region-deleted

@tanyasarkjain
Copy link
Author

tanyasarkjain commented Dec 2, 2024

Well the deletions start within the region, but they do extend outward -- is this not counted? Are there no extra parameters I can include to allow crispressoWGS to consider deletions that extend beyond the window? If I expand the window, some reads that extend across the current region of interest are disregarded as those reads may not span the new window which is larger, which is why I don't want to just expand the window. Im having trouble trying out that branch when I try to clone the repo checkout and install it using setup.py, im getting error messages. Also, on another note - i have 15,000 locations I want to look at. Is there a way to parallelise crispresso WGS?

@kclem
Copy link
Member

kclem commented Dec 3, 2024

  1. CRISPResso will cut reads to the regions you specify in your region file. CRISPResso also calculates the deletion length for deletions, but can't calculate length for deletions that extend beyond the region window. This is why it is not accepted. Are you able to try this branch which allows these types of reads? https://github.com/pinellolab/CRISPResso2/tree/wgs-whole-region-deleted
  2. What error messages are you getting? Are they installation errors or are they errors when you run?
  3. You can parallelize CRISPRessoWGS using the -p parameter (set to 'max' to use all cores)

@tanyasarkjain
Copy link
Author

tanyasarkjain commented Dec 4, 2024

I am okay with the deletions that extend beyond the the region not citing the correct deletion length - I am more interested in if there is a deletion or not. Does Crispresso (for amplicons) work in the exact same way?

I am having trouble with the installation for this branch. I was running python setup.py install and was getting the below error message. Running this on my HPC which has python 3.11 installed. My cluster does not have 3.8 installed


  check.warn(importable)
running build_ext
building 'CRISPResso2.CRISPResso2Align' extension
gcc -pthread -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/wynton/group/clelland/usftp21.novogene.com/01.RawData/CRISPResso2/crispresso_env/lib64/python3.11/site-packages/numpy/_core/include -I/wynton/group/clelland/usftp21.novogene.com/01.RawData/CRISPResso2/crispresso_env/include -I/usr/include/python3.11 -c CRISPResso2/CRISPResso2Align.c -o build/temp.linux-x86_64-cpython-311/CRISPResso2/CRISPResso2Align.o -w -Ofast
In file included from /usr/include/python3.11/Python.h:38,
                 from CRISPResso2/CRISPResso2Align.c:32:
CRISPResso2/CRISPResso2Align.c: In function ‘__pyx_f_5numpy_PyDataType_SHAPE’:
CRISPResso2/CRISPResso2Align.c:8308:39: error: ‘PyArray_Descr’ {aka ‘struct _PyArray_Descr’} has no member named ‘subarray’
     __Pyx_INCREF(((PyObject*)__pyx_v_d->subarray->shape));
                                       ^~
/usr/include/python3.11/pyport.h:24:38: note: in definition of macro ‘_Py_CAST’
 #define _Py_CAST(type, expr) ((type)(expr))
                                      ^~~~
/usr/include/python3.11/object.h:506:35: note: in expansion of macro ‘_PyObject_CAST’
 #  define Py_INCREF(op) Py_INCREF(_PyObject_CAST(op))
                                   ^~~~~~~~~~~~~~
CRISPResso2/CRISPResso2Align.c:1515:27: note: in expansion of macro ‘Py_INCREF’
   #define __Pyx_INCREF(r) Py_INCREF(r)
                           ^~~~~~~~~
CRISPResso2/CRISPResso2Align.c:8308:5: note: in expansion of macro ‘__Pyx_INCREF’
     __Pyx_INCREF(((PyObject*)__pyx_v_d->subarray->shape));
     ^~~~~~~~~~~~
CRISPResso2/CRISPResso2Align.c:8309:36: error: ‘PyArray_Descr’ {aka ‘struct _PyArray_Descr’} has no member named ‘subarray’
     __pyx_r = ((PyObject*)__pyx_v_d->subarray->shape);
                                    ^~
CRISPResso2/CRISPResso2Align.c: In function ‘__Pyx_PyErr_GetTopmostException’:
CRISPResso2/CRISPResso2Align.c:25717:23: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_type’; did you mean ‘exc_value’?
     while ((exc_info->exc_type == NULL || exc_info->exc_type == Py_None) &&
                       ^~~~~~~~
                       exc_value
CRISPResso2/CRISPResso2Align.c:25717:53: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_type’; did you mean ‘exc_value’?
     while ((exc_info->exc_type == NULL || exc_info->exc_type == Py_None) &&
                                                     ^~~~~~~~
                                                     exc_value
CRISPResso2/CRISPResso2Align.c: In function ‘__Pyx__ExceptionSave’:
CRISPResso2/CRISPResso2Align.c:25731:23: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_type’; did you mean ‘exc_value’?
     *type = exc_info->exc_type;
                       ^~~~~~~~
                       exc_value
CRISPResso2/CRISPResso2Align.c:25733:19: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_traceback’
     *tb = exc_info->exc_traceback;
                   ^~
CRISPResso2/CRISPResso2Align.c: In function ‘__Pyx__ExceptionReset’:
CRISPResso2/CRISPResso2Align.c:25747:26: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_type’; did you mean ‘exc_value’?
     tmp_type = exc_info->exc_type;
                          ^~~~~~~~
                          exc_value
CRISPResso2/CRISPResso2Align.c:25749:22: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_traceback’
     tmp_tb = exc_info->exc_traceback;
                      ^~
CRISPResso2/CRISPResso2Align.c:25750:15: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_type’; did you mean ‘exc_value’?
     exc_info->exc_type = type;
               ^~~~~~~~
               exc_value
CRISPResso2/CRISPResso2Align.c:25752:13: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_traceback’
     exc_info->exc_traceback = tb;
             ^~
CRISPResso2/CRISPResso2Align.c: In function ‘__Pyx__GetException’:
CRISPResso2/CRISPResso2Align.c:25809:30: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_type’; did you mean ‘exc_value’?
         tmp_type = exc_info->exc_type;
                              ^~~~~~~~
                              exc_value
CRISPResso2/CRISPResso2Align.c:25811:26: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_traceback’
         tmp_tb = exc_info->exc_traceback;
                          ^~
CRISPResso2/CRISPResso2Align.c:25812:19: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_type’; did you mean ‘exc_value’?
         exc_info->exc_type = local_type;
                   ^~~~~~~~
                   exc_value
CRISPResso2/CRISPResso2Align.c:25814:17: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_traceback’
         exc_info->exc_traceback = local_tb;
                 ^~
CRISPResso2/CRISPResso2Align.c: In function ‘__Pyx__ExceptionSwap’:
CRISPResso2/CRISPResso2Align.c:26340:26: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_type’; did you mean ‘exc_value’?
     tmp_type = exc_info->exc_type;
                          ^~~~~~~~
                          exc_value
CRISPResso2/CRISPResso2Align.c:26342:22: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_traceback’
     tmp_tb = exc_info->exc_traceback;
                      ^~
CRISPResso2/CRISPResso2Align.c:26343:15: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_type’; did you mean ‘exc_value’?
     exc_info->exc_type = *type;
               ^~~~~~~~
               exc_value
CRISPResso2/CRISPResso2Align.c:26345:13: error: ‘_PyErr_StackItem’ {aka ‘struct _err_stackitem’} has no member named ‘exc_traceback’
     exc_info->exc_traceback = *tb;
             ^~
CRISPResso2/CRISPResso2Align.c: In function ‘__Pyx_AddTraceback’:
CRISPResso2/CRISPResso2Align.c:458:62: error: dereferencing pointer to incomplete type ‘PyFrameObject’ {aka ‘struct _frame’}
   #define __Pyx_PyFrame_SetLineNumber(frame, lineno)  (frame)->f_lineno = (lineno)
                                                              ^~
CRISPResso2/CRISPResso2Align.c:27115:5: note: in expansion of macro ‘__Pyx_PyFrame_SetLineNumber’
     __Pyx_PyFrame_SetLineNumber(py_frame, py_line);

@Colelyman
Copy link
Contributor

@tanyasarkjain Could you check to see if cython is installed? If not, I would try installing cython (using pip install cython) and then try reinstalling CRISPResso. Please let me know if that doesn't work!

@tanyasarkjain
Copy link
Author

tanyasarkjain commented Dec 4, 2024

Installed it but unfortunately still getting errors when i run python setup.py install:


Compiling CRISPResso2/CRISPRessoCOREResources.pyx because it changed.
Compiling CRISPResso2/CRISPResso2Align.pyx because it changed.
[1/2] Cythonizing CRISPResso2/CRISPResso2Align.pyx

Error compiling Cython file:
------------------------------------------------------------
...
    ctypedef unsigned int size_t

cdef extern from "Python.h":
    ctypedef void PyObject

ctypedef np.int_t DTYPE_INT
         ^
------------------------------------------------------------

CRISPResso2/CRISPResso2Align.pyx:20:9: 'int_t' is not a type identifier

Error compiling Cython file:
------------------------------------------------------------
...

cdef extern from "Python.h":
    ctypedef void PyObject

ctypedef np.int_t DTYPE_INT
ctypedef np.uint_t DTYPE_UINT
         ^
------------------------------------------------------------

CRISPResso2/CRISPResso2Align.pyx:21:9: 'uint_t' is not a type identifier
Traceback (most recent call last):
  File "/wynton/group/clelland/usftp21.novogene.com/01.RawData/CRISPResso2/setup.py", line 101, in <module>
    main()
  File "/wynton/group/clelland/usftp21.novogene.com/01.RawData/CRISPResso2/setup.py", line 47, in main
    ext_modules = cythonize(ext_modules, language_level="3")
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/group/clelland/usftp21.novogene.com/01.RawData/CRISPResso2/crispresso_env/lib64/python3.11/site-packages/Cython/Build/Dependencies.py", line 1154, in cythonize
    cythonize_one(*args)
  File "/wynton/group/clelland/usftp21.novogene.com/01.RawData/CRISPResso2/crispresso_env/lib64/python3.11/site-packages/Cython/Build/Dependencies.py", line 1321, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: CRISPResso2/CRISPResso2Align.pyx

@Colelyman
Copy link
Contributor

Thanks for trying that! I would try next to install numpy v2.0.0 or greater. If that doesn't work, please let me know!

@tanyasarkjain
Copy link
Author

Numpy was already installed with version 2.1.3

@Colelyman
Copy link
Contributor

Thank you, which version of CRISPResso are you trying to install? There was a recent breaking change between the numpy versions. If you aren't on the current master branch, I would recommend pulling that and trying to install it that way.

@tanyasarkjain
Copy link
Author

I am trying to install wgs-whole-region-deleted because of the aforementioned reasons where deletions in the current master branch version aren't calculated in the way that's ideal for my analysis

@Colelyman
Copy link
Contributor

Sorry about that, I see the issue! Could you try this branch https://github.com/edilytics/CRISPResso2/tree/wgs-whole-region-deleted-v2.3.1 which has fix you are looking for and is compatible with numpy v2.1.3?

Alternatively, you could try installing numpy 1.26.4 while still using the branch you are on. Please let me know if either of these options work!

@tanyasarkjain
Copy link
Author

Thanks - I will try that. To clarify how the ANALYZED_REGIONS are produced -- is it correct that the reads are trimmed to all which span the region of interest? So those BAMS in ANALYZED_REGIONS should contain all those reads?

@kclem
Copy link
Member

kclem commented Dec 9, 2024

Yes, ANALYZED_REGIONS contains reads that spanned the region of interest, but trimmed to only contain the sequence in the region of interest (not the whole read)

@tanyasarkjain
Copy link
Author

tanyasarkjain commented Dec 9, 2024

Is that correct? When looking at the reads on IGV they extend beyond my 23 bp regions of interest I put, making it seem like the read was not trimmed. (see above screenshot)

@kclem
Copy link
Member

kclem commented Dec 13, 2024

Was the screenshot above from your ANALYZED_REGIONS bam file? I though the IGV screenshot above in your third was raw data.

@tanyasarkjain
Copy link
Author

yes it was

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants