-
Notifications
You must be signed in to change notification settings - Fork 17
Creating a splice annotation from a GTF file
Dana Wyman edited this page Jan 6, 2020
·
3 revisions
TranscriptClean was originally designed to perform noncanonical splice junction correction using high-confidence splice junctions derived from mapping short reads to the genome with STAR. However, if you prefer to use splice junctions from a GTF formatted transcript annotation such as GENCODE, you can use our accessory script, get_SJs_from_gtf.py, to convert your GTF to the splice junction file format required by TranscriptClean using this command:
python accessory_scripts/get_SJs_from_gtf.py --f /path/to/annotation.gtf \
--g /path/to/reference_genome.fa \
--o spliceJns.txt
The output file follows the STAR SJ.out.tab format, which is described in detail in the STAR manual (section 4.4) here:
Columns:
- Chromosome
- First base of the intron (1-based)
- Last base of the intron (1-based)
- Strand (0: undefined, 1: +, 2: -)
- Intron motif code
0: non-canonical,
1: GT/AG
2: CT/AC
3: GC/AG
4: CT/GC
5: AT/AC
6: GT/AT - 0: unannotated, 1: annotated. This script assigns '1' to all junctions.
Columns 7, 8, and 9 are set to "NA" by this script.