Skip to content

Commit

Permalink
minor edits
Browse files Browse the repository at this point in the history
  • Loading branch information
smcclatchy committed Dec 9, 2024
1 parent 667e6a5 commit 7ab33af
Show file tree
Hide file tree
Showing 4 changed files with 47 additions and 45 deletions.
64 changes: 33 additions & 31 deletions episodes/create-transcriptome-map.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,16 +40,16 @@ ethr <- summary(object = eperm, alpha = 0.05)
### Local and Distant eQTL

In the previous lesson, we saw that the QTL peak for a gene can lie directly
over the gene that is being mapped. This is called a "local eQTL" because the
over the gene that is being mapped. This is called a _local eQTL_ because the
QTL peak is located near the gene. When the QTL peak is located far from the
gene being mapped, we call this a "distant eQTL". When the QTL peak is located
gene being mapped, we call this a _distal eQTL_. When the QTL peak is located
on the same chromosome as the gene, there are different heuristics regarding the
distance between the gene and its QTL peak that we use to call an eQTL local
or distant. This depends on the type of cross and the resolution of the
or distal. This depends on the type of cross and the resolution of the
recombination block structure.

In this episode, we will create a table containing the QTL peaks for all genes
and the gene positions. We will then classify QTL into local or distant and
and the gene positions. We will then classify QTL into local or distal and
will make a plot of all off the eQTL in relation to the gene positions.

### Transcriptome Map
Expand All @@ -58,11 +58,11 @@ We have encapsulated the code to create a transcriptome map in a file in this
lesson. You can copy this file from the Github repository to use in your eQTL
mapping project. We will read this file in now.

```{r}
```{r, eval=FALSE}
getwd()
```

```{r}
```{r, eval=FALSE}
dir()
```

Expand Down Expand Up @@ -91,11 +91,10 @@ of this workshop. The required column names are:
position in Mb.

First, we will get the gene positions from the annotation and rename the
columns to match what `ggtmap` requires. We need to have columns named "gene_id",
"gene_chr", and "gene_pos". We must rename the columns because, when we are
finished, we will have "chr" and "pos" columns for both the gene and its QTL. We
will also add the "symbol" column since it is nice to have gene symbols in the
data.
columns to match what `ggtmap` requires. We need to have columns named
`gene_id`, `gene_chr`, and `gene_pos`. We must rename the columns because, when
we are finished, we will have `chr` and `pos` columns for both the gene and its QTL. We will also add the `symbol` column since it is nice to have gene symbols
in the data.

```{r get_gene_pos}
gene_pos <- annot |>
Expand Down Expand Up @@ -142,12 +141,14 @@ chromosome.
:::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::

We can tabulate the number of local and distant eQTL that we have and add this to
our QTL summary table. A local eQTL occurs when the QTL peaks is directly over the
gene position. But what if it is 2 Mb away? Or 10 Mb? It's possible that a gene
may have a trans eQTL on the same chromosome if the QTL is "far enough" from the
gene. In [Keller et al](https://pmc.ncbi.nlm.nih.gov/articles/PMC5937189/), the
authors selected a 4 Mb of the corresponding gene and we will use this threshold.
We can tabulate the number of local and distal eQTL that we have and add this to
our QTL summary table. A local eQTL occurs when the QTL peaks is directly over
the gene position. But what if it is 2 Mb away? Or 10 Mb? It's possible that a
gene may have a distal eQTL on the same chromosome if the QTL is "far enough"
from the gene. In
[Keller et al](https://pmc.ncbi.nlm.nih.gov/articles/PMC5937189/),
the authors selected a 4 Mb distance from the corresponding gene and we will use
this threshold.

```{r local_summary, warning=FALSE, message=FALSE}
eqtl <- eqtl |>
Expand All @@ -161,7 +162,7 @@ count(eqtl, local)
### Plot Transcriptome Map

In [Keller et al](https://pmc.ncbi.nlm.nih.gov/articles/PMC5937189/), the
authors used a LOD threshold of 7.2 to select genes tovuse in their
authors used a LOD threshold of 7.2 to select genes to use in their
transcriptome map. We will use this threshold to reproduce their results as
closely as possible.

Expand All @@ -174,7 +175,7 @@ ggtmap(data = eqtl |>
local.radius = 4)
```

The plot above is called a "Transcriptome Map" because it shows the positions of
The plot above is called a *transcriptome map* because it shows the positions of
the genes (or transcripts) and their corresponding QTL. The QTL position is
shown on the X-axis and the gene position is shown on the Y-axis. The
chromosomes are listed along the top and right of the plot.
Expand Down Expand Up @@ -214,18 +215,18 @@ possibly through multiple steps.
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

If you look at the plot, there are vertical bands of points which stack over a
single QTL location. These are called "eQTL hotspots". Rather than look at the
single QTL location. These are called *eQTL hotspots*. Rather than look at the
transcriptome map, it may be easier to look at the density of the eQTL along
the genome. We have provided a function called "eqtl_density_plot" to do this.
the genome. We have provided a function called `eqtl_density_plot` to do this.


```{r eqtl_density,fig.height=6,fig.width=8,warning=FALSE,message=FALSE}
eqtl_density_plot(eqtl, lod_thr = eqtl_thr)
```

The plot above shows the mouse genome on the X-axis and the number of transcripts
in a 4 Mb window on the Y-axis. It is difficult to say
how many genes must be involved to call something and eQTL hotspot. There are
The plot above shows the mouse genome on the X-axis and the number of
transcripts in a 4 Mb window on the Y-axis. It is difficult to say how many
genes must be involved to call something and eQTL hotspot. There are
permutation-based methods which require a large amount of time and memory. In
this case, we have called hotspots involving more than 100 genes eQTL hotspots.

Expand Down Expand Up @@ -256,16 +257,16 @@ How do the eQTL density plots compare to their results?

::::::::::::::::::::::::::::::::::::::: solution

The eQTL density plots are largely similar. The hotspot on chromosome 11 contains
more genes in our analysis than in the paper.
The eQTL density plots are largely similar. The hotspot on chromosome 11
contains more genes in our analysis than in the paper.

::::::::::::::::::::::::::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

How can we find the location of the eQTL hotspots and the genes which lie
within each eQTL? We have written a function called "get_eqtl_hotspots" to help
you and have provides it in "gg_transcriptome_map.R". It requires the eQTL data,
within each eQTL? We have written a function called `get_eqtl_hotspots` to help
you and have provides it in `gg_transcriptome_map.R`. It requires the eQTL data,
a LOD threshold to filter the peaks, the number of genes required to call a
locus a hotspot, and the radius in Mb around the hotspot to use when selecting
genes which are part of the hotspot. We will use a 2 MB radius around the
Expand Down Expand Up @@ -293,6 +294,7 @@ of the eQTL positions in each hotspot.
```{r get_hotspot_pos}
sapply(hotspots, function(z) { mean(z$qtl_pos) })
```

Let's look at some of the genes in the chromosome 2 hotspot.

```{r headhotspot_2}
Expand Down Expand Up @@ -339,7 +341,7 @@ pca_2 <- princomp(expr_2)
pc1_2 <- pca_2$scores[,1,drop = FALSE]
```

We can then map PC1 as a phenotype and we should be a large QTL peak in the
We can then map PC1 as a phenotype and we should have a large QTL peak in the
same location as the hotspot.

```{r map_pc1_2,fig.width=8,fig.height=6}
Expand All @@ -352,7 +354,7 @@ plot_scan1(pc1_lod, map, main = "PC1 of Chr 2 Hotspot")
```

We can also look at the founder allele effects at the peak. We will use a
function in gg_transcriptome_map.R called "plot_fit1()".
function in `gg_transcriptome_map.R` called `plot_fit1()`.

```{r pc1_2_eff,fig.width=8,fig.height=6}
peaks <- find_peaks(pc1_lod, map, threshold = 50)
Expand Down Expand Up @@ -383,7 +385,7 @@ the allele effects from this plot.
- Local eQTL occur more frequently than distant eQTL.
- Local eQTL appear along the diagonal in a transcriptome map and distant eQTL
appear on the off-diagonal.
- Stacks of eQTL which appear over a single locus are called "eQTL hotspots" and
- Stacks of eQTL which appear over a single locus are called *eQTL hotspots* and
represent sets of genes which are transcriptionally regulated by a single locus.
- The first principal component of genes in and eQTL hotspot can be used to
summarize the genes in the hotspot.
Expand Down
2 changes: 1 addition & 1 deletion episodes/load-explore-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ Males have higher insulin AUC than females.

What does the heavy line bisecting the boxes indicate?
What do the lines at the top and bottom of the boxes indicate?
What does the "whisker" extending from the top and bottom of the boxes indicate?
What does the *whisker* extending from the top and bottom of the boxes indicate?
What do the black dots extending from the whiskers indicate?

:::::::::::::::::::::::: solution
Expand Down
22 changes: 11 additions & 11 deletions episodes/map-many-eqtls.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ can perform the QTL mapping in parallel on one node, so you will typically
request only one node. In this case, we requested 22 cores on one node.
*sumner2* has nodes with up to 70 cores, but there is a point of diminishing
returns when parallelizing too much. We performed a short test run and found
that we used abouty 100GB of memory, so we requested twice that amount in case
that we used about 100GB of memory, so we requested twice that amount in case
there were memory surges during computation. And we requested one day (24 hours)
of compute time.

Expand All @@ -109,8 +109,8 @@ many separate mapping jobs and combining the results. We have chosen to show
you one simple way in this course. If you have `slurm` expertise, you can try
other methods of breaking up the work.

Next, we call the R script using "singularity exec" to execute the command
which follows the container name. We call "Rscript" to run the eQTL mapping
Next, we call the R script using `singularity exec` to execute the command
which follows the container name. We call `Rscript` to run the eQTL mapping
script, which we placed in the same directory as the bash script.

The inputs to the script are the genoprobs, expression data, covariates,
Expand Down Expand Up @@ -154,7 +154,7 @@ map = readRDS(file.path(base_dir, 'attie_do_map.rds'))
covar$sex = factor(covar$sex)
covar$DOwave = factor(covar$DOwave)
# Craete a matrix of additive covariates.
# Create a matrix of additive covariates.
addcovar = model.matrix(~sex + DOwave, data = covar)[,-1]
# Map all genes.
Expand All @@ -179,12 +179,12 @@ saveRDS(peaks, file = file.path(base_dir, 'attie_do_eqtl_peaks.rds'))

The output of this script is two files.

1. attie_do_eqtl_lod.rds: This is numeric matrix containing LOD scores for all
genes at all markers. We save it as a compressed R data file (*.rds). In this
1. `attie_do_eqtl_lod.rds`: This is numeric matrix containing LOD scores for all
genes at all markers. We save it as a compressed R data file (`*.rds`). In this
case, the file is 11 GB.
2. attie_do_eqtl_peaks.rds: This is a data.frame containing the peaks with LOD
over 6. We save it as a compressed R data file (*.rds). In this case, the file
is 674 KB. You should have downloaded this file into your "data" directory.
2. `attie_do_eqtl_peaks.rds`: This is a data.frame containing the peaks with LOD
over 6. We save it as a compressed R data file (`*.rds`). In this case, the file
is 674 KB. You should have downloaded this file into your `data` directory.

### Finding QTL Peaks

Expand All @@ -194,7 +194,7 @@ Let's load in the QTL peaks that we pre-computed.
peaks <- readRDS(file = "data/attie_do_eqtl_peaks.rds")
```

`peaks` is a data.frame which contains all of the peaks in the same format as
`peaks` is a data frame which contains all of the peaks in the same format as
we saw in the previous lesson. Let's remind ourselves what the peaks table
looks like.

Expand Down Expand Up @@ -321,7 +321,7 @@ for(i in lod_brks) {
results |>
filter(lod_brks >= 6) |>
ggplot(aes(lod_brks, n)) +
geom_line(size = 1.25) +
geom_line(linewidth = 1.25) +
scale_x_continuous(breaks = 0:10 * 10) +
labs(title = "Number of Peaks above LOD Score",
x = "LOD",
Expand Down
4 changes: 2 additions & 2 deletions episodes/maximum-peaks-genes.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ marker. We could perform a traditional multiple-testing correction such as a
Benjamini-Hochberg false discovery rate (FDR). However, by performing
permutations of the sample labels, scanning the genome, and retaining the
maximum LOD from each permutation, we are effectively adjusting for multiple
testing across the genome because we have selecting only the maximum LOD
testing across the genome because we are selecting only the maximum LOD
across all markers in each permutation.

Multiple testing at the gene level comes from mapping multiple genes. If we use
Expand Down Expand Up @@ -165,7 +165,7 @@ Let's see how many genes we have retained.
nrow(peaks_filt)
```

We still have almost 17,000 genes wiht an FDR of 5% or less.
We still have almost 17,000 genes with an FDR of 5% or less.

Next, let's look at what the range of LOD scores is.

Expand Down

0 comments on commit 7ab33af

Please sign in to comment.