minor edits

smcclatchy · Dec 9, 2024 · 7ab33af · 7ab33af
1 parent 667e6a5
commit 7ab33af
Show file tree

Hide file tree

Showing 4 changed files with 47 additions and 45 deletions.
diff --git a/episodes/create-transcriptome-map.Rmd b/episodes/create-transcriptome-map.Rmd
@@ -40,16 +40,16 @@ ethr       <- summary(object = eperm, alpha  = 0.05)
 ### Local and Distant eQTL
 
 In the previous lesson, we saw that the QTL peak for a gene can lie directly
-over the gene that is being mapped. This is called a "local eQTL" because the
+over the gene that is being mapped. This is called a _local eQTL_ because the
 QTL peak is located near the gene. When the QTL peak is located far from the
-gene being mapped, we call this a "distant eQTL". When the QTL peak is located
+gene being mapped, we call this a _distal eQTL_. When the QTL peak is located
 on the same chromosome as the gene, there are different heuristics regarding the
 distance between the gene and its QTL peak that we use to call an eQTL local
-or distant. This depends on the type of cross and the resolution of the 
+or distal. This depends on the type of cross and the resolution of the 
 recombination block structure.
 
 In this episode, we will create a table containing the QTL peaks for all genes
-and the gene positions. We will then classify QTL into local or distant and 
+and the gene positions. We will then classify QTL into local or distal and 
 will make a plot of all off the eQTL in relation to the gene positions.
 
 ### Transcriptome Map
@@ -58,11 +58,11 @@ We have encapsulated the code to create a transcriptome map in a file in this
 lesson. You can copy this file from the Github repository to use in your eQTL 
 mapping project. We will read this file in now.
 
-```{r}
+```{r, eval=FALSE}
 getwd()
 ```
 
-```{r}
+```{r, eval=FALSE}
 dir()
 ```
 
@@ -91,11 +91,10 @@ of this workshop. The required column names are:
     position in Mb.
 
 First, we will get the gene positions from the annotation and rename the 
-columns to match what `ggtmap` requires. We need to have columns named "gene_id",
-"gene_chr", and "gene_pos". We must rename the columns because, when we are 
-finished, we will have "chr" and "pos" columns for both the gene and its QTL. We
-will also add the "symbol" column since it is nice to have gene symbols in the
-data. 
+columns to match what `ggtmap` requires. We need to have columns named 
+`gene_id`, `gene_chr`, and `gene_pos`. We must rename the columns because, when 
+we are finished, we will have `chr` and `pos` columns for both the gene and its QTL. We will also add the `symbol` column since it is nice to have gene symbols 
+in the data. 
 
 ```{r get_gene_pos}
 gene_pos <- annot |>
@@ -142,12 +141,14 @@ chromosome.
 :::::::::::::::::::::::::::::::::
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
-We can tabulate the number of local and distant eQTL that we have and add this to 
-our QTL summary table. A local eQTL occurs when the QTL peaks is directly over the 
-gene position. But what if it is 2 Mb away? Or 10 Mb? It's possible that a gene 
-may have a trans eQTL on the same chromosome if the QTL is "far enough" from the 
-gene. In [Keller et al](https://pmc.ncbi.nlm.nih.gov/articles/PMC5937189/), the 
-authors selected a 4 Mb of the corresponding gene and we will use this threshold.
+We can tabulate the number of local and distal eQTL that we have and add this to 
+our QTL summary table. A local eQTL occurs when the QTL peaks is directly over 
+the gene position. But what if it is 2 Mb away? Or 10 Mb? It's possible that a 
+gene may have a distal eQTL on the same chromosome if the QTL is "far enough" 
+from the gene. In 
+[Keller et al](https://pmc.ncbi.nlm.nih.gov/articles/PMC5937189/), 
+the authors selected a 4 Mb distance from the corresponding gene and we will use 
+this threshold.
 
 ```{r local_summary, warning=FALSE, message=FALSE}
 eqtl <- eqtl |> 
@@ -161,7 +162,7 @@ count(eqtl, local)
 ### Plot Transcriptome Map
 
 In [Keller et al](https://pmc.ncbi.nlm.nih.gov/articles/PMC5937189/), the 
-authors used a LOD threshold of 7.2 to select genes tovuse in their 
+authors used a LOD threshold of 7.2 to select genes to use in their 
 transcriptome map. We will use this threshold to reproduce their results as 
 closely as possible.
 
@@ -174,7 +175,7 @@ ggtmap(data = eqtl |>
                        local.radius = 4)
 ```
 
-The plot above is called a "Transcriptome Map" because it shows the positions of 
+The plot above is called a *transcriptome map* because it shows the positions of 
 the genes (or transcripts) and their corresponding QTL. The QTL position is 
 shown on the X-axis and the gene position is shown on the Y-axis. The 
 chromosomes are listed along the top and right of the plot. 
@@ -214,18 +215,18 @@ possibly through multiple steps.
 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
 
 If you look at the plot, there are vertical bands of points which stack over a
-single QTL location. These are called "eQTL hotspots". Rather than look at the
+single QTL location. These are called *eQTL hotspots*. Rather than look at the
 transcriptome map, it may be easier to look at the density of the eQTL along
-the genome. We have provided a function called "eqtl_density_plot" to do this.
+the genome. We have provided a function called `eqtl_density_plot` to do this.
 
 
 ```{r eqtl_density,fig.height=6,fig.width=8,warning=FALSE,message=FALSE}
 eqtl_density_plot(eqtl, lod_thr = eqtl_thr)
 ```
 
-The plot above shows the mouse genome on the X-axis and the number of transcripts
-in a 4 Mb window on the Y-axis. It is difficult to say
-how many genes must be involved to call something and eQTL hotspot. There are
+The plot above shows the mouse genome on the X-axis and the number of 
+transcripts in a 4 Mb window on the Y-axis. It is difficult to say how many 
+genes must be involved to call something and eQTL hotspot. There are
 permutation-based methods which require a large amount of time and memory. In
 this case, we have called hotspots involving more than 100 genes eQTL hotspots.
 
@@ -256,16 +257,16 @@ How do the eQTL density plots compare to their results?
 
 ::::::::::::::::::::::::::::::::::::::: solution
 
-The eQTL density plots are largely similar. The hotspot on chromosome 11 contains
-more genes in our analysis than in the paper.
+The eQTL density plots are largely similar. The hotspot on chromosome 11 
+contains more genes in our analysis than in the paper.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
 
 How can we find the location of the eQTL hotspots and the genes which lie
-within each eQTL? We have written a function called "get_eqtl_hotspots" to help 
-you and have provides it in "gg_transcriptome_map.R". It requires the eQTL data,
+within each eQTL? We have written a function called `get_eqtl_hotspots` to help 
+you and have provides it in `gg_transcriptome_map.R`. It requires the eQTL data,
 a LOD threshold to filter the peaks, the number of genes required to call a
 locus a hotspot, and the radius in Mb around the hotspot to use when selecting
 genes which are part of the hotspot. We will use a 2 MB radius around the
@@ -293,6 +294,7 @@ of the eQTL positions in each hotspot.
 ```{r get_hotspot_pos}
 sapply(hotspots, function(z) { mean(z$qtl_pos) })
 ```
+
 Let's look at some of the genes in the chromosome 2 hotspot.
 
 ```{r headhotspot_2}
@@ -339,7 +341,7 @@ pca_2  <- princomp(expr_2)
 pc1_2  <- pca_2$scores[,1,drop = FALSE]
 ```
 
-We can then map PC1 as a phenotype and we should be a large QTL peak in the 
+We can then map PC1 as a phenotype and we should have a large QTL peak in the 
 same location as the hotspot.
 
 ```{r map_pc1_2,fig.width=8,fig.height=6}
@@ -352,7 +354,7 @@ plot_scan1(pc1_lod, map, main = "PC1 of Chr 2 Hotspot")
 ```
 
 We can also look at the founder allele effects at the peak. We will use a 
-function in gg_transcriptome_map.R called "plot_fit1()".
+function in `gg_transcriptome_map.R` called `plot_fit1()`.
 
 ```{r pc1_2_eff,fig.width=8,fig.height=6}
 peaks <- find_peaks(pc1_lod, map, threshold = 50)
@@ -383,7 +385,7 @@ the allele effects from this plot.
 - Local eQTL occur more frequently than distant eQTL.
 - Local eQTL appear along the diagonal in a transcriptome map and distant eQTL
 appear on the off-diagonal.
-- Stacks of eQTL which appear over a single locus are called "eQTL hotspots" and
+- Stacks of eQTL which appear over a single locus are called *eQTL hotspots* and
 represent sets of genes which are transcriptionally regulated by a single locus.
 - The first principal component of genes in and eQTL hotspot can be used to 
 summarize the genes in the hotspot.

diff --git a/episodes/load-explore-data.Rmd b/episodes/load-explore-data.Rmd
@@ -161,7 +161,7 @@ Males have higher insulin AUC than females.
 
 What does the heavy line bisecting the boxes indicate?  
 What do the lines at the top and bottom of the boxes indicate?  
-What does the "whisker" extending from the top and bottom of the boxes indicate?    
+What does the *whisker* extending from the top and bottom of the boxes indicate?    
 What do the black dots extending from the whiskers indicate? 
 
 :::::::::::::::::::::::: solution 

diff --git a/episodes/map-many-eqtls.Rmd b/episodes/map-many-eqtls.Rmd
@@ -98,7 +98,7 @@ can perform the QTL mapping in parallel on one node, so you will typically
 request only one node. In this case, we requested 22 cores on one node. 
 *sumner2* has nodes with up to 70 cores, but there is a point of diminishing 
 returns when parallelizing too much. We performed a short test run and found
-that we used abouty 100GB of memory, so we requested twice that amount in case
+that we used about 100GB of memory, so we requested twice that amount in case
 there were memory surges during computation. And we requested one day (24 hours)
 of compute time. 
 
@@ -109,8 +109,8 @@ many separate mapping jobs and combining the results. We have chosen to show
 you one simple way in this course. If you have `slurm` expertise, you can try
 other methods of breaking up the work.
 
-Next, we call the R script using "singularity exec" to execute the command
-which follows the container name. We call "Rscript" to run the eQTL mapping
+Next, we call the R script using `singularity exec` to execute the command
+which follows the container name. We call `Rscript` to run the eQTL mapping
 script, which we placed in the same directory as the bash script.
 
 The inputs to the script are the genoprobs, expression data, covariates, 
@@ -154,7 +154,7 @@ map   = readRDS(file.path(base_dir, 'attie_do_map.rds'))
 covar$sex    = factor(covar$sex)
 covar$DOwave = factor(covar$DOwave)
 
-# Craete a matrix of additive covariates.
+# Create a matrix of additive covariates.
 addcovar     = model.matrix(~sex + DOwave, data = covar)[,-1]
 
 # Map all genes.
@@ -179,12 +179,12 @@ saveRDS(peaks, file = file.path(base_dir, 'attie_do_eqtl_peaks.rds'))
 
 The output of this script is two files.
 
-1. attie_do_eqtl_lod.rds: This is numeric matrix containing LOD scores for all
-genes at all markers. We save it as a compressed R data file (*.rds). In this 
+1. `attie_do_eqtl_lod.rds`: This is numeric matrix containing LOD scores for all
+genes at all markers. We save it as a compressed R data file (`*.rds`). In this 
 case, the file is 11 GB.
-2. attie_do_eqtl_peaks.rds: This is a data.frame containing the peaks with LOD
-over 6. We save it as a compressed R data file (*.rds). In this case, the file
-is 674 KB. You should have downloaded this file into your "data" directory.
+2. `attie_do_eqtl_peaks.rds`: This is a data.frame containing the peaks with LOD
+over 6. We save it as a compressed R data file (`*.rds`). In this case, the file
+is 674 KB. You should have downloaded this file into your `data` directory.
 
 ### Finding QTL Peaks
 
@@ -194,7 +194,7 @@ Let's load in the QTL peaks that we pre-computed.
 peaks <- readRDS(file = "data/attie_do_eqtl_peaks.rds")
 ```
 
-`peaks` is a data.frame which contains all of the peaks in the same format as
+`peaks` is a data frame which contains all of the peaks in the same format as
 we saw in the previous lesson. Let's remind ourselves what the peaks table 
 looks like.
 
@@ -321,7 +321,7 @@ for(i in lod_brks) {
 results |>
   filter(lod_brks >= 6) |>
   ggplot(aes(lod_brks, n)) +
-    geom_line(size = 1.25) +
+    geom_line(linewidth = 1.25) +
     scale_x_continuous(breaks = 0:10 * 10) +
     labs(title = "Number of Peaks above LOD Score",
          x     = "LOD",

diff --git a/episodes/maximum-peaks-genes.Rmd b/episodes/maximum-peaks-genes.Rmd
@@ -48,7 +48,7 @@ marker. We could perform a traditional multiple-testing correction such as a
 Benjamini-Hochberg false discovery rate (FDR). However, by performing 
 permutations of the sample labels, scanning the genome, and retaining the 
 maximum LOD from each permutation, we are effectively adjusting for multiple
-testing across the genome because we have selecting only the maximum LOD 
+testing across the genome because we are selecting only the maximum LOD 
 across all markers in each permutation.
 
 Multiple testing at the gene level comes from mapping multiple genes. If we use
@@ -165,7 +165,7 @@ Let's see how many genes we have retained.
 nrow(peaks_filt)
 ```
 
-We still have almost 17,000 genes wiht an FDR of 5% or less.
+We still have almost 17,000 genes with an FDR of 5% or less.
 
 Next, let's look at what the range of LOD scores is.