-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path2.Evaluation of covariates.Rmd
176 lines (154 loc) · 6.3 KB
/
2.Evaluation of covariates.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
title: "Evaluation of covariates"
author: "Bai Ling"
date: "`r format(Sys.Date(), format = '%d %B %Y')`"
output:
html_document:
theme: cerulean
css: style.css
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.align = "center")
options(stringsAsFactors = FALSE)
setwd("/Users/apple/Documents/GitHub/EWAS/")
```
## Contents
***
- [1.Dependency between p-value and covariate](#dependency)
- [2.Comparison of Omnibus test and naïve test](#simulated)
- [3.Omnibus test to assess the informativeness of EWAS-related covariates](#real)
<br>
<br>
<br>
#### Library R packages and basic setttings
```{r library prepare, message=FALSE, warning=FALSE}
library(tidyverse)
library(readxl)
library(cowplot)
library(IHW)
library(ggforce)
selected_measure <- c("sd.b", "sd.m", "mean", "mad", "dip", "precision", "direction", "chr",
"refgene.pos", "cpg.loc", "dhs", "probe.type", "icc.b", "icc.m")
measure_col <- c("sd.b"="#7570B3", "sd.m"="#E7298A", "mean"="#A6CEE3",
"mad"="#1F78B4", "dip"="#B2DF8A", "precision"="#FFFF00", "icc.b"="#FB9A99",
"icc.m"="#E31A1C", "refgene.pos"="#FDBF6F", "chr"="#A6761D", "cpg.loc"="#1B9E77",
"dhs"="#949494", "direction"="#7FFFD4", "probe.type" ="#FFE4E1")
```
### 1.Dependency between p-value and covariate {#dependency}
***
There are two representative examples for the dependency between p-value and covariate "mean". The silimar relationship for other covariates could be found in Figure S3.
<br>
<br>
#### 1.1 panel A : differential methylation happens in low methylation region
```{r enrichment in low methylation}
# load data
EWAS33 <- readRDS("Data/dataset/EWAS33.RDS")
i <- "mean"
covariate_type <- "ordinal"
# histogram of all raw p-values
p1 <- ggplot(data = EWAS33, aes(x = P.value, y = stat(density))) +
geom_histogram(fill = "#00AFBB", color = "black", binwidth = 0.05 , size = 0.05, boundary = 0) +
theme_classic() +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 8))
# scatter plot of mean and p-values
result <- ihw(EWAS33$P.value, covariates = EWAS33[[i]], alpha = 0.05, covariate_type = covariate_type)
result_df <- as.data.frame(result)
f <- ecdf(result_df$covariate)
result_df$percentile_covariate <- f(result_df$covariate)
p2 <- ggplot(data = result_df, aes(x = percentile_covariate, y = -log10(pvalue))) +
geom_hex(bins = 100) +
theme_classic() +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 8))
# histogram stratified by covariate mean
result_df$ihwGroup <- groups_by_filter(result_df$group, 5)
p3 <- ggplot(data = result_df, aes(x = pvalue, y = stat(density))) +
geom_histogram(fill = "#00AFBB", color = "black", binwidth = 0.05, boundary = 0, size = 0.05) +
facet_wrap(~ ihwGroup, nrow = 1) +
theme_classic() +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 8))
# put figures together
plots <- align_plots(p1, p3, align = "v", axis = "l")
top_row <- plot_grid(plots[[1]], p2, nrow = 1, align = "h")
panelA <- plot_grid(top_row, plots[[2]], ncol = 1)
panelA
```
<br>
#### 1.2 panel B: differential methylation happens in median methylation region
```{r enrichment in median methylation}
# load data
EWAS14 <- readRDS("Data/dataset/EWAS14.RDS")
i <- "mean"
covariate_type <- "ordinal"
# histogram of all raw p-values
p1 <- ggplot(data = EWAS14, aes(x = P.value, y = stat(density))) +
geom_histogram(fill = "#00AFBB", color = "black", binwidth = 0.05 , size = 0.05, boundary = 0) +
theme_classic() +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 8))
# scatter plot of mean and p-values
result <- ihw(EWAS14$P.value, covariates = EWAS14[[i]], alpha = 0.05, covariate_type = covariate_type)
result_df <- as.data.frame(result)
f <- ecdf(result_df$covariate)
result_df$percentile_covariate <- f(result_df$covariate)
p2 <- ggplot(data = result_df, aes(x = percentile_covariate, y = -log10(pvalue))) +
geom_hex(bins = 100) +
theme_classic() +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 8))
# histogram stratified by covariate mean
result_df$ihwGroup <- groups_by_filter(result_df$group, 5)
p3 <- ggplot(data = result_df, aes(x = pvalue, y = stat(density))) +
geom_histogram(fill = "#00AFBB", color = "black", binwidth = 0.05, boundary = 0, size = 0.05) +
facet_wrap(~ ihwGroup, nrow = 1) +
theme_classic() +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 8))
# put figures together
plots <- align_plots(p1, p3, align = "v", axis = "l")
top_row <- plot_grid(plots[[1]], p2, nrow = 1, align = "h")
panelB <- plot_grid(top_row, plots[[2]], ncol = 1)
panelB
```
<br>
#### 1.3 put all panels together
```{r put figures together, fig.width=6.69, fig.height=8}
fs3 <- plot_grid(panelA, panelB, ncol = 1, align = "vl", labels = c("A", "B"), label_size = 18)
fs3
```
```{r, include=FALSE}
save_plot("Figures/FigureS3.pdf", fs3, base_width = 6.69, base_height = 8)
```
### 2.Comparison of Omnibus test and naïve test {#simulated}
***
We performed simulations to assess the type I error and power of the proposed omnibus test and benchmarked against the naïve tests.
<br>
![](Figures/Figure1A.png)
<br>
### 3.Omnibus test to assess the informativeness of EWAS-related covariates {#real}
***
We use omnibus test to evaluate the 14 EWAS-related covariates' informativeness for each EWAS data set. Covariates with p-value less than 0.05 are thought to be informative.
```{r}
omnibus_result <- read.csv("Data/omnibus.result.csv", header = TRUE)
p <- omnibus_result %>%
filter(measure %in% selected_measure) %>%
ggplot(aes(x = reorder(measure, -log10(p.value), FUN = median), y = -log10(p.value))) +
geom_boxplot(aes(fill = measure), outlier.shape = NA, fatten = 1) +
geom_sina(size = 1, alpha = 0.3, shape = 16) +
geom_hline(yintercept = -log10(0.05), linetype = "dashed") +
coord_flip() +
xlab("") +
scale_fill_manual(values = measure_col) +
theme_classic() +
theme(legend.position = "none",
axis.text = element_text(size = 8),
axis.title = element_text(size = 10))
p
```
```{r, include=FALSE}
f1 <- ggdraw() + draw_image("Figures/Figure1A.png", width = 4, height = 4)
f1 <- plot_grid(f1, p, nrow = 1)
ggsave(filename = "Figures/Figure1.pdf", f1, useDingbats = FALSE)
```