-
Notifications
You must be signed in to change notification settings - Fork 4
/
simul_rota.Rmd
244 lines (194 loc) · 8.03 KB
/
simul_rota.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
---
title: Simulate data and fit a 2-species static (aka single-season) occupancy model
à la Rota et al. (2016)
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, dpi = 600, cache = TRUE, warning = FALSE, message = FALSE)
```
We consider a two-species static occupancy model à la [Rota et al. (2016)](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12587). We simulate data from this model, and fit the model to these data using `Unmarked`.
## Setting the scene
Ignoring the site index, we use the following notation for the occupancy probabilities:
* $\psi_{11}$ is the prob. that species 1 and species 2 are both present;
* $\psi_{10}$ is the prob. that species 1 is present and species 2 is absent;
* $\psi_{01}$ is the prob. that species 1 is absent and species 2 is present;
* $\psi_{00}$ is the prob. that species 1 and species 2 are both absent,
with avec $\psi_{11} + \psi_{10} + \psi_{01} + \psi_{00} = 1.$
The marginal probabilities of occupancy are:
* $\Pr(z_1 = 1) = \Pr(\mbox{species 1 is present}) = \psi_{10} + \psi_{11}$
* $\Pr(z_2 = 1) = \Pr(\mbox{species 2 is present}) = \psi_{01} + \psi_{11}$
* $\Pr(z_1 = 0) = \Pr(\mbox{species 1 is absent}) = \psi_{01} + \psi_{00}$
* $\Pr(z_2 = 0) = \Pr(\mbox{species 2 is absent}) = \psi_{10} + \psi_{00}$
And the conditional probabilities (reminder: $\Pr(\mbox{A|B}) = \Pr(\mbox{A and B})/\Pr(\mbox{B})$):
* $\Pr(z_1 = 1 | z_2 = 0) = \psi_{10} / (\psi_{10} + \psi_{00}) = \Pr(\mbox{species 1 is present given species 2 is absent});$
* $\Pr(z_1 = 1 | z_2 = 1) = \psi_{11} / (\psi_{11} + \psi_{01}) = \Pr(\mbox{species 1 is present given species 2 is present});$
* $\Pr(z_2 = 1 | z_1 = 0) = \psi_{01} / (\psi_{01} + \psi_{00}) = \Pr(\mbox{species 2 is present given species 1 is absent});$
* $\Pr(z_2 = 1 | z_1 = 1) = \psi_{11} / (\psi_{11} + \psi_{10}) = \Pr(\mbox{species 2 is present given species 1 is present}).$
## Data simulation
We will use the package `mipfb` to simulate occupancy state as a multivariate Bernoulli random variable; more about the multivariate Bernoulli can be found in [Dai et al. (2013)](https://arxiv.org/pdf/1206.1874.pdf):
```{r}
library(mipfp)
```
For reproducibility, we set the seed:
```{r}
set.seed(2020)
```
Choose the number of species, the number of sites, and the number of visits:
```{r}
S <- 2 # nb species
N <- 500 # nb sites
J <- 5 # nb visits
```
Let's consider a scenario in which species 2 avoids species 1 while species 1 does not care about species 2 and its presence or absence. To specify this scenario, we will work out the conditional probabilities with, for example:
* $\Pr(z_2 = 1 | z_1 = 0) = 0.6$, species 2 is present with high probability whenever species 1 is absent
* $\Pr(z_2 = 1 | z_1 = 1) = 0.1$, species 2 avoids species 1 when it is present
* $\Pr(z_1 = 1 | z_2 = 0) = \Pr(z_1 = 1 | z_2 = 1) = 0.4$, species 1 does not care about presence/absence of species 2
Now we need to go back to the probabilities of occupancy. Let $x = \psi_{01}$, $y = \psi_{10}$ et $z = \psi_{11}$ soit $1 - x - y - z = \psi_{00}$, then we have a system of 3 equations with 3 unknowns:
$$0.6 = x / (x + 1 - x - y - z) \Leftrightarrow x + 0.6y + 0.6z = 0.6$$
$$0.1 = z / (z + y) \Leftrightarrow -0.1y + 0.9z = 0$$
$$0.4 = y / (y + 1 - x - y - z) \Leftrightarrow 0.4x + y + 0.4z = 0.4$$
which can be solved with the [Mathematica online solver](https://www.wolframalpha.com/input/?i=solve%7Bx%2B0.6y%2B0.6z%3D%3D0.6%2C-0.1y%2B0.9z%3D%3D0%2C0.4x%2By%2B0.4z%3D%3D0.4%7D):
```{r}
psi01 <- 81/175
psi10 <- 36/175
psi11 <- 4/175
psi00 <- 1 - (psi01 + psi10 + psi11) # 54/175
```
We then obtain the marginal occupancy probabilities:
```{r}
psiS1 <- psi10 + psi11
psiS2 <- psi01 + psi11
```
Now we're ready to simulate data from a multivariate Bernoulli (check out `?RMultBinary` and `?ObtainMultBinaryDist`).
First, we calculate the odds ratios:
```{r}
or <- matrix(c(1, (psiS1*(1-psiS2))/(psiS2*(1-psiS1)),
(psiS2*(1-psiS1))/(psiS1*(1-psiS2)), 1), nrow = 2, ncol = 2, byrow = TRUE)
rownames(or) <- colnames(or) <- c("sp1", "sp2")
```
Then the marginal probabilities:
```{r}
marg.probs <- c(psiS1, psiS2)
```
And we estimate the joint probability:
```{r}
p.joint <- ObtainMultBinaryDist(odds = or, marg.probs = marg.probs)
```
At last, we generate $N$ random samples from a bivariate Bernoulli (2 species) with relevant parameters
```{r}
z <- RMultBinary(n = N, mult.bin.dist = p.joint)$binary.sequences
```
Now we add on top the observation. First, we fix the detection probability for each species:
```{r}
ps <- c(0.5,0.9)
```
Then we generate the detection and non-detections for each species, which we store in a list:
```{r}
y <- list()
for (i in 1:S){
y[[i]] <- matrix(NA,N,J)
for (j in 1:N){
for (k in 1:J){
y[[i]][j,k] <- rbinom(1,1,z[j,i]*ps[i])
}
}
}
names(y) <- c('sp1','sp2')
```
## Model fitting
Now let us fit a 2-species static occupancy model to the data we have simulated. We need to load the package `unmarked`:
```{r}
library(unmarked)
```
We format the data as required:
```{r}
data <- unmarkedFrameOccuMulti(y=y)
```
Let's have a look to the data:
```{r}
summary(data)
```
And in particular the detections and non-detections:
```{r}
plot(data)
```
Now we specify the effects we would like to consider on the occupancy and detection probabilities. The thing is that the function `occuMulti` doesn't work directly on the occupancy probabilities but on the so-called natural parameters (in that specific order):
* $f_1 = \log(\psi_{10}/\psi_{00})$;
* $f_2 = \log(\psi_{01}/\psi_{00})$;
* $f_{12} = \log(\psi_{00}\psi_{11} / \psi_{10}\psi_{01})$,
that is:
* $\psi_{11} = \exp(f_1+f_2+f_{12})/\mbox{den}$;
* $\psi_{10} = \exp(f_1)/\mbox{den}$;
* $\psi_{01} = \exp(f_2)/\mbox{den}$,
where $\mbox{den} = 1+\exp(f_1)+\exp(f_2)+\exp(f_1+f_2+f_{12})$:
```{r}
occFormulas <- c('~1','~1','~1')
```
To specify the effects on detection, there is no difficulty:
```{r}
detFormulas <- c('~1','~1')
```
We fit a model with constant natural parameters and constant detection probabilities
```{r}
fit <- occuMulti(detFormulas,occFormulas,data)
```
Display the result:
```{r}
fit
```
Get the natural parameter and detection estimates:
```{r}
mle <- fit@opt$par
names(mle) <- c('f1','f2','f12','lp1','lp2')
```
Get the occupancy estimates:
```{r}
den <- 1 + exp(mle['f1'])+exp(mle['f2'])+exp(mle['f1']+mle['f2']+mle['f12'])
psi11hat <- exp(mle['f1']+mle['f2']+mle['f12'])/den
psi10hat <- exp(mle['f1'])/den
psi01hat <- exp(mle['f2'])/den
```
I do it by hand to understand how `unmarked` works. The easy way is to use `predict(fit,'state')`.
Get the detection estimates:
```{r}
p1hat <- plogis(mle['lp1'])
p2hat <- plogis(mle['lp2'])
```
Again I do it by hand, but `unmarked` can do it for you with `predict(fit,'det')`.
Now compare the parameters we used to simulate the data (left column) to the parameter estimates (right column)
```{r}
res <- data.frame(real = c(psiS1,
psiS2,
psi01,
psi10,
psi11,
ps[1],
ps[2]),
estim = c(psi10hat+psi11hat,
psi01hat+psi11hat,
psi01hat,
psi10hat,
psi11hat,
p1hat,
p2hat))
rownames(res) <- c('marginal_occ1','marginal_occ2','psi01','psi10','psi11','det1','det2')
res
```
If you just want to get the parameter estimates directly:
```{r}
# detection
predict(fit,'det',species=1)[1,]
predict(fit,'det',species=2)[1,]
# marginal occupancy
predict(fit,'state',species=1)[1,]
predict(fit,'state',species=2)[1,]
# conditional occupancy
predict(fit,'state',species=1,cond='sp2')[1,] # species 1 | species 2 present
predict(fit,'state',species=1,cond='-sp2')[1,] # species 1 | species 2 absent
predict(fit,'state',species=2,cond='sp1')[1,] # species 2 | species 1 present
predict(fit,'state',species=2,cond='-sp1')[1,] # species 2 | species 1 absent
```
## R version used
```{r}
sessionInfo()
```