Skip to content

Commit

Permalink
Merge pull request #46 from subugoe/review
Browse files Browse the repository at this point in the history
Review
  • Loading branch information
Ahobert authored Dec 1, 2020
2 parents 18a762f + 031b6fd commit 26297c1
Show file tree
Hide file tree
Showing 22 changed files with 41,969 additions and 31 deletions.
47 changes: 23 additions & 24 deletions analysis.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ author: "Anne Hobert"
date: "3/6/2020"
output:
github_document:
word_document:
fig_caption: yes
html_document:
fig_caption: yes
keep_md : true
urlcolor: blue
urlcolor: blue
word_document:
fig_caption: yes
---

```{r, echo = FALSE, message = FALSE, warning = FALSE}
Expand All @@ -23,13 +23,13 @@ knitr::opts_chunk$set(
fig.asp = 0.618,
out.width = "90%",
fig.align = "center",
dpi = 600
dpi = 600,
dev= c("png", "cairo_ps")
)
```

```{r}
library(tidyverse)
#library(urltools)
library(cowplot)
library(colorblindr)
library(scales)
Expand All @@ -52,12 +52,12 @@ pubs_cat <- pubs_cat %>%
sector == "Ressortforschung" ~ "GRA"
),
sector = case_when(
sector == "Hochschulen" ~ "Universities",
sector == "Helmholtz-Gemeinschaft" ~ "Helmholtz Association",
sector == "Max-Planck-Gesellschaft" ~ "Max Planck Society",
sector == "Leibniz-Gemeinschaft" ~ "Leibniz Association",
sector == "Fraunhofer-Gesellschaft" ~ "Fraunhofer Society",
sector == "Ressortforschung" ~ "Government Research\n Agencies"
sector == "Hochschulen" ~ "Universities (UNI)",
sector == "Helmholtz-Gemeinschaft" ~ "Helmholtz Association (HGF)",
sector == "Max-Planck-Gesellschaft" ~ "Max Planck Society (MPS)",
sector == "Leibniz-Gemeinschaft" ~ "Leibniz Association (WGL)",
sector == "Fraunhofer-Gesellschaft" ~ "Fraunhofer Society (FhS)",
sector == "Ressortforschung" ~ "Government Research\n Agencies (GRA)"
),
oa_category = factor(
oa_category,
Expand Down Expand Up @@ -121,7 +121,7 @@ pubs_oa_year <- pubs_cat %>%
First, we look at how the overall OA share developed over time. The following figure displays the number of publications associated with one of the German research institutions we considered and highlights they part that is freely accessible online according to Unpaywall over the considered time period from 2010 until 2018. The total number of articles over the whole period is `r sum(pubs_oa_year$number_of_articles)` with an overall OA share of `r round(sum(pubs_oa_year %>% filter(oa_category == "is_oa") %>% .$number_of_articles)/sum(pubs_oa_year$number_of_articles)*100)` %.


```{r, fig.cap="Open access to journal articles from German research institutions according to Unpaywall. Blue area represents journal articles with at least one freely available full-text, grey area represents toll-access articles."}
```{r fig2, fig.cap="Open access to journal articles from German research institutions according to Unpaywall. Blue area represents journal articles with at least one freely available full-text, grey area represents toll-access articles."}
ggplot(pubs_oa_year, aes(x = PUBYEAR, y = number_of_articles)) +
geom_area(aes(fill = fct_rev(oa_category), group = fct_rev(oa_category)), alpha = 0.8, colour = "white") +
scale_fill_manual(
Expand All @@ -135,7 +135,6 @@ ggplot(pubs_oa_year, aes(x = PUBYEAR, y = number_of_articles)) +
breaks = scales::extended_breaks()(0:110000)
) +
labs(x = "Publication Year", y = "Total Articles") +
# theme_minimal_hgrid(12) +
theme_minimal_hgrid() +
theme(legend.position = "top",
legend.justification = "right")
Expand All @@ -147,7 +146,7 @@ As can be seen, the total number of articles, as well as the part that is OA inc

In order to investigate what role the different sectors play in OA publishing in Germany and how they contribute to the OA development/overall OA shares, we distplay the development over time of the number of OA articles for each sector in the following figure. Note that scales for the `y-axes` are not the same, since the total publication output varies significantly among sectors.

```{r, fig.asp=1, fig.cap="Open access to journal articles per sector according to Unpaywall. Blue area represents journal articles with at least one freely available full-text, grey area represents toll-access articles. Sectors are ordered by publication output with the highest output top left and lowest at the bottom. Note that scales for the `y-axes` are not the same, since the total publication output varies significantly among sectors."}
```{r fig3, fig.asp=1, fig.cap="Open access to journal articles per sector according to Unpaywall. Blue area represents journal articles with at least one freely available full-text, grey area represents toll-access articles. Sectors are ordered by publication output with the highest output top left and lowest at the bottom. Note that scales for the `y-axes` are not the same, since the total publication output varies significantly among sectors."}
pubs_cat %>%
mutate(oa_category = fct_collapse(
oa_category,
Expand Down Expand Up @@ -190,7 +189,7 @@ pubs_cat %>%
legend.justification = "right") +
theme_minimal_hgrid() +
# bold facet names
theme(strip.text = element_text(face="bold")) +
# theme(strip.text = element_text(face="bold")) +
theme(legend.position = "top",
legend.justification = "right")
```
Expand Down Expand Up @@ -278,7 +277,7 @@ oa_shares_inst_sector_stats <- oa_shares_inst_sector %>%
```
The following figure displays scatterplots where the OA share of an institution over the whole time period is shown with respect to its publication output.

```{r, fig.asp=1, fig.cap="Open Access shares of research institutions in Germany with respect to their total publication output grouped by the sector they belong to. Only institutions with at least 100 publications are shown. Blue points correspond to single insitutions, gray lines are obained by linear regression within the sector, gray areas are pointwise symmetric 95% t-distribution confidence bands. Scales of the x-axes vary across subplots in order to adapt to the different publication volumes. Dashed lines show the median value per sector for the OA share (red) and the total number of publications (orange)."}
```{r fig4, fig.asp=1, fig.cap="Open Access shares of research institutions in Germany with respect to their total publication output grouped by the sector they belong to. Only institutions with at least 100 publications are shown. Blue points correspond to single insitutions, gray lines are obained by linear regression within the sector, gray areas are pointwise symmetric 95% t-distribution confidence bands. Scales of the x-axes vary across subplots in order to adapt to the different publication volumes. Dashed lines show the median value per sector for the OA share (red) and the total number of publications (orange)."}
point_shapes <- oa_shares_inst_sector %>%
filter(n_total >= 100) %>%
mutate(point_shape = ifelse(inst_label =="", 19, 15)) %>%
Expand Down Expand Up @@ -321,7 +320,7 @@ oa_shares_inst_sector %>%
theme_minimal_grid() +
theme(legend.position = "none") +
# bold facet labels
theme(strip.text = element_text(face = "bold"))+
# theme(strip.text = element_text(face = "bold"))+
theme(axis.text=element_text(size=10))
```
The most striking observations from this figure are the high OA shares of most of the Max-Planck and Helmholtz institutes and the very low OA fractions of almost all of the state and federal institutes as well as the ones from the Fraunhofer Society. Universities and Leibniz-Society have many institutes with OA shares close to one half. We can further see very well that the universities have by far the largest publication volumes, followed by the Helmholtz-Society. The linear trend of higher publication volume implying higher OA shares is most distinctive for the university sector (narrowest confidence bands).
Expand Down Expand Up @@ -420,7 +419,7 @@ which_significant <- function(ci_lims, grouped = FALSE){
```


```{r, fig.cap="OA shares of German research institutions per sector. The color of the boxes groups sectors into universities with a typically high total journal publication output, research-oriented institutes with a medium journal publication output and practise oriented institutions with a comparatively low journal publication output. Gray points display the OA shares for individual institutions. Notches indicate approximate 95 % confidence intervals for the median values. Non-overlapping notches imply a strong indication that median values are significantly different."}
```{r fig5, fig.cap="OA shares of German research institutions per sector. The color of the boxes groups sectors into universities with a typically high total journal publication output, research-oriented institutes with a medium journal publication output and practise oriented institutions with a comparatively low journal publication output. Gray points display the OA shares for individual institutions. Notches indicate approximate 95 % confidence intervals for the median values. Non-overlapping notches imply a strong indication that median values are significantly different."}
oa_shares_inst_sec_boxplot <- pubs_cat %>%
anti_join(exclude_from_inst_analysis, by = c("INST_NAME" = "NAME")) %>%
group_by(sector) %>%
Expand Down Expand Up @@ -513,7 +512,7 @@ readr::write_csv(cat_overlap, "data/overlap_oa_categories.csv")

Keeping in mind that our categories are non-exclusive, as just shown, we now visualise the number of articles per category on the national level, that is, without differentiation by sector. As a first step, we investigate the two main OA routes via a journal or via a repository.

```{r, fig.asp = 0.4, fig.cap="Development of the number of articles per OA host type and their overlap. Highlighted in blue are the number of articles per OA host type with articles made available only via a journal on the left, articles available only in repositories on the right and the overlap, that is, articles openly accessible via both a journal and a repository, in the middle. Grey Area shows the remaining OA articles."}
```{r fig6, fig.asp = 0.4, fig.cap="Development of the number of articles per OA host type and their overlap. Highlighted in blue are the number of articles per OA host type with articles made available only via a journal on the left, articles available only in repositories on the right and the overlap, that is, articles openly accessible via both a journal and a repository, in the middle. Grey Area shows the remaining OA articles."}
host_overlap <- pubs_cat %>%
filter(oa_category != "not_oa") %>%
mutate(oa_host = fct_collapse(
Expand Down Expand Up @@ -565,13 +564,13 @@ host_overlap <- pubs_cat %>%
legend.justification = "right") +
theme_minimal_hgrid() +
# bold facet names
theme(strip.text = element_text(face="bold")) +
# theme(strip.text = element_text(face="bold")) +
theme(legend.position = "top",
legend.justification = "right")
```


```{r, fig.cap="Development of the percentage of journal articles per OA category (as per schema in Table 1) over time. Categories are non-exclusive, that is some articles may be counted for more than one category. Colors correspond to the OA category. On the left, access provided via a journal is displayed, on the right via repositories. Grey area shows the total percentage of OA via the corresponding route (journal or repository). "}
```{r fig7, fig.cap="Development of the percentage of journal articles per OA category (as per schema in Table 1) over time. Categories are non-exclusive, that is some articles may be counted for more than one category. Colors correspond to the OA category. On the left, access provided via a journal is displayed, on the right via repositories. Grey area shows the total percentage of OA via the corresponding route (journal or repository). "}
oa_shares_host <- pubs_cat %>%
mutate(PUBYEAR = lubridate::ymd(paste0(PUBYEAR, "-01-01"))) %>%
group_by(PUBYEAR) %>%
Expand Down Expand Up @@ -634,9 +633,9 @@ pubs_cat %>%
labs(x = "Publication Year", y = "OA percentage") +
theme_minimal_hgrid() +
scale_x_date(date_labels = "%y") +
theme(legend.position = "none") +
theme(legend.position = "none") # +
# bold facet labels
theme(strip.text = element_text(face = "bold")) + ggsave("lineplot_rel.png", width = 6, height = 4)
# theme(strip.text = element_text(face = "bold"))
```
Observations:

Expand All @@ -646,7 +645,7 @@ Observations:
- most prevalent category: subject-specific repos, registered with OpenDOAR

Again, we go one step further and look at sector specific OA proportions.
```{r, fig.asp=1, fig.cap="OA shares per category and sector. Coloring and size of the points displays the percentage in the respective category. Grey numbers display the percentage value explicitly. The bottom row shows the overall OA share of the sectors, the rightmost column the percentage of articles in the corresponding category regardless of the sector (on the national level). Ordering of the sectors is according to total publication output for the entire sector (highest: universities, lowest: Fraunhofer Society)."}
```{r fig8, fig.asp=1, fig.cap="OA shares per category and sector. Coloring and size of the points displays the percentage in the respective category. Grey numbers display the percentage value explicitly. The bottom row shows the overall OA share of the sectors, the rightmost column the percentage of articles in the corresponding category regardless of the sector (on the national level). Ordering of the sectors is according to total publication output for the entire sector (highest: universities, lowest: Fraunhofer Society)."}
oa_shares_sector <-pubs_cat %>%
mutate(n_total = n_distinct(PK_ITEMS)) %>%
group_by(sec_abbr) %>%
Expand Down
Loading

0 comments on commit 26297c1

Please sign in to comment.