You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 27, 2023. It is now read-only.
it just occurred to me during the call with @kjgarza that it might be a good idea to write down the draft design principles for hoad that we've been talking about.
There are three levels of user/target segmentation, which correspond to three levels of our code.
Distributed in-memory database.
This database should be as generic as possible, in the extreme case just duplicating the crossref coverage, but with a lot better performance and arbitrary SQL/dplyr queries.
Target: Analysts (us).
Code:
setup of the database (currently Google BigQuery, maybe Azure Synapse)
batch jobs to seed the db with dumps and incremental updates
example queries
Domain-specific APIs
Opinionated queries against 1 to yield domain-specific data objects (that fit into laptop memories).
A set of (multiple!) tidy data frames that make sense for hybrid open access uptake analysis, i.e. make it possible to run the plots/analyses in 3.
Target: R users interested in hybrid OA.
Code:
dplyr/sql queries against 1
additional on-client data wrangling
assertions and tests
Dashboard
Views on the data in 2 to tell answer our business questions.
Target: HOAD project stakeholders
Code:
plots (those are also part of the package proper)
dashboard (maybe modules are also part of the package)
The text was updated successfully, but these errors were encountered:
it just occurred to me during the call with @kjgarza that it might be a good idea to write down the draft design principles for hoad that we've been talking about.
There are three levels of user/target segmentation, which correspond to three levels of our code.
This database should be as generic as possible, in the extreme case just duplicating the crossref coverage, but with a lot better performance and arbitrary SQL/dplyr queries.
Opinionated queries against 1 to yield domain-specific data objects (that fit into laptop memories).
A set of (multiple!) tidy data frames that make sense for hybrid open access uptake analysis, i.e. make it possible to run the plots/analyses in 3.
Views on the data in 2 to tell answer our business questions.
The text was updated successfully, but these errors were encountered: