Data Management Hub is a distributed data management platform which enable secure collaboration on sensitive data and make data public in a way it remain useful for decades. It will complement the existing networks of institutional repositories and make verifiable (research) data truly findable, accessible, interoperable and reusable by employing a distributed file system (IPFS) and a public blockchain, to ensure that all changes are secured against falsifications by digital signatures.
• Executive Summary ( drafted 19 December 2016)
• Data Management Hub in 2 min video
• Presentation of Data Manegement Hub at the Academic Publishing in Europe conference - 10 min video (18 January 2017)
• slides
• DEMO
• Contact us and help to built Data Management Hub
We are living through a fundamental change, progressing from an era in which data was scarce and hard to access to an era in which data exceeds our ability to extract meaning from it.
Data fusion from different fields of science and across institutions is gaining importance in making new discoveries and replicating old ones [1].
Computer science is now called to action to develop new infrastructures which will make it possible to employ data exploration, analysis and mining techniques to extract insight from large datasets with the help of artificial intelligence [2]. Such infrastructures must be resilient, permanent and linked (read ‘useful’).
The explosion of data-intensive research is challenging researchers, publishers and librarians to create new solutions to link publications to research data (and vice versa), to facilitate data mining and to manage the dataset as a potential unit of publication.
Funders of academic research in Europe consistently emphasized the importance of research data being made available in addition to original research articles as a means of comprehensive and sustainable reporting of research findings. As a result, it is mandatory to have a data management plan in place for research grants issued under the FP7 framework agreement and advise grantees that “5% of total research expenditure should be spent on properly managing and 'stewarding' data in an integrated fashion.” [11]
With regard to clinical research in particular, there is growing consensus among key organisation about the societal relevance of shared research data: “The International Committee of Medical Journal Editors (ICMJE) believes that there is an ethical obligation to share data generated by interventional clinical trials responsibly because participants have put themselves at risk.” [10]
The Data Management Hub is a distributed data management platform which fits into researchers’ workflows, enables secure collaboration on sensitive data [4] and empowers dissemination of research outcomes so that data will remain useful for decades [5]. The Data Management Hub will link research data and publications permanently to each other.
Peer reviewing publication and re-use of open research data will be delivered by this platform. Careerwise, by using the Data Management Hub, researchers will benefit from unambiguous data authorship attribution.
The Data Management Hub platform authenticates and timestamps generated research data automatically, and enables scientists to benefit from a state-of-the art version management system in order to empower replications 1 of scientific studies universally.
The Data Management Hub uses a hybrid topology, based on a distributed network of servers in the cloud and at universities. The platform is built upon two innovative openly licensed technologies:
-
the blockchain - an immutable digital public ledger that is distributed, verified and monitored by multiple sources at the same time [6], and
-
the Interplanetary File System (IPFS) - a distributed peer-to-peer hypermedia protocol [7].
Publications and underlying research data are stored on IPFS that also connects all nodes. Each publication is addressed by unique immutable cryptographic hash (permanent URL / persistent identifier).
Permissioned blockchains are immutable records of transaction history performed by a predefined list of subjects with known identities. Due to its distributed nature, blockchains provide a built-in means of recovery from database corruption and a mechanism for definitive data verification.
The Data Management Hub can be seen as two module solution: a secure work space and an open data repository:
-
The secure workspace enables researchers to collaborate on sensitive data by authentication only.
-
The open data repository is where scientists publish their data, interact with others and share research outcomes. [4]
The architecturally distributed network of nodes will take care of open data publishing, archiving and interaction with data on a worldwide scale. Each file submitted to the network is given a unique cryptographic hash (a persistent identifier) that allows the IPFS network to automatically delete duplicates and track version history for every file. Historic versioning prevents information from being easily erased. Since the files are provided by distributed nodes, download speeds are higher. [8]
The data management hub is ideally positioned to support clinical researchers along the different steps of the research workflow. Our goal is to map out the specific data management needs and requirements for the digital biomarkers community and subsequently provide a tailored software solution which integrates with the specific research practices and publication routines.
The Data Management Hub allows researchers to manage research data in a seamless process from experimentation to publication. The Data Management Hub integrates enabling scientists
- to keep sensitive data secure and,
- to collaborate between research groups,
- to make research data FAIR (Findable, Accessible, Interoperable and Reusable) [13],
- to effortlessly publish data together with original journal articles.
Data Management Hub must be designed, developed and maintained according to the Principles for Open Scholarly Infrastructures[14]
In turn, the research data infrastructure enabled with the Data Management Hub will be permanent, persistent and linked and will have a transformative potential for the way we do science in the future
Footnotes
-
1,500 scientists lift the lid on reproducibilty http://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970 (cited 17 August 2017) ↩