An R package for building patient level predictive models using data in Common Data Model format.
- Takes a cohort and outcome of interest as input.
- Extracts the necessary data from a database in OMOP Common Data Model format.
- Uses a large set of covariates including for example all drugs, diagnoses, procedures, as well as age, comorbidity indexes, etc.
- Various machine learning algorithms can be used to develop predictive models.
- Includes function for evaluating predictive models
- Includes functions to plot and explore model performance (ROC + Calibration)
- Supported outcome models are l1 logistic regression, Random forest, Gradient boosting machines, Naive Bayes, KNN and MLP.
Calibration plot | ROC plot |
PatientLevelPrediction is an R package, with some functions implemented in C++ and python.
Requires R (version 3.3.0 or higher). Installation on Windows requires RTools. Libraries used in PatientLevelPrediction require Java and Python.
The python installation is required for some of the machine learning algorithms. We advise to install Python 3.6 using Anaconda (https://www.continuum.io/downloads)
- Cyclops
- DatabaseConnector
- SqlRender
- FeatureExtraction
- BigKnn
- On Windows, make sure RTools is installed.
- The DatabaseConnector and SqlRender packages require Java. Java can be downloaded from http://www.java.com.
- Random forest, Naive Bayes and MLP require python 3.6. Python 3.6 can be downloaded from: https://www.continuum.io/downloads.
- In R, use the following commands to download and install PatientLevelPrediction:
install.packages("drat")
drat::addRepo("OHDSI")
install.packages("PatientLevelPrediction")
- We recommend testing your instalation by running:
PatientLevelPrediction::checkPlpInstallation()
If you have a response other than 1 (indicating everything works), enter the response number in:
PatientLevelPrediction::interpretInstallCode()
Non-windows users: Please note that the package uses python to implement some of the classifiers. The package pythonInR is used as the interface, and in Linux or Mac OS it uses the same python specified in path (the python that loads when you type the command python). Please make sure the anaconda python is specified in your path rather than any default python (unless it is set up with the following packages), as the packages: numpy, scikit-learn and tensorFlow are required to run the patient level prediciton python code.
Note that for testing you can simulate a random plpData object using the following code:
set.seed(1234)
data(plpDataSimulationProfile)
sampleSize <- 2000
plpData <- PatientLevelPrediction::simulatePlpData(plpDataSimulationProfile, n = sampleSize)
Have a look at the video below for an extensive demo of the package.
- Vignette: Building patient-level predictive models
- Vignette: Adding existing models
- Developer questions/comments/feedback: OHDSI Forum
- We use the GitHub issue tracker for all bugs/issues/enhancements
PatientLevelPrediction is licensed under Apache License 2.0
PatientLevelPrediction is being developed in R Studio.
Beta
- This project is supported in part through the National Science Foundation grant IIS 1251151.