GitHub - ZuhalAmarkhil/ML-Predicting-Patients-Metastatic-Diagnosis-Period: Utilizing Python, this project analyzes real-world healthcare data, leveraging machine learning algorithms to predict patients' metastatic diagnosis periods.

Predicting Metastatic Diagnosis Period with Machine Learning

Description

This project analyzes a real-world evidence dataset from Health Verity (HV), one of the largest healthcare data ecosystems in the US, to predict the patient's Metastatic Diagnosis Period (metastatic_diagnosis_period). The HV dataset contains health-related information of patients diagnosed with metastatic triple-negative breast cancer in the US. It includes roughly 19k records, with each row corresponding to a single patient and her metastatic diagnosis period.

The project encompasses data preprocessing, data exploration, feature engineering and selection, modeling for predicting the metastatic diagnosis period, model evaluation, and outputting the results in a CSV file.

Libraries Utilized

pandas: Used for data manipulation and analysis.
numpy: Utilized for numerical operations and array handling.
seaborn: For statistical data visualization.
matplotlib.pyplot: For creating visualizations.
sklearn.impute.SimpleImputer: To handle missing values.
sklearn.preprocessing.LabelEncoder: For encoding categorical labels.
sklearn.impute.KNNImputer: For imputing missing values using k-nearest neighbors.
matplotlib.colors.LinearSegmentedColormap: For defining custom colormaps.
sklearn.model_selection.train_test_split: To split the dataset into training and testing sets.
sklearn.metrics: For evaluating model performance using mean absolute error, mean squared error.
catboost.CatBoostRegressor: For modeling using gradient boosting with categorical features.
lightgbm.LGBMRegressor: For efficient gradient boosting modeling with large datasets.
xgboost: For optimized distributed gradient boosting modeling.

Project Structure

Plots and Charts/: Directory containing visualizations and plots generated during the analysis.
Metadata.md: Contains metadata information about the dataset.
README.md: Project documentation and overview.
analysis_prediction.ipynb: Jupyter notebook with data analysis, modeling, and prediction.
train.csv, test.csv: These files contain datasets for training and testing machine learning models. The training dataset (train.csv) is used to train the model, while the testing dataset (test.csv) is used to evaluate its performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Metastatic Diagnosis Period with Machine Learning

Description

Libraries Utilized

Project Structure

About

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Plots and Charts		Plots and Charts
Metadata.md		Metadata.md
README.md		README.md
analysis_prediction.ipynb		analysis_prediction.ipynb
test.csv		test.csv
train.csv		train.csv

ZuhalAmarkhil/ML-Predicting-Patients-Metastatic-Diagnosis-Period

Folders and files

Latest commit

History

Repository files navigation

Predicting Metastatic Diagnosis Period with Machine Learning

Description

Libraries Utilized

Project Structure

About

Topics

Resources

Stars

Watchers

Forks

Packages 0

Languages

Packages