BREAST CANCER PREDICTION
DATA INFORMATION
Breast Cancer is the most common among women and it is the second leading cause of cancer death in United States. Breast cancer occurs when there is an abnormal growth of the breast tissue which is generally called as a tumor. But not all the tumors are cancerous. It can be non-cancerous too. Those tumors can be of the following.
Benign - Not cancerous Pre-Malignant - Pre-cancerous Malignant-Cancerous
The diagnosis can be done through tests such as MRI, Mammogram, ultrasound and Biopsy.
OBJECTIVE
The Objective of this project is to predict whether the patient has Breast cancer or not by using the data taken in diagnosis of the disease. This is a clinical data and we need to come up with the accuracy of not less than 95%. Here we are going to start with analyzing the data and apply some machine learning techniques to create a model.
DATA SET
The data set "cancer.csv" contains the data of the patients who got their diagnosis for Breast cancer. There are 569 samples consisting of both Malignant and Benign tumor cells.
There are 32 columns and 569 records The first column has the unique ID numbers of the samples The second column has the diagonis with M representing 'Malignant' and B representing 'Benign'. The rest of the columns consists of the diagnosed value computed from the cell nuclei which we are going to use those values to build our model for cancer prediction ie., whether the tumor is benign or malignant.