-
The famous Titanic dataset was used for exploration,preparation and modelling with Logistic Regression model.
-
Four feature selction techniques were used to select the best features to include in the model, namely :
- RFE(Recursive Feature Elimination)
- Decision Trees Feature Selection
- Correlation Analysis
- Coefficient Feature Importance(Using Logistic Regression)
-
The model obtained an accuracy of 100% using pre-processing required by Logistic Regression, which included :
- Removing outliers.
- Removing mutlicollinearity - The model asssumees that the feature variables are not correlated with each other. Highly correlated features should be removed.
- Asserting linear assumption - Feature variables need to have a linear relationship with the target variable. A log transformation is used to assert that relationship if it is not present.
- Asserting normal distribution - Feature variables need to hae a normal distribution. If they are not normally distributed a log transform or BoxCox is used to assert the distribution.
- Feature scaling - The features must be scaled as they might not be habing the same range of values, therefore redulting in features with high numbers dominating the model and appearing to be more important than other variables. Feature scaling helps us scale them to the same range and tehrefore give each feature a chance to equally contribute to the model.
-
Notifications
You must be signed in to change notification settings - Fork 0
Mschlei-48/Titanic-Data-Exploration-Preparation-and-Modelling-
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Titanic Dataset Exploration, Visualization and Modelling with Logistic Regression.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published