diff --git a/My Project/.ipynb_checkpoints/09-11-2020 ML Course Nigeria Project 'Abdulhameed Araromi'-checkpoint.ipynb b/My Project/.ipynb_checkpoints/09-11-2020 ML Course Nigeria Project 'Abdulhameed Araromi'-checkpoint.ipynb new file mode 100644 index 0000000..5258d8a --- /dev/null +++ b/My Project/.ipynb_checkpoints/09-11-2020 ML Course Nigeria Project 'Abdulhameed Araromi'-checkpoint.ipynb @@ -0,0 +1,1696 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Project\n", + "\n", + "In this project, our aim is to building a model for predicting churn. Churn is the percentage of customers that stopped using your company's product or service during a certain time frame. Thus, in the given dataset, our label will be `Churn` column.\n", + "\n", + "## Steps\n", + "- Read the `churn.csv` file and describe it.\n", + "- Make at least 4 different analysis on Exploratory Data Analysis section.\n", + "- Pre-process the dataset to get ready for ML application. (Check missing data and handle them, can we need to do scaling or feature extraction etc.)\n", + "- Define appropriate evaluation metric for our case (classification).\n", + "- Train and evaluate Logistic Regression, Decision Trees and one other appropriate algorithm which you can choose from scikit-learn library.\n", + "- Is there any overfitting and underfitting? Interpret your results and try to overcome if there is any problem in a new section.\n", + "- Create confusion metrics for each algorithm and display Accuracy, Recall, Precision and F1-Score values.\n", + "- Analyse and compare results of 3 algorithms.\n", + "- Select best performing model based on evaluation metric you chose on test dataset.\n", + "\n", + "\n", + "Good luck :)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Abdulhameed Temitope Araromi

" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import seaborn as sns\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ChurnAccountWeeksContractRenewalDataPlanDataUsageCustServCallsDayMinsDayCallsMonthlyChargeOverageFeeRoamMins
00128112.71265.111089.09.8710.0
10107113.71161.612382.09.7813.7
20137100.00243.411452.06.0612.2
3084000.02299.47157.03.106.6
4075000.03166.711341.07.4210.1
\n", + "
" + ], + "text/plain": [ + " Churn AccountWeeks ContractRenewal DataPlan DataUsage CustServCalls \\\n", + "0 0 128 1 1 2.7 1 \n", + "1 0 107 1 1 3.7 1 \n", + "2 0 137 1 0 0.0 0 \n", + "3 0 84 0 0 0.0 2 \n", + "4 0 75 0 0 0.0 3 \n", + "\n", + " DayMins DayCalls MonthlyCharge OverageFee RoamMins \n", + "0 265.1 110 89.0 9.87 10.0 \n", + "1 161.6 123 82.0 9.78 13.7 \n", + "2 243.4 114 52.0 6.06 12.2 \n", + "3 299.4 71 57.0 3.10 6.6 \n", + "4 166.7 113 41.0 7.42 10.1 " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Read csv\n", + "data = pd.read_csv(\"churn.csv\")\n", + "data.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 3333 entries, 0 to 3332\n", + "Data columns (total 11 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Churn 3333 non-null int64 \n", + " 1 AccountWeeks 3333 non-null int64 \n", + " 2 ContractRenewal 3333 non-null int64 \n", + " 3 DataPlan 3333 non-null int64 \n", + " 4 DataUsage 3333 non-null float64\n", + " 5 CustServCalls 3333 non-null int64 \n", + " 6 DayMins 3333 non-null float64\n", + " 7 DayCalls 3333 non-null int64 \n", + " 8 MonthlyCharge 3333 non-null float64\n", + " 9 OverageFee 3333 non-null float64\n", + " 10 RoamMins 3333 non-null float64\n", + "dtypes: float64(5), int64(6)\n", + "memory usage: 286.6 KB\n" + ] + } + ], + "source": [ + "# Describe our data for each feature and use .info() for get information about our dataset\n", + "# Analys missing values\n", + "data.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ChurnAccountWeeksContractRenewalDataPlanDataUsageCustServCallsDayMinsDayCallsMonthlyChargeOverageFeeRoamMins
count3333.0000003333.0000003333.0000003333.0000003333.0000003333.0000003333.0000003333.0000003333.0000003333.0000003333.000000
mean0.144914101.0648060.9030900.2766280.8164751.562856179.775098100.43564456.30516110.05148810.237294
std0.35206739.8221060.2958790.4473981.2726681.31549154.46738920.06908416.4260322.5357122.791840
min0.0000001.0000000.0000000.0000000.0000000.0000000.0000000.00000014.0000000.0000000.000000
25%0.00000074.0000001.0000000.0000000.0000001.000000143.70000087.00000045.0000008.3300008.500000
50%0.000000101.0000001.0000000.0000000.0000001.000000179.400000101.00000053.50000010.07000010.300000
75%0.000000127.0000001.0000001.0000001.7800002.000000216.400000114.00000066.20000011.77000012.100000
max1.000000243.0000001.0000001.0000005.4000009.000000350.800000165.000000111.30000018.19000020.000000
\n", + "
" + ], + "text/plain": [ + " Churn AccountWeeks ContractRenewal DataPlan DataUsage \\\n", + "count 3333.000000 3333.000000 3333.000000 3333.000000 3333.000000 \n", + "mean 0.144914 101.064806 0.903090 0.276628 0.816475 \n", + "std 0.352067 39.822106 0.295879 0.447398 1.272668 \n", + "min 0.000000 1.000000 0.000000 0.000000 0.000000 \n", + "25% 0.000000 74.000000 1.000000 0.000000 0.000000 \n", + "50% 0.000000 101.000000 1.000000 0.000000 0.000000 \n", + "75% 0.000000 127.000000 1.000000 1.000000 1.780000 \n", + "max 1.000000 243.000000 1.000000 1.000000 5.400000 \n", + "\n", + " CustServCalls DayMins DayCalls MonthlyCharge OverageFee \\\n", + "count 3333.000000 3333.000000 3333.000000 3333.000000 3333.000000 \n", + "mean 1.562856 179.775098 100.435644 56.305161 10.051488 \n", + "std 1.315491 54.467389 20.069084 16.426032 2.535712 \n", + "min 0.000000 0.000000 0.000000 14.000000 0.000000 \n", + "25% 1.000000 143.700000 87.000000 45.000000 8.330000 \n", + "50% 1.000000 179.400000 101.000000 53.500000 10.070000 \n", + "75% 2.000000 216.400000 114.000000 66.200000 11.770000 \n", + "max 9.000000 350.800000 165.000000 111.300000 18.190000 \n", + "\n", + " RoamMins \n", + "count 3333.000000 \n", + "mean 10.237294 \n", + "std 2.791840 \n", + "min 0.000000 \n", + "25% 8.500000 \n", + "50% 10.300000 \n", + "75% 12.100000 \n", + "max 20.000000 " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Churn 0\n", + "AccountWeeks 0\n", + "ContractRenewal 0\n", + "DataPlan 0\n", + "DataUsage 0\n", + "CustServCalls 0\n", + "DayMins 0\n", + "DayCalls 0\n", + "MonthlyCharge 0\n", + "OverageFee 0\n", + "RoamMins 0\n", + "dtype: int64" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.isna().sum()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exploratory Data Analysis" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAP60lEQVR4nO3dcayd9V3H8fdnMBm6ESEUVtrOsqVTCyqEayXyh0yi1CWmbHNLMRuNErsQZkaymMD+ENQ0WSLbHHPDdBmDmm2k2YZUBSfD6VxkY7dLs9JiXR0Id630spmARtF2X/84T8NZe3p/p7c959z2vl/JyXnO93l+z/lecssnz/P8nuemqpAkaS6vmHQDkqSFz7CQJDUZFpKkJsNCktRkWEiSms6cdAOjcv7559fKlSsn3YYknVK2b9/+fFUtObJ+2obFypUrmZ6ennQbknRKSfJvg+qehpIkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDWdtndwn6grfm/LpFvQArT9j2+YdAvSRHhkIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqSmkYVFkhVJvpzkySS7kry3q9+R5LtJdnSvN/eNuS3J3iR7klzbV78iyc5u3V1JMqq+JUlHO3OE+z4IvK+qvpnkNcD2JI906z5cVXf2b5xkNbAeuAS4CPhSkjdW1SHgbmAj8DXgIWAt8PAIe5ck9RnZkUVV7a+qb3bLLwJPAsvmGLIOuL+qXqqqp4C9wJokS4FzquqxqipgC3DdqPqWJB1tLNcskqwELge+3pXek+RbSe5Jcm5XWwY82zdspqst65aPrA/6no1JppNMz87OnswfQZIWtZGHRZJXA58HbqmqF+idUnoDcBmwH/jg4U0HDK856kcXqzZX1VRVTS1ZsuREW5ckdUYaFkleSS8oPl1VXwCoqueq6lBV/QD4BLCm23wGWNE3fDmwr6svH1CXJI3JKGdDBfgk8GRVfaivvrRvs7cAT3TL24D1Sc5KcjGwCni8qvYDLya5stvnDcCDo+pbknS0Uc6Gugp4F7AzyY6u9n7g+iSX0TuV9DTwboCq2pVkK7Cb3kyqm7uZUAA3AfcCZ9ObBeVMKEkao5GFRVV9lcHXGx6aY8wmYNOA+jRw6cnrTpJ0PLyDW5LUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lS08jCIsmKJF9O8mSSXUne29XPS/JIkm937+f2jbktyd4ke5Jc21e/IsnObt1dSTKqviVJRxvlkcVB4H1V9dPAlcDNSVYDtwKPVtUq4NHuM9269cAlwFrg40nO6PZ1N7ARWNW91o6wb0nSEUYWFlW1v6q+2S2/CDwJLAPWAfd1m90HXNctrwPur6qXquopYC+wJslS4JyqeqyqCtjSN0aSNAZjuWaRZCVwOfB14MKq2g+9QAEu6DZbBjzbN2ymqy3rlo+sD/qejUmmk0zPzs6e1J9BkhazkYdFklcDnwduqaoX5tp0QK3mqB9drNpcVVNVNbVkyZLjb1aSNNBIwyLJK+kFxaer6gtd+bnu1BLd+4GuPgOs6Bu+HNjX1ZcPqEuSxmSUs6ECfBJ4sqo+1LdqG7ChW94APNhXX5/krCQX07uQ/Xh3qurFJFd2+7yhb4wkaQzOHOG+rwLeBexMsqOrvR/4ALA1yY3AM8DbAapqV5KtwG56M6lurqpD3bibgHuBs4GHu5ckaUxGFhZV9VUGX28AuOYYYzYBmwbUp4FLT153kqTj4R3ckqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lS01BhkeTRYWqSpNPTmXOtTPIq4EeB85OcC6RbdQ5w0Yh7kyQtEHOGBfBu4BZ6wbCdl8PiBeBjo2tLkrSQzBkWVfUR4CNJfreqPjqmniRJC0zryAKAqvpokl8EVvaPqaotI+pLkrSADBUWSf4ceAOwAzjUlQswLCRpERgqLIApYHVV1SibkSQtTMPeZ/EE8NpRNiJJWriGDYvzgd1Jvphk2+HXXAOS3JPkQJIn+mp3JPlukh3d6819625LsjfJniTX9tWvSLKzW3dXkhz5XZKk0Rr2NNQd89j3vcCfcvR1jQ9X1Z39hSSrgfXAJfSm6X4pyRur6hBwN7AR+BrwELAWeHge/UiS5mnY2VD/cLw7rqqvJFk55ObrgPur6iXgqSR7gTVJngbOqarHAJJsAa7DsJCksRr2cR8vJnmhe/1PkkNJXpjnd74nybe601TndrVlwLN928x0tWXd8pH1Y/W5Mcl0kunZ2dl5tidJOtJQYVFVr6mqc7rXq4C30TvFdLzupjcF9zJgP/DBrj7oOkTNUT9Wn5uraqqqppYsWTKP9iRJg8zrqbNV9RfAL89j3HNVdaiqfgB8AljTrZoBVvRtuhzY19WXD6hLksZo2Jvy3tr38RX07rs47nsukiytqv3dx7fQm5ILsA34TJIP0bvAvQp4vKoOdafArgS+DtwA+NgRSRqzYWdD/Xrf8kHgaXoXpY8pyWeBq+k9sXYGuB24Osll9ILmaXoPKqSqdiXZCuzu9n9zNxMK4CZ6M6vOpndh24vbkjRmw86G+q3j3XFVXT+g/Mk5tt8EbBpQnwYuPd7vlySdPMPOhlqe5IHuJrvnknw+yfL2SEnS6WDYC9yfondd4SJ6U1f/sqtJkhaBYcNiSVV9qqoOdq97AeemStIiMWxYPJ/knUnO6F7vBL43ysYkSQvHsGHx28A7gH+ndzPdbwDHfdFbknRqGnbq7B8BG6rqPwCSnAfcSS9EJEmnuWGPLH72cFAAVNX3gctH05IkaaEZNixe0ffQv8NHFsMelUiSTnHD/g//g8A/Jfkcvbuv38GAG+gkSaenYe/g3pJkmt7DAwO8tap2j7QzSdKCMfSppC4cDAhJWoTm9YhySdLiYlhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkppGFRZJ7khxI8kRf7bwkjyT5dvfe/3e9b0uyN8meJNf21a9IsrNbd1eSjKpnSdJgozyyuBdYe0TtVuDRqloFPNp9JslqYD1wSTfm40nO6MbcDWwEVnWvI/cpSRqxkYVFVX0F+P4R5XXAfd3yfcB1ffX7q+qlqnoK2AusSbIUOKeqHquqArb0jZEkjcm4r1lcWFX7Abr3C7r6MuDZvu1mutqybvnI+kBJNiaZTjI9Ozt7UhuXpMVsoVzgHnQdouaoD1RVm6tqqqqmlixZctKak6TFbtxh8Vx3aonu/UBXnwFW9G23HNjX1ZcPqEuSxmjcYbEN2NAtbwAe7KuvT3JWkovpXch+vDtV9WKSK7tZUDf0jZEkjcmZo9pxks8CVwPnJ5kBbgc+AGxNciPwDPB2gKralWQrsBs4CNxcVYe6Xd1Eb2bV2cDD3UuSNEYjC4uquv4Yq645xvabgE0D6tPApSexNUnScVooF7glSQuYYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNEwmLJE8n2ZlkR5LprnZekkeSfLt7P7dv+9uS7E2yJ8m1k+hZkhazSR5ZvKmqLquqqe7zrcCjVbUKeLT7TJLVwHrgEmAt8PEkZ0yiYUlarBbSaah1wH3d8n3AdX31+6vqpap6CtgLrBl/e5K0eE0qLAr42yTbk2zsahdW1X6A7v2Crr4MeLZv7ExXO0qSjUmmk0zPzs6OqHVJWnzOnND3XlVV+5JcADyS5J/n2DYDajVow6raDGwGmJqaGriNJOn4TSQsqmpf934gyQP0Tis9l2RpVe1PshQ40G0+A6zoG74c2DfWhqUF5pk//JlJt6AF6HW/v3Nk+x77aagkP5bkNYeXgV8FngC2ARu6zTYAD3bL24D1Sc5KcjGwCnh8vF1L0uI2iSOLC4EHkhz+/s9U1d8k+QawNcmNwDPA2wGqaleSrcBu4CBwc1UdmkDfkrRojT0squo7wM8NqH8PuOYYYzYBm0bcmiTpGBbS1FlJ0gJlWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajplwiLJ2iR7kuxNcuuk+5GkxeSUCIskZwAfA34NWA1cn2T1ZLuSpMXjlAgLYA2wt6q+U1X/C9wPrJtwT5K0aJw56QaGtAx4tu/zDPALR26UZCOwsfv4n0n2jKG3xeB84PlJN7EQ5M4Nk25BR/P387DbczL28hODiqdKWAz6L1BHFao2A5tH387ikmS6qqYm3Yc0iL+f43GqnIaaAVb0fV4O7JtQL5K06JwqYfENYFWSi5P8CLAe2DbhniRp0TglTkNV1cEk7wG+CJwB3FNVuybc1mLiqT0tZP5+jkGqjjr1L0nSDzlVTkNJkibIsJAkNRkWmpOPWdFCleSeJAeSPDHpXhYDw0LH5GNWtMDdC6yddBOLhWGhufiYFS1YVfUV4PuT7mOxMCw0l0GPWVk2oV4kTZBhobkM9ZgVSac/w0Jz8TErkgDDQnPzMSuSAMNCc6iqg8Dhx6w8CWz1MStaKJJ8FngM+MkkM0lunHRPpzMf9yFJavLIQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFdAKSvDbJ/Un+NcnuJA8l2Zjkrybdm3QyGRbSPCUJ8ADw91X1hqpaDbwfuPAE93tK/LljLS7+Ukrz9ybg/6rqzw4XqmpHkh8HrknyOeBSYDvwzqqqJE8DU1X1fJIp4M6qujrJHcBFwErg+ST/ArwOeH33/idVddf4fjTph3lkIc3f4SAY5HLgFnp/B+T1wFVD7O8KYF1V/Wb3+aeAa+k9Kv72JK88oW6lE2BYSKPxeFXNVNUPgB30jhhatlXVf/d9/uuqeqmqngcOcIKnt6QTYVhI87eL3tHAIC/1LR/i5VO+B3n5392rjhjzX0PuQxo7w0Kav78DzkryO4cLSX4e+KU5xjzNywHzttG1Jp1choU0T9V7CudbgF/pps7uAu5g7r/58QfAR5L8I72jBemU4FNnJUlNHllIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqSm/wf03QODNr6OSgAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Our label Distribution (countplot)\n", + "sns.countplot(data['Churn'])" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Example EDA\n", + "sns.distplot(data.AccountWeeks)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Let us perform some analysis with the data's features**\n", + "* Group the data by whether the customer wil churn and analyse their different features to know more about how the data behave." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAV9UlEQVR4nO3dfbCedZ3f8feHECEijAIRw4FswBPrBjtGezZDx5ku6mKI7TayO25jq2Ss0zhTiLF1OwP+o+xMHGfHh9JUXWJlpZ2tNPvgGJ+6C1GLTtWY0AgEpNwLCZyQJiGy5SE0Mcm3f5w7Fyfk5JxjkvtcJ7nfr5kz9/37Xdfvur+HOeRz/67HVBWSJAGc1XYBkqTpw1CQJDUMBUlSw1CQJDUMBUlS4+y2CzgZF198cc2bN6/tMiTptLJ58+anq2r2WMtO61CYN28emzZtarsMSTqtJNl+vGXuPpIkNQwFSVLDUJAkNQwFSVLDUJAkNQwFSVKjZ6GQ5NwkG5P8PMnWJLd2+z+ZZEeSLd2fd48ac0uSTpJHkizuVW2SpLH18jqF/cA7qur5JDOBHyX5bnfZ56vqM6NXTrIAWAZcBVwK3JPkDVV1qIc1ShrHmjVr6HQ6rdawY8cOAAYGBlqtA2BwcJCVK1e2XUZP9WymUCOe7zZndn/Ge3jDUuCuqtpfVY8DHWBRr+qTdHp48cUXefHFF9suo2/09IrmJDOAzcAg8IWq+mmSJcBNSW4ANgEfq6pngAHgJ6OGD3f7Xr7NFcAKgLlz5/ayfKnvTYdvxatWrQLgtttua7mS/tDTA81VdaiqFgKXAYuSvAn4EvB6YCGwE/hsd/WMtYkxtrm2qoaqamj27DFv3SFJOkFTcvZRVf0d8APguqra1Q2Lw8CXeWkX0TBw+ahhlwFPTUV9kqQRvTz7aHaSV3ffzwJ+B/hFkjmjVrseeLD7fj2wLMk5Sa4A5gMbe1WfJOlYvTymMAe4s3tc4SxgXVV9K8l/SbKQkV1D24APA1TV1iTrgIeAg8CNnnkkSVOrZ6FQVfcDbxmj/wPjjFkNrO5VTZKk8XlFsySpYShIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpYSgIgL179/KRj3yEvXv3tl2KpBYZCgLg9ttv5/7772ft2rVtlyKpRYaC2Lt3L/fccw8Ad999t7MFqY8ZCuL222/n8OHDABw+fNjZgtTHDAWxYcOGo9pHZg2S+o+hIJKM25bUP3r5kB1N0po1a+h0Oq19/vnnn88zzzxzVPvIw9LbMDg4OC0eGC/1I2cKYs6cOeO2JfUPZwrTwHT4Vnz99dfzzDPPsHjxYm655Za2y5HUEkNBwMjs4MCBA6xYsaLtUiS1qGe7j5Kcm2Rjkp8n2Zrk1m7/hUnuTvJo9/U1o8bckqST5JEki3tVm441c+ZMBgcHueiii9ouRVKLenlMYT/wjqp6M7AQuC7J1cDNwIaqmg9s6LZJsgBYBlwFXAd8McmMHtYnSXqZnoVCjXi+25zZ/SlgKXBnt/9O4D3d90uBu6pqf1U9DnSARb2qT5J0rJ6efZRkRpItwG7g7qr6KXBJVe0E6L6+trv6APDkqOHD3b6Xb3NFkk1JNu3Zs6eX5UtS3+lpKFTVoapaCFwGLErypnFWH+uKqRpjm2uraqiqhmbPnn2KKpUkwRRdp1BVfwf8gJFjBbuSzAHovu7urjYMXD5q2GXAU1NRnyRpRC/PPpqd5NXd97OA3wF+AawHlndXWw58o/t+PbAsyTlJrgDmAxt7VZ8k6Vi9vE5hDnBn9wyis4B1VfWtJD8G1iX5EPAE8F6AqtqaZB3wEHAQuLGqDvWwPknSy/QsFKrqfuAtY/TvBd55nDGrgdW9qkmSND7vfSRJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqRGL++SKukErVmzhk6n03YZ08KR/w6rVq1quZLpYXBwkJUrV/Zs+4aCNA11Oh0e3fq/mPsq7x7/il+N7NDYv31Ty5W074nnZ/T8MwwFaZqa+6pDfPytz7ZdhqaRT913Qc8/w2MKkqSGoSBJahgKkqSGoSBJavQsFJJcnuT7SR5OsjXJqm7/J5PsSLKl+/PuUWNuSdJJ8kiSxb2qTZI0tl6efXQQ+FhV3ZfkfGBzkru7yz5fVZ8ZvXKSBcAy4CrgUuCeJG+oKs/Jk6Qp0rOZQlXtrKr7uu+fAx4GBsYZshS4q6r2V9XjQAdY1Kv6JEnHmpJjCknmAW8BftrtuinJ/UnuSPKabt8A8OSoYcOMESJJViTZlGTTnj17elm2JPWdnodCklcBfwl8tKqeBb4EvB5YCOwEPntk1TGG1zEdVWuraqiqhmbPnt2boiWpT/U0FJLMZCQQ/qyq/gqgqnZV1aGqOgx8mZd2EQ0Dl48afhnwVC/rkyQdrZdnHwX4CvBwVX1uVP+cUatdDzzYfb8eWJbknCRXAPOBjb2qT5J0rF6effQ24APAA0m2dPs+DrwvyUJGdg1tAz4MUFVbk6wDHmLkzKUbPfNIkqZWz0Khqn7E2McJvjPOmNXA6l7VJEkan1c0S5IahoIkqWEoSJIahoIkqWEoSJIahoIkqWEoSJIahoIkqWEoSJIahoIkqWEoSJIavbwhnqQTtGPHDl54bgafuu+CtkvRNLL9uRmct2NHTz/DmYIkqeFMQZqGBgYG2H9wJx9/67Ntl6Jp5FP3XcA5A+M96v7kOVOQJDX6eqawZs0aOp1O22VMC0f+O6xatarlSqaHwcFBVq5c2XYZ0pTr61DodDpsefBhDr3ywrZLad1ZBwqAzY/tarmS9s3Y98u2S5BaM6lQSHIe8GJVHU7yBuCNwHer6lc9rW4KHHrlhbz4xne3XYamkVm/OO7DAaUz3mSPKdwLnJtkANgAfBD4aq+KkiS1Y7KhkKraB/wesKaqrgcWjDsguTzJ95M8nGRrklXd/guT3J3k0e7ra0aNuSVJJ8kjSRaf6C8lSToxkw6FJP8Q+BfAt7t9E+16Ogh8rKp+E7gauDHJAuBmYENVzWdk1nFz9wMWAMuAq4DrgC8mmfHr/DKSpJMz2VBYBdwCfL2qtia5Evj+eAOqamdV3dd9/xzwMDAALAXu7K52J/Ce7vulwF1Vtb+qHgc6wKJf43eRJJ2kSR1orqp7GTmucKT9GPCRyX5IknnAW4CfApdU1c7udnYmeW13tQHgJ6OGDXf7Xr6tFcAKgLlz5062BEnSJEz27KM3AH8IzBs9pqreMYmxrwL+EvhoVT2b5LirjtFXx3RUrQXWAgwNDR2zXJJ04iZ7ncKfA38C/Cfg0GQ3nmQmI4HwZ1X1V93uXUnmdGcJc4Dd3f5h4PJRwy8DnprsZ0mSTt5kjykcrKovVdXGqtp85Ge8ARmZEnwFeLiqPjdq0Xpgeff9cuAbo/qXJTknyRXAfGDjpH8TSdJJm+xM4ZtJ/jXwdWD/kc6qGu/Sz7cBHwAeSLKl2/dx4NPAuiQfAp4A3tvd1tYk64CHGDlz6caqmvSsRJJ08iYbCke+2f+7UX0FXHm8AVX1I8Y+TgDwzuOMWQ2snmRNkqRTbLJnH13R60IkSe0bNxSSvKOqvpfk98ZaPurgsSTpDDDRTOG3ge8BvzvGsgIMBUk6g4wbClX1ie7rB6emHElSmybaffRvx1v+slNNJUmnuYl2H30G2AJ8l5FTUY97ObIk6fQ3USi8lZE7l/5jYDPwNUbucHpG3F5ix44dzNj3f32oio4yY99eduw42HYZUivGvaK5qrZU1c1VtZCRq5OXAg8l+adTUZwkaWpN9oZ4sxm5y+nfZ+QeRbvHH3F6GBgY4P/sP9vHceoos37xHQYGLmm7DKkVEx1o/iDwz4Bzgb8A/qCqzohAkCQda6KZwleABxi5R9Fi4F2jb31dVe5GkqQzyESh8PYpqUKSNC1MdPHa/wBI8k+A71TV4SmpSpLUisk+T2EZ8GiSP07ym70sSJLUnkmFQlW9n5Gzj/4W+NMkP06yIsn5Pa1OkjSlJjtToKqeZeTRmncBc4DrgfuSrOxRbZKkKTapUEjyu0m+zsgdU2cCi6pqCfBm4A97WJ8kaQpN9slr7wU+X1X3ju6sqn1J/uWpL0uS1IbJPnnthnGWbTh15UiS2jTZ3UdXJ/lZkueTHEhyKMmzE4y5I8nuJA+O6vtkkh1JtnR/3j1q2S1JOkkeSbL4xH8lSdKJmuzuo//IyGmpfw4MATcAgxOM+Wp33H9+Wf/nq+ozozuSLOhu/yrgUuCeJG+oqkOTrE864zzx/Aw+dd8FbZfRul37Rr67XvJKL5N64vkZzO/xZ0w2FKiqTpIZ3X+o/zTJ/5xg/XuTzJvk5pcCd1XVfuDxJB1gEfDjydYnnUkGByf6ztU/DnQ6AJzzG/43mU/v/zYmGwr7krwC2JLkj4GdwHkn+Jk3JbkB2AR8rKqeAQaAn4xaZ7jbd4wkK4AVAHPnzj3BEqTpbeVKz/Q+YtWqVQDcdtttLVfSHyZ7ncIHuuveBLwAXA78/gl83peA1wMLGQmWz3b7x3qi25gP8qmqtVU1VFVDs2fPPoESJEnHM9mzj7Z3n6lAVd16oh9WVbuOvE/yZeBb3eYwI0FzxGXAUyf6OZKkEzPR8xQCfIKRGUKAs5IcBNZU1R/9uh+WZE5V7ew2rweOnJm0HvivST7HyIHm+cDGX3f7J2LGvl/6OE7grP83cjLZ4XM9sDlj3y8BH7Kj/jTRTOGjwNuA36qqxwGSXAl8Kcm/qarPH29gkq8B1wAXJxlmJFyuSbKQkV1D24APA1TV1iTrgIeAg8CNU3HmkQfzXtLpPAfA4JX+YwiX+LehvjVRKNwAXFtVTx/pqKrHkrwf+BvguKFQVe8bo/sr46y/Glg9QT2nlAfzXuLBPEkw8YHmmaMD4Yiq2sPIPZAkSWeQiULhwAkukySdhibaffTm49zOIsC5PahHktSiiR7HOWOqCpEktW/SD9mRJJ35DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1ehYKSe5IsjvJg6P6Lkxyd5JHu6+vGbXsliSdJI8kWdyruiRJx9fLmcJXgete1nczsKGq5gMbum2SLACWAVd1x3wxiQ/4kaQp1rNQqKp7gV++rHspcGf3/Z3Ae0b131VV+6vqcaADLOpVbZKksU31MYVLqmonQPf1td3+AeDJUesNd/skSVNouhxozhh9NeaKyYokm5Js2rNnT4/LkqT+MtWhsCvJHIDu6+5u/zBw+aj1LgOeGmsDVbW2qoaqamj27Nk9LVaS+s1Uh8J6YHn3/XLgG6P6lyU5J8kVwHxg4xTXJkl97+xebTjJ14BrgIuTDAOfAD4NrEvyIeAJ4L0AVbU1yTrgIeAgcGNVHepVbZKksfUsFKrqfcdZ9M7jrL8aWN2reiRJE5suB5olSdOAoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJapzdxocm2QY8BxwCDlbVUJILgf8GzAO2AX9QVc+0UZ8k9as2Zwpvr6qFVTXUbd8MbKiq+cCGbluSNIWm0+6jpcCd3fd3Au9prxRJ6k9thUIBf5Nkc5IV3b5LqmonQPf1tWMNTLIiyaYkm/bs2TNF5UpSf2grFN5WVW8FlgA3JvlHkx1YVWuraqiqhmbPnt27CvvMvn37eOCBB+h0Om2XIqlFrYRCVT3Vfd0NfB1YBOxKMgeg+7q7jdr61fbt2zl8+DC33npr26VIatGUn32U5DzgrKp6rvv+XcAfAeuB5cCnu6/fmOra2rJmzZpWv6Hv27ePAwcOAPDkk0+yYsUKZs2a1Vo9g4ODrFy5srXPl/pZGzOFS4AfJfk5sBH4dlX9d0bC4NokjwLXdtuaAtu3bz+qvW3btnYKkdS6KZ8pVNVjwJvH6N8LvHOq65kO2v5WfM011xzVPnDgALfddls7xUhq1XQ6JVWS1DJDQZLUMBQkSQ1DQZLUMBTEeeedN25bUv8wFMQLL7wwbltS/zAUxNlnnz1uW1L/MBTEwYMHx21L6h+Ggpg3b964bUn9w1AQN9xww1Ht5cuXt1SJpLYZCuKOO+4Yty2pfxgKYnh4+Kj2k08+2VIlktpmKIgk47Yl9Q9DQVx99dXjtiX1D0NBnH/++Ue1L7jggpYqkdQ2Q0H88Ic/PKp97733tlSJpLYZCuKiiy4aty2pfxgKYufOneO2JfUPQ0GS1Jh2oZDkuiSPJOkkubntevrBpZdeOm5bUv+YVqGQZAbwBWAJsAB4X5IF7VZ15tuzZ8+4bUn9Y7rdI3kR0KmqxwCS3AUsBR5qtaoz3Ote9zq2bdt2VFsCWLNmDZ1Op9Uajnz+qlWrWq0DYHBwkJUrV7ZdRk9Nq5kCMACMvsfCcLevkWRFkk1JNvmN9tTYtWvXuG2pTbNmzWLWrFltl9E3pttMYaz7K9RRjaq1wFqAoaGhGmN9/ZquvfZavvnNb1JVJOFd73pX2yVpmjjTvxXrWNNtpjAMXD6qfRnwVEu19I3ly5c3T1ubOXPmMbfSltQ/plso/AyYn+SKJK8AlgHrW67pjHfRRRexZMkSkrBkyRIvXpP62LTafVRVB5PcBPw1MAO4o6q2tlxWX1i+fDnbtm1zliD1uVSdvrvlh4aGatOmTW2XIUmnlSSbq2porGXTbfeRJKlFhoIkqWEoSJIahoIkqXFaH2hOsgfY3nYdZ5CLgafbLkIag3+bp9ZvVNXssRac1qGgUyvJpuOdkSC1yb/NqePuI0lSw1CQJDUMBY22tu0CpOPwb3OKeExBktRwpiBJahgKkqSGoSCSXJfkkSSdJDe3XY90RJI7kuxO8mDbtfQLQ6HPJZkBfAFYAiwA3pdkQbtVSY2vAte1XUQ/MRS0COhU1WNVdQC4C1jack0SAFV1L/DLtuvoJ4aCBoAnR7WHu32S+pChoIzR53nKUp8yFDQMXD6qfRnwVEu1SGqZoaCfAfOTXJHkFcAyYH3LNUlqiaHQ56rqIHAT8NfAw8C6qtrablXSiCRfA34M/L0kw0k+1HZNZzpvcyFJajhTkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVpAklel+SuJH+b5KEk30myIsm32q5NOtUMBWkcSQJ8HfhBVb2+qhYAHwcuOcntnn0q6pNONf8wpfG9HfhVVf3JkY6q2pLk1cA7k/wF8CZgM/D+qqok24Chqno6yRDwmaq6JskngUuBecDTSf43MBe4svv676vqP0zdryYdy5mCNL4j/+CP5S3ARxl5DsWVwNsmsb1/ACytqn/ebb8RWMzILcw/kWTmSVUrnSRDQTpxG6tquKoOA1sYmQFMZH1VvTiq/e2q2l9VTwO7OcndUtLJMhSk8W1l5Nv9WPaPen+Il3bHHuSl/7fOfdmYFya5DakVhoI0vu8B5yT5V0c6kvwW8NvjjNnGS0Hy+70rTTr1DAVpHDVyx8jrgWu7p6RuBT7J+M+cuBW4LckPGfn2L502vEuqJKnhTEGS1DAUJEkNQ0GS1DAUJEkNQ0GS1DAUJEkNQ0GS1Pj/DTAwv91M99AAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "data_churn = data.groupby('Churn').get_group(1)\n", + "data_no_churn = data.groupby('Churn').get_group(0)\n", + "\n", + "#Check how the DayMins columns for customer that churn vs those that didnt churn varies using boxplot\n", + "#sns.boxplot('DayCalls',data = data_churn)\n", + "sns.boxplot('Churn','DayMins', data = data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the boxplot above, it seems that customer that churn tends to have lower **DayMins** rate than those that wont churn. Although the **DayMins** minimum is significantly low which might not be expected if our assumption that customer customer with lower **DayMins** tends to churn, although detailed explanation about what DayMins mean was not provided. Let continue our comparison and see customer behavior as regards **DataUsage**" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEGCAYAAABvtY4XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAT+ElEQVR4nO3df3BcV3nG8efVDztxAnWyUVzHQRjqFpqhkB8igcmQNrFkZCCGlmlLmGC1hcozgG0oLaUMQ0mnpTNAW2yFTkcTCHJDk8GlaQlNhGXTOM5AATm4cUJSoqZyG9lxpI3dhNhWJO3bP3ZlS4p8tdj33Cud/X5mNNa72r33jWb9+OTsveeYuwsAEJ+6vBsAAIRBwANApAh4AIgUAQ8AkSLgASBSDXk3MNVFF13kK1euzLsNAFgw9u7dO+LuTbP9bF4F/MqVK9Xf3593GwCwYJjZgdP9jCkaAIgUAQ8AkSLgASBSBDwARIqAB5CpYrGoTZs2qVgs5t1K9Ah4AJnq6enR/v37tW3btrxbiR4BDyAzxWJRvb29cnf19vYyig+MgAeQmZ6eHpVKJUnSxMQEo/jACHgAmdm5c6fGx8clSePj4+rr68u5o7gR8AAy09raqoaG8g30DQ0Namtry7mjuBHwADLT0dGhurpy7NTX12v9+vU5dxQ3Ah5AZgqFgtrb22Vmam9vV6FQyLulqM2rxcYAxK+jo0ODg4OM3jNAwAPIVKFQ0NatW/NuoyYwRQMAkSLgASBSBDwARIqAB4BIEfAAECkCHgAiFfQySTMblPS8pAlJ4+7eEvJ8AIBTsrgO/np3H8ngPACAKZiiAYBIhQ54l7TDzPaaWedsTzCzTjPrN7P+4eHhwO0AQO0IHfDXuvuVktZK+pCZXTfzCe7e7e4t7t7S1NQUuB0AqB1BA97dD1b+fEbS3ZKuDnk+AMApwQLezM4zs5dNfi9pjaRHQp0PADBdyKtolkm628wmz/MP7t4b8HwAgCmCBby7PynpDaGODwBIxmWSABApAh4AIsWOTinr6urSwMBArj0MDQ1JklasWJFrH5K0atUqbdy4Me82gJpEwEfo+PHjebcAYB4g4FM2H0armzdvliRt2bIl504A5Ik5eACZKhaL2rRpk4rFYt6tRI+AB5Cp7u5uPfzww+ru7s67legR8AAyUywW1dfXJ0nq6+tjFB8YAQ8gM93d3SqVSpKkUqnEKD4wAh5AZnbt2pVYI10EPIDMuHtijXQR8AAys3r16ml1a2trTp3UBgIeQGY2bNigurpy7NTV1amzc9aN3pASAh5AZgqFwslRe1tbmwqFQs4dxY07WQFkasOGDXr66acZvWeAgAeQqUKhoK1bt+bdRk1gigYAIkXAA0CkCHgAmWKxsewQ8AAy1dPTo/3792vbtm15txI9Ah5AZorFonp7e+Xu6u3tZRQfGAEPIDM9PT0nFxubmJhgFB8YAQ8gMzt37tT4+LgkaXx8/OTSwQiDgAeQmbe85S2JNdJFwAPIzIkTJ6bVo6OjOXVSGwh4AJl58MEHp9V79uzJqZPaQMADyIyZJdZIV/CAN7N6M/uRmX0r9LkAzG8z14OfWSNdWYzgN0t6LIPzAJjnOjs7WQ8+Q0ED3swulfR2SbeFPA+AhaFQKKipqUmS1NTUxHrwgYUewX9R0scllU73BDPrNLN+M+sfHh4O3A6APBWLRR0+fFiSdPjwYe5kDSxYwJvZOyQ94+57k57n7t3u3uLuLZP/sgOIU1dXV2KNdIUcwV8raZ2ZDUq6S9INZnZHwPMBmOd2796dWCNdwQLe3f/E3S9195WS3iPpO+5+c6jzAZj/3D2xRrq4Dh5AZhoaGhJrpCuT36673y/p/izOBWD+mlxo7HQ10sUIHkBmuJM1WwQ8gMwwB58tAh4AIkXAA0CkCHgAiBQBDyAzkwuNna5GuvjtAsjM5Ibbp6uRLgIeACJFwANApAh4AJlZunTptPqCCy7Ip5EaQcADyMzRo0en1UeOHMmnkRpBwANApAh4AJm5+OKLp9XLli3LqZPaQMADyMzM5YHr6+tz6qQ2EPAAMnPw4MHEGuki4AFkZuXKlYk10kXAA8jMpz71qcQa6aoq4K3sZjP7dKVuNrOrw7YGIDYzr3vnOviwqh3B/62kN0u6qVI/L+lLQToCEK2enp6TC4zV1dVp27ZtOXcUt2oD/hp3/5CkE5Lk7kckLQrWFYAo7dy58+QCY6VSSX19fTl3FLdqA37MzOoluSSZWZMkloED8DNpbW2dVre1teXUSW2oNuC3Srpb0sVm9heSHpT02WBdAYjSunXrptU33nhjTp3UhqoC3t2/Junjkv5S0iFJ73L37SEbAxCfO+64I7FGuhrmfopkZhdKekbSnVMea3T3sVCNAYjP7t27E2ukq9opmockDUv6iaQnKt//t5k9ZGZXhWoOQFzcPbFGuqoN+F5Jb3P3i9y9IGmtpK9L+qDKl1ACAOaZagO+xd2/PVm4+w5J17n7v0taHKQzANFZsmRJYo10VTUHL+lZM/tjSXdV6t+WdKRy6SSXSwKoyrFjxxJrpKvaEfx7JV0q6Z8l/Yuk5spj9ZJ+a7YXmNk5ZvYDM/sPM3vUzG5JoV8ACxiLjWWr2sskR9x9o7tf4e6Xu/uH3X3Y3V9094HTvGxU0g3u/gZJl0tqN7M3pdQ3gAVo5o1N7e3tOXVSG6pdbKzJzD5vZvea2Xcmv5Je42U/rZSNlS8+Mgdq2O233z6tvu2223LqpDZUO0XzNUmPS3qVpFskDUr64VwvMrN6M9un8jX0fe7+/Vme02lm/WbWPzw8XG3fABag8fHxxBrpqjbgC+7+ZUlj7r7b3X9P0pzTLe4+4e6Xqzx/f7WZvW6W53S7e4u7tzQ1Nf0svQMAElS92Fjlz0Nm9nYzu0Ll0K6Kux+VdL8kJtwAICPVBvyfm9nPSfqYpD+UdJukjya9oDJvv7Ty/bmSWlWe5gFQo84777zEGumq6jp4d/9W5dv/k3R9lcdeLqmncq18naSvTzkOgBrEHHy2qr2K5nNm9nIzazSzXWY2YmY3J73G3R+uXFb5end/nbv/WTotA1ioli9fnlgjXdVO0axx9+ckvUPSU5J+SdIfBesKQJQOHTqUWCNd1QZ8Y+XPt0m6092fDdQPgIjV19cn1khXtWvR3GNmj0s6LumDlS37ToRrC0CMWIsmW9UuVfAJSW9WeVXJMUnHJL0zZGMAgLOTOII3s9+Y8ZCb2Yikfe7+dLi2AMTIzKZt8mFmOXYTv7mmaGbbEfdCSa83s/e7e+J6NAAwVV1dnSYmJqbVCCcx4N39d2d73MxeqfKOTteEaApAnFavXq0dO3acrFtbW3PsJn5n9M+nux/QqStrAKAqa9asSayRrjMKeDN7jcrrvQNA1W699dZpdVdXV06d1Ia5PmS9Ry9dw/1ClZchSLyTFQBmGhwcTKyRrrk+ZP3CjNolFSU94e4vhmkJQKzOOeccnThxYlqNcOb6kHV3Vo0AiN/UcJ+tRrqqXWzsTWb2QzP7qZm9aGYTZvZc6OYAAGeu2g9Zb5V0k6QnJJ0r6QOS+HQEAOaxateikbsPmFm9u09Iut3MvhuwLwDAWao24I+Z2SJJ+8zsc5IOSWIrFgA/k/r6+ml3srKaZFjVTtG8r/LcD0t6QdIrJM1cpwYAEjU2Tr8/ctGiRTl1UhuqDfh3ufsJd3/O3W9x9z9QefMPAKjazKtmjh8/nlMntaHagO+Y5bHfSbEPAEDK5rqT9SZJ75X0KjP75pQfvUzlG54AoGosF5ytuT5k/a7KH6heJOmvpjz+vKSHQzUFIE5XXnml9u7de7K+6qqrcuwmfnPdyXpA0gGVd3MCgLMyc5PtgwcP5tRJbeBOVgCZmRnoBHxY3MkKAJGqej14dx+QVO/uE+5+u6Trw7UFIEbLly+fVl9yySU5dVIbuJMVQGaOHj06rT5y5Eg+jdSIs7mT9d2hmgIQp+uuuy6xRrqqGsG7+wEza6p8f0vYlgDEauo18AgvcQRvZZ8xsxFJj0v6iZkNm9mn5zqwmb3CzP7NzB4zs0fNbHNaTQNYmPbs2TOtfuCBB3LqpDbMNUXzEUnXSnqjuxfc/QJJ10i61sw+OsdrxyV9zN1/WdKbJH3IzC4724YBLFzLli1LrJGuuaZo1ktqc/eRyQfc/Ukzu1nSDkl/c7oXuvshlT+Mlbs/b2aPSVoh6cdn3fUsurq6NDAwEOLQC87k72HzZv6nSZJWrVqljRs35t0GJB0+fDixRrrmCvjGqeE+yd2HzaxxthfMxsxWSrpC0vdn+VmnpE5Jam5urvaQLzEwMKB9jzymiSUXnvExYlH3Ynmec++T/OWpP/Zs3i1gira2Nt1zzz1yd5mZ1qxZk3dLUZsr4F88w5+dZGbnS/qGpI+4+0vufnX3bkndktTS0nJWn8BMLLlQx1/7trM5BCJz7uP35t0Cpujo6NB9992nsbExNTY2av369Xm3FLW55uDfYGbPzfL1vKRfmevglVH+NyR9zd3/KY2GASxchUJBa9eulZlp7dq1KhQKebcUtcSAd/d6d3/5LF8vc/fEKRorrwP6ZUmPuftfp9k0gIVr3bp1WrJkiW688ca8W4le1UsVnIFrVb5B6gYz21f5Yv4EqHHbt2/XCy+8oO3bt+fdSvSCBby7P+ju5u6vd/fLK19MiAI1rFgsqq+vT5LU19enYpF9g0IKOYIHgGm6u7tVKpUkSaVSSd3d3Tl3FDcCHkBmdu3alVgjXQQ8gMzMXIuGtWnCIuABZGbx4sWJNdJFwAPIzLFjxxJrpIuAB5CZJUuWJNZIFwEPIDOjo6OJNdJFwANApAh4AJlh0+1sEfAAMjPzztWRkZesRo4UEfAAMtPW1jatZj34sAh4AJnp6OhQY2N5IdpFixaxHnxgBDyAzLAefLbm2tEJAFLV0dGhwcFBRu8ZIOABZKpQKGjr1q15t1ETmKIBgEgR8AAQKQIeACJFwANApAh4AIgUAQ8gU8ViUZs2bWLD7QwQ8AAy1dPTo/3792vbtm15txI9Ah5AZorFonp7e+Xuuu+++xjFB0bAA8hMT0+PxsbGJEljY2OM4gMj4AFkpq+vT+4uSXJ37dixI+eO4kbAA8jMsmXLEmuki4AHkJnDhw8n1khXsIA3s6+Y2TNm9kiocwBYWNra2mRmkiQzY8OPwEKO4L8qqT3g8QEsMB0dHWpoKC9i29jYyJLBgQULeHd/QNKzoY4PYOFhw49s5b4evJl1SuqUpObm5py7ARAaG35kJ/cPWd29291b3L2lqakp73YABDa54Qej9/ByD3gAQBgEPABEKuRlkndK+p6k15jZU2b2/lDnAgC8VLAPWd39plDHBgDMjSkaAIgUAQ8AkSLgASBSBDwARIqAB4BIEfAAECkCHgAiRcADQKQIeACIFAEPAJEi4AEgUgQ8AESKgAeASOW+ZR+AbHR1dWlgYCDvNjQ0NCRJWrFiRa59rFq1Shs3bsy1h9AIeACZOn78eN4t1AwCHqgR82W0unnzZknSli1bcu4kfszBA0CkCHgAiBQBDwCRYg4eyMB8uYJlPpj8PUzOxde6kFfzEPBABgYGBvTEoz9S8/kTebeSu0Vj5YmD0QP9OXeSv//5aX3Q4xPwQEaaz5/QJ698Lu82MI989qGXBz0+c/AAECkCHgAiRcADQKQIeACIFAEPAJEi4AEgUkED3szazew/zWzAzD4R8lwAgOmCBbyZ1Uv6kqS1ki6TdJOZXRbqfACA6ULe6HS1pAF3f1KSzOwuSe+U9OMQJxsaGlL980Wd/9Dfhzh89UoTknu+PcwnZlJd2Lv1Ek2Ma2hoPL/zVwwNDenZow3asPuC3HoYK5lKvDVPqjOpsS7fX8johOnChqFgxw8Z8Csk/e+U+ilJ18x8kpl1SuqUpObm5jM+2dKlS+fFRgKjo6MqlUp5tzFv1NXVafHiRTl2sEhLly7N8fxl8+L9OToq8d48pa5OdYsX59rCuVLQ96d5oNGmmf2mpLe6+wcq9fskXe3up11Vp6Wlxfv7WZ8CAKplZnvdvWW2n4X8kPUpSa+YUl8q6WDA8wEApggZ8D+U9Itm9iozWyTpPZK+GfB8AIApgs3Bu/u4mX1Y0rcl1Uv6irs/Gup8AIDpgi4X7O73Sro35DkAALPjTlYAiBQBDwCRIuABIFIEPABEKtiNTmfCzIYlHci7j0hcJGkk7yaA0+D9mZ5XunvTbD+YVwGP9JhZ/+nubgPyxvszG0zRAECkCHgAiBQBH6/uvBsAEvD+zABz8AAQKUbwABApAh4AIkXAR4jNzjFfmdlXzOwZM3sk715qAQEfGTY7xzz3VUnteTdRKwj4+Jzc7NzdX5Q0udk5kDt3f0DSs3n3USsI+PjMttn5ipx6AZAjAj4+NstjXAsL1CACPj5sdg5AEgEfIzY7ByCJgI+Ou49Lmtzs/DFJX2ezc8wXZnanpO9Jeo2ZPWVm78+7p5ixVAEARIoRPABEioAHgEgR8AAQKQIeACJFwANApAh41BQz+3kzu8vM/svMfmxm95pZp5l9K+/egLQR8KgZZmaS7pZ0v7v/grtfJumTkpad5XEb0ugPSBtvTNSS6yWNufvfTT7g7vvMbKmk1Wb2j5JeJ2mvpJvd3c1sUFKLu4+YWYukL7j7r5nZZyRdImmlpBEz+4mkZkmvrvz5RXffmt1/GvBSjOBRSybDezZXSPqIymvov1rStVUc7ypJ73T391bq10p6q8pLNv+pmTWeVbfAWSLggbIfuPtT7l6StE/lkflcvunux6fU/+ruo+4+IukZneXUD3C2CHjUkkdVHnXPZnTK9xM6NX05rlN/T86Z8ZoXqjwGkAsCHrXkO5IWm9nvTz5gZm+U9KsJrxnUqX8U3h2uNSB9BDxqhpdX1vt1SW2VyyQflfQZJa+Xf4ukLWa2R+VRObBgsJokAESKETwARIqAB4BIEfAAECkCHgAiRcADQKQIeACIFAEPAJH6f/G02IcNfmwfAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "sns.boxplot('Churn','DataUsage', data = data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The boxplot above indicate people with lower **DataUsage** tends not to churn and there apear to be several outliers for people who doesnt churn and have high data usage. Detailed explanation of what **DataUsage** means was not given, therefore no significant conclusion can be made" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Pre-process Data (Decision Tree)\n", + "* Check for duplicate values and remove\n", + "* Split the data to train-test" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ChurnAccountWeeksContractRenewalDataPlanDataUsageCustServCallsDayMinsDayCallsMonthlyChargeOverageFeeRoamMins
00128112.701265.111089.09.8710.0
10107113.701161.612382.09.7813.7
20137100.000243.411452.06.0612.2
3084000.002299.47157.03.106.6
4075000.003166.711341.07.4210.1
....................................
33280192112.672156.27771.710.789.9
3329068100.343231.15756.47.679.6
3330028100.002180.810956.014.4414.1
33310184000.002213.810550.07.985.0
3332074113.700234.4113100.013.3013.7
\n", + "

3333 rows × 11 columns

\n", + "
" + ], + "text/plain": [ + " Churn AccountWeeks ContractRenewal DataPlan DataUsage \\\n", + "0 0 128 1 1 2.70 \n", + "1 0 107 1 1 3.70 \n", + "2 0 137 1 0 0.00 \n", + "3 0 84 0 0 0.00 \n", + "4 0 75 0 0 0.00 \n", + "... ... ... ... ... ... \n", + "3328 0 192 1 1 2.67 \n", + "3329 0 68 1 0 0.34 \n", + "3330 0 28 1 0 0.00 \n", + "3331 0 184 0 0 0.00 \n", + "3332 0 74 1 1 3.70 \n", + "\n", + " CustServCalls DayMins DayCalls MonthlyCharge OverageFee RoamMins \n", + "0 1 265.1 110 89.0 9.87 10.0 \n", + "1 1 161.6 123 82.0 9.78 13.7 \n", + "2 0 243.4 114 52.0 6.06 12.2 \n", + "3 2 299.4 71 57.0 3.10 6.6 \n", + "4 3 166.7 113 41.0 7.42 10.1 \n", + "... ... ... ... ... ... ... \n", + "3328 2 156.2 77 71.7 10.78 9.9 \n", + "3329 3 231.1 57 56.4 7.67 9.6 \n", + "3330 2 180.8 109 56.0 14.44 14.1 \n", + "3331 2 213.8 105 50.0 7.98 5.0 \n", + "3332 0 234.4 113 100.0 13.30 13.7 \n", + "\n", + "[3333 rows x 11 columns]" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "bool_df = data.duplicated(keep = False)\n", + "data_cl = data[~bool_df]\n", + "data_cl" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": {}, + "outputs": [], + "source": [ + "X, y = data.iloc[:, 2:], data.iloc[:,0] #Choose not to use AccountWeeks\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 19, stratify = y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Decision Tree Training and Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Training accuracy: 0.9502786112301758\n", + "Test Accuracy: 0.922\n" + ] + } + ], + "source": [ + "from sklearn.tree import DecisionTreeClassifier\n", + "clf = DecisionTreeClassifier(max_depth = 6, random_state= 9)\n", + "clf.fit(X_train, y_train)\n", + "\n", + "print(\"Training accuracy: \", clf.score(X_train, y_train))\n", + "print(\"Test Accuracy: \", clf.score(X_test, y_test))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The variance of the model is 0.03 which indicate that the model is doing well in avoiding overfitting.But, this model probably wil have overfit to the no churn label since there are significant more label 0 than 1. The best accuracy metrics to use for this is recall score or generalized F1 score, that way, we will know how our model is doing against the imbalanced dataset. Let plot confusion matrix to verify." + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy score: 0.922\n", + "Precision score: 0.8819149990638664\n", + "Recall score: 0.7797136519459569\n", + "F1 score: 0.8192285229579775\n", + " precision recall f1-score support\n", + "\n", + " 0 0.93 0.98 0.96 855\n", + " 1 0.83 0.58 0.68 145\n", + "\n", + " accuracy 0.92 1000\n", + " macro avg 0.88 0.78 0.82 1000\n", + "weighted avg 0.92 0.92 0.92 1000\n", + "\n" + ] + } + ], + "source": [ + "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report,confusion_matrix\n", + "prob = clf.predict(X_test)\n", + "print(\"Accuracy score: \", accuracy_score(y_test, prob))\n", + "print(\"Precision score: \", precision_score(y_test, prob, average = 'macro'))\n", + "print(\"Recall score: \", recall_score(y_test, prob, average = 'macro'))\n", + "print(\"F1 score: \", f1_score(y_test, prob, average = 'macro'))\n", + "print(classification_report(y_test, prob))" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(91.68, 0.5, 'Actual label')" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAQsAAAELCAYAAADOVaNSAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAXDElEQVR4nO3dd5QV5f3H8feXXaVIExAELCiKBY4tgAoqKoiCKIgGS/yhRiWxxxIjsQSxRSPmxBYliaLGhtgV0QgaQcGAkahIEQsqvUkvu/D9/TEDLJctzy63zO5+Xufcs3fKnfu9F/azzzwz84y5OyIiZamR6wJEpHJQWIhIEIWFiARRWIhIEIWFiATJz3UB5VGw6BsduqlEarc4OtclSAUUrp9txc1Xy0JEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCSIwkJEgigsRCRIfq4LqKyefO5lXnx9FGbGvq1bcfvvr6FmzR23We/zqdP5xYBruHfwDXQ/7ujtes/169cz8LYhfDn9Kxo2qM+9gwfSsnkzps34mtvufZCVq1ZTI68GA/qfRY9uXbbrvWSLvw0dwsk9u7Fg4SIOObQrAM88/VfatGkNQMMG9flp2XLad+ieyzIzTi2LCpi/cBFPj3iV5x+7n1f++QgbN27krXf/vc16GzZs4M8PP07njoeVa/uz587n/Muv32b+S2+8Q/16dXlr+GP835l9uO/hxwCoVasmd958Ha8+/SiPDrmdu+9/lOUrVlbsw8k2nnxyOCf3+sVW8875xSW079Cd9h268/LLI3nllZE5qi57stqyMLP9gd5AS8CBOcBr7j41m3WkQ+GGDaxbt578vHzWrF3HLk0abbPOMyNe44RjO/PF1BlbzX/97TE8/cKrFBQUclDb/bjp2svIy8sr8z3HjB3PpReeC0D3Y4/mzvv+irvTao/dNq/TdJfGNNq5IUt/Wkb9enW381MKwNhxH7PnnruVuPyMM07hhBP7ZbGi3Mhay8LMfgc8BxjwH2Bi/PxZM7shW3WkQ7NdmnD+2afTrW9/jut9DvV2qkPnw3+21TrzFy5i9Acf0a9Pz63mf/3d94wa/W+eemQILz7xEDVq1OCNd94Let8FCxeza9MmAOTn51F3pzr8tGz5Vut8/uV0CgoK2b1l8+34hBLq6KMOZ/6Chcyc+W2uS8m4bLYsLgTauntB0Zlmdh8wBfhjcS8yswHAAICHh9zORf3PznSdZVq2fAXvjZ3A2y88Tr16dbn2pjt5/e0xnHLi8ZvXufsvj3L1Jb/cpsXw8aTJfDltJmddeBUA69ato9HODQG4cuBgZs+ZT0FhAXPnL+T08y4D4Nx+vTnt5O64+za1mNnm5wsXLWHg4D9xx03XUqOG9jCz4cwz+/D886/muoysyGZYbARaALNS5jePlxXL3YcCQwEKFn2z7W9LDkyYNJmWLZpt/iXv2qUTkz//cquwmDLtK377hyj/li5bztjxE8nLy8PdObVHN66+5IJttnv/XbcAUZ/FjXcMYdiD92y1vFnTJsxbsIhdm+5CYeEGVq5aTYP69QBYuWoVl/72Fq4YcB4HtzsgEx9bUuTl5XFanx50PKJHrkvJimyGxW+A0Wb2FfBDPG8PYB/g8izWsd2aN9uFz76Yxpq1a6lVsyYfT5pM2/333Wqdt0cM2/z8xtuH0KVzR7oe04mvv53FFTcMpv9Zp9F454YsW76CVatX02LXZmW+73FHHcGrI9/lkHYH8M77Yzn8ZwdjZhQUFHDVwNs49aSunHj89h1xkXDduh7N9OkzmT17bq5LyYqshYW7jzKzNkBHog5OA34EJrr7hmzVkQ4Htd2fE447in4XXEFeXh77t2nNz3v34PmX3wTgzNNOLvG1rffakysu7s+A39zIRt/IDvn53HjNpUFh0bfXiQy87U/06PdLGtSvx59ujbp6Ro0ZyyeTv+CnZSt4ZeS7ANxx4zXsHx/ak+3zz6ceossxR9KkSSO++2YStw6+l8eHPUe/fr15rprsggBYcfvBSZWU3RAJU7uFWjmVUeH62VbcfPWCiUgQhYWIBFFYiEgQhYWIBFFYiEgQhYWIBFFYiEiQEk/KMrM65dmQu6/e/nJEJKlKO4NzJdFl5KHKvsZaRCqt0sLil5QvLESkCisxLNx9WBbrEJGEK9eFZGZ2IPAzYHfgMXefZ2b7APPdfUUmChSRZAgKCzOrCzwGnAEUxK8bBcwD7gS+B67LUI0ikgChh07vAzoBXYF6RJeXbzISOCnNdYlIwoTuhvQFrnL398ws9ajHLGDP9JYlIkkT2rKoDSwuYVk9oFINXiMi5RcaFhOB/iUsOwP4KD3liEhShe6G3AS8a2bvAi8QnX/R08yuJgqLYzJUn4gkRFDLwt3HEXVu1gQeJOrgvBXYG+jm7hMzVqGIJELweRbu/iFwtJnVBnYGftL1ICLVR0WuOl1LdK7FmjTXIiIJFhwWZtbTzD4iCot5wFoz+8jMSh73XkSqjKCwMLNfAa8TXYl6FfDz+OdK4LV4uYhUYUH3DTGzWcBId7+kmGWPAD3dfY8M1LcV3TekctF9Qyqn7b1vSGPgpRKWvQg0qkhRIlJ5hIbFe0CXEpZ1AT5ITzkiklSlDat3YJHJ+4G/m1lj4BVgAdAUOA3oAVyUwRpFJAFK7LMws41sPVJW0f0YT51294wPq6c+i8pFfRaVU0l9FqWdlHVchmoRkUqotGH1/p3NQkQk2co1rB6AmdUAaqXO16nfIlVb6ElZZma/M7OZRKd6ryjmISJVWOih0yuBG4B/EHVs3gEMBmYA3wEDMlGciCRHaFhcDPwBuCeefsXdbwXaAtOAfTNQm4gkSGhY7AVMdvcNRLshDQHcfSPwMHBeRqoTkcQIDYvFQN34+ffAoUWW7Uw0RqeIVGGhR0M+BDoQDfv/DDDIzBoB64HLgNGZKU9EkiI0LAYBLePndxLthpxP1KL4F3BFmusSkYQJukQ9KXS6d+Wi070rp+29RF1EqrnSrjodXp4NuXu/7S9HRJKqtD6LXbJWhYgkXmkXkumqUxHZTH0WIhJEYSEiQRQWIhJEYSEiQRQWIhJEYSEiQdJ1Upa7+5lpqKdULVr3yPRbSBq1rNc41yVIGumkLBEJopOyRCSI+ixEJEjwrQDMrB7QG2hD8bcCuD6NdYlIwgSFhZm1Jhotqw6wE7CQ6M7p+cBSYBmgsBCpwkJ3Q/4MTAKaEd0KoCfRKFnnAiuBjB8JEZHcCt0N6Uh0p/R18fSO8Ujfz5hZE+AvQKcM1CciCRHasqgFLI+H/l8CtCiy7Avg4HQXJiLJEhoWM4A94+efAr82s1pmtgNwITAnE8WJSHKE7oY8BxwCPAXcDLwNLAc2xts4PwO1iUiCBIWFu99X5PkEM2sH9CDaPRnj7l9kqD4RSYjg8yyKcvcfgKFprkVEEiz0PIueZa3j7iO3vxwRSarQlsUbgBOdY1FU0Zv+5KWlIhFJpNCw2KuYeY2A7kSdmxekqyARSabQDs5ZxcyeBXxqZhuA3wOnprMwEUmWdFx1+ilwfBq2IyIJtl1hYWY7Eu2GzE1LNSKSWKFHQyaydWcmwI5AK6Ae6rMQqfJCOzinsG1YrAVeAF5x9ylprUpEEie0g/P8DNchIgkX1GdhZmPMbP8SlrUxszHpLUtEkia0g/NYoH4Jy+oDx6SlGhFJrPIcDUnts9h0NOR4YF7aKhKRRCrtJkN/AG6JJx2YYJZ6tvdmf0pzXSKSMKV1cI4EFhFdD3I/MAT4LmWd9cA0dx+bkepEJDFKu8nQRGAigJmtAN5w98XZKkxEkiW0z2IycHhxC8ysp5kdlLaKRCSRynMrgGLDAugQLxeRKiw0LA4juslQccYDh6anHBFJqtCwyCO6E1lxdiK6TkREqrDQsJgIDChh2QCiu5WJSBUWeiHZIOBdM/sYeILoJKzmQH+iGwydkJHqRCQxQi8k+8DMugN3AQ8QnXuxEfgYOEHnWYhUfcG3AnD394EjzawOsDOw1N1XA5jZDu5ekJkSRSQJyj1SlruvdvfZwBozO97M/oauDRGp8sp9kyEzOxw4G+gHNCO6UfJzaa5LRBImdFi9dkQBcRbRUHrriQ6XXgM85O6FmSpQRJKhxN0QM9vbzH5vZp8D/wOuA6YSHQHZl6iT81MFhUj1UFrLYibRpekfA78CXnT3pQBm1iALtYlIgpTWwTmLqPXQjmikrE5mVqEbKYtI5VdiWLj7XkBnopOwugKvA/Pjox9dKWbkLBGpuko9dOru4939CqAlcCLwKnA6MCJe5WIza5/ZEkUkCcy9fA2EeNzNnkRHRnoBtYEZ7n5A+svb2i4N9lNrphKpk18z1yVIBcxa/Fmx42dW5KSs9e7+irufRXSeRX+izlARqcK2616n7r7K3Z9291PSVZCIJFM67qIuItWAwkJEgigsRCSIwiIH6jeox2NP/oWPJr7Fh/8ZSfsOh3Bqn5MYO+EN5i+dysGHtst1iZLiwl+fy78+fIl3xr3E/UPvpmbNLSNJDrjsPGYt/oydGzXMXYFZoLDIgTv/eCNj3h1Lpw49OLZzb2bM+JqpX87g/HOvYPyHE3NdnqRo1rwpFwz4Bb26nk33o/qSl1eDU/qeBEDzFs046tgj+PGHOTmuMvMUFllWt95OHNG5A/98MjqvraCggOXLVvDVjG/4eua3Oa5OSpKXn0etWjXJy8ujdu1azJ+7EIBb7rieuwb9mfKer1QZKSyyrFWr3Vm8aAkPPHwXY8a+zJ8fuJ06dWrnuiwpxfy5Cxj64BOM/987TPxyNCuWr2Ts++PpdtKxzJu7gKlTZuS6xKxIRFiY2QWlLBtgZpPMbNLa9T9lsarMyMvP56CDD+TxfzzL8UefxupVa7jy6pIGTpckqN+gHt17HsdRh/WgY9tu1N6pNn3PPIXLr7mY++56KNflZU0iwgK4taQF7j7U3du7e/taOzbMYkmZMXf2PObMnsd/P/kMgNdfHcVBBx+Y46qkNEd1OYIfZv3IksVLKSwsZNQbo+l3dm9236Mlb33wAuM+fYvmLZrx5nvPs0vTxrkuN2Oydsm5mX1W0iKi08arhQULFjFn9jxa77MXX8/8lqO7HMn06V/nuiwpxZzZ8zi0/UHUql2LtWvW0vmYwxn15mjO6nPR5nXGffoWp3Q9m6VLfspdoRmWzfEpmhFdubo0Zb4BH2WxjpwbeP1tPPL3e9lhhx2Y9d0PXHnZQHr26sZd99xM4yaNeGb4o0z5fCr9+l5U9sYk4yZ/8jkjX3uXN997ng2FG5jy+VSeeWJE2S+sYsp91WmF38jsH8Dj7j6umGXPuPs5ZW1DV51WLrrqtHIq6arTrLUs3P3CUpaVGRQikltJ6eAUkYRTWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiARRWIhIEIWFiAQxd891DQKY2QB3H5rrOiRMdfz3UssiOQbkugApl2r376WwEJEgCgsRCaKwSI5qtf9bBVS7fy91cIpIELUsRCSIwkJEgigscszMTjKz6WY208xuyHU9Ujoze8zMFpjZF7muJdsUFjlkZnnAQ0AP4EDgbDM7MLdVSRmGASfluohcUFjkVkdgprt/4+7rgeeA3jmuSUrh7h8AS3JdRy4oLHKrJfBDkekf43kiiaOwyC0rZp6OZUsiKSxy60dg9yLTuwFzclSLSKkUFrk1EdjXzPYysx2Bs4DXclyTSLEUFjnk7oXA5cDbwFRguLtPyW1VUhozexYYD+xnZj+a2YW5rilbdLq3iARRy0JEgigsRCSIwkJEgigsRCSIwkJEgigsEsDMBpmZF3nMMbMXzax1Bt+zV/xereLpVvF0r3Jso5+ZnZ/GmurGNZS4zYrUGb9umJlN2u4io229b2Yj0rGtyiQ/1wXIZsvYcjXj3sBtwGgza+vuq7Lw/nOBI4Fp5XhNP6AJ0ZWYUsUpLJKj0N0nxM8nmNn3wFigJ/BC6spmVtvd16Trzd19HTChzBWl2tJuSHJ9Ev9sBWBm35nZEDO72cx+BJbH82uY2Q3x4DnrzGyGmZ1XdEMWGRQP2rLCzJ4E6qesU2zz3swuNrPPzWytmc03sxFm1sDMhgGnA12K7D4NKvK63mY2KX7dPDO7x8x2SNn26XG9a8zsA2D/inxRZtbfzMaZ2RIzW2pm75lZ+xLW7WNm0+K6xqWOHxLyfVZXalkkV6v457wi884BpgCXsuXf7gHgPGAw8F/gBOAxM1vs7m/E61wJ3ALcSdRa6QvcU1YBZnZTvN2Hgd8CdYCTgbpEu0l7AA3jeiC6MA4z6wc8CzwK/B5oDdxF9Mfpunidw4DngZeBq4C2wPCyaipBK+BJ4GtgR6Lv6QMza+fu3xRZb0/gPuBmYA1wK/C2me3r7mvjdUK+z+rJ3fXI8QMYBCwiCoB8oA3wHlHroXm8zndE/Qq1irxuH2AjcF7K9p4EJsbP84iuZP1ryjr/IrocvlU83Sqe7hVPNwRWA/eVUvcI4P2UeQbMAh5Pmf9Lol/QxvH0cOBL4ksO4nk3xjWcX8p7blVnMctrxN/hNOCWIvOHxa/rVGTenkAh8OvQ7zOefh8Ykev/N9l+aDckORoDBfFjOlEn55nuPrfIOqN9y19AgK5E/7lfNrP8TQ9gNHBIPGzf7kBz4NWU93upjHqOBGoDj5fzc7QhanEMT6lpDFALaBev1xF4zePfvsCaimVmB5jZy2Y2H9hA9B3uF9dS1AJ3/2jThLvPItrd6xjPCvk+qy3thiTHMqAb0V+/ecCclF8kgPkp002IWg7LSthmc2DX+PmClGWp06kaxz/nlrrWtprEP0eWsHzT+B27VqCmbZhZPeAdou/mGqJWzVrg70ThVNb2FxB9TxD2ff5Y3hqrCoVFchS6e1nnAaSGxxKiZnRnor+IqRaw5d+4acqy1OlUi+OfzYl2kUJtGp9yAPBpMcu/jX/Oq0BNxTmSaNCgE9x982FfM2tQzLrFbb8pUT8QhH2f1ZbConIbQ/SXsIG7/6u4FczsB6JfzN7AqCKL+pax7fFEfQznEXdKFmM92/71ng7MJuoL+Vsp258InGpmA4u0oMqqqTi145/rNs0ws05EfRufpKzb1Mw6bdoVMbM9gMPYsqtV5vdZnSksKjF3n25mjwDPmdk9wCSiX962QBt3v8jdN8TL7jWzRURHQ04HDihj2z+Z2W3AHfEoXiOBmkRHQ25199lEnYi9zawPUfN8jrvPMbNrgafMrD7wFlGo7A30Ac5w99XA3cDHRH0b/yDqy6jIQDITgJXA3+LPuRtRh/HsYtZdFNe16WjIYKLWwrD4M5f5fVagvqoj1z2semw5GlLGOt8B9xYz34DfEDWl1wELgX8D/VPWuS1etgJ4mujwYolHQ4q89ldERy3WEbVQhgP142VNiA59LolfO6jI63oQBdMqoqM6k4Hbgfwi6/wcmEnUxzAO6EAFjoYQnfn6BVEAfEZ0Itv7FDliQRQIk4haLzPiz/Mh0K4C3+dW264uD42UJSJBdOhURIIoLEQkiMJCRIIoLEQkiMJCRIIoLEQkiMJCRIIoLEQkyP8DrJqzin4hc6MAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "cm = confusion_matrix(y_test, prob)\n", + "ax = sns.heatmap(cm, square=True, annot= True, cbar = False)\n", + "ax.set_xlabel('Predicted label', fontsize = 15)\n", + "ax.set_ylabel ('Actual label', fontsize = 15)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since we don't have enough data for churn label, the model mispredict 61 of churn as not churn. Let try XGBOOST to select the best parameters to use" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### XGBOOST" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "metadata": {}, + "outputs": [], + "source": [ + "import xgboost as xgb" + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "metadata": {}, + "outputs": [], + "source": [ + "dmatrix_train = xgb.DMatrix(data=X_train, label=y_train)\n", + "dmatrix_test = xgb.DMatrix(data=X_test, label=y_test)\n", + "\n", + "param = {'max_depth':6, \n", + " 'eta':0.3, \n", + " 'objective':'multi:softprob', \n", + " 'num_class':2}\n", + "\n", + "num_round = 6\n", + "model = xgb.train(param, dmatrix_train, num_round)\n", + "\n", + "preds = model.predict(dmatrix_test)\n", + "\n", + "best_preds = np.asarray([np.argmax(line) for line in preds])" + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Precision: 0.8803212544949875\n", + "Recall: 0.8216172615446662\n", + "Accuracy: 0.93\n", + " precision recall f1-score support\n", + "\n", + " 0 0.95 0.97 0.96 855\n", + " 1 0.82 0.67 0.73 145\n", + "\n", + " accuracy 0.93 1000\n", + " macro avg 0.88 0.82 0.85 1000\n", + "weighted avg 0.93 0.93 0.93 1000\n", + "\n" + ] + } + ], + "source": [ + "# metrics\n", + "print(\"Precision: \", (precision_score(y_test, best_preds, average='macro')))\n", + "print(\"Recall: \",(recall_score(y_test, best_preds, average='macro')))\n", + "print(\"Accuracy: \", (accuracy_score(y_test, best_preds)))\n", + "print(classification_report(y_test, best_preds))" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "cm = confusion_matrix(y_test, best_preds)\n", + "ax = sns.heatmap(cm, square=True, annot=True, cbar=False)\n", + "ax.set_xlabel('Predicted Labels',fontsize = 15)\n", + "ax.set_ylabel('True Labels',fontsize = 15)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using XGboost, there have been some trade-off and our recall score and f1 score have increased. Let search for the best hyperparameters" + ] + }, + { + "cell_type": "code", + "execution_count": 106, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tuned: {'eta': 0.05, 'gamma': 0.1, 'learning_rate': 0.01, 'max_depth': 5, 'min_child_weight': 1, 'n_estimators': 500}\n", + "Mean of the cv scores is 0.932706\n", + "Train Score 0.958423\n", + "Test Score 0.938000\n", + "Seconds used for refitting the best model on the train dataset: 3.063578\n" + ] + } + ], + "source": [ + "from xgboost.sklearn import XGBClassifier\n", + "from sklearn.model_selection import GridSearchCV, RandomizedSearchCV \n", + "\n", + "param_dict = {\n", + " 'eta': [0.05,0.10,0.20,0.25,0.30],\n", + " 'gamma': [0.0, 0.1, 0.2, 0.4],\n", + " 'max_depth':range(3,10,2),\n", + " 'min_child_weight':range(1,6,2),\n", + " 'learning_rate': [0.001,0.01,0.1,1],\n", + " 'n_estimators': [200,500,1000]\n", + " \n", + "}\n", + "\n", + "xgc = XGBClassifier()\n", + "\n", + "clf = GridSearchCV(xgc,param_dict,cv=2,n_jobs = -1).fit(X_train,y_train)\n", + "\n", + "print(\"Tuned: {}\".format(clf.best_params_)) \n", + "print(\"Mean of the cv scores is {:.6f}\".format(clf.best_score_))\n", + "print(\"Train Score {:.6f}\".format(clf.score(X_train,y_train)))\n", + "print(\"Test Score {:.6f}\".format(clf.score(X_test,y_test)))\n", + "print(\"Seconds used for refitting the best model on the train dataset: {:.6f}\".format(clf.refit_time_))" + ] + }, + { + "cell_type": "code", + "execution_count": 113, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0 0.95 0.98 0.96 855\n", + " 1 0.87 0.68 0.76 145\n", + "\n", + " accuracy 0.94 1000\n", + " macro avg 0.91 0.83 0.86 1000\n", + "weighted avg 0.94 0.94 0.93 1000\n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "Text(91.68, 0.5, 'True Labels')" + ] + }, + "execution_count": 113, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "xgb_pred = clf.predict(X_test)\n", + "print(classification_report(y_test, xgb_pred))\n", + "cm = confusion_matrix(y_test, xgb_pred)\n", + "ax = sns.heatmap(cm, square=True, annot=True, cbar=False)\n", + "ax.set_xlabel('Predicted Labels',fontsize = 15)\n", + "ax.set_ylabel('True Labels',fontsize = 15)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Hyperparameter tuning was worth the time as both the recall and f1 score improved compared to choosing hyperparameter manually." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Logistic Regression\n", + "Logisitic regression will require us to standardize our dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 136, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0.41167182, 0.67648946, 0.32758048, ..., 1.99072703, 0.0715836 ,\n", + " 0.08500823],\n", + " [0.41167182, 0.14906505, 0.32758048, ..., 1.56451025, 0.10708191,\n", + " 1.24048169],\n", + " [0.41167182, 0.9025285 , 0.32758048, ..., 0.26213309, 1.57434567,\n", + " 0.70312091],\n", + " ...,\n", + " [0.41167182, 1.83505538, 0.32758048, ..., 0.01858065, 1.73094204,\n", + " 1.3837779 ],\n", + " [0.41167182, 2.08295458, 3.05268496, ..., 0.38390932, 0.81704825,\n", + " 1.87621082],\n", + " [0.41167182, 0.67974475, 0.32758048, ..., 2.66049626, 1.28129669,\n", + " 1.24048169]])" + ] + }, + "execution_count": 136, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from scipy import stats\n", + "import numpy as np\n", + "z = np.abs(stats.zscore(data))\n", + "z" + ] + }, + { + "cell_type": "code", + "execution_count": 137, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "414\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
indexChurnAccountWeeksContractRenewalDataPlanDataUsageCustServCallsDayMinsDayCallsMonthlyChargeOverageFeeRoamMins
000128112.701265.111089.09.8710.0
110107113.701161.612382.09.7813.7
220137100.000243.411452.06.0612.2
360121112.033218.28887.317.437.5
480117100.191184.59763.917.588.7
.......................................
29143327079100.002134.79840.09.4911.8
291533280192112.672156.27771.710.789.9
29163329068100.343231.15756.47.679.6
29173330028100.002180.810956.014.4414.1
29183332074113.700234.4113100.013.3013.7
\n", + "

2919 rows × 12 columns

\n", + "
" + ], + "text/plain": [ + " index Churn AccountWeeks ContractRenewal DataPlan DataUsage \\\n", + "0 0 0 128 1 1 2.70 \n", + "1 1 0 107 1 1 3.70 \n", + "2 2 0 137 1 0 0.00 \n", + "3 6 0 121 1 1 2.03 \n", + "4 8 0 117 1 0 0.19 \n", + "... ... ... ... ... ... ... \n", + "2914 3327 0 79 1 0 0.00 \n", + "2915 3328 0 192 1 1 2.67 \n", + "2916 3329 0 68 1 0 0.34 \n", + "2917 3330 0 28 1 0 0.00 \n", + "2918 3332 0 74 1 1 3.70 \n", + "\n", + " CustServCalls DayMins DayCalls MonthlyCharge OverageFee RoamMins \n", + "0 1 265.1 110 89.0 9.87 10.0 \n", + "1 1 161.6 123 82.0 9.78 13.7 \n", + "2 0 243.4 114 52.0 6.06 12.2 \n", + "3 3 218.2 88 87.3 17.43 7.5 \n", + "4 1 184.5 97 63.9 17.58 8.7 \n", + "... ... ... ... ... ... ... \n", + "2914 2 134.7 98 40.0 9.49 11.8 \n", + "2915 2 156.2 77 71.7 10.78 9.9 \n", + "2916 3 231.1 57 56.4 7.67 9.6 \n", + "2917 2 180.8 109 56.0 14.44 14.1 \n", + "2918 0 234.4 113 100.0 13.30 13.7 \n", + "\n", + "[2919 rows x 12 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "2919" + ] + }, + "execution_count": 137, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "outliers = list(set(np.where(z > 3)[0]))\n", + "\n", + "print(len(outliers))\n", + "\n", + "new_data = data.drop(outliers,axis = 0).reset_index(drop = False)\n", + "display(new_data)\n", + "\n", + "y_new = y[list(new_data[\"index\"])]\n", + "len(y_new)" + ] + }, + { + "cell_type": "code", + "execution_count": 143, + "metadata": {}, + "outputs": [], + "source": [ + "X_new = new_data.drop(['index', 'Churn'], axis = 1)\n", + "\n", + "from sklearn.preprocessing import StandardScaler\n", + "X_scaled = StandardScaler().fit_transform(X_new)" + ] + }, + { + "cell_type": "code", + "execution_count": 156, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Training accuracy: 0.8962310327949095\n", + "Test accuracy: 0.910958904109589\n" + ] + } + ], + "source": [ + "from sklearn.linear_model import LogisticRegressionCV\n", + "from sklearn.model_selection import train_test_split\n", + "X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_new, test_size = 0.3, random_state = 19, stratify = y_new)\n", + "model = LogisticRegressionCV(cv = 3,solver = 'sag', max_iter = 1000, random_state = 9)\n", + "model.fit(X_train, y_train)\n", + "\n", + "print(\"Training accuracy: \", model.score(X_train, y_train))\n", + "print(\"Test accuracy: \", model.score(X_test, y_test))" + ] + }, + { + "cell_type": "code", + "execution_count": 157, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Precision: 0.8795475113122172\n", + "Recall: 0.6120192307692308\n", + "Accuracy: 0.910958904109589\n", + " precision recall f1-score support\n", + "\n", + " 0 0.91 0.99 0.95 780\n", + " 1 0.85 0.23 0.36 96\n", + "\n", + " accuracy 0.91 876\n", + " macro avg 0.88 0.61 0.66 876\n", + "weighted avg 0.91 0.91 0.89 876\n", + "\n" + ] + } + ], + "source": [ + "preds = model.predict(X_test)\n", + "print(\"Precision: \", (precision_score(y_test, preds, average='macro')))\n", + "print(\"Recall: \",(recall_score(y_test, preds, average='macro')))\n", + "print(\"Accuracy: \", (accuracy_score(y_test, preds)))\n", + "print(classification_report(y_test, preds))" + ] + }, + { + "cell_type": "code", + "execution_count": 158, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(91.68, 0.5, 'Actual Value')" + ] + }, + "execution_count": 158, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "cmx = confusion_matrix(y_test, preds)\n", + "ax = sns.heatmap(cmx, square= True, annot= True, cbar= False)\n", + "ax.set_xlabel(\"Predicted Value\", fontsize = 15)\n", + "ax.set_ylabel(\"Actual Value\", fontsize = 15)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Logistic regression performed worse on the recall and f1 score which indicated that logistic regression is best used when we don't have imbalanced dataset. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Evaluation\n", + "Choosing a better model is taskful but data processing and analysis is more taskful. After carefully training the data using Decision Tree, XGboost and Logistic Regression, it was evident that XGBoost performed the best due to the fact that it is able to run several trees and developed on error from previous tree. Using the grid search from the scikit learn library to tune hyperparameters gives the best result and finally XGBoost model was selected. The dataset was biased i.e. it is unbalanced, therefore most model will overfit to the largest number of cases which in this case was 'Customer not churn (0)'. The best performing model was able to predict true positive of 98 and false negative of 47 leading to a low recall. \n", + "* Others:\n", + "* False positive: 15\n", + "* True negative: 8.4e02. \n", + "* The model can generally be improved if more positive case is provided in the dataset i.e. more data will improve the model." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Preprocessing\n", + "\n", + "- Are there any duplicated values?\n", + "- Do we need to do feature scaling?\n", + "- Do we need to generate new features?\n", + "- Split Train and Test dataset. (0.7/0.3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ML Application\n", + "\n", + "- Define models.\n", + "- Fit models.\n", + "- Evaluate models for both train and test dataset.\n", + "- Generate Confusion Matrix and scores of Accuracy, Recall, Precision and F1-Score.\n", + "- Analyse occurrence of overfitting and underfitting. If there is any of them, try to overcome it within a different section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Evaluation\n", + "\n", + "- Select the best performing model and write your comments about why choose this model.\n", + "- Analyse results and make comment about how you can improve model." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/My Project/09-11-2020 ML Course Nigeria Project 'Abdulhameed Araromi'.ipynb b/My Project/09-11-2020 ML Course Nigeria Project 'Abdulhameed Araromi'.ipynb new file mode 100644 index 0000000..52b2795 --- /dev/null +++ b/My Project/09-11-2020 ML Course Nigeria Project 'Abdulhameed Araromi'.ipynb @@ -0,0 +1,1675 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Project\n", + "\n", + "In this project, our aim is to building a model for predicting churn. Churn is the percentage of customers that stopped using your company's product or service during a certain time frame. Thus, in the given dataset, our label will be `Churn` column.\n", + "\n", + "## Steps\n", + "- Read the `churn.csv` file and describe it.\n", + "- Make at least 4 different analysis on Exploratory Data Analysis section.\n", + "- Pre-process the dataset to get ready for ML application. (Check missing data and handle them, can we need to do scaling or feature extraction etc.)\n", + "- Define appropriate evaluation metric for our case (classification).\n", + "- Train and evaluate Logistic Regression, Decision Trees and one other appropriate algorithm which you can choose from scikit-learn library.\n", + "- Is there any overfitting and underfitting? Interpret your results and try to overcome if there is any problem in a new section.\n", + "- Create confusion metrics for each algorithm and display Accuracy, Recall, Precision and F1-Score values.\n", + "- Analyse and compare results of 3 algorithms.\n", + "- Select best performing model based on evaluation metric you chose on test dataset.\n", + "\n", + "\n", + "Good luck :)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

Abdulhameed Temitope Araromi

" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import seaborn as sns\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ChurnAccountWeeksContractRenewalDataPlanDataUsageCustServCallsDayMinsDayCallsMonthlyChargeOverageFeeRoamMins
00128112.71265.111089.09.8710.0
10107113.71161.612382.09.7813.7
20137100.00243.411452.06.0612.2
3084000.02299.47157.03.106.6
4075000.03166.711341.07.4210.1
\n", + "
" + ], + "text/plain": [ + " Churn AccountWeeks ContractRenewal DataPlan DataUsage CustServCalls \\\n", + "0 0 128 1 1 2.7 1 \n", + "1 0 107 1 1 3.7 1 \n", + "2 0 137 1 0 0.0 0 \n", + "3 0 84 0 0 0.0 2 \n", + "4 0 75 0 0 0.0 3 \n", + "\n", + " DayMins DayCalls MonthlyCharge OverageFee RoamMins \n", + "0 265.1 110 89.0 9.87 10.0 \n", + "1 161.6 123 82.0 9.78 13.7 \n", + "2 243.4 114 52.0 6.06 12.2 \n", + "3 299.4 71 57.0 3.10 6.6 \n", + "4 166.7 113 41.0 7.42 10.1 " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Read csv\n", + "data = pd.read_csv(\"churn.csv\")\n", + "data.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 3333 entries, 0 to 3332\n", + "Data columns (total 11 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Churn 3333 non-null int64 \n", + " 1 AccountWeeks 3333 non-null int64 \n", + " 2 ContractRenewal 3333 non-null int64 \n", + " 3 DataPlan 3333 non-null int64 \n", + " 4 DataUsage 3333 non-null float64\n", + " 5 CustServCalls 3333 non-null int64 \n", + " 6 DayMins 3333 non-null float64\n", + " 7 DayCalls 3333 non-null int64 \n", + " 8 MonthlyCharge 3333 non-null float64\n", + " 9 OverageFee 3333 non-null float64\n", + " 10 RoamMins 3333 non-null float64\n", + "dtypes: float64(5), int64(6)\n", + "memory usage: 286.6 KB\n" + ] + } + ], + "source": [ + "# Describe our data for each feature and use .info() for get information about our dataset\n", + "# Analys missing values\n", + "data.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ChurnAccountWeeksContractRenewalDataPlanDataUsageCustServCallsDayMinsDayCallsMonthlyChargeOverageFeeRoamMins
count3333.0000003333.0000003333.0000003333.0000003333.0000003333.0000003333.0000003333.0000003333.0000003333.0000003333.000000
mean0.144914101.0648060.9030900.2766280.8164751.562856179.775098100.43564456.30516110.05148810.237294
std0.35206739.8221060.2958790.4473981.2726681.31549154.46738920.06908416.4260322.5357122.791840
min0.0000001.0000000.0000000.0000000.0000000.0000000.0000000.00000014.0000000.0000000.000000
25%0.00000074.0000001.0000000.0000000.0000001.000000143.70000087.00000045.0000008.3300008.500000
50%0.000000101.0000001.0000000.0000000.0000001.000000179.400000101.00000053.50000010.07000010.300000
75%0.000000127.0000001.0000001.0000001.7800002.000000216.400000114.00000066.20000011.77000012.100000
max1.000000243.0000001.0000001.0000005.4000009.000000350.800000165.000000111.30000018.19000020.000000
\n", + "
" + ], + "text/plain": [ + " Churn AccountWeeks ContractRenewal DataPlan DataUsage \\\n", + "count 3333.000000 3333.000000 3333.000000 3333.000000 3333.000000 \n", + "mean 0.144914 101.064806 0.903090 0.276628 0.816475 \n", + "std 0.352067 39.822106 0.295879 0.447398 1.272668 \n", + "min 0.000000 1.000000 0.000000 0.000000 0.000000 \n", + "25% 0.000000 74.000000 1.000000 0.000000 0.000000 \n", + "50% 0.000000 101.000000 1.000000 0.000000 0.000000 \n", + "75% 0.000000 127.000000 1.000000 1.000000 1.780000 \n", + "max 1.000000 243.000000 1.000000 1.000000 5.400000 \n", + "\n", + " CustServCalls DayMins DayCalls MonthlyCharge OverageFee \\\n", + "count 3333.000000 3333.000000 3333.000000 3333.000000 3333.000000 \n", + "mean 1.562856 179.775098 100.435644 56.305161 10.051488 \n", + "std 1.315491 54.467389 20.069084 16.426032 2.535712 \n", + "min 0.000000 0.000000 0.000000 14.000000 0.000000 \n", + "25% 1.000000 143.700000 87.000000 45.000000 8.330000 \n", + "50% 1.000000 179.400000 101.000000 53.500000 10.070000 \n", + "75% 2.000000 216.400000 114.000000 66.200000 11.770000 \n", + "max 9.000000 350.800000 165.000000 111.300000 18.190000 \n", + "\n", + " RoamMins \n", + "count 3333.000000 \n", + "mean 10.237294 \n", + "std 2.791840 \n", + "min 0.000000 \n", + "25% 8.500000 \n", + "50% 10.300000 \n", + "75% 12.100000 \n", + "max 20.000000 " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Churn 0\n", + "AccountWeeks 0\n", + "ContractRenewal 0\n", + "DataPlan 0\n", + "DataUsage 0\n", + "CustServCalls 0\n", + "DayMins 0\n", + "DayCalls 0\n", + "MonthlyCharge 0\n", + "OverageFee 0\n", + "RoamMins 0\n", + "dtype: int64" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.isna().sum()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exploratory Data Analysis" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAP60lEQVR4nO3dcayd9V3H8fdnMBm6ESEUVtrOsqVTCyqEayXyh0yi1CWmbHNLMRuNErsQZkaymMD+ENQ0WSLbHHPDdBmDmm2k2YZUBSfD6VxkY7dLs9JiXR0Id630spmARtF2X/84T8NZe3p/p7c959z2vl/JyXnO93l+z/lecssnz/P8nuemqpAkaS6vmHQDkqSFz7CQJDUZFpKkJsNCktRkWEiSms6cdAOjcv7559fKlSsn3YYknVK2b9/+fFUtObJ+2obFypUrmZ6ennQbknRKSfJvg+qehpIkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDWdtndwn6grfm/LpFvQArT9j2+YdAvSRHhkIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqSmkYVFkhVJvpzkySS7kry3q9+R5LtJdnSvN/eNuS3J3iR7klzbV78iyc5u3V1JMqq+JUlHO3OE+z4IvK+qvpnkNcD2JI906z5cVXf2b5xkNbAeuAS4CPhSkjdW1SHgbmAj8DXgIWAt8PAIe5ck9RnZkUVV7a+qb3bLLwJPAsvmGLIOuL+qXqqqp4C9wJokS4FzquqxqipgC3DdqPqWJB1tLNcskqwELge+3pXek+RbSe5Jcm5XWwY82zdspqst65aPrA/6no1JppNMz87OnswfQZIWtZGHRZJXA58HbqmqF+idUnoDcBmwH/jg4U0HDK856kcXqzZX1VRVTS1ZsuREW5ckdUYaFkleSS8oPl1VXwCoqueq6lBV/QD4BLCm23wGWNE3fDmwr6svH1CXJI3JKGdDBfgk8GRVfaivvrRvs7cAT3TL24D1Sc5KcjGwCni8qvYDLya5stvnDcCDo+pbknS0Uc6Gugp4F7AzyY6u9n7g+iSX0TuV9DTwboCq2pVkK7Cb3kyqm7uZUAA3AfcCZ9ObBeVMKEkao5GFRVV9lcHXGx6aY8wmYNOA+jRw6cnrTpJ0PLyDW5LUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lS08jCIsmKJF9O8mSSXUne29XPS/JIkm937+f2jbktyd4ke5Jc21e/IsnObt1dSTKqviVJRxvlkcVB4H1V9dPAlcDNSVYDtwKPVtUq4NHuM9269cAlwFrg40nO6PZ1N7ARWNW91o6wb0nSEUYWFlW1v6q+2S2/CDwJLAPWAfd1m90HXNctrwPur6qXquopYC+wJslS4JyqeqyqCtjSN0aSNAZjuWaRZCVwOfB14MKq2g+9QAEu6DZbBjzbN2ymqy3rlo+sD/qejUmmk0zPzs6e1J9BkhazkYdFklcDnwduqaoX5tp0QK3mqB9drNpcVVNVNbVkyZLjb1aSNNBIwyLJK+kFxaer6gtd+bnu1BLd+4GuPgOs6Bu+HNjX1ZcPqEuSxmSUs6ECfBJ4sqo+1LdqG7ChW94APNhXX5/krCQX07uQ/Xh3qurFJFd2+7yhb4wkaQzOHOG+rwLeBexMsqOrvR/4ALA1yY3AM8DbAapqV5KtwG56M6lurqpD3bibgHuBs4GHu5ckaUxGFhZV9VUGX28AuOYYYzYBmwbUp4FLT153kqTj4R3ckqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lS01BhkeTRYWqSpNPTmXOtTPIq4EeB85OcC6RbdQ5w0Yh7kyQtEHOGBfBu4BZ6wbCdl8PiBeBjo2tLkrSQzBkWVfUR4CNJfreqPjqmniRJC0zryAKAqvpokl8EVvaPqaotI+pLkrSADBUWSf4ceAOwAzjUlQswLCRpERgqLIApYHVV1SibkSQtTMPeZ/EE8NpRNiJJWriGDYvzgd1Jvphk2+HXXAOS3JPkQJIn+mp3JPlukh3d6819625LsjfJniTX9tWvSLKzW3dXkhz5XZKk0Rr2NNQd89j3vcCfcvR1jQ9X1Z39hSSrgfXAJfSm6X4pyRur6hBwN7AR+BrwELAWeHge/UiS5mnY2VD/cLw7rqqvJFk55ObrgPur6iXgqSR7gTVJngbOqarHAJJsAa7DsJCksRr2cR8vJnmhe/1PkkNJXpjnd74nybe601TndrVlwLN928x0tWXd8pH1Y/W5Mcl0kunZ2dl5tidJOtJQYVFVr6mqc7rXq4C30TvFdLzupjcF9zJgP/DBrj7oOkTNUT9Wn5uraqqqppYsWTKP9iRJg8zrqbNV9RfAL89j3HNVdaiqfgB8AljTrZoBVvRtuhzY19WXD6hLksZo2Jvy3tr38RX07rs47nsukiytqv3dx7fQm5ILsA34TJIP0bvAvQp4vKoOdafArgS+DtwA+NgRSRqzYWdD/Xrf8kHgaXoXpY8pyWeBq+k9sXYGuB24Osll9ILmaXoPKqSqdiXZCuzu9n9zNxMK4CZ6M6vOpndh24vbkjRmw86G+q3j3XFVXT+g/Mk5tt8EbBpQnwYuPd7vlySdPMPOhlqe5IHuJrvnknw+yfL2SEnS6WDYC9yfondd4SJ6U1f/sqtJkhaBYcNiSVV9qqoOdq97AeemStIiMWxYPJ/knUnO6F7vBL43ysYkSQvHsGHx28A7gH+ndzPdbwDHfdFbknRqGnbq7B8BG6rqPwCSnAfcSS9EJEmnuWGPLH72cFAAVNX3gctH05IkaaEZNixe0ffQv8NHFsMelUiSTnHD/g//g8A/Jfkcvbuv38GAG+gkSaenYe/g3pJkmt7DAwO8tap2j7QzSdKCMfSppC4cDAhJWoTm9YhySdLiYlhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkppGFRZJ7khxI8kRf7bwkjyT5dvfe/3e9b0uyN8meJNf21a9IsrNbd1eSjKpnSdJgozyyuBdYe0TtVuDRqloFPNp9JslqYD1wSTfm40nO6MbcDWwEVnWvI/cpSRqxkYVFVX0F+P4R5XXAfd3yfcB1ffX7q+qlqnoK2AusSbIUOKeqHquqArb0jZEkjcm4r1lcWFX7Abr3C7r6MuDZvu1mutqybvnI+kBJNiaZTjI9Ozt7UhuXpMVsoVzgHnQdouaoD1RVm6tqqqqmlixZctKak6TFbtxh8Vx3aonu/UBXnwFW9G23HNjX1ZcPqEuSxmjcYbEN2NAtbwAe7KuvT3JWkovpXch+vDtV9WKSK7tZUDf0jZEkjcmZo9pxks8CVwPnJ5kBbgc+AGxNciPwDPB2gKralWQrsBs4CNxcVYe6Xd1Eb2bV2cDD3UuSNEYjC4uquv4Yq645xvabgE0D6tPApSexNUnScVooF7glSQuYYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNEwmLJE8n2ZlkR5LprnZekkeSfLt7P7dv+9uS7E2yJ8m1k+hZkhazSR5ZvKmqLquqqe7zrcCjVbUKeLT7TJLVwHrgEmAt8PEkZ0yiYUlarBbSaah1wH3d8n3AdX31+6vqpap6CtgLrBl/e5K0eE0qLAr42yTbk2zsahdW1X6A7v2Crr4MeLZv7ExXO0qSjUmmk0zPzs6OqHVJWnzOnND3XlVV+5JcADyS5J/n2DYDajVow6raDGwGmJqaGriNJOn4TSQsqmpf934gyQP0Tis9l2RpVe1PshQ40G0+A6zoG74c2DfWhqUF5pk//JlJt6AF6HW/v3Nk+x77aagkP5bkNYeXgV8FngC2ARu6zTYAD3bL24D1Sc5KcjGwCnh8vF1L0uI2iSOLC4EHkhz+/s9U1d8k+QawNcmNwDPA2wGqaleSrcBu4CBwc1UdmkDfkrRojT0squo7wM8NqH8PuOYYYzYBm0bcmiTpGBbS1FlJ0gJlWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajplwiLJ2iR7kuxNcuuk+5GkxeSUCIskZwAfA34NWA1cn2T1ZLuSpMXjlAgLYA2wt6q+U1X/C9wPrJtwT5K0aJw56QaGtAx4tu/zDPALR26UZCOwsfv4n0n2jKG3xeB84PlJN7EQ5M4Nk25BR/P387DbczL28hODiqdKWAz6L1BHFao2A5tH387ikmS6qqYm3Yc0iL+f43GqnIaaAVb0fV4O7JtQL5K06JwqYfENYFWSi5P8CLAe2DbhniRp0TglTkNV1cEk7wG+CJwB3FNVuybc1mLiqT0tZP5+jkGqjjr1L0nSDzlVTkNJkibIsJAkNRkWmpOPWdFCleSeJAeSPDHpXhYDw0LH5GNWtMDdC6yddBOLhWGhufiYFS1YVfUV4PuT7mOxMCw0l0GPWVk2oV4kTZBhobkM9ZgVSac/w0Jz8TErkgDDQnPzMSuSAMNCc6iqg8Dhx6w8CWz1MStaKJJ8FngM+MkkM0lunHRPpzMf9yFJavLIQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFdAKSvDbJ/Un+NcnuJA8l2Zjkrybdm3QyGRbSPCUJ8ADw91X1hqpaDbwfuPAE93tK/LljLS7+Ukrz9ybg/6rqzw4XqmpHkh8HrknyOeBSYDvwzqqqJE8DU1X1fJIp4M6qujrJHcBFwErg+ST/ArwOeH33/idVddf4fjTph3lkIc3f4SAY5HLgFnp/B+T1wFVD7O8KYF1V/Wb3+aeAa+k9Kv72JK88oW6lE2BYSKPxeFXNVNUPgB30jhhatlXVf/d9/uuqeqmqngcOcIKnt6QTYVhI87eL3tHAIC/1LR/i5VO+B3n5392rjhjzX0PuQxo7w0Kav78DzkryO4cLSX4e+KU5xjzNywHzttG1Jp1choU0T9V7CudbgF/pps7uAu5g7r/58QfAR5L8I72jBemU4FNnJUlNHllIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqSm/wf03QODNr6OSgAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Our label Distribution (countplot)\n", + "sns.countplot(data['Churn'])" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEGCAYAAABsLkJ6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAu60lEQVR4nO3deZhcZZX48e+p6uo1ve97d/Z0FrInhC2sBhCCosOiAuqAUXBGZ3Rkfj6PPx1n5ue4zYgimyLEEREBNQgSIpCwJIEsZOsknXR3Ot2d9Jbe963e3x91g03TS/V6azmfJ/101b33rTrvU+k697733vOKMQallFLBx2F3AEoppeyhCUAppYKUJgCllApSmgCUUipIaQJQSqkgFWJ3AGORlJRk8vLy7A5DKaX8yr59+84ZY5IHL/erBJCXl8fevXvtDkMppfyKiJwearkOASmlVJDSBKCUUkFKE4BSSgUpTQBKKRWkNAEopVSQ0gSglFJBShOAUkoFKU0ASikVpDQBKKVUkPKrO4FV8HrqnfIR19++JmeaIlEqcHh1BCAiG0SkSESKReT+IdaLiDxgrT8kIssHrHtcRGpF5MigNgkisk1ETlq/4yfeHaWUUt4aNQGIiBN4ELgWKABuE5GCQZtdC8yxfu4BHhqw7glgwxAvfT/wqjFmDvCq9VwppdQ08eYIYDVQbIwpNcb0AE8DGwdtsxHYbDx2A3Eikg5gjHkDaBjidTcCT1qPnwRuGkf8SimlxsmbBJAJVAx4XmktG+s2g6UaY6oArN8pXsSilFJqkniTAGSIZWYc24yLiNwjIntFZG9dXd1kvKRSSim8uwqoEsge8DwLODuObQarEZF0Y0yVNVxUO9RGxphHgUcBVq5cOSlJRQWXka4g0quHVDDzJgHsAeaISD5wBrgVuH3QNluA+0TkaWAN0Hx+eGcEW4A7ge9Zv/80lsCVGs65tm62F9Xx+vFaKho7AEiODiMzNoK5adE4ZKgDVqWCz6gJwBjTJyL3AVsBJ/C4MaZQRDZZ6x8GXgKuA4qBDuCz59uLyG+B9UCSiFQC/9cY80s8X/zPiMjngXLgk5PZMRVcnnqnnN5+N1sOnGV/eSMGiA4PISU6jIb2Hg5XNmOApBlhXD4vmSVZcTgdmghUcBNj/GdUZeXKlUanhAxOo90I1tLVy292n6aisZOLZyexNDuO9NhwxNrb7+13c7y6le1FtVQ1d5EYFcrHl2eRnxQ17Gvq8JAKFCKyzxizcvByvRNY+b2zTZ1s3lVGZ28/n1qTw8KM2A9t43I6WJwZy6KMGI5Xt/Li4Sp+8WYpl8xJ5qqCFEIcWhVFBR9NAMqvtXf3sXlXGSLCFy6dRUZcxIjbiwgL0mOYmRzFi4eqeONkHSdrW7l9dQ6JM8LG9N56cln5O93tUX7LGMNz+ytp7+nnM2tzR/3yHygsxMnHl2fx6TW5NHX08uD2Yk7UtE5htEr5Hk0Aym/tKq3neHUrGxamjenLf6CCjBjuvXw2cRGhPLmzjO1FtfjTeTGlJkITgPJLVc2d/OVINfNSo1k3K3FCr5UQFcqmy2axOCuWV47W8PSeCnr63JMUqVK+S88BKL/jNoZn91USGerk5hVZ71/pMxGhIQ5uWZlNRmwEWwuraWjvYcOiNNJiwychYqV8kx4BKL9zoKKJquYurluczoywyduHEREunZvMp9fmUtfWzY0/e4vDlc2T9vpK+RpNAMqv9Pa72Xa0hsy4CBZnfvhyz8mwID2GTZfNwuV0cMuju9heNGSVEqX8ng4BKb+yq6Se5s5ePrEia0pLOqTFhPOZC3N5cmcZn3tiDx9blsWKXJ2zSAUWPQJQfqOjp4/tJ2qZlxrNrOQZU/5+MeEu7r5kJjOTZvDc/kp2lZyb8vdUajppAlB+Y3tRHd29bj6yKG3a3jPc5eSOdbkUpMfwwqEqDlQ0Tdt7KzXVNAEov9Da1cvu0nqW5cSRFjO9V+aEOBzcsiqb/KQont1XQVG13jCmAoMmAOUXdpbU0+82rJ9rz8RxLqeDz6zNJS02nKfePU1FQ4ctcSg1mTQBKJ/X3OnZ+1+UGUtS9Njq9UymcJeTu9blMyMshN/t1ZvFlP/TBKB83q93ldHd52b9vGS7Q2FGWAg3r8iiob2HlwtHm/NIKd+mCUD5tI6ePh5/u4x5qdGkx46v3s9km5k0g4tmJbK7tIHi2ja7w1Fq3DQBKJ/29LsVNLT3+MTe/0DXLEwjaUYYz++vpKu33+5wlBoXTQDKZ/X0uXnszVLW5CeQmzj8zF12cDkdfHJFFs2dvWw7WmN3OEqNiyYA5bO2HDxLVXMXX7p8tt2hDCk7IZIVufG8W9ZAc2ev3eEoNWaaAJRPcrsNj+woYX5aNJfOSbI7nGGtn5eCMYYdJ+rsDkWpMdMEoHzS60W1nKxtY9Nlsyal3PNUSYgKZXlOPHv0KED5IU0Ayic9sqOUzLgIrl+Sbncoo9KjAOWvNAEon7PvdCPvljXw95fk43L6/n9RPQpQ/sr3/7pU0HlkRwlxkS5uWZVtdyheO38U8OZJPQpQ/kMTgPIpxbWtvHK0hjvW5hIZ6j/TVSREhbIoM5b95Y309muJCOUfNAEon/LwjlLCXQ7uuijf7lDGbFVeAl29bo6c0WkklX/QBKB8xpmmTv743hluXZVDQlSo3eGM2cykKBKjQtlT1mh3KEp5RROA8hmPvVEKwN2XzrQ5kvEREVbmxlNW305JndYIUr5PE4DyCQ3tPTy9p5yblmWSGecbRd/GY3luPA6BZ/ZU2B2KUqPSBKB8whNvn6K7z82my/xz7/+86HAX89NieHZfpc4XoHyeJgBlu7buPp7cdZprClKZnRJtdzgTtiovnvr2Hv56TIvEKd+mCUDZ7ql3TtPc2csX1/tm0bexmpMaTXpsOM/s1WEg5du8SgAiskFEikSkWETuH2K9iMgD1vpDIrJ8tLYislREdovIARHZKyKrJ6dLyp909fbz2JunuHh2Ekuz4+wOZ1I4RLjxggzeOnmOpo4eu8NRalijJgARcQIPAtcCBcBtIlIwaLNrgTnWzz3AQ160/T7wHWPMUuBb1nMVZJ7dV0ldazdfunyW3aFMquuXpNPnNrxSqMNAynd5c6vlaqDYGFMKICJPAxuBowO22QhsNsYYYLeIxIlIOpA3QlsDxFjtY4GzE++O8nVPvVP+/uN+t+HH24rIjo/gVF0762b5btnnsTpc2Ux8pIvH3iylz20+sO72NTk2RaXUB3kzBJQJDBzMrLSWebPNSG2/AvxARCqAHwL/OtSbi8g91hDR3ro6rbMSSA5VNtHY0cv6eSk+XfJ5PESExZlxlNS10dHdZ3c4Sg3JmwQw1F+m8XKbkdp+EfiqMSYb+Crwy6He3BjzqDFmpTFmZXKyb80Lq8bPbZVPTosJZ16a/1/5M5TFWbG4DRRWtdgdilJD8iYBVAIDyzJm8eHhmuG2GantncDz1uPf4xlqUkGiqLqV2tZuLp2bhCPA9v7Py4gNJyEqVGsDKZ/lTQLYA8wRkXwRCQVuBbYM2mYLcId1NdBaoNkYUzVK27PAZdbjK4CTE+yL8iNvnqwjLsLF4sw4u0OZMp5hoFhK6tpo12Eg5YNGPQlsjOkTkfuArYATeNwYUygim6z1DwMvAdcBxUAH8NmR2lovfTfwExEJAbrwXD2kgkBFQwdl9R1cvzgdpyMw9/7PW5wZy44TdRw928Kq/AS7w1HqA7wquG6MeQnPl/zAZQ8PeGyAe71tay1/C1gxlmBVYHjzZB3hLgcrc+PtDmXKpceGkxgVyuEzzZoAlM/RO4HVtKpv66bwbAtr8hMJczntDmfKiQgLM2IpPddGZ0+/3eEo9QGaANS0ervkHA4RLpyZaHco06YgIwa3gaIavRpI+RZNAGraNLb3sO90I0uz44iJcNkdzrTJio8gOiyEo1Wtdoei1Af4z6Sryi8MvNN3sDdO1NHbb7hoduDc8esNhwjz06M5WNlMn84XrHyIHgGoaeE2hnfLGshLjCQtNtzucKZdQXoMPX1uSura7Q5FqfdpAlDTori2jYb2HtYE0dj/QDOTZxDqdHBM7wpWPkQTgJoWu0vriQoLYWFGzOgbByCX08Hc1Bkcq27B7R5cSUUpe2gCUFOusaOHoupWVuXGE+II3v9yC9JjaO3q42Blk92hKAVoAlDTYM+pBoCgvxFqfloMDoFtR3WOAOUbNAGoKdXndrPndCPz06KJjwy1OxxbRYQ6yUuK4hVNAMpHaAJQU+pYVSvt3X1Be/J3sIL0GIpr2yg7p1cDKftpAlBT6r3yRmLCQ5idMsPuUHzC/DTPSfC/HtOjAGU/TQBqyrR193GippWl2fEBW/N/rBKiQpmfFq0JQPkETQBqyhysaMJtYFlOnN2h+JSrFqSyp6yRpo4eu0NRQU4TgJoy71U0khkXQWpM8N35O5KrClLpdxu2F+kc18peWgtITYnqli7ONnXx0SXpXrcZqY5QIFmSGUtydBjbjtZw07JMu8NRQUyPANSUOFDeiENgSVac3aH4HIdDuGpBCjtO1NHdp3MEKPtoAlCTzm0MByqamJsazYwwPcgcylULUmnr7uOd0ga7Q1FBTBOAmnQldW20dPWxLCfwp3wcr4tmJxHucujVQMpWmgDUpDtc2UxYiIP5adF2h+Kzwl1OLpmTzF+P1uCZUlup6acJQE2qfreh8GwLC9JjcDn1v9dIri5I5WxzF4VntUS0sof+hapJVVLXRmdvP4syYu0OxeddOT8Fh8ArhdV2h6KClCYANamOnPEM/8xJ1dIPo0mcEcaqvAQtDqdsowlATZrefrcO/4zRNQvTOF7dyul6LQ6npp9eo6cmzc6Sejp7+1mcqcM/Ixl4w1tnj+c+gP/6y3EunpMMwO1rcmyJSwUf3U1Tk+bFQ2cJC3Fo5c8xSIgKJT02nKM6V7CygSYANSl6+928crRGh3/GYUF6DKfrO2jr7rM7FBVk9C9VTYqdJfU0dfTq8M84FKTHYIDjehSgppkmADUpXj5SRVSoU4d/xiE9Npz4SJcOA6lppwlATVi/27DtaA3r56fo8M84iMj7U0V292pxODV99K9VTdj+8kbOtfXwkYVpdofitxZmxNLnNhyvabU7FBVEvEoAIrJBRIpEpFhE7h9ivYjIA9b6QyKy3Ju2IvJla12hiHx/4t1RdnilsJpQp4PL5yXbHYrfykmMJDo8hCNnmu0ORQWRUe8DEBEn8CBwNVAJ7BGRLcaYowM2uxaYY/2sAR4C1ozUVkQuBzYCS4wx3SKSMpkdU9PDGMPWwhrWzU4kOtxldzh+yyHCwowY9p1upKOnj8hQvUVHTT1vjgBWA8XGmFJjTA/wNJ4v7oE2ApuNx24gTkTSR2n7ReB7xphuAGNM7ST0R02z49WtlDd06PDPJFiUGUtvv+H14zpVpJoe3iSATKBiwPNKa5k324zUdi5wiYi8IyI7RGTVUG8uIveIyF4R2VtXp38YvmZrYTUinglO1MTkJUYxIyyEl45U2R2KChLeJAAZYtngAubDbTNS2xAgHlgLfB14RkQ+tL0x5lFjzEpjzMrkZB1j9jVbC2tYkRNPcnSY3aH4vfPDQK8dq32/RIRSU8mbBFAJZA94ngWc9XKbkdpWAs9bw0bvAm4gyfvQld0qGjo4VtWiwz+TaFFmLJ29/ew4oSOiaup5kwD2AHNEJF9EQoFbgS2DttkC3GFdDbQWaDbGVI3S9o/AFQAiMhcIBc5NtENq+my16thrApg8eYlRJEaF8tJhnSNATb1RLzUwxvSJyH3AVsAJPG6MKRSRTdb6h4GXgOuAYqAD+OxIba2Xfhx4XESOAD3AnUbnxvML56tZ/u/uctJiwnmr+Jznk1cT5nQI1yxMY8uBM3T19hPuctodkgpgXl1rZox5Cc+X/MBlDw94bIB7vW1rLe8BPj2WYJXvaOvu43R9O5fP16t3J9tHl6Tz23fLee14LdctTrc7HBXA9E5gNS5F1S0YPIXM1ORaOzOR5OgwthwYfKpNqcmlCUCNy9GzLcRFuEiPDbc7lIDjdAg3LMngtaJamjt77Q5HBTBNAGrMevrcnKxtY0FGDENcuasmwY1LM+jpc79/ol2pqaAJQI3ZiZpW+txGh3+m0AVZseQmRuowkJpSmgDUmB2raiHC5SQvMcruUAKWiLDxggx2lpyjtrXL7nBUgNIEoMakt9/N8epWFqRH43To8M9UunFpBm4DLx7S0hBqamgCUGOy51QDnb39OvwzDWanRFOQHsOfdBhITRFNAGpMthZWE+IQZqdE2x1KUNi4NIMDFU2UnWu3OxQVgDQBKK+53YaXC6uZmxpNaIj+15kONy7NQASe319pdygqAOlfsfLaexVN1LR0szBDh3+mS3psBBfPTuK5/Wdwu7VSippcmgCU114+UoXLKcxP0wQwnT6xIoszTZ3sPlVvdygqwGgCUF4xxvCXI9VcPDuJiFAtUDadrilIY0ZYCM/tO2N3KCrAaAJQXik820JlYyfXLtLiZNMtItTJR5ek85cjVbR399kdjgogmgCUV/5ypAqnQ7i6QKd+tMPNK7Lo6OnnL0e0NISaPJoA1KjOD/+snZlAfFSo3eEEpZW58eQmRvLcPr0aSE0er+YDUMHn/KQvADUtXZTWtbMoI/YDy9X0ERFuXp7Fj7edoKKhg+yESLtDUgFAE4Aa1ZGzzQjo5Z/TZLgkG+IQROD3eyv4p2vmTXNUKhDpEJAakTGGQ5XN5CZGEh3usjucoBYXGcqlc5J5Zm8lff1uu8NRAUCPANSIqlu6qGvtZt3SDLtDUUBmXAQ7TtTxby8cZf6geky3r8mxKSrlr/QIQI3oYEUzDoFFGbF2h6KABekxzAgLYU9Zg92hqACgCUANy20MhyqbmJMSTVSYHiz6AqdDWJEbT1FNq04XqSZME4AaVkVDB02dvSzJ0r1/X7IyNx63gX2nG+0ORfk5TQBqWAcqmghxiNb+9zGJM8KYmRzFvtMNuI0WiFPjpwlADanfbThyppkF6TGEubT2j69ZnZdAY0cvxbVtdoei/JgmADWkkro22nv6uUCHf3xSQUYMUWEh7C7VCqFq/DQBqCEdrGgi3OVgbqrO/OWLQhwOVufFU1TdSkN7j93hKD+lCUB9SGtXL0fONrM4M5YQp/4X8VWr8xMRgXd0ngA1TvrXrT7khYNV9PYbVuYm2B2KGkFshIuC9Bj2ljXS06d3Bqux0wSgPuR3eytIiQ4jKz7C7lDUKNbOTKSzt59DlU12h6L8kCYA9QFF1a0crGhiZV4CImJ3OGoU+UlRpESHsau0HqOXhKox0gSgPuCZvRW4nMKy7Di7Q1FeEBEunJVIVXOX3himxkwTgHpfT5+bP7x3hqsLUrX0gx9Zmh1HuMvB42+fsjsU5We8SgAiskFEikSkWETuH2K9iMgD1vpDIrJ8DG2/JiJGRJIm1hU1UX89VkNDew+fXJltdyhqDMJCnKzOS+TlI9VUNHTYHY7yI6MmABFxAg8C1wIFwG0iUjBos2uBOdbPPcBD3rQVkWzgakCnmfIBT++pID02nEvnJNsdihqjC2cl4hDhl2/pUYDynjdHAKuBYmNMqTGmB3ga2Dhom43AZuOxG4gTkXQv2v438C+Anr2yWUldG2+cqOOWVdk4HXry19/ERri44YIMntlbQXOHVglV3vEmAWQCFQOeV1rLvNlm2LYiciNwxhhzcKQ3F5F7RGSviOytq6vzIlw1Hpt3luFyCp9ak2t3KGqc/v6SfDp6+vntHj2gVt7xJgEMtTs4eI99uG2GXC4ikcA3gW+N9ubGmEeNMSuNMSuTk3VoYiq0dPXy7L5KbliSQXJ0mN3hqHFamBHLulmJPPF2md4YprziTQKoBAaeFcwCznq5zXDLZwH5wEERKbOW7xeRtLEErybH7/dW0t7Tz2cvyrc7FDVBd18yk+qWLv58aPCfqFIf5k0C2APMEZF8EQkFbgW2DNpmC3CHdTXQWqDZGFM1XFtjzGFjTIoxJs8Yk4cnUSw3xlRPVseUd/rdhid3lrEiN57FWvnT7102N5m5qTN4ZEep3himRjVqAjDG9AH3AVuBY8AzxphCEdkkIpuszV4CSoFi4DHgSyO1nfReqHF7/Xgt5Q0dfPaiPLtDUZPA4RA2XTaLoppWXi+qtTsc5eO8utvHGPMSni/5gcseHvDYAPd623aIbfK8iUNNvl/tPEVaTDgfWaijb4Hihgsy+NErJ3hoewlXzE+1Oxzlw/RO4CB2qLKJt4vruXNdHi4t+xwwXE4Hd1+Sz56yRvaUNdgdjvJh+lcfxH7+egkx4SF8em2O3aGoSXbLqhwSokJ5eHuJ3aEoH6YJIEidrGnl5cJq7lqXR3S4y+5w1CSLCHVy17o8Xj1ey/HqFrvDUT5KK34FqYd2lOByCtHhLp56R28cCkR3XJjLwztKeGh7CT+5dZnd4SgfpAkgCFU0dPCnA2dZm5+gVT8DyFCJfHlOPFsOnOWfr55HTmKkDVEpX6ZDQEHokTdKcAhcrEXfAt7Fs5NwOISH39BzAerDNAEEmarmTp7ZW8knVmQRG6Fj/4EuJsLFipx4nt1bSU1Ll93hKB+jCSDI/Oy1Yowx3Hv5bLtDUdPk0rnJ9Lnd/OLNUrtDUT5GB4AD2OAx4cb2Hp5+t4KVefG8ceKcTVGp6ZYQFcqNF2Twm3fKuffy2cRFhtodkvIRegQQRF4vqkUE1s9LsTsUNc2+uH42HT39/OrtMrtDUT5EE0CQqG/rZn95I6vzE3TsPwjNS4vm6oJUnthZRlt3n93hKB+hCSBIvHa8FqdDuGyuXvkTrL60fhbNnb38Vu/7UBZNAEGgpqWLAxVNrJ2ZqHf9BrFlOfFcNDuRx94spau33+5wlA/Qk8BBYNvRGkJDHFym1/0HrfMXBMxPi+Ht4nq+8dwh1uQnvr/+9jVaDyoY6RFAgKto6OBoVQuXzEkmUu/6DXozk6LIjo/gjRN19Lt1wphgpwkggBljeLmwmqiwEC6anTh6AxXwRIT181Jo7Ojl8Jkmu8NRNtMEEMCKa9s4da6dK+YlExbitDsc5SPmpUWTGhPG9qI63DptZFDTBBCg3G7D1qPVxEe6WJWfYHc4yoc4RFg/N4Xa1m6OntVS0cFME0CA+vPhKs42dXHVglRCHPoxqw9anBVLYlQo24tqdfL4IKbfDAGop8/ND7cWkRYTzgXZcXaHo3yQQ4T185I529zFiZpWu8NRNtEEEIB+885pyhs62LAoDYeI3eEoH7U0O564CBevHdejgGClCSDAtHb18tPXilk3K5E5KTPsDkf5MKdDuHRuMhWNnewqqbc7HGUDTQAB5pEdpTS09/Cv1y5AdO9fjWJFbjzR4SH85NWTdoeibKAJIIDUtHTxi7dKueGCDBZnxdodjvIDLqeDy+Ym886pBnYWa4nwYKMJIIB8/+Ui3G74+jXz7A5F+ZFVeQmkx4bzw1eK9FxAkNEEECAOVzbz3P5KPntRnk7+rcbE5XRw3xWz2V/exPaiOrvDUdNIE0AAMMbw3RePkhgVyr1X6FSPauw+uSKb7IQIfrRNjwKCiSaAALC1sJp3TzXw1avnEqPlntU4hIY4+Mcr53LkTAtbC2vsDkdNEy0P6ec27yzjf149SUp0GMZ8eB5gpbx109IMfv56MT96pYirFqQQ4tT9w0Cnn7Cfe6v4HA3tPVy/OB2nQy/7VOMX4nTwLxvmc7K2jafe1R2JYKAJwI9VNHTw2vFaFmbEMCc12u5wVAD4yMJU1s1K5MfbTtDU0WN3OGqKeZUARGSDiBSJSLGI3D/EehGRB6z1h0Rk+WhtReQHInLc2v4PIhI3KT0KIt95oRCHCNcvTrc7FBUgRIRv3VBAS2cv//NXvTks0I2aAETECTwIXAsUALeJSMGgza4F5lg/9wAPedF2G7DIGLMEOAH864R7E0ReKazmr8dquXJBCnGRoXaHowLI/LQYbl+Tw693n+akFooLaN4cAawGio0xpcaYHuBpYOOgbTYCm43HbiBORNJHamuMecUY02e13w1kTUJ/gkJHTx/feeEo81KjWTcrye5wVAD6p6vnERXq5N/+fFQvCw1g3iSATKBiwPNKa5k323jTFuBzwF+GenMRuUdE9orI3ro6vUkFPHf8nmnq5Ls3LdITv2pKJESF8s/XzOPNk+d4bv8Zu8NRU8SbBDDUN8zgXYLhthm1rYh8E+gDfjPUmxtjHjXGrDTGrExOTvYi3MC2s/gcT+ws4651eazWmb7UFPrM2lxW5yXwnRcKqW7usjscNQW8uQ+gEsge8DwLOOvlNqEjtRWRO4GPAlcaPc4cVUtXL19/9hD5SVF8Y8N8u8NRAWS4+0cumZPEoTNN/J8/HOaXd67UCrMBxpsjgD3AHBHJF5FQ4FZgy6BttgB3WFcDrQWajTFVI7UVkQ3AN4AbjTEdk9SfgPbdF45S1dzJj/7uAiJCdZJ3NfUSZ4TxLx+Zz2vHa3leh4ICzqgJwDpRex+wFTgGPGOMKRSRTSKyydrsJaAUKAYeA740Ulurzc+AaGCbiBwQkYcnr1uB5+Uj1fx+XyWbLpvF8px4u8NRQeSudXmszkvg2y8UUl6v+2qBRPxp5GXlypVm7969docx7U6da+fGn75FXlIUz37xQsJC/rb3r6Uf1FS7fU0OFQ0dXP/Am+QkRvLspnWEu/QI1J+IyD5jzMrBy/VOYB/X2dPPF/93H06n8NCnl3/gy1+p6ZKdEMl/37KUI2da+M4LhaM3UH5Bi8H5MGMM3/zDYYqqW7lzXR5vnNAZm9T0G3iUedncZH77bgW9fYblufHcvibHxsjURGkC8GFP7Czj+ffOcOWCFOZqrR/lA65akEp5Qwd/PHCG5Ogwu8NRE6RDQD5q29Eavvvno1y1IJXL56XYHY5SADgdwm2rc4gOD2Hz7tNUNOhJYX+mCcAHHaps4h9++x6LMmN54LalOPTaa+VDZoSFcOeFefS73XzuiT00d/baHZIaJ00APqaysYPPPbGXxBmh/PLOVUSG6iid8j0pMeF8ak0uZfXtfOk3++jpc9sdkhoHvQzURzz1TjmtXb089mYpbd19bLp0Fikx4XaHpdSIwkIc/PPvD3L94nQeuG2Z1qbyUcNdBqq7lz6is6efJ3aW0dzZy+cuytcvf+UXbl6RRUN7D//x0jGiw0P4fx9frOUi/IgmAB/Q0dPH5l1l1LZ085kLc8lNjLI7JKW8dvelM2nq7OHB10uIjXBx/7XzNQn4CU0ANuvq7ecLv95HeUMHt67O0cs9lV/62jXzaOns45E3Sgl3Ofnq1XPtDkl5QROAjbr7PHf5vnnyHDcvz2JxZqzdISk1JgNvEpuXFs2KnHh+8upJCs8284s7V9kYmfKGXgVkk95+N/c99R6vF9Xxnx9bzIpcLfCm/JtDhI8tz2RZdhx/PVbLg68X2x2SGoUmABv09bv5ytMH2Ha0hm/fUKC306uA4RDh5hVZLM2O4wdbi/jpqzqxvC/TIaBp1tfv5qvPHOTFw1X8n+vmc9dF+XaHpNSkcojwiRVZCPCjbSfYd7qRqwtSP3RiWHd87KcJYBr1uw1f+/1BXjh4lm9smM89l86yOySlpsT5I4EQp4PtJ+ro7Xdz3eJ0vTrIx2gCmCb/u/s0z+2r5L2KJq4pSCU2wqW1/FVAc4hw09IMQpzC2yX1dPe52bg0U28W8yGaAKZBv9vw/H7Pl/9VC1JYr8XdVJAQET66OJ0Il5PXjtfS3tPPrauycTn19KMv0AQwiYbao3cbwx/2n2F/eRNXLkjhivmpNkSmlH1EhKsWpBIVFsKfD57l8bdPccfaPLvDUuhVQFPKbQx/eO8M+8obuXJ+Clfql78KYhfOTOSWVdlUNnTy0I5iTp1rtzukoKcJYIq4jeH5/WfYd7qRK+ancOUC/fJXaklWHJ+7OJ+Onn5uevBtdhbrLHd20gQwBdzG8Ny+SvaXN3LlghSu0i9/pd6XnxTFl9bPJiU6jDsef5fH3zqFP1UlDiSaACZZv9vw7L7zJ3xTddhHqSEkRIXy/JfWsX5eCv/256PcvXkvje09docVdDQBTKI+t5un95RzwLrU84r5erWPUsOJDnfx2B0r+NZHC9hxoo7rHnhTh4SmmSaASdLV289vdpdTeLaF6xen66WeSnlBRPjcxfk8/8WLCHc5uf0X7/DV3x2grrXb7tCCgs4INglau3r5wq/3sauknhuXZrAmP9HukJTyO739brYX1fLGiXNEhTn5ylVzuX1NDuEup92h+b3hZgTTBDBBNS1dfPZXeyiqaeXjyzJZlqNVPZWaiNrWLvaUNfB2cT0p0WFsumyWJoIJ0gQwBYprW7nz8T00dvTw0KdXcKax0+6QlAoIt6/JYVdJPT959QS7SxuIi3Rx8/IsbludzewUnTRprHRO4En2+vFavvK7A7icDn53z4UszorV2j5KTaILZyVy4awLeae0ns27T7N5Vxm/fOsUy3LiuH5xOhsWpZEVH2l3mH5NE8AY9bsN/73tBD97vZiC9Bge+cwKshP0P6FSU2XNzETWzEzkXFs3z+2r5MmdZfz7i8f49xePkREXztyUaOakRpOTEInTIVpmegw0AYxBeX0H9z9/iJ0l9dyyMpvvbFyo45JKTZOkGWF84bJZRIe7qG/r5sjZFoqqW3jjZB3bT9QR6nSQnRBBbWsXq/ISWJIVS3S4y+6wfZqeA/BCV28/D+8o4efbSwhxCN++cSF/tzL7Q9vpEJBS06+rt5/SujaK69o5Xd9OdUsXxoAIzEyK4oKsOAoyYliYEUtBegyxkcGXFPQcwDi0dvXyh/fO8NibpVQ0dHLDBRl887oFpMWG2x2aUsoS7nJSkBFLQUYsANcvSee98kYOVTZzqLKJN4vP8fx7Z97fPjbCRVpMOKkx4aTGhJEaE05ydBh3rsuzqQf28eoIQEQ2AD8BnMAvjDHfG7RerPXXAR3AXcaY/SO1FZEE4HdAHlAG/J0xpnGkOKbjCKCjp489ZY389WgNz++vpL2nnyVZsdy/YT7rZieN2FaPAJTyTa1dvVQ1d1HV3EVNSxfVzV3UtXbTb33/CZCVEMHMpBnMTI4iNyGSnMRIsuMjSYsNZ0ZYiF/PZjbuIwARcQIPAlcDlcAeEdlijDk6YLNrgTnWzxrgIWDNKG3vB141xnxPRO63nn9jIp0cidtt6Ol309Pvpqunn+bOXlq6ejnX1kN5fQdl9e2cqGnlQEUTvf2G0BAHH12Szh0X5rE0O26qwlJKTYPocBfR4S7mpv7tEtJ+t+FcWze1rd3UtHQRFRZCSW0b755qoLO3/wPtI1xOUmLCSIgKJS7CRVxkKNHhIUSFhTAjLITIUCcRLicRoU7CXdZPiIMwl5OwEAdhIQ5CQxyEOh24nA5CnILL6cDpEEIcYlty8WYIaDVQbIwpBRCRp4GNwMAEsBHYbDyHE7tFJE5E0vHs3Q/XdiOw3mr/JLCdKUoA3/rTETbvOj3iNnGRLvKTovj8xTNZNyuRlXnxRIbqCJlSgcrpEGsYKJzFmZ7howtnJmKMoa27j8b2Hho6emnt6qW1q48W63dtSzcdPX109brp6XO/fxQxGfE4BATB+sf5vCAIj3xmBZfOTZ6U9zrPm2+4TKBiwPNKPHv5o22TOUrbVGNMFYAxpkpEhiyeIyL3APdYT9tEpMiLmMfsNHAQ+OP4XyIJCLRKVoHWp0DrDwRenwKtPzBJfbrs3yfUPHeohd4kgKGOTQanvOG28abtiIwxjwKPjqWNHURk71BjbP4s0PoUaP2BwOtToPUHfLtP3lQDrQQGXvOYBZz1cpuR2tZYw0RYv2u9D1sppdREeZMA9gBzRCRfREKBW4Etg7bZAtwhHmuBZmt4Z6S2W4A7rcd3An+aYF+UUkqNwahDQMaYPhG5D9iK51LOx40xhSKyyVr/MPASnktAi/FcBvrZkdpaL/094BkR+TxQDnxyUns2/Xx+mGocAq1PgdYfCLw+BVp/wIf75Fd3AiullJo8OiOYUkoFKU0ASikVpDQBTJCIbBCRIhEptu5o9ksiUiYih0XkgIjstZYliMg2ETlp/fbp6c5E5HERqRWRIwOWDdsHEflX63MrEpGP2BP18Ibpz7dF5Iz1OR0QkesGrPPp/gCISLaIvC4ix0SkUET+0Vrul5/TCP3xj8/JGKM/4/zBc2K7BJgJhOK5l6zA7rjG2ZcyIGnQsu8D91uP7wf+y+44R+nDpcBy4MhofQAKrM8rDMi3Pken3X3woj/fBr42xLY+3x8rznRgufU4Gjhhxe6Xn9MI/fGLz0mPACbm/TIZxpge4Hypi0CxEU+ZDqzfN9kXyuiMMW8ADYMWD9eHjcDTxphuY8wpPFewrZ6OOL01TH+G4/P9Ac9d/8YqFGmMaQWO4akY4Jef0wj9GY5P9UcTwMQMVwLDHxngFRHZZ5XfgEHlOoAhy3X4uOH64M+f3X0icsgaIjo/VOJ3/RGRPGAZ8A4B8DkN6g/4weekCWBiJlzqwodcZIxZjqey670icqndAU0xf/3sHgJmAUuBKuBH1nK/6o+IzACeA75ijGkZadMhlvlcv4boj198TpoAJsabMhl+wRhz1vpdC/wBz2FpIJTrGK4PfvnZGWNqjDH9xhg38Bh/Gz7wm/6IiAvPl+VvjDHPW4v99nMaqj/+8jlpApgYb8pk+DwRiRKR6POPgWuAIwRGuY7h+rAFuFVEwkQkH89cFu/aEN+YnP+StHwMz+cEftIfERHgl8AxY8yPB6zyy89puP74zedk91l0f//BUwLjBJ6z+d+0O55x9mEmnisTDgKF5/sBJAKvAiet3wl2xzpKP36L53C7F8+e1udH6gPwTetzKwKutTt+L/vza+AwcAjPl0m6v/THivFiPEMeh4AD1s91/vo5jdAfv/ictBSEUkoFKR0CUkqpIKUJQCmlgpQmAKWUClKaAJRSKkhpAlBKqSClCUD5NRH5mIgYEZlvcxw3iUiB9fgCETkwYN1tItJh3TCEiCwWkUPjeI/1IvLnSQtaBT1NAMrf3Qa8hecmPDvdhKfSI3iu/849f3MdsA44jqdOzPnnb09rdEoNQROA8ltW/ZWL8Nwgdau1zCkiP7TmNjgkIl+2lq8SkZ0iclBE3hWRaBEJF5FfWdu+JyKXW9veJSI/G/A+fxaR9dbjNhH5D+t1dotIqoisA24EfmDt+efjuUt8jfUSK4AH8XzxY/3ead2B/biI7LHef+OAPvzAWn5IRL4wRN9XWW1mishlA+rOvzcg8Sg1Ik0Ayp/dBLxsjDkBNIjIcuAePF/Ay4wxS4DfWGU6fgf8ozHmAuAqoBO4F8AYsxjPkcSTIhI+yntGAbut13kDuNsYsxPP3Z5fN8YsNcaUADuBdVZpDTewnQ8mgLfx3BH6mjFmFXA5ngQShSehNVvLVwF3W2UDALASzsPARmNMKfA14F5jzFLgEqtvSo1KE4DyZ7fhmYMB6/dteL7cHzbG9AEYYxqAeUCVMWaPtazFWn8xnlv2McYcB04Dc0d5zx7g/Dj8PiBvmO3exvNFvxrYYyWF2SKSDMywvrivAe63jhq2A+FAjrX8Dmv5O3jKJMyxXncB8ChwgzGmfMB7/VhE/gGIO993pUYTYncASo2HiCQCVwCLRMTgmZ3N4PlSHlzfRIZYdn75UPr44M7RwKOCXvO3+in9DP83tBvP3vvFwC5rWSWeoaqdA97/ZmNM0QeC8hQY+7IxZuug5evx1AYKx3M+4XwF1++JyIt4atDsFpGrrISm1Ij0CED5q08Am40xucaYPGNMNnAK2A9sEpEQ8Mw1i+cEbIaIrLKWRVvr3wA+ZS2bi2fvuwjP9JhLRcQhItl4N2NTK54pAYH3Z4eqAO7ibwlgF/AV/pYAtgJftr7wEZFlA5Z/ccBVQ3OtoSGAJuB64D8HnJeYZYw5bIz5L2AvYOsVUcp/aAJQ/uo2PPMWDPQckAGUA4dE5CBwu/FM13kL8FNr2TY8e9E/B5wichjPOYK7jDHdeIZUTuG5mueHeJLKaJ4Gvm6dhJ1lLXsbCDPGnJ8BaheeyqvnE8B3AZcV6xHrOcAvgKPAfmv5Iww40jDG1AA3AA+KyBrgKyJyxOpbJ/AXL+JVSquBKqVUsNIjAKWUClKaAJRSKkhpAlBKqSClCUAppYKUJgCllApSmgCUUipIaQJQSqkg9f8BB+HCWThoo6EAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Example EDA\n", + "sns.distplot(data.AccountWeeks)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Let us perform some analysis with the data's features**\n", + "* Group the data by whether the customer wil churn and analyse their different features to know more about how the data behave." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAV9UlEQVR4nO3dfbCedZ3f8feHECEijAIRw4FswBPrBjtGezZDx5ku6mKI7TayO25jq2Ss0zhTiLF1OwP+o+xMHGfHh9JUXWJlpZ2tNPvgGJ+6C1GLTtWY0AgEpNwLCZyQJiGy5SE0Mcm3f5w7Fyfk5JxjkvtcJ7nfr5kz9/37Xdfvur+HOeRz/67HVBWSJAGc1XYBkqTpw1CQJDUMBUlSw1CQJDUMBUlS4+y2CzgZF198cc2bN6/tMiTptLJ58+anq2r2WMtO61CYN28emzZtarsMSTqtJNl+vGXuPpIkNQwFSVLDUJAkNQwFSVLDUJAkNQwFSVKjZ6GQ5NwkG5P8PMnWJLd2+z+ZZEeSLd2fd48ac0uSTpJHkizuVW2SpLH18jqF/cA7qur5JDOBHyX5bnfZ56vqM6NXTrIAWAZcBVwK3JPkDVV1qIc1ShrHmjVr6HQ6rdawY8cOAAYGBlqtA2BwcJCVK1e2XUZP9WymUCOe7zZndn/Ge3jDUuCuqtpfVY8DHWBRr+qTdHp48cUXefHFF9suo2/09IrmJDOAzcAg8IWq+mmSJcBNSW4ANgEfq6pngAHgJ6OGD3f7Xr7NFcAKgLlz5/ayfKnvTYdvxatWrQLgtttua7mS/tDTA81VdaiqFgKXAYuSvAn4EvB6YCGwE/hsd/WMtYkxtrm2qoaqamj27DFv3SFJOkFTcvZRVf0d8APguqra1Q2Lw8CXeWkX0TBw+ahhlwFPTUV9kqQRvTz7aHaSV3ffzwJ+B/hFkjmjVrseeLD7fj2wLMk5Sa4A5gMbe1WfJOlYvTymMAe4s3tc4SxgXVV9K8l/SbKQkV1D24APA1TV1iTrgIeAg8CNnnkkSVOrZ6FQVfcDbxmj/wPjjFkNrO5VTZKk8XlFsySpYShIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpYSgIgL179/KRj3yEvXv3tl2KpBYZCgLg9ttv5/7772ft2rVtlyKpRYaC2Lt3L/fccw8Ad999t7MFqY8ZCuL222/n8OHDABw+fNjZgtTHDAWxYcOGo9pHZg2S+o+hIJKM25bUP3r5kB1N0po1a+h0Oq19/vnnn88zzzxzVPvIw9LbMDg4OC0eGC/1I2cKYs6cOeO2JfUPZwrTwHT4Vnz99dfzzDPPsHjxYm655Za2y5HUEkNBwMjs4MCBA6xYsaLtUiS1qGe7j5Kcm2Rjkp8n2Zrk1m7/hUnuTvJo9/U1o8bckqST5JEki3tVm441c+ZMBgcHueiii9ouRVKLenlMYT/wjqp6M7AQuC7J1cDNwIaqmg9s6LZJsgBYBlwFXAd8McmMHtYnSXqZnoVCjXi+25zZ/SlgKXBnt/9O4D3d90uBu6pqf1U9DnSARb2qT5J0rJ6efZRkRpItwG7g7qr6KXBJVe0E6L6+trv6APDkqOHD3b6Xb3NFkk1JNu3Zs6eX5UtS3+lpKFTVoapaCFwGLErypnFWH+uKqRpjm2uraqiqhmbPnn2KKpUkwRRdp1BVfwf8gJFjBbuSzAHovu7urjYMXD5q2GXAU1NRnyRpRC/PPpqd5NXd97OA3wF+AawHlndXWw58o/t+PbAsyTlJrgDmAxt7VZ8k6Vi9vE5hDnBn9wyis4B1VfWtJD8G1iX5EPAE8F6AqtqaZB3wEHAQuLGqDvWwPknSy/QsFKrqfuAtY/TvBd55nDGrgdW9qkmSND7vfSRJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqRGL++SKukErVmzhk6n03YZ08KR/w6rVq1quZLpYXBwkJUrV/Zs+4aCNA11Oh0e3fq/mPsq7x7/il+N7NDYv31Ty5W074nnZ/T8MwwFaZqa+6pDfPytz7ZdhqaRT913Qc8/w2MKkqSGoSBJahgKkqSGoSBJavQsFJJcnuT7SR5OsjXJqm7/J5PsSLKl+/PuUWNuSdJJ8kiSxb2qTZI0tl6efXQQ+FhV3ZfkfGBzkru7yz5fVZ8ZvXKSBcAy4CrgUuCeJG+oKs/Jk6Qp0rOZQlXtrKr7uu+fAx4GBsYZshS4q6r2V9XjQAdY1Kv6JEnHmpJjCknmAW8BftrtuinJ/UnuSPKabt8A8OSoYcOMESJJViTZlGTTnj17elm2JPWdnodCklcBfwl8tKqeBb4EvB5YCOwEPntk1TGG1zEdVWuraqiqhmbPnt2boiWpT/U0FJLMZCQQ/qyq/gqgqnZV1aGqOgx8mZd2EQ0Dl48afhnwVC/rkyQdrZdnHwX4CvBwVX1uVP+cUatdDzzYfb8eWJbknCRXAPOBjb2qT5J0rF6effQ24APAA0m2dPs+DrwvyUJGdg1tAz4MUFVbk6wDHmLkzKUbPfNIkqZWz0Khqn7E2McJvjPOmNXA6l7VJEkan1c0S5IahoIkqWEoSJIahoIkqWEoSJIahoIkqWEoSJIahoIkqWEoSJIahoIkqWEoSJIavbwhnqQTtGPHDl54bgafuu+CtkvRNLL9uRmct2NHTz/DmYIkqeFMQZqGBgYG2H9wJx9/67Ntl6Jp5FP3XcA5A+M96v7kOVOQJDX6eqawZs0aOp1O22VMC0f+O6xatarlSqaHwcFBVq5c2XYZ0pTr61DodDpsefBhDr3ywrZLad1ZBwqAzY/tarmS9s3Y98u2S5BaM6lQSHIe8GJVHU7yBuCNwHer6lc9rW4KHHrlhbz4xne3XYamkVm/OO7DAaUz3mSPKdwLnJtkANgAfBD4aq+KkiS1Y7KhkKraB/wesKaqrgcWjDsguTzJ95M8nGRrklXd/guT3J3k0e7ra0aNuSVJJ8kjSRaf6C8lSToxkw6FJP8Q+BfAt7t9E+16Ogh8rKp+E7gauDHJAuBmYENVzWdk1nFz9wMWAMuAq4DrgC8mmfHr/DKSpJMz2VBYBdwCfL2qtia5Evj+eAOqamdV3dd9/xzwMDAALAXu7K52J/Ce7vulwF1Vtb+qHgc6wKJf43eRJJ2kSR1orqp7GTmucKT9GPCRyX5IknnAW4CfApdU1c7udnYmeW13tQHgJ6OGDXf7Xr6tFcAKgLlz5062BEnSJEz27KM3AH8IzBs9pqreMYmxrwL+EvhoVT2b5LirjtFXx3RUrQXWAgwNDR2zXJJ04iZ7ncKfA38C/Cfg0GQ3nmQmI4HwZ1X1V93uXUnmdGcJc4Dd3f5h4PJRwy8DnprsZ0mSTt5kjykcrKovVdXGqtp85Ge8ARmZEnwFeLiqPjdq0Xpgeff9cuAbo/qXJTknyRXAfGDjpH8TSdJJm+xM4ZtJ/jXwdWD/kc6qGu/Sz7cBHwAeSLKl2/dx4NPAuiQfAp4A3tvd1tYk64CHGDlz6caqmvSsRJJ08iYbCke+2f+7UX0FXHm8AVX1I8Y+TgDwzuOMWQ2snmRNkqRTbLJnH13R60IkSe0bNxSSvKOqvpfk98ZaPurgsSTpDDDRTOG3ge8BvzvGsgIMBUk6g4wbClX1ie7rB6emHElSmybaffRvx1v+slNNJUmnuYl2H30G2AJ8l5FTUY97ObIk6fQ3USi8lZE7l/5jYDPwNUbucHpG3F5ix44dzNj3f32oio4yY99eduw42HYZUivGvaK5qrZU1c1VtZCRq5OXAg8l+adTUZwkaWpN9oZ4sxm5y+nfZ+QeRbvHH3F6GBgY4P/sP9vHceoos37xHQYGLmm7DKkVEx1o/iDwz4Bzgb8A/qCqzohAkCQda6KZwleABxi5R9Fi4F2jb31dVe5GkqQzyESh8PYpqUKSNC1MdPHa/wBI8k+A71TV4SmpSpLUisk+T2EZ8GiSP07ym70sSJLUnkmFQlW9n5Gzj/4W+NMkP06yIsn5Pa1OkjSlJjtToKqeZeTRmncBc4DrgfuSrOxRbZKkKTapUEjyu0m+zsgdU2cCi6pqCfBm4A97WJ8kaQpN9slr7wU+X1X3ju6sqn1J/uWpL0uS1IbJPnnthnGWbTh15UiS2jTZ3UdXJ/lZkueTHEhyKMmzE4y5I8nuJA+O6vtkkh1JtnR/3j1q2S1JOkkeSbL4xH8lSdKJmuzuo//IyGmpfw4MATcAgxOM+Wp33H9+Wf/nq+ozozuSLOhu/yrgUuCeJG+oqkOTrE864zzx/Aw+dd8FbZfRul37Rr67XvJKL5N64vkZzO/xZ0w2FKiqTpIZ3X+o/zTJ/5xg/XuTzJvk5pcCd1XVfuDxJB1gEfDjydYnnUkGByf6ztU/DnQ6AJzzG/43mU/v/zYmGwr7krwC2JLkj4GdwHkn+Jk3JbkB2AR8rKqeAQaAn4xaZ7jbd4wkK4AVAHPnzj3BEqTpbeVKz/Q+YtWqVQDcdtttLVfSHyZ7ncIHuuveBLwAXA78/gl83peA1wMLGQmWz3b7x3qi25gP8qmqtVU1VFVDs2fPPoESJEnHM9mzj7Z3n6lAVd16oh9WVbuOvE/yZeBb3eYwI0FzxGXAUyf6OZKkEzPR8xQCfIKRGUKAs5IcBNZU1R/9uh+WZE5V7ew2rweOnJm0HvivST7HyIHm+cDGX3f7J2LGvl/6OE7grP83cjLZ4XM9sDlj3y8BH7Kj/jTRTOGjwNuA36qqxwGSXAl8Kcm/qarPH29gkq8B1wAXJxlmJFyuSbKQkV1D24APA1TV1iTrgIeAg8CNU3HmkQfzXtLpPAfA4JX+YwiX+LehvjVRKNwAXFtVTx/pqKrHkrwf+BvguKFQVe8bo/sr46y/Glg9QT2nlAfzXuLBPEkw8YHmmaMD4Yiq2sPIPZAkSWeQiULhwAkukySdhibaffTm49zOIsC5PahHktSiiR7HOWOqCpEktW/SD9mRJJ35DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1ehYKSe5IsjvJg6P6Lkxyd5JHu6+vGbXsliSdJI8kWdyruiRJx9fLmcJXgete1nczsKGq5gMbum2SLACWAVd1x3wxiQ/4kaQp1rNQqKp7gV++rHspcGf3/Z3Ae0b131VV+6vqcaADLOpVbZKksU31MYVLqmonQPf1td3+AeDJUesNd/skSVNouhxozhh9NeaKyYokm5Js2rNnT4/LkqT+MtWhsCvJHIDu6+5u/zBw+aj1LgOeGmsDVbW2qoaqamj27Nk9LVaS+s1Uh8J6YHn3/XLgG6P6lyU5J8kVwHxg4xTXJkl97+xebTjJ14BrgIuTDAOfAD4NrEvyIeAJ4L0AVbU1yTrgIeAgcGNVHepVbZKksfUsFKrqfcdZ9M7jrL8aWN2reiRJE5suB5olSdOAoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJapzdxocm2QY8BxwCDlbVUJILgf8GzAO2AX9QVc+0UZ8k9as2Zwpvr6qFVTXUbd8MbKiq+cCGbluSNIWm0+6jpcCd3fd3Au9prxRJ6k9thUIBf5Nkc5IV3b5LqmonQPf1tWMNTLIiyaYkm/bs2TNF5UpSf2grFN5WVW8FlgA3JvlHkx1YVWuraqiqhmbPnt27CvvMvn37eOCBB+h0Om2XIqlFrYRCVT3Vfd0NfB1YBOxKMgeg+7q7jdr61fbt2zl8+DC33npr26VIatGUn32U5DzgrKp6rvv+XcAfAeuB5cCnu6/fmOra2rJmzZpWv6Hv27ePAwcOAPDkk0+yYsUKZs2a1Vo9g4ODrFy5srXPl/pZGzOFS4AfJfk5sBH4dlX9d0bC4NokjwLXdtuaAtu3bz+qvW3btnYKkdS6KZ8pVNVjwJvH6N8LvHOq65kO2v5WfM011xzVPnDgALfddls7xUhq1XQ6JVWS1DJDQZLUMBQkSQ1DQZLUMBTEeeedN25bUv8wFMQLL7wwbltS/zAUxNlnnz1uW1L/MBTEwYMHx21L6h+Ggpg3b964bUn9w1AQN9xww1Ht5cuXt1SJpLYZCuKOO+4Yty2pfxgKYnh4+Kj2k08+2VIlktpmKIgk47Yl9Q9DQVx99dXjtiX1D0NBnH/++Ue1L7jggpYqkdQ2Q0H88Ic/PKp97733tlSJpLYZCuKiiy4aty2pfxgKYufOneO2JfUPQ0GS1Jh2oZDkuiSPJOkkubntevrBpZdeOm5bUv+YVqGQZAbwBWAJsAB4X5IF7VZ15tuzZ8+4bUn9Y7rdI3kR0KmqxwCS3AUsBR5qtaoz3Ote9zq2bdt2VFsCWLNmDZ1Op9Uajnz+qlWrWq0DYHBwkJUrV7ZdRk9Nq5kCMACMvsfCcLevkWRFkk1JNvmN9tTYtWvXuG2pTbNmzWLWrFltl9E3pttMYaz7K9RRjaq1wFqAoaGhGmN9/ZquvfZavvnNb1JVJOFd73pX2yVpmjjTvxXrWNNtpjAMXD6qfRnwVEu19I3ly5c3T1ubOXPmMbfSltQ/plso/AyYn+SKJK8AlgHrW67pjHfRRRexZMkSkrBkyRIvXpP62LTafVRVB5PcBPw1MAO4o6q2tlxWX1i+fDnbtm1zliD1uVSdvrvlh4aGatOmTW2XIUmnlSSbq2porGXTbfeRJKlFhoIkqWEoSJIahoIkqXFaH2hOsgfY3nYdZ5CLgafbLkIag3+bp9ZvVNXssRac1qGgUyvJpuOdkSC1yb/NqePuI0lSw1CQJDUMBY22tu0CpOPwb3OKeExBktRwpiBJahgKkqSGoSCSXJfkkSSdJDe3XY90RJI7kuxO8mDbtfQLQ6HPJZkBfAFYAiwA3pdkQbtVSY2vAte1XUQ/MRS0COhU1WNVdQC4C1jack0SAFV1L/DLtuvoJ4aCBoAnR7WHu32S+pChoIzR53nKUp8yFDQMXD6qfRnwVEu1SGqZoaCfAfOTXJHkFcAyYH3LNUlqiaHQ56rqIHAT8NfAw8C6qtrablXSiCRfA34M/L0kw0k+1HZNZzpvcyFJajhTkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVpAklel+SuJH+b5KEk30myIsm32q5NOtUMBWkcSQJ8HfhBVb2+qhYAHwcuOcntnn0q6pNONf8wpfG9HfhVVf3JkY6q2pLk1cA7k/wF8CZgM/D+qqok24Chqno6yRDwmaq6JskngUuBecDTSf43MBe4svv676vqP0zdryYdy5mCNL4j/+CP5S3ARxl5DsWVwNsmsb1/ACytqn/ebb8RWMzILcw/kWTmSVUrnSRDQTpxG6tquKoOA1sYmQFMZH1VvTiq/e2q2l9VTwO7OcndUtLJMhSk8W1l5Nv9WPaPen+Il3bHHuSl/7fOfdmYFya5DakVhoI0vu8B5yT5V0c6kvwW8NvjjNnGS0Hy+70rTTr1DAVpHDVyx8jrgWu7p6RuBT7J+M+cuBW4LckPGfn2L502vEuqJKnhTEGS1DAUJEkNQ0GS1DAUJEkNQ0GS1DAUJEkNQ0GS1Pj/DTAwv91M99AAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "data_churn = data.groupby('Churn').get_group(1)\n", + "data_no_churn = data.groupby('Churn').get_group(0)\n", + "\n", + "#Check how the DayMins columns for customer that churn vs those that didnt churn varies using boxplot\n", + "#sns.boxplot('DayCalls',data = data_churn)\n", + "sns.boxplot('Churn','DayMins', data = data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the boxplot above, it seems that customer that churn tends to have lower **DayMins** rate than those that wont churn. Although the **DayMins** minimum is significantly low which might not be expected if our assumption that customer customer with lower **DayMins** tends to churn, although detailed explanation about what DayMins mean was not provided. Let continue our comparison and see customer behavior as regards **DataUsage**" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEGCAYAAABvtY4XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAT+ElEQVR4nO3df3BcV3nG8efVDztxAnWyUVzHQRjqFpqhkB8igcmQNrFkZCCGlmlLmGC1hcozgG0oLaUMQ0mnpTNAW2yFTkcTCHJDk8GlaQlNhGXTOM5AATm4cUJSoqZyG9lxpI3dhNhWJO3bP3ZlS4p8tdj33Cud/X5mNNa72r33jWb9+OTsveeYuwsAEJ+6vBsAAIRBwANApAh4AIgUAQ8AkSLgASBSDXk3MNVFF13kK1euzLsNAFgw9u7dO+LuTbP9bF4F/MqVK9Xf3593GwCwYJjZgdP9jCkaAIgUAQ8AkSLgASBSBDwARIqAB5CpYrGoTZs2qVgs5t1K9Ah4AJnq6enR/v37tW3btrxbiR4BDyAzxWJRvb29cnf19vYyig+MgAeQmZ6eHpVKJUnSxMQEo/jACHgAmdm5c6fGx8clSePj4+rr68u5o7gR8AAy09raqoaG8g30DQ0Namtry7mjuBHwADLT0dGhurpy7NTX12v9+vU5dxQ3Ah5AZgqFgtrb22Vmam9vV6FQyLulqM2rxcYAxK+jo0ODg4OM3jNAwAPIVKFQ0NatW/NuoyYwRQMAkSLgASBSBDwARIqAB4BIEfAAECkCHgAiFfQySTMblPS8pAlJ4+7eEvJ8AIBTsrgO/np3H8ngPACAKZiiAYBIhQ54l7TDzPaaWedsTzCzTjPrN7P+4eHhwO0AQO0IHfDXuvuVktZK+pCZXTfzCe7e7e4t7t7S1NQUuB0AqB1BA97dD1b+fEbS3ZKuDnk+AMApwQLezM4zs5dNfi9pjaRHQp0PADBdyKtolkm628wmz/MP7t4b8HwAgCmCBby7PynpDaGODwBIxmWSABApAh4AIsWOTinr6urSwMBArj0MDQ1JklasWJFrH5K0atUqbdy4Me82gJpEwEfo+PHjebcAYB4g4FM2H0armzdvliRt2bIl504A5Ik5eACZKhaL2rRpk4rFYt6tRI+AB5Cp7u5uPfzww+ru7s67legR8AAyUywW1dfXJ0nq6+tjFB8YAQ8gM93d3SqVSpKkUqnEKD4wAh5AZnbt2pVYI10EPIDMuHtijXQR8AAys3r16ml1a2trTp3UBgIeQGY2bNigurpy7NTV1amzc9aN3pASAh5AZgqFwslRe1tbmwqFQs4dxY07WQFkasOGDXr66acZvWeAgAeQqUKhoK1bt+bdRk1gigYAIkXAA0CkCHgAmWKxsewQ8AAy1dPTo/3792vbtm15txI9Ah5AZorFonp7e+Xu6u3tZRQfGAEPIDM9PT0nFxubmJhgFB8YAQ8gMzt37tT4+LgkaXx8/OTSwQiDgAeQmbe85S2JNdJFwAPIzIkTJ6bVo6OjOXVSGwh4AJl58MEHp9V79uzJqZPaQMADyIyZJdZIV/CAN7N6M/uRmX0r9LkAzG8z14OfWSNdWYzgN0t6LIPzAJjnOjs7WQ8+Q0ED3swulfR2SbeFPA+AhaFQKKipqUmS1NTUxHrwgYUewX9R0scllU73BDPrNLN+M+sfHh4O3A6APBWLRR0+fFiSdPjwYe5kDSxYwJvZOyQ94+57k57n7t3u3uLuLZP/sgOIU1dXV2KNdIUcwV8raZ2ZDUq6S9INZnZHwPMBmOd2796dWCNdwQLe3f/E3S9195WS3iPpO+5+c6jzAZj/3D2xRrq4Dh5AZhoaGhJrpCuT36673y/p/izOBWD+mlxo7HQ10sUIHkBmuJM1WwQ8gMwwB58tAh4AIkXAA0CkCHgAiBQBDyAzkwuNna5GuvjtAsjM5Ibbp6uRLgIeACJFwANApAh4AJlZunTptPqCCy7Ip5EaQcADyMzRo0en1UeOHMmnkRpBwANApAh4AJm5+OKLp9XLli3LqZPaQMADyMzM5YHr6+tz6qQ2EPAAMnPw4MHEGuki4AFkZuXKlYk10kXAA8jMpz71qcQa6aoq4K3sZjP7dKVuNrOrw7YGIDYzr3vnOviwqh3B/62kN0u6qVI/L+lLQToCEK2enp6TC4zV1dVp27ZtOXcUt2oD/hp3/5CkE5Lk7kckLQrWFYAo7dy58+QCY6VSSX19fTl3FLdqA37MzOoluSSZWZMkloED8DNpbW2dVre1teXUSW2oNuC3Srpb0sVm9heSHpT02WBdAYjSunXrptU33nhjTp3UhqoC3t2/Junjkv5S0iFJ73L37SEbAxCfO+64I7FGuhrmfopkZhdKekbSnVMea3T3sVCNAYjP7t27E2ukq9opmockDUv6iaQnKt//t5k9ZGZXhWoOQFzcPbFGuqoN+F5Jb3P3i9y9IGmtpK9L+qDKl1ACAOaZagO+xd2/PVm4+w5J17n7v0taHKQzANFZsmRJYo10VTUHL+lZM/tjSXdV6t+WdKRy6SSXSwKoyrFjxxJrpKvaEfx7JV0q6Z8l/Yuk5spj9ZJ+a7YXmNk5ZvYDM/sPM3vUzG5JoV8ACxiLjWWr2sskR9x9o7tf4e6Xu/uH3X3Y3V9094HTvGxU0g3u/gZJl0tqN7M3pdQ3gAVo5o1N7e3tOXVSG6pdbKzJzD5vZvea2Xcmv5Je42U/rZSNlS8+Mgdq2O233z6tvu2223LqpDZUO0XzNUmPS3qVpFskDUr64VwvMrN6M9un8jX0fe7+/Vme02lm/WbWPzw8XG3fABag8fHxxBrpqjbgC+7+ZUlj7r7b3X9P0pzTLe4+4e6Xqzx/f7WZvW6W53S7e4u7tzQ1Nf0svQMAElS92Fjlz0Nm9nYzu0Ll0K6Kux+VdL8kJtwAICPVBvyfm9nPSfqYpD+UdJukjya9oDJvv7Ty/bmSWlWe5gFQo84777zEGumq6jp4d/9W5dv/k3R9lcdeLqmncq18naSvTzkOgBrEHHy2qr2K5nNm9nIzazSzXWY2YmY3J73G3R+uXFb5end/nbv/WTotA1ioli9fnlgjXdVO0axx9+ckvUPSU5J+SdIfBesKQJQOHTqUWCNd1QZ8Y+XPt0m6092fDdQPgIjV19cn1khXtWvR3GNmj0s6LumDlS37ToRrC0CMWIsmW9UuVfAJSW9WeVXJMUnHJL0zZGMAgLOTOII3s9+Y8ZCb2Yikfe7+dLi2AMTIzKZt8mFmOXYTv7mmaGbbEfdCSa83s/e7e+J6NAAwVV1dnSYmJqbVCCcx4N39d2d73MxeqfKOTteEaApAnFavXq0dO3acrFtbW3PsJn5n9M+nux/QqStrAKAqa9asSayRrjMKeDN7jcrrvQNA1W699dZpdVdXV06d1Ia5PmS9Ry9dw/1ClZchSLyTFQBmGhwcTKyRrrk+ZP3CjNolFSU94e4vhmkJQKzOOeccnThxYlqNcOb6kHV3Vo0AiN/UcJ+tRrqqXWzsTWb2QzP7qZm9aGYTZvZc6OYAAGeu2g9Zb5V0k6QnJJ0r6QOS+HQEAOaxateikbsPmFm9u09Iut3MvhuwLwDAWao24I+Z2SJJ+8zsc5IOSWIrFgA/k/r6+ml3srKaZFjVTtG8r/LcD0t6QdIrJM1cpwYAEjU2Tr8/ctGiRTl1UhuqDfh3ufsJd3/O3W9x9z9QefMPAKjazKtmjh8/nlMntaHagO+Y5bHfSbEPAEDK5rqT9SZJ75X0KjP75pQfvUzlG54AoGosF5ytuT5k/a7KH6heJOmvpjz+vKSHQzUFIE5XXnml9u7de7K+6qqrcuwmfnPdyXpA0gGVd3MCgLMyc5PtgwcP5tRJbeBOVgCZmRnoBHxY3MkKAJGqej14dx+QVO/uE+5+u6Trw7UFIEbLly+fVl9yySU5dVIbuJMVQGaOHj06rT5y5Eg+jdSIs7mT9d2hmgIQp+uuuy6xRrqqGsG7+wEza6p8f0vYlgDEauo18AgvcQRvZZ8xsxFJj0v6iZkNm9mn5zqwmb3CzP7NzB4zs0fNbHNaTQNYmPbs2TOtfuCBB3LqpDbMNUXzEUnXSnqjuxfc/QJJ10i61sw+OsdrxyV9zN1/WdKbJH3IzC4724YBLFzLli1LrJGuuaZo1ktqc/eRyQfc/Ukzu1nSDkl/c7oXuvshlT+Mlbs/b2aPSVoh6cdn3fUsurq6NDAwEOLQC87k72HzZv6nSZJWrVqljRs35t0GJB0+fDixRrrmCvjGqeE+yd2HzaxxthfMxsxWSrpC0vdn+VmnpE5Jam5urvaQLzEwMKB9jzymiSUXnvExYlH3Ynmec++T/OWpP/Zs3i1gira2Nt1zzz1yd5mZ1qxZk3dLUZsr4F88w5+dZGbnS/qGpI+4+0vufnX3bkndktTS0nJWn8BMLLlQx1/7trM5BCJz7uP35t0Cpujo6NB9992nsbExNTY2av369Xm3FLW55uDfYGbPzfL1vKRfmevglVH+NyR9zd3/KY2GASxchUJBa9eulZlp7dq1KhQKebcUtcSAd/d6d3/5LF8vc/fEKRorrwP6ZUmPuftfp9k0gIVr3bp1WrJkiW688ca8W4le1UsVnIFrVb5B6gYz21f5Yv4EqHHbt2/XCy+8oO3bt+fdSvSCBby7P+ju5u6vd/fLK19MiAI1rFgsqq+vT5LU19enYpF9g0IKOYIHgGm6u7tVKpUkSaVSSd3d3Tl3FDcCHkBmdu3alVgjXQQ8gMzMXIuGtWnCIuABZGbx4sWJNdJFwAPIzLFjxxJrpIuAB5CZJUuWJNZIFwEPIDOjo6OJNdJFwANApAh4AJlh0+1sEfAAMjPzztWRkZesRo4UEfAAMtPW1jatZj34sAh4AJnp6OhQY2N5IdpFixaxHnxgBDyAzLAefLbm2tEJAFLV0dGhwcFBRu8ZIOABZKpQKGjr1q15t1ETmKIBgEgR8AAQKQIeACJFwANApAh4AIgUAQ8gU8ViUZs2bWLD7QwQ8AAy1dPTo/3792vbtm15txI9Ah5AZorFonp7e+Xuuu+++xjFB0bAA8hMT0+PxsbGJEljY2OM4gMj4AFkpq+vT+4uSXJ37dixI+eO4kbAA8jMsmXLEmuki4AHkJnDhw8n1khXsIA3s6+Y2TNm9kiocwBYWNra2mRmkiQzY8OPwEKO4L8qqT3g8QEsMB0dHWpoKC9i29jYyJLBgQULeHd/QNKzoY4PYOFhw49s5b4evJl1SuqUpObm5py7ARAaG35kJ/cPWd29291b3L2lqakp73YABDa54Qej9/ByD3gAQBgEPABEKuRlkndK+p6k15jZU2b2/lDnAgC8VLAPWd39plDHBgDMjSkaAIgUAQ8AkSLgASBSBDwARIqAB4BIEfAAECkCHgAiRcADQKQIeACIFAEPAJEi4AEgUgQ8AESKgAeASOW+ZR+AbHR1dWlgYCDvNjQ0NCRJWrFiRa59rFq1Shs3bsy1h9AIeACZOn78eN4t1AwCHqgR82W0unnzZknSli1bcu4kfszBA0CkCHgAiBQBDwCRYg4eyMB8uYJlPpj8PUzOxde6kFfzEPBABgYGBvTEoz9S8/kTebeSu0Vj5YmD0QP9OXeSv//5aX3Q4xPwQEaaz5/QJ698Lu82MI989qGXBz0+c/AAECkCHgAiRcADQKQIeACIFAEPAJEi4AEgUkED3szazew/zWzAzD4R8lwAgOmCBbyZ1Uv6kqS1ki6TdJOZXRbqfACA6ULe6HS1pAF3f1KSzOwuSe+U9OMQJxsaGlL980Wd/9Dfhzh89UoTknu+PcwnZlJd2Lv1Ek2Ma2hoPL/zVwwNDenZow3asPuC3HoYK5lKvDVPqjOpsS7fX8johOnChqFgxw8Z8Csk/e+U+ilJ18x8kpl1SuqUpObm5jM+2dKlS+fFRgKjo6MqlUp5tzFv1NXVafHiRTl2sEhLly7N8fxl8+L9OToq8d48pa5OdYsX59rCuVLQ96d5oNGmmf2mpLe6+wcq9fskXe3up11Vp6Wlxfv7WZ8CAKplZnvdvWW2n4X8kPUpSa+YUl8q6WDA8wEApggZ8D+U9Itm9iozWyTpPZK+GfB8AIApgs3Bu/u4mX1Y0rcl1Uv6irs/Gup8AIDpgi4X7O73Sro35DkAALPjTlYAiBQBDwCRIuABIFIEPABEKtiNTmfCzIYlHci7j0hcJGkk7yaA0+D9mZ5XunvTbD+YVwGP9JhZ/+nubgPyxvszG0zRAECkCHgAiBQBH6/uvBsAEvD+zABz8AAQKUbwABApAh4AIkXAR4jNzjFfmdlXzOwZM3sk715qAQEfGTY7xzz3VUnteTdRKwj4+Jzc7NzdX5Q0udk5kDt3f0DSs3n3USsI+PjMttn5ipx6AZAjAj4+NstjXAsL1CACPj5sdg5AEgEfIzY7ByCJgI+Ou49Lmtzs/DFJX2ezc8wXZnanpO9Jeo2ZPWVm78+7p5ixVAEARIoRPABEioAHgEgR8AAQKQIeACJFwANApAh41BQz+3kzu8vM/svMfmxm95pZp5l9K+/egLQR8KgZZmaS7pZ0v7v/grtfJumTkpad5XEb0ugPSBtvTNSS6yWNufvfTT7g7vvMbKmk1Wb2j5JeJ2mvpJvd3c1sUFKLu4+YWYukL7j7r5nZZyRdImmlpBEz+4mkZkmvrvz5RXffmt1/GvBSjOBRSybDezZXSPqIymvov1rStVUc7ypJ73T391bq10p6q8pLNv+pmTWeVbfAWSLggbIfuPtT7l6StE/lkflcvunux6fU/+ruo+4+IukZneXUD3C2CHjUkkdVHnXPZnTK9xM6NX05rlN/T86Z8ZoXqjwGkAsCHrXkO5IWm9nvTz5gZm+U9KsJrxnUqX8U3h2uNSB9BDxqhpdX1vt1SW2VyyQflfQZJa+Xf4ukLWa2R+VRObBgsJokAESKETwARIqAB4BIEfAAECkCHgAiRcADQKQIeACIFAEPAJH6f/G02IcNfmwfAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "sns.boxplot('Churn','DataUsage', data = data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The boxplot above indicate people with lower **DataUsage** tends not to churn and there apear to be several outliers for people who doesnt churn and have high data usage. Detailed explanation of what **DataUsage** means was not given, therefore no significant conclusion can be made" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Pre-process Data (Decision Tree)\n", + "* Check for duplicate values and remove\n", + "* Split the data to train-test" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ChurnAccountWeeksContractRenewalDataPlanDataUsageCustServCallsDayMinsDayCallsMonthlyChargeOverageFeeRoamMins
00128112.701265.111089.09.8710.0
10107113.701161.612382.09.7813.7
20137100.000243.411452.06.0612.2
3084000.002299.47157.03.106.6
4075000.003166.711341.07.4210.1
....................................
33280192112.672156.27771.710.789.9
3329068100.343231.15756.47.679.6
3330028100.002180.810956.014.4414.1
33310184000.002213.810550.07.985.0
3332074113.700234.4113100.013.3013.7
\n", + "

3333 rows × 11 columns

\n", + "
" + ], + "text/plain": [ + " Churn AccountWeeks ContractRenewal DataPlan DataUsage \\\n", + "0 0 128 1 1 2.70 \n", + "1 0 107 1 1 3.70 \n", + "2 0 137 1 0 0.00 \n", + "3 0 84 0 0 0.00 \n", + "4 0 75 0 0 0.00 \n", + "... ... ... ... ... ... \n", + "3328 0 192 1 1 2.67 \n", + "3329 0 68 1 0 0.34 \n", + "3330 0 28 1 0 0.00 \n", + "3331 0 184 0 0 0.00 \n", + "3332 0 74 1 1 3.70 \n", + "\n", + " CustServCalls DayMins DayCalls MonthlyCharge OverageFee RoamMins \n", + "0 1 265.1 110 89.0 9.87 10.0 \n", + "1 1 161.6 123 82.0 9.78 13.7 \n", + "2 0 243.4 114 52.0 6.06 12.2 \n", + "3 2 299.4 71 57.0 3.10 6.6 \n", + "4 3 166.7 113 41.0 7.42 10.1 \n", + "... ... ... ... ... ... ... \n", + "3328 2 156.2 77 71.7 10.78 9.9 \n", + "3329 3 231.1 57 56.4 7.67 9.6 \n", + "3330 2 180.8 109 56.0 14.44 14.1 \n", + "3331 2 213.8 105 50.0 7.98 5.0 \n", + "3332 0 234.4 113 100.0 13.30 13.7 \n", + "\n", + "[3333 rows x 11 columns]" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "bool_df = data.duplicated(keep = False)\n", + "data_cl = data[~bool_df]\n", + "data_cl" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": {}, + "outputs": [], + "source": [ + "X, y = data.iloc[:, 2:], data.iloc[:,0] #Choose not to use AccountWeeks\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 19, stratify = y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Decision Tree Training and Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Training accuracy: 0.9502786112301758\n", + "Test Accuracy: 0.922\n" + ] + } + ], + "source": [ + "from sklearn.tree import DecisionTreeClassifier\n", + "clf = DecisionTreeClassifier(max_depth = 6, random_state= 9)\n", + "clf.fit(X_train, y_train)\n", + "\n", + "print(\"Training accuracy: \", clf.score(X_train, y_train))\n", + "print(\"Test Accuracy: \", clf.score(X_test, y_test))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The variance of the model is 0.03 which indicate that the model is doing well in avoiding overfitting.But, this model probably wil have overfit to the no churn label since there are significant more label 0 than 1. The best accuracy metrics to use for this is recall score or generalized F1 score, that way, we will know how our model is doing against the imbalanced dataset. Let plot confusion matrix to verify." + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy score: 0.922\n", + "Precision score: 0.8819149990638664\n", + "Recall score: 0.7797136519459569\n", + "F1 score: 0.8192285229579775\n", + " precision recall f1-score support\n", + "\n", + " 0 0.93 0.98 0.96 855\n", + " 1 0.83 0.58 0.68 145\n", + "\n", + " accuracy 0.92 1000\n", + " macro avg 0.88 0.78 0.82 1000\n", + "weighted avg 0.92 0.92 0.92 1000\n", + "\n" + ] + } + ], + "source": [ + "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report,confusion_matrix\n", + "prob = clf.predict(X_test)\n", + "print(\"Accuracy score: \", accuracy_score(y_test, prob))\n", + "print(\"Precision score: \", precision_score(y_test, prob, average = 'macro'))\n", + "print(\"Recall score: \", recall_score(y_test, prob, average = 'macro'))\n", + "print(\"F1 score: \", f1_score(y_test, prob, average = 'macro'))\n", + "print(classification_report(y_test, prob))" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(91.68, 0.5, 'Actual label')" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "cm = confusion_matrix(y_test, prob)\n", + "ax = sns.heatmap(cm, square=True, annot= True, cbar = False)\n", + "ax.set_xlabel('Predicted label', fontsize = 15)\n", + "ax.set_ylabel ('Actual label', fontsize = 15)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since we don't have enough data for churn label, the model mispredict 61 of churn as not churn. Let try XGBOOST to select the best parameters to use" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### XGBOOST" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "metadata": {}, + "outputs": [], + "source": [ + "import xgboost as xgb" + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "metadata": {}, + "outputs": [], + "source": [ + "dmatrix_train = xgb.DMatrix(data=X_train, label=y_train)\n", + "dmatrix_test = xgb.DMatrix(data=X_test, label=y_test)\n", + "\n", + "param = {'max_depth':6, \n", + " 'eta':0.3, \n", + " 'objective':'multi:softprob', \n", + " 'num_class':2}\n", + "\n", + "num_round = 6\n", + "model = xgb.train(param, dmatrix_train, num_round)\n", + "\n", + "preds = model.predict(dmatrix_test)\n", + "\n", + "best_preds = np.asarray([np.argmax(line) for line in preds])" + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Precision: 0.8803212544949875\n", + "Recall: 0.8216172615446662\n", + "Accuracy: 0.93\n", + " precision recall f1-score support\n", + "\n", + " 0 0.95 0.97 0.96 855\n", + " 1 0.82 0.67 0.73 145\n", + "\n", + " accuracy 0.93 1000\n", + " macro avg 0.88 0.82 0.85 1000\n", + "weighted avg 0.93 0.93 0.93 1000\n", + "\n" + ] + } + ], + "source": [ + "# metrics\n", + "print(\"Precision: \", (precision_score(y_test, best_preds, average='macro')))\n", + "print(\"Recall: \",(recall_score(y_test, best_preds, average='macro')))\n", + "print(\"Accuracy: \", (accuracy_score(y_test, best_preds)))\n", + "print(classification_report(y_test, best_preds))" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "cm = confusion_matrix(y_test, best_preds)\n", + "ax = sns.heatmap(cm, square=True, annot=True, cbar=False)\n", + "ax.set_xlabel('Predicted Labels',fontsize = 15)\n", + "ax.set_ylabel('True Labels',fontsize = 15)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using XGboost, there have been some trade-off and our recall score and f1 score have increased. Let search for the best hyperparameters" + ] + }, + { + "cell_type": "code", + "execution_count": 106, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tuned: {'eta': 0.05, 'gamma': 0.1, 'learning_rate': 0.01, 'max_depth': 5, 'min_child_weight': 1, 'n_estimators': 500}\n", + "Mean of the cv scores is 0.932706\n", + "Train Score 0.958423\n", + "Test Score 0.938000\n", + "Seconds used for refitting the best model on the train dataset: 3.063578\n" + ] + } + ], + "source": [ + "from xgboost.sklearn import XGBClassifier\n", + "from sklearn.model_selection import GridSearchCV, RandomizedSearchCV \n", + "\n", + "param_dict = {\n", + " 'eta': [0.05,0.10,0.20,0.25,0.30],\n", + " 'gamma': [0.0, 0.1, 0.2, 0.4],\n", + " 'max_depth':range(3,10,2),\n", + " 'min_child_weight':range(1,6,2),\n", + " 'learning_rate': [0.001,0.01,0.1,1],\n", + " 'n_estimators': [200,500,1000]\n", + " \n", + "}\n", + "\n", + "xgc = XGBClassifier()\n", + "\n", + "clf = GridSearchCV(xgc,param_dict,cv=2,n_jobs = -1).fit(X_train,y_train)\n", + "\n", + "print(\"Tuned: {}\".format(clf.best_params_)) \n", + "print(\"Mean of the cv scores is {:.6f}\".format(clf.best_score_))\n", + "print(\"Train Score {:.6f}\".format(clf.score(X_train,y_train)))\n", + "print(\"Test Score {:.6f}\".format(clf.score(X_test,y_test)))\n", + "print(\"Seconds used for refitting the best model on the train dataset: {:.6f}\".format(clf.refit_time_))" + ] + }, + { + "cell_type": "code", + "execution_count": 159, + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "feature_names mismatch: ['ContractRenewal', 'DataPlan', 'DataUsage', 'CustServCalls', 'DayMins', 'DayCalls', 'MonthlyCharge', 'OverageFee', 'RoamMins'] ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9']\nexpected RoamMins, MonthlyCharge, DataPlan, DayMins, OverageFee, DayCalls, CustServCalls, ContractRenewal, DataUsage in input data\ntraining data did not have the following fields: f9, f7, f2, f3, f8, f0, f5, f6, f1, f4", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mxgb_pred\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mclf\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mX_test\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mclassification_report\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0my_test\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mxgb_pred\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[0mcm\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mconfusion_matrix\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0my_test\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mxgb_pred\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[0max\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0msns\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mheatmap\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mcm\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0msquare\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mTrue\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mannot\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mTrue\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcbar\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mFalse\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[0max\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mset_xlabel\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Predicted Labels'\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mfontsize\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m15\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\AppData\\Local\\conda\\conda\\envs\\tf\\lib\\site-packages\\sklearn\\utils\\metaestimators.py\u001b[0m in \u001b[0;36m\u001b[1;34m(*args, **kwargs)\u001b[0m\n\u001b[0;32m 117\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 118\u001b[0m \u001b[1;31m# lambda, but not partial, allows help() to work with update_wrapper\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 119\u001b[1;33m \u001b[0mout\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;32mlambda\u001b[0m \u001b[1;33m*\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m*\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 120\u001b[0m \u001b[1;31m# update the docstring of the returned function\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 121\u001b[0m \u001b[0mupdate_wrapper\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mout\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\AppData\\Local\\conda\\conda\\envs\\tf\\lib\\site-packages\\sklearn\\model_selection\\_search.py\u001b[0m in \u001b[0;36mpredict\u001b[1;34m(self, X)\u001b[0m\n\u001b[0;32m 485\u001b[0m \"\"\"\n\u001b[0;32m 486\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_check_is_fitted\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'predict'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 487\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mbest_estimator_\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mX\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 488\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 489\u001b[0m \u001b[1;33m@\u001b[0m\u001b[0mif_delegate_has_method\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdelegate\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'best_estimator_'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'estimator'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\AppData\\Local\\conda\\conda\\envs\\tf\\lib\\site-packages\\xgboost\\sklearn.py\u001b[0m in \u001b[0;36mpredict\u001b[1;34m(self, data, output_margin, ntree_limit, validate_features, base_margin)\u001b[0m\n\u001b[0;32m 896\u001b[0m \u001b[0moutput_margin\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0moutput_margin\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 897\u001b[0m \u001b[0mntree_limit\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mntree_limit\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 898\u001b[1;33m validate_features=validate_features)\n\u001b[0m\u001b[0;32m 899\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0moutput_margin\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 900\u001b[0m \u001b[1;31m# If output_margin is active, simply return the scores\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\AppData\\Local\\conda\\conda\\envs\\tf\\lib\\site-packages\\xgboost\\core.py\u001b[0m in \u001b[0;36mpredict\u001b[1;34m(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs, pred_interactions, validate_features, training)\u001b[0m\n\u001b[0;32m 1362\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1363\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mvalidate_features\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1364\u001b[1;33m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_validate_features\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1365\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1366\u001b[0m \u001b[0mlength\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mc_bst_ulong\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\AppData\\Local\\conda\\conda\\envs\\tf\\lib\\site-packages\\xgboost\\core.py\u001b[0m in \u001b[0;36m_validate_features\u001b[1;34m(self, data)\u001b[0m\n\u001b[0;32m 1934\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1935\u001b[0m raise ValueError(msg.format(self.feature_names,\n\u001b[1;32m-> 1936\u001b[1;33m data.feature_names))\n\u001b[0m\u001b[0;32m 1937\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1938\u001b[0m def get_split_value_histogram(self, feature, fmap='', bins=None,\n", + "\u001b[1;31mValueError\u001b[0m: feature_names mismatch: ['ContractRenewal', 'DataPlan', 'DataUsage', 'CustServCalls', 'DayMins', 'DayCalls', 'MonthlyCharge', 'OverageFee', 'RoamMins'] ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9']\nexpected RoamMins, MonthlyCharge, DataPlan, DayMins, OverageFee, DayCalls, CustServCalls, ContractRenewal, DataUsage in input data\ntraining data did not have the following fields: f9, f7, f2, f3, f8, f0, f5, f6, f1, f4" + ] + } + ], + "source": [ + "xgb_pred = clf.predict(X_test)\n", + "print(classification_report(y_test, xgb_pred))\n", + "cm = confusion_matrix(y_test, xgb_pred)\n", + "ax = sns.heatmap(cm, square=True, annot=True, cbar=False)\n", + "ax.set_xlabel('Predicted Labels',fontsize = 15)\n", + "ax.set_ylabel('True Labels',fontsize = 15)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Hyperparameter tuning was worth the time as both the recall and f1 score improved compared to choosing hyperparameter manually." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Logistic Regression\n", + "Logisitic regression will require us to standardize our dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 136, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0.41167182, 0.67648946, 0.32758048, ..., 1.99072703, 0.0715836 ,\n", + " 0.08500823],\n", + " [0.41167182, 0.14906505, 0.32758048, ..., 1.56451025, 0.10708191,\n", + " 1.24048169],\n", + " [0.41167182, 0.9025285 , 0.32758048, ..., 0.26213309, 1.57434567,\n", + " 0.70312091],\n", + " ...,\n", + " [0.41167182, 1.83505538, 0.32758048, ..., 0.01858065, 1.73094204,\n", + " 1.3837779 ],\n", + " [0.41167182, 2.08295458, 3.05268496, ..., 0.38390932, 0.81704825,\n", + " 1.87621082],\n", + " [0.41167182, 0.67974475, 0.32758048, ..., 2.66049626, 1.28129669,\n", + " 1.24048169]])" + ] + }, + "execution_count": 136, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from scipy import stats\n", + "import numpy as np\n", + "z = np.abs(stats.zscore(data))\n", + "z" + ] + }, + { + "cell_type": "code", + "execution_count": 137, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "414\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
indexChurnAccountWeeksContractRenewalDataPlanDataUsageCustServCallsDayMinsDayCallsMonthlyChargeOverageFeeRoamMins
000128112.701265.111089.09.8710.0
110107113.701161.612382.09.7813.7
220137100.000243.411452.06.0612.2
360121112.033218.28887.317.437.5
480117100.191184.59763.917.588.7
.......................................
29143327079100.002134.79840.09.4911.8
291533280192112.672156.27771.710.789.9
29163329068100.343231.15756.47.679.6
29173330028100.002180.810956.014.4414.1
29183332074113.700234.4113100.013.3013.7
\n", + "

2919 rows × 12 columns

\n", + "
" + ], + "text/plain": [ + " index Churn AccountWeeks ContractRenewal DataPlan DataUsage \\\n", + "0 0 0 128 1 1 2.70 \n", + "1 1 0 107 1 1 3.70 \n", + "2 2 0 137 1 0 0.00 \n", + "3 6 0 121 1 1 2.03 \n", + "4 8 0 117 1 0 0.19 \n", + "... ... ... ... ... ... ... \n", + "2914 3327 0 79 1 0 0.00 \n", + "2915 3328 0 192 1 1 2.67 \n", + "2916 3329 0 68 1 0 0.34 \n", + "2917 3330 0 28 1 0 0.00 \n", + "2918 3332 0 74 1 1 3.70 \n", + "\n", + " CustServCalls DayMins DayCalls MonthlyCharge OverageFee RoamMins \n", + "0 1 265.1 110 89.0 9.87 10.0 \n", + "1 1 161.6 123 82.0 9.78 13.7 \n", + "2 0 243.4 114 52.0 6.06 12.2 \n", + "3 3 218.2 88 87.3 17.43 7.5 \n", + "4 1 184.5 97 63.9 17.58 8.7 \n", + "... ... ... ... ... ... ... \n", + "2914 2 134.7 98 40.0 9.49 11.8 \n", + "2915 2 156.2 77 71.7 10.78 9.9 \n", + "2916 3 231.1 57 56.4 7.67 9.6 \n", + "2917 2 180.8 109 56.0 14.44 14.1 \n", + "2918 0 234.4 113 100.0 13.30 13.7 \n", + "\n", + "[2919 rows x 12 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "2919" + ] + }, + "execution_count": 137, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "outliers = list(set(np.where(z > 3)[0]))\n", + "\n", + "print(len(outliers))\n", + "\n", + "new_data = data.drop(outliers,axis = 0).reset_index(drop = False)\n", + "display(new_data)\n", + "\n", + "y_new = y[list(new_data[\"index\"])]\n", + "len(y_new)" + ] + }, + { + "cell_type": "code", + "execution_count": 143, + "metadata": {}, + "outputs": [], + "source": [ + "X_new = new_data.drop(['index', 'Churn'], axis = 1)\n", + "\n", + "from sklearn.preprocessing import StandardScaler\n", + "X_scaled = StandardScaler().fit_transform(X_new)" + ] + }, + { + "cell_type": "code", + "execution_count": 156, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Training accuracy: 0.8962310327949095\n", + "Test accuracy: 0.910958904109589\n" + ] + } + ], + "source": [ + "from sklearn.linear_model import LogisticRegressionCV\n", + "from sklearn.model_selection import train_test_split\n", + "X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_new, test_size = 0.3, random_state = 19, stratify = y_new)\n", + "model = LogisticRegressionCV(cv = 3,solver = 'sag', max_iter = 1000, random_state = 9)\n", + "model.fit(X_train, y_train)\n", + "\n", + "print(\"Training accuracy: \", model.score(X_train, y_train))\n", + "print(\"Test accuracy: \", model.score(X_test, y_test))" + ] + }, + { + "cell_type": "code", + "execution_count": 157, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Precision: 0.8795475113122172\n", + "Recall: 0.6120192307692308\n", + "Accuracy: 0.910958904109589\n", + " precision recall f1-score support\n", + "\n", + " 0 0.91 0.99 0.95 780\n", + " 1 0.85 0.23 0.36 96\n", + "\n", + " accuracy 0.91 876\n", + " macro avg 0.88 0.61 0.66 876\n", + "weighted avg 0.91 0.91 0.89 876\n", + "\n" + ] + } + ], + "source": [ + "preds = model.predict(X_test)\n", + "print(\"Precision: \", (precision_score(y_test, preds, average='macro')))\n", + "print(\"Recall: \",(recall_score(y_test, preds, average='macro')))\n", + "print(\"Accuracy: \", (accuracy_score(y_test, preds)))\n", + "print(classification_report(y_test, preds))" + ] + }, + { + "cell_type": "code", + "execution_count": 158, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(91.68, 0.5, 'Actual Value')" + ] + }, + "execution_count": 158, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "cmx = confusion_matrix(y_test, preds)\n", + "ax = sns.heatmap(cmx, square= True, annot= True, cbar= False)\n", + "ax.set_xlabel(\"Predicted Value\", fontsize = 15)\n", + "ax.set_ylabel(\"Actual Value\", fontsize = 15)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Logistic regression performed worse on the recall and f1 score which indicated that logistic regression is best used when we don't have imbalanced dataset. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Evaluation\n", + "Choosing a better model is taskful but data processing and analysis is more taskful. After carefully training the data using Decision Tree, XGboost and Logistic Regression, it was evident that XGBoost performed the best due to the fact that it is able to run several trees and developed on error from previous tree. Using the grid search from the scikit learn library to tune hyperparameters gives the best result and finally XGBoost model was selected. The dataset was biased i.e. it is unbalanced, therefore most model will overfit to the largest number of cases which in this case was 'Customer not churn (0)'. The best performing model was able to predict true positive of 98 and false negative of 47 leading to a low recall. \n", + "* Others:\n", + "* False positive: 15\n", + "* True negative: 8.4e02. \n", + "* The model can generally be improved if more positive case is provided in the dataset i.e. more data will improve the model." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Preprocessing\n", + "\n", + "- Are there any duplicated values?\n", + "- Do we need to do feature scaling?\n", + "- Do we need to generate new features?\n", + "- Split Train and Test dataset. (0.7/0.3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ML Application\n", + "\n", + "- Define models.\n", + "- Fit models.\n", + "- Evaluate models for both train and test dataset.\n", + "- Generate Confusion Matrix and scores of Accuracy, Recall, Precision and F1-Score.\n", + "- Analyse occurrence of overfitting and underfitting. If there is any of them, try to overcome it within a different section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Evaluation\n", + "\n", + "- Select the best performing model and write your comments about why choose this model.\n", + "- Analyse results and make comment about how you can improve model." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/Project/churn.csv b/My Project/churn.csv similarity index 100% rename from Project/churn.csv rename to My Project/churn.csv diff --git a/Project/09-11-2020 ML Course Nigeria Project 'name'.ipynb b/Project/09-11-2020 ML Course Nigeria Project 'name'.ipynb deleted file mode 100644 index 1856d7e..0000000 --- a/Project/09-11-2020 ML Course Nigeria Project 'name'.ipynb +++ /dev/null @@ -1,331 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Project\n", - "\n", - "In this project, our aim is to building a model for predicting churn. Churn is the percentage of customers that stopped using your company's product or service during a certain time frame. Thus, in the given dataset, our label will be `Churn` column.\n", - "\n", - "## Steps\n", - "- Read the `churn.csv` file and describe it.\n", - "- Make at least 4 different analysis on Exploratory Data Analysis section.\n", - "- Pre-process the dataset to get ready for ML application. (Check missing data and handle them, can we need to do scaling or feature extraction etc.)\n", - "- Define appropriate evaluation metric for our case (classification).\n", - "- Train and evaluate Logistic Regression, Decision Trees and one other appropriate algorithm which you can choose from scikit-learn library.\n", - "- Is there any overfitting and underfitting? Interpret your results and try to overcome if there is any problem in a new section.\n", - "- Create confusion metrics for each algorithm and display Accuracy, Recall, Precision and F1-Score values.\n", - "- Analyse and compare results of 3 algorithms.\n", - "- Select best performing model based on evaluation metric you chose on test dataset.\n", - "\n", - "\n", - "Good luck :)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "

Your Name

" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Data" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "import seaborn as sns\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
ChurnAccountWeeksContractRenewalDataPlanDataUsageCustServCallsDayMinsDayCallsMonthlyChargeOverageFeeRoamMins
00128112.71265.111089.09.8710.0
10107113.71161.612382.09.7813.7
20137100.00243.411452.06.0612.2
3084000.02299.47157.03.106.6
4075000.03166.711341.07.4210.1
\n", - "
" - ], - "text/plain": [ - " Churn AccountWeeks ContractRenewal DataPlan DataUsage CustServCalls \\\n", - "0 0 128 1 1 2.7 1 \n", - "1 0 107 1 1 3.7 1 \n", - "2 0 137 1 0 0.0 0 \n", - "3 0 84 0 0 0.0 2 \n", - "4 0 75 0 0 0.0 3 \n", - "\n", - " DayMins DayCalls MonthlyCharge OverageFee RoamMins \n", - "0 265.1 110 89.0 9.87 10.0 \n", - "1 161.6 123 82.0 9.78 13.7 \n", - "2 243.4 114 52.0 6.06 12.2 \n", - "3 299.4 71 57.0 3.10 6.6 \n", - "4 166.7 113 41.0 7.42 10.1 " - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Read csv\n", - "data = pd.read_csv(\"churn.csv\")\n", - "data.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Describe our data for each feature and use .info() for get information about our dataset\n", - "# Analys missing values" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Exploratory Data Analysis" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAPvklEQVR4nO3df6zddX3H8edL6o9saixpQWw7S0xdVvcD9A7JWDKdkV/JUn9MA4tSGVn9AxZNzBL0j8E0JGZDjTrCUmMFjEqIyuxMI9ZOp25Te2saoFTCHTJ6bUev1oCbylZ874/zveHQ3ns/p5eee245z0fyzfl+39/P93vel1x48f15U1VIkrSQZ426AUnS8mdYSJKaDAtJUpNhIUlqMiwkSU0rRt3AMKxatarWr18/6jYk6ZSyZ8+eH1fV6rnWPSPDYv369UxOTo66DUk6pST5z/nWeRpKktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLU9Ix8gvtkeNVf3TbqFrQM7fm7K0bdgjQSHllIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqWloYZFkXZKvJ9mfZF+Sd3X165P8KMnebrq0b5v3JplKcn+Si/rqF3e1qSTXDqtnSdLcVgxx30eB91TV95O8ANiTZGe37iNVdWP/4CQbgcuAVwAvAb6W5OXd6puA1wPTwO4k26vqviH2LknqM7SwqKpDwKFu/mdJ9gNrFthkE3B7VT0O/DDJFHBet26qqh4ESHJ7N9awkKQlsiTXLJKsB84FvtuVrklyd5JtSVZ2tTXAgb7NprvafPVjv2NLkskkkzMzMyf5J5Ck8Tb0sEjyfOALwLur6jHgZuBlwDn0jjw+NDt0js1rgfpTC1Vbq2qiqiZWr159UnqXJPUM85oFSZ5NLyg+U1VfBKiqR/rWfwL4crc4Dazr23wtcLCbn68uSVoCw7wbKsAngf1V9eG++ll9w94I3NvNbwcuS/LcJGcDG4DvAbuBDUnOTvIcehfBtw+rb0nS8YZ5ZHEB8HbgniR7u9r7gMuTnEPvVNJDwDsBqmpfkjvoXbg+ClxdVU8AJLkGuAs4DdhWVfuG2Lck6RjDvBvq28x9vWHHAtvcANwwR33HQttJkobLJ7glSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUNLSySrEvy9ST7k+xL8q6ufnqSnUke6D5XdvUk+ViSqSR3J3ll3742d+MfSLJ5WD1LkuY2zCOLo8B7quq3gPOBq5NsBK4FdlXVBmBXtwxwCbChm7YAN0MvXIDrgFcD5wHXzQaMJGlpDC0squpQVX2/m/8ZsB9YA2wCbu2G3Qq8oZvfBNxWPd8BXpTkLOAiYGdVHamqnwI7gYuH1bck6XhLcs0iyXrgXOC7wJlVdQh6gQKc0Q1bAxzo22y6q81XP/Y7tiSZTDI5MzNzsn8ESRprQw+LJM8HvgC8u6oeW2joHLVaoP7UQtXWqpqoqonVq1cvrllJ0pyGGhZJnk0vKD5TVV/syo90p5foPg939WlgXd/ma4GDC9QlSUtkmHdDBfgksL+qPty3ajswe0fTZuBLffUruruizgce7U5T3QVcmGRld2H7wq4mSVoiK4a47wuAtwP3JNnb1d4HfBC4I8lVwMPAW7p1O4BLgSng58CVAFV1JMkHgN3duPdX1ZEh9i1JOsbQwqKqvs3c1xsAXjfH+AKunmdf24BtJ687SdKJ8AluSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpaaCwSLJrkJok6ZlpxUIrkzwP+DVgVZKVQLpVLwReMuTeJEnLxIJhAbwTeDe9YNjDk2HxGHDTEPuSJC0jC4ZFVX0U+GiSv6yqjy9RT5KkZaZ1ZAFAVX08yR8A6/u3qarbhtSXJGkZGSgsknwaeBmwF3iiKxdgWEjSGBgoLIAJYGNV1TCbkSQtT4M+Z3Ev8OJhNiJJWr4GDYtVwH1J7kqyfXZaaIMk25IcTnJvX+36JD9KsrebLu1b994kU0nuT3JRX/3irjaV5NoT/QElSU/foKehrl/Evm8B/p7jr2t8pKpu7C8k2QhcBryC3m26X0vy8m71TcDrgWlgd5LtVXXfIvqRJC3SoHdD/cuJ7riqvplk/YDDNwG3V9XjwA+TTAHndeumqupBgCS3d2MNC0laQoO+7uNnSR7rpl8meSLJY4v8zmuS3N2dplrZ1dYAB/rGTHe1+epz9bglyWSSyZmZmUW2Jkmay0BhUVUvqKoXdtPzgDfTO8V0om6mdwvuOcAh4ENdPXOMrQXqc/W4taomqmpi9erVi2hNkjSfRb11tqr+EfjjRWz3SFU9UVW/Aj7Bk6eapoF1fUPXAgcXqEuSltCgD+W9qW/xWfSeuzjhZy6SnFVVh7rFN9K7JRdgO/DZJB+md4F7A/A9ekcWG5KcDfyI3kXwPzvR75UkPT2D3g31J33zR4GH6F1onleSzwGvoffG2mngOuA1Sc6hFzQP0XtRIVW1L8kd9C5cHwWurqonuv1cA9wFnAZsq6p9A/YsSTpJBr0b6soT3XFVXT5H+ZMLjL8BuGGO+g5gx4l+vyTp5Bn0bqi1Se7sHrJ7JMkXkqwddnOSpOVh0Avcn6J3XeEl9G5d/aeuJkkaA4OGxeqq+lRVHe2mWwDvT5WkMTFoWPw4yduSnNZNbwN+MszGJEnLx6Bh8efAW4H/ovcw3Z8CJ3zRW5J0ahr01tkPAJur6qcASU4HbqQXIpKkZ7hBjyx+dzYoAKrqCHDucFqSJC03g4bFs/pe+jd7ZDHoUYkk6RQ36H/wPwT8W5LP03v6+q3M8QCdJOmZadAnuG9LMknv5YEB3uQfIJKk8THwqaQuHAwISRpDi3pFuSRpvBgWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqWloYZFkW5LDSe7tq52eZGeSB7rPlV09ST6WZCrJ3Ule2bfN5m78A0k2D6tfSdL8hnlkcQtw8TG1a4FdVbUB2NUtA1wCbOimLcDN0AsX4Drg1cB5wHWzASNJWjpDC4uq+iZw5JjyJuDWbv5W4A199duq5zvAi5KcBVwE7KyqI1X1U2AnxweQJGnIlvqaxZlVdQig+zyjq68BDvSNm+5q89WPk2RLkskkkzMzMye9cUkaZ8vlAnfmqNUC9eOLVVuraqKqJlavXn1Sm5OkcbfUYfFId3qJ7vNwV58G1vWNWwscXKAuSVpCSx0W24HZO5o2A1/qq1/R3RV1PvBod5rqLuDCJCu7C9sXdjVJ0hJaMawdJ/kc8BpgVZJpenc1fRC4I8lVwMPAW7rhO4BLgSng58CVAFV1JMkHgN3duPdX1bEXzSVJQza0sKiqy+dZ9bo5xhZw9Tz72QZsO4mtSZJO0HK5wC1JWsYMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoaSVgkeSjJPUn2Jpnsaqcn2Znkge5zZVdPko8lmUpyd5JXjqJnSRpnozyyeG1VnVNVE93ytcCuqtoA7OqWAS4BNnTTFuDmJe9UksbccjoNtQm4tZu/FXhDX/226vkO8KIkZ42iQUkaV6MKiwK+mmRPki1d7cyqOgTQfZ7R1dcAB/q2ne5qT5FkS5LJJJMzMzNDbF2Sxs+KEX3vBVV1MMkZwM4kP1hgbOao1XGFqq3AVoCJiYnj1kuSFm8kYVFVB7vPw0nuBM4DHklyVlUd6k4zHe6GTwPr+jZfCxxc0oalZebh9//OqFvQMvQbf33P0Pa95Kehkvx6khfMzgMXAvcC24HN3bDNwJe6+e3AFd1dUecDj86erpIkLY1RHFmcCdyZZPb7P1tVX0myG7gjyVXAw8BbuvE7gEuBKeDnwJVL37IkjbclD4uqehD4vTnqPwFeN0e9gKuXoDVJ0jyW062zkqRlyrCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktR0yoRFkouT3J9kKsm1o+5HksbJKREWSU4DbgIuATYClyfZONquJGl8nBJhAZwHTFXVg1X1v8DtwKYR9yRJY2PFqBsY0BrgQN/yNPDq/gFJtgBbusX/TnL/EvU2DlYBPx51E8tBbtw86hZ0PH8/Z12Xp7uHl8634lQJi7n+CdRTFqq2AluXpp3xkmSyqiZG3Yc0F38/l8apchpqGljXt7wWODiiXiRp7JwqYbEb2JDk7CTPAS4Dto+4J0kaG6fEaaiqOprkGuAu4DRgW1XtG3Fb48TTe1rO/P1cAqmq9ihJ0lg7VU5DSZJGyLCQJDUZFlqQr1nRcpRkW5LDSe4ddS/jwrDQvHzNipaxW4CLR93EODEstBBfs6Jlqaq+CRwZdR/jxLDQQuZ6zcqaEfUiaYQMCy2k+ZoVSePBsNBCfM2KJMCw0MJ8zYokwLDQAqrqKDD7mpX9wB2+ZkXLQZLPAf8O/GaS6SRXjbqnZzpf9yFJavLIQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFtEhJXpzk9iT/keS+JDuSbEny5VH3Jp1shoW0CEkC3Al8o6peVlUbgfcBZz7N/Z4Sf+pY48ewkBbntcD/VdU/zBaqai/wLeD5ST6f5AdJPtMFC0keSrKqm59I8o1u/vokW5N8FbgtyTuSfDHJV5I8kORvl/ynk47h/8VIi/PbwJ551p0LvILee7T+FbgA+HZjf68C/rCqfpHkHcA53X4eB+5P8vGqOrDQDqRh8shCOvm+V1XTVfUrYC+wfoBttlfVL/qWd1XVo1X1S+A+4KVD6FMamGEhLc4+ekcDc3m8b/4JnjyCP8qT/84975ht/mfAfUgjYVhIi/PPwHOT/MVsIcnvA3+0wDYP8WTAvHl4rUknn2EhLUL13sD5RuD13a2z+4DrWfjvffwN8NEk36J3tCCdMnzrrCSpySMLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLU9P+BS+lzMeDnBAAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "# Our label Distribution (countplot)\n" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "# Example EDA\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Preprocessing\n", - "\n", - "- Are there any duplicated values?\n", - "- Do we need to do feature scaling?\n", - "- Do we need to generate new features?\n", - "- Split Train and Test dataset. (0.7/0.3)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# ML Application\n", - "\n", - "- Define models.\n", - "- Fit models.\n", - "- Evaluate models for both train and test dataset.\n", - "- Generate Confusion Matrix and scores of Accuracy, Recall, Precision and F1-Score.\n", - "- Analyse occurrence of overfitting and underfitting. If there is any of them, try to overcome it within a different section." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Evaluation\n", - "\n", - "- Select the best performing model and write your comments about why choose this model.\n", - "- Analyse results and make comment about how you can improve model." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.3" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -}