Supervised Problem:
- The problem in which we are given a dataset which has target/dependent/labelled variable.
- Here we know for what we are building the model
Classification Problem:
- The problem where dataset contains categorical target variable.
KNN
- K Nearest Neighbours
- Uses similarity measure ( how much two objects are alike)
- Stores available cases and for finding label for new case, check in it's v neighbor, and says that I am one of them.
- Applications
- Recommended System (E Commerce Websites)
- Concept Search (Internet generates plethora of documents each day, to segeregate them we may use this algorithm)
My Implementation
- Algorithm
- Find the distance between new_data_instance and all existing instances
- Find nearest K points.
- Among them choose which target class occurred in majority.
Catch
- Choose approaprite value for K
Steps
- Define Distance Metric Function (Euclidian & Manhattan)
- Define NearestNeigbours Function
- Define Predict Function
-
X_train --> training data with features and target
-
X_test --> test data without target
-
K: K neighbors
Summary
- Total Attributes - 9
- Number of instances - 768
- Score (Accuracy) - 74.4%
- View Notebook