This is a paper on k-means clustering. It talks about the results in k-means++ paper and discusses various seeding methods of center initializations.
This is a python based project. So, install python 3. You will need jupyter
and scikit-learn
packages in order to run this experiment. Install them using these commands
pip install -U jupyter
pip install -U scikit-learn
Change directory into the root project directory. Then use jupyter notebook
to open jupyter in your browser. There are two notebooks you can run for verifying the results. One is kmeans++_check
and the other is compare
. They will create output CSV files that contain the results in subdirectory out
.