Clustering Algorithms Project

Clustering is one of the important algorithms in unsupervised learning of machine learning. This project aims to implement the clustering algorithms: k-means and k-means++ from scratch in pyspark.
- The dataset dimension is 4600x58
- Distance metrics- Euclidean and Manhattan
- Cost function: Sum of squared distances