Clustering Algorithms Project

Clustering is one of the important algorithms in unsupervised learning of machine learning. This project aims to implement the clustering algorithms: k-means and k-means++ from scratch in pyspark.

  • The dataset dimension is 4600x58
  • Distance metrics- Euclidean and Manhattan
  • Cost function: Sum of squared distances

Git Repository