A cluster refers to a collection of data points aggregated together because of certain similarities. K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems in machine learning or data science.
The K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible. The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid.
What is K-Means Algorithm?
The k-means clustering algorithm mainly performs two tasks:
- Determines the best value for K center points or centroids by an iterative process.
- Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other clusters. The below diagram explains the working of the K-means Clustering Algorithm:
How to choose K?
The main drawback of this technique is related to ambiguity about the K number of points that should be initialized. To overcome this issue, the performance of the algorithm is calculated for different numbers of centroids.
To apply the evaluation, once the convergence occurred, the distance between each cluster centroid and the data point is calculated. Then all the calculated distances summed as a measure of performance.
As the number of cluster centroids increases, the magnitude of the objective function will be less.
#KMeansClustering #MachineLearning #Probyto #ProbytoAI
Subscribe & Follow us for latest in field of AI & Tech and stay updated!