Handling of outliers differs from case to case. In some cases, it will provide very useful information, and in some cases, it will severely affect the results of the analysis. Having said that, let’s learn about some of the issues that arise due to outliers in the K-means algorithm below.
The centroids will not be a true representation of a cluster in the presence of outliers. The sum of squared errors (SSE) will also be very high in the case of outliers. Small clusters will bond with outliers, which may not be the true representation of the natural patterns of clusters in data. Due to these reasons, outliers need to be removed before proceeding with clustering on the data.