Skip to main content
Home
  • Tutorials
    • Quality Assurance
    • Software Development
    • Machine Learning
    • Data Science
  • About Us
  • Contact
programsbuzz facebook programsbuzz twitter programsbuzz linkedin
  • Log in

Main navigation

  • Tutorials
    • Quality Assurance
    • Software Development
    • Machine Learning
    • Data Science
  • About Us
  • Contact

Divisive Hierarchical Clustering with K-means

Profile picture for user shiksha.dahiya
Written by shiksha.dahiya on 06/23/2021 - 21:19

Clustering is an important analysis tool in many fields, such as pattern recognition, image classification, biological sciences, marketing, city-planning, document retrievals, etc. Divisive hierarchical clustering is one of the most widely used clustering methods. Divisive hierarchical clustering with k-means is one of the efficient clustering methods among all the clustering methods. In this method, a cluster is split into k-smaller clusters under continuous iteration using k-means clustering until every element has its own cluster. Here while using k-means clustering the initial centers are taken differently as by converting the m-dimensional data into one dimensional data, then dividing one dimensional data into k parts. Sorting those one dimensional in different parts and taking the middle element id and that particular ids one dimensional element is taken as centroid, these four centroids are taken as initial four centroids for the m-dimensional data . The architecture of the divisive hierarchical clustering with K-means clearly explains that it is working.

K-Means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. This procedure in K-Means follows an easy and simple way to differentiate the given data set through a certain number of clusters (assume k clusters) fixed apriori. The main idea is to define k centers, one for each cluster. These centers should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.

The next step is to take each point belonging to a given data set and associate it to the nearest center. When no point is pending, the first step is completed and an early group age is done. At this point one needs to re-calculate k new centroids as barycenter of the clusters resulting from the previous step. After acquiring these ‘k’ new centroids, a perfectly new binding has to be built in between the same data set points and the nearest new center. A loop has been generated. As a result of this looping may noticed that the k centers change their location step by step until no more changes are done or in other words centers do not move any more.

To implement divisive hierarchical clustering algorithm with K-means and to apply Agglomerative Hierarchical Clustering on the resultant data in data mining where efficient and accurate result. In Hierarchical Clustering by finding the initial k centroids in a fixed manner instead of randomly choosing them. In which k centroids are chosen by dividing the one dimensional data of a particular cluster into k parts and then sorting those individual parts separately, then the middle elements id in each part is mapped to id of m-dimensional data. The m-dimensional elements whose ids are matched, taken as initial k centroids of any cluster. The applying the Agglomerative Hierarchical Clustering on the resultant each element has its own individual cluster, where the clusters are merger based on the centroid distance. Then finally obtaining k-clusters.

A Divisive hierarchical clustering is one of the most important tasks in data mining and this method works by grouping objects into a tree of clusters. The top-down strategy is starting with all objects in one cluster. It subdivides the clusters into smaller and smaller pieces by k-means algorithm by choosing initial k centroids in a fixed manner to get an efficient result, until each object form a cluster on its own and by applying Agglomerative Hierarchical Clustering on the result to get the efficient k cluster with high accuracy.

Algorithmic steps for k-means clustering

Suppose X= x1, x2, x3,........, xn be a set of data points given and let V=v1, v2,......., vc be a set of centers.

  • Find the initial k centroids by the above process.
  • Calculate the Euclidian distance between each data point and cluster centers.
  • Assign the data point to the cluster center whose distance from the cluster centre is least compared to all other cluster centers.
  • Recalculate the new cluster center using
  • Recalculate the distance between each data point and new obtained cluster centers.
  • If no data point was reassigned then stop, otherwise repeat from step 3)

Algorithm

  1. Initially all items belong to one cluster Ci=0.
  2.  Split Ci into sub-clusters, Ci+1 and Ci+2.
  3. Apply K-mean on Ci+1 and Ci+2.
  4. Increment the value of i.
  5. Repeat steps b, c and d until the desired cluster structure is obtained. 
  6. Node 0 containing the whole data set
  7. C1=2 input nodes 1-2.
  8. C2=3 input nodes-> 2- 4 (1 spilt into 2 sub group-3 and 4).
  9. C3=4 input nodes ->3-6 (1 spilt into 2 sub group-5 & 6).
  10. Do until Ckmax not reached where Ckmax is maximum number of clusters.

Difference between K-Means and Hierarchical clustering

  • Hierarchical clustering can’t handle big data well but K Means clustering can. This is because the time complexity of K Means is linear i.e. O(n) while that of hierarchical clustering is quadratic i.e. O(n2).
  • In K Means clustering, since we start with random choice of clusters, the results produced by running the algorithm multiple times might differ. While results are reproducible in Hierarchical clustering.
  • K Means is found to work well when the shape of the clusters is hyper spherical (like circle in 2D, sphere in 3D).
  • K Means clustering requires prior knowledge of K i.e. no. of clusters you want to divide your data into. But, you can stop at whatever number of clusters you find appropriate in hierarchical clustering by interpreting the dendrogram.
Related Content
Clustering Tutorial
Agglomerative Hierarchical Clustering
  • Log in or register to post comments

Choose Your Technology

  1. Agile
  2. Apache Groovy
  3. Apache Hadoop
  4. Apache HBase
  5. Apache Spark
  6. Appium
  7. AutoIt
  8. AWS
  9. Behat
  10. Cucumber Java
  11. Cypress
  12. DBMS
  13. Drupal
  14. GitHub
  15. GitLab
  16. GoLang
  17. Gradle
  18. HTML
  19. ISTQB Foundation
  20. Java
  21. JavaScript
  22. JMeter
  23. JUnit
  24. Karate
  25. Kotlin
  26. LoadRunner
  27. matplotlib
  28. MongoDB
  29. MS SQL Server
  30. MySQL
  31. Nightwatch JS
  32. PactumJS
  33. PHP
  34. Playwright
  35. Playwright Java
  36. Playwright Python
  37. Postman
  38. Project Management
  39. Protractor
  40. PyDev
  41. Python
  42. Python NumPy
  43. Python Pandas
  44. Python Seaborn
  45. R Language
  46. REST Assured
  47. Ruby
  48. Selenide
© Copyright By iVagus Services Pvt. Ltd. 2023. All Rights Reserved.

Footer

  • Cookie Policy
  • Privacy Policy
  • Terms of Use