Hierarchical Clustering - An Unsupervised Learning Algorithm
Introduction
Unsupervised learning is a type of Machine learning in which we use unlabeled data and we try to find a pattern among the data.
Clustering algorithms falls under the category of unsupervised learning. In these algorithms, we try to make different clusters among the data.
Hierarchical Clustering algorithms build a hierarchy of clusters where each node is a cluster consisting of the clusters of its children node.
To check it's implementation in Python CLICK HERE
There are various strategies in Hierarchical Clustering such as :
Divisive - It is a Top-down approach. So we start with all observations in a large cluster and break it down into smaller ones.
Agglomerative - It is the opposite of Divisive as it is a Bottom-Up approach. Here, each observation starts in its cluster and pairs of the cluster are merged as they move up the hierarchy.
(Generally, Agglomerative is used more as compared to Divisive among Data Scientists)
Example of Hierarchical Clustering - An International team of scientists led by UCLA biologists used a dendrogram to report genetic data from more than 900 dogs from 85 breeds and more than 200 wild grey wolves worldwide. They used this diagram to see the similarity in genetic data of these animals.
Algorithm
The Algorithm for Hierarchical Clustering is as follow :
- Create n Clusters for each data point.
- Compute the Proximity matrix.
- Repeat -
- Merge the two closest cluster
- Update the proximity matrix
- Until only a single cluster remains.
Proximity matrix is simply an NxN dimensional matrix. Here n is the number of the training example. The matrix contains the distance between two nearest clusters.
Computing Proximity matrix and Distance between 2 clusters
There are 4 ways of computing distance between the clusters -
- Single - Linkage Clustering
- The minimum distance between clusters
- Complete - Linkage Clustering
- The maximum distance between clusters
- Average - Linkage Clustering
- The average distance between clusters
- Centroid - Linkage Clustering
- Distance between cluster centroids
Advantages and Disadvantages
Hierarchical Clustering V/S K-Means
Thanks for reading the blog. Drop your feedback and suggestions in comments. Next post will be on its implementation. So make sure to check it out.
To check it's implementation in Python CLICK HERE
Visit my website - https://chandbud.me/
Comments
Post a Comment
If you have any questions do let me know and also give your valuable feedback for this blog.