Hierarchical Clustering - An Unsupervised Learning Algorithm

- June 20, 2020

Introduction

Unsupervised learning is a type of Machine learning in which we use unlabeled data and we try to find a pattern among the data.

Clustering algorithms falls under the category of unsupervised learning. In these algorithms, we try to make different clusters among the data.

Hierarchical Clustering algorithms build a hierarchy of clusters where each node is a cluster consisting of the clusters of its children node.

To check it's implementation in Python CLICK HERE

There are various strategies in Hierarchical Clustering such as :

Divisive
Agglomerative

This type of diagram is called Dendrogram.

Divisive - It is a Top-down approach. So we start with all observations in a large cluster and break it down into smaller ones.

Agglomerative - It is the opposite of Divisive as it is a Bottom-Up approach. Here, each observation starts in its cluster and pairs of the cluster are merged as they move up the hierarchy.

(Generally, Agglomerative is used more as compared to Divisive among Data Scientists)

Example of Hierarchical Clustering - An International team of scientists led by UCLA biologists used a dendrogram to report genetic data from more than 900 dogs from 85 breeds and more than 200 wild grey wolves worldwide. They used this diagram to see the similarity in genetic data of these animals.

Source

Algorithm

The Algorithm for Hierarchical Clustering is as follow :

Create n Clusters for each data point.
Compute the Proximity matrix.
Repeat -

Merge the two closest cluster
Update the proximity matrix

Until only a single cluster remains.

Now you must be wondering what is proximity matrix??

Proximity matrix is simply an NxN dimensional matrix. Here n is the number of the training example. The matrix contains the distance between two nearest clusters.

Computing Proximity matrix and Distance between 2 clusters

There are 4 ways of computing distance between the clusters -

Single - Linkage Clustering

The minimum distance between clusters

Complete - Linkage Clustering

The maximum distance between clusters

Average - Linkage Clustering

The average distance between clusters

Centroid - Linkage Clustering

Distance between cluster centroids

Advantages and Disadvantages

Hierarchical Clustering V/S K-Means

Thanks for reading the blog. Drop your feedback and suggestions in comments. Next post will be on its implementation. So make sure to check it out.

To check it's implementation in Python CLICK HERE

Visit my website - https://chandbud.me/

Search This Blog

TechGeeks

Hierarchical Clustering - An Unsupervised Learning Algorithm

Comments

Post a Comment

Popular posts from this blog

Implementing Hierarchical Clustering - In Python Programming language

Pose Estimation || Application of Computer Vision and Deep Learning