Company Logo
  • Industries

      Industries

    • Retail and Wholesale
    • Travel and Borders
    • Fintech and Banking
    • Textile and Fashion
    • Life Science and MedTech
    • Featured

      image
    • Edge AI vs. Cloud AI: Choosing the Right Intelligence for the Right Moment
    • From lightning-fast insights at the device level to deep computation in the cloud, AI deployment is becoming more strategic than ever.

      image
    • Neuromorphic Computing: Rewiring the Future of AI
    • Inspired by the human brain, neuromorphic computing could redefine how machines think, learn, and adapt—far beyond what today’s systems can achieve.

  • Capabilities

      Capabilities

    • Agentic AI
    • Product Engineering
    • Digital Transformation
    • Browser Extension
    • Devops
    • QA Test Engineering
    • Data Science
    • Featured

      image
    • Agentic AI for RAG and LLM: Autonomous Intelligence Meets Smarter Retrieval
    • Agentic AI is making retrieval more contextual, actions more purposeful, and outcomes more intelligent.

      image
    • Agentic AI in Manufacturing: Smarter Systems, Autonomous Decisions
    • As industries push toward hyper-efficiency, Agentic AI is emerging as a key differentiator—infusing intelligence, autonomy, and adaptability into the heart of manufacturing operations.

  • Resources

      Resources

    • Insights
    • Case Studies
    • AI Readiness Guide
    • Trending Insights

      image
    • Agentic AI in Manufacturing: Smarter Systems, Autonomous Decisions
    • As industries push toward hyper-efficiency, Agentic AI is emerging as a key differentiator—infusing intelligence, autonomy, and adaptability into the heart of manufacturing operations.

      image
    • GitHub Copilot and Cursor: Redefining the Developer Experience
    • AI-powered coding tools aren’t just assistants—they’re becoming creative collaborators in software development.

  • About

      About

    • About Coditude
    • Press Releases
    • Social Responsibility
    • Women Empowerment
    • Events

    • Coditude At RSAC 2024: Leading Tomorrow's Tech.
    • Generative AI Summit Austin 2025
    • Foundation Day 2025
    • Featured

      image
    • Coditude Turns 14!
    • Celebrating People, Purpose, and Progress

      image
    • Tree Plantation Drive From Saplings to Shade
    • Coditude CSR activity at Baner Hills, where we planted 100 trees, to protect our environment and create a greener sustainable future.

  • Careers

      Careers

    • Careers
    • Internship Program
    • Company Culture
    • Featured

      image
    • Mastering Prompt Engineering in 2025
    • Techniques, Trends & Real-World Examples

      image
    • GitHub Copilot and Cursor: Redefining the Developer Experience
    • AI-powered coding tools aren’t just assistants—they’re becoming creative collaborators in software development.

  • Contact
Coditude Logo
  • Industries
    • Retail
    • Travel and Borders
    • Fintech and Banking
    • Martech and Consumers
    • Life Science and MedTech
    • Featured

      Edge AI vs. Cloud AI: Choosing the Right Intelligence for the Right Moment

      From lightning-fast insights at the device level to deep computation in the cloud, AI deployment is becoming more strategic than ever.

      Neuromorphic Computing: Rewiring the Future of AI

      Inspired by the human brain, neuromorphic computing could redefine how machines think, learn, and adapt—far beyond what today’s systems can achieve.

  • Capabilities
    • Agentic AI
    • Product Engineering
    • Digital transformation
    • Browser extension
    • Devops
    • QA Test Engineering
    • Data Science
    • Featured

      Agentic AI for RAG and LLM: Autonomous Intelligence Meets Smarter Retrieval

      Agentic AI is making retrieval more contextual, actions more purposeful, and outcomes more intelligent.

      Agentic AI in Manufacturing: Smarter Systems, Autonomous Decisions

      As industries push toward hyper-efficiency, Agentic AI is emerging as a key differentiator—infusing intelligence, autonomy, and adaptability into the heart of manufacturing operations.

  • Resources
    • Insights
    • Case studies
    • AI Readiness Guide
    • Trending Insights

      Agentic AI in Manufacturing: Smarter Systems, Autonomous Decisions

      As industries push toward hyper-efficiency, Agentic AI is emerging as a key differentiator—infusing intelligence, autonomy, and adaptability into the heart of manufacturing operations.

      GitHub Copilot and Cursor: Redefining the Developer Experience

      AI-powered coding tools aren’t just assistants—they’re becoming creative collaborators in software development.

  • About
    • About Coditude
    • Press Releases
    • Social Responsibility
    • Women Empowerment
    • Events

      Coditude At RSAC 2024: Leading Tomorrow's Tech.

      Generative AI Summit Austin 2025

      Foundation Day 2025

    • Featured

      Coditude Turns 14!

      Celebrating People, Purpose, and Progress

      Tree Plantation Drive From Saplings to Shade

      Coditude CSR activity at Baner Hills, where we planted 100 trees, to protect our environment and create a greener sustainable future.

  • Careers
    • Careers
    • Internship Program
    • Company Culture
    • Featured

      Mastering Prompt Engineering in 2025

      Techniques, Trends & Real-World Examples

      GitHub Copilot and Cursor: Redefining the Developer Experience

      AI-powered coding tools aren’t just assistants—they’re becoming creative collaborators in software development.

  • Contact

Contact Info

  • 3rd Floor, Indeco Equinox, 1/1A/7, Baner Rd, next to Soft Tech Engineers, Baner, Pune, Maharashtra 411045
  • info@coditude.com
Breadcrumb Background
  • Insights

Clustering IRIS Plant Data Using Hierarchical Clustering

Let's Talk

Python

Backend

Data Science

How Hierarchical Clustering Works?

Before seeing hierarchical clustering in action, let us first understand the theory behind the hierarchical clustering. Following are the steps that are performed during hierarchical clustering.

1. In the beginning, every data point in the dataset is treated as a cluster which means that we have N clusters at the beginning of the algorithm.

2. The distance between all the points is calculated and two points closest to each other are joined together to a form a cluster. At this point in time, the number of clusters will be N-1.

3. Next, the point which is closest to the cluster formed in step 2, will be joined to the cluster resulting in N-2 clusters,

4. Steps 2 and 3 are repeated until one big cluster is formed.

5. Finally, the big cluster is divided into K small clusters with the help of dendrograms. We will study dendrograms with the help of an example in the next section.

It is important to mention the process of calculating distance between the points and clusters. There are several ways to do so. Some of them are as follows:

1. The distance can be of any type e.g. Euclidean or Manhattan.

2. The distance can be calculated by finding the distance between the two closest points in the cluster, the two farthest points between the clusters or between the centroids of the clusters.

3. The distance can also be calculated by taking the means of all the values mentioned in step2.

A simple Example of Hierarchical Clustering

Enough of the theory, let's now see a simple example of hierarchical clustering. Before performing hierarchical clustering of for the Iris data, we will perform hierarchical clustering on some dummy data to understand the concept.

Let's first import the required libraries:

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

In the next step, we will create some dummy data.

data = np.array([[8,12], [12,17],[20,20],
           
           [25,10],[22,35],[81,65],
         
           [70,75],[55,65],[51,60],[85,93],])

Let's plot scatter plot using these data points along with their labels. Execute the following script:

points = range(1, 11)

plt.figure(figsize=(8, 6))

plt.subplots_adjust(bottom=0.2)

plt.scatter(data[:,0],data[:,1], label=’True Position’, color = ‘r’)

for point, x, y in zip(points, data[:, 0], data[:, 1]):

plt.annotate(point, xy=(x, y), xytext=(-3, 3),textcoords=’offset points’, ha=’right’, va=’bottom’)

plt.show()

In the output, you will see the following graph:

Clustering Iris Graph

Let's name the above plot as Plot1. We will apply the theory we learned in the last section to form clusters in Plot1. If we closely look at Plot1, we will see that points 1 and 2 are closest to each other, hence they will be joined to form a cluster in the first step. Next, point 3 is closest to the cluster made by joining points 1 and 2. Hence, now the cluster will contain point 1, 2 and 3. Next, point 4 is closest to the cluster, it will also be added to the cluster and finally point 5 will also be added to the cluster.

In the top right of Plot1, points 8 and 9 are closest to each other while the points 6 and 7 are closest to each other. Hence two clusters will be formed. One cluster will contain points 8 and 9 and the other cluster will contain points 6 and 7. Since the two clusters are closed to each other than point 10, hence a new cluster will be formed that will contain points 6, 7, 8 and 9. Finally, point 10 will be added to the cluster. Finally, we will have two clusters: clusters with points 1-5 and a cluster with points 6-10.

In the script above we can see from the naked eye that if we were to form two clusters from the above dataset, we would group points 1-5 in one cluster while points 6-10 in the other cluster. However, human judgment is prone to error. Furthermore, there can be hundreds and thousands of data points and in that case, we cannot make guesses with the naked eye. This is why we use hierarchical clustering.

Let's now see how dendrograms help in hierarchical clustering. Let's draw dendrograms for our clusters. We will use the “dendrogram” and “linkage” classes from the “scipy.cluster.hierarchy” module. Look at the following script:

from scipy.cluster.hierarchy import dendrogram, linkage

links = linkage(data, ‘single’)

points = range(1, 11)

plt.figure(figsize=(8, 6))

dendrogram(links, orientation=’top’, labels=points, distance_sort=’descending’, show_leaf_counts=True)

plt.show()

In the output, you should see the following figure:

Clustering Iris Graph

You can see how the two clusters are formed starting from the smallest clusters of two points. The vertical line that joins two clusters is the Euclidean distance between the two lines. Now let's see how dendrograms help in clustering the data. Suppose we want to divide data into two clusters. We will draw a horizontal line that passes through only two horizontal lines as shown in the figure below:

Clustering Iris Graph

In the above case, we will have two clusters in the output. If the horizontal line crosses three vertical lines, we will have three clusters in the output. It depends upon the threshold value of the vertical distance that you chose.

Let's now see how to do agglomerative clustering using Scikit Learn library. To do so, the “AgglomerativeClustering” class from the sklearn.cluster library is used. Look at the following script:

from sklearn.cluster import AgglomerativeClustering

groups = AgglomerativeClustering(n_clusters=2, affinity=’euclidean’, linkage=’ward’)

groups .fit_predict(data)

The output looks like this

array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0], dtype=int64)

You can see that the first five points have been clustered together while the last 5 points have been clustered together. Let's plot the clustered points:

plt.scatter(data[:,0],data[:,1], c=groups.labels_, cmap=’cool’)
Clustering Iris Graph

Hierarchical Clustering of Iris Data

Iris dataset contains plants of three different types: setosa, virginica and versicolor. The dataset contains labeled data where sepal-length, sepal-width and petal-length, petal-width of each plant is available. We will use the four attributes of the plants to cluster them into three different groups.

The following script imports the Iris dataset.

iris_data = pd.read_csv(‘https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv’)

Let's see how the dataset looks like:

iris_data.head()
Clustering Iris Graph

You can see that our dataset contains numerical values for the attributes.

Clustering is an unsupervised technique, therefore we do not require labels in our dataset. The following script removes the “species” column that contains labels, from the dataset.

iris_data.drop([‘species’], axis=1, inplace = True)

Let's now see how our dataset looks like:

Clustering Iris Graph

We will plot a pair plot to see if we can find any relation between the attributes. It can help us reduce the number of dimensions or attributes in our dataset.

Execute the following script

import seaborn as sns

sns.pairplot(iris_data)
Clustering Iris Graph

From the output, you can clearly see that there is positive correlation between the petal-length and petal-width column which is a good indicator for clustering.

We will use only these two attributes for clustering because that way, it will be easier for us to plot the data. Execute the following script to remove the sepal-length and sepal-width attributes from our dataset.

iris_data = iris_data[[‘petal_length’, ‘petal_width’]]

Let's now plot our dataset:

sns.scatterplot(x=”petal_length”, y=”petal_width”, data=iris_data)

The output looks like this:


Clustering Iris Graph

Let's now divide our data into three clusters:

from sklearn.cluster import AgglomerativeClustering

groups = AgglomerativeClustering(n_clusters=3, affinity=’euclidean’, linkage=’ward’)

groups .fit_predict(iris_data)

The output of the script above looks like this:

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

2, 2, 2, 2, 0, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2,

2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)

Finally, let's plot the data points to see three clusters:

plt.scatter(iris_data[‘petal_length’] ,iris_data[‘petal_width’], c= groups.labels_, cmap=’cool’)

The output of the script above looks like this:

Clustering Iris Graph

You can see the Iris data divided into three clusters.

Conclusion

Hierarchical clustering is one of the most popular unsupervised learning algorithms. In this article, we explained the theory behind hierarchical clustering along. Furthermore, we implemented hierarchical clustering with the help of Python's Scikit learn library to cluster Iris data.

Want Help with Data Mining and Statistics projects?

Having rich experience in data mining and statistics projects, Coditude is an ideal partner for your data mining software development requirements providing flexible and cost effective engagement models.

Reach out to us to know more about how we can help you.

Elevate UI With Coditude's Front-End Expertise

Elevate UI With Coditude's Front-End Expertise

Connect with Coditude

Chief Executive Officer

Hrishikesh Kale

Chief Executive Officer

Chief Executive OfficerLinkedin

30 mins FREE consultation

Popular Feeds

Scraping JavaScript-Rendered Web Pages with Python
August 18, 2025
Scraping JavaScript-Rendered Web Pages with Python
 Enhancing Chatbots with Advanced RAG Techniques
August 05, 2025
Enhancing Chatbots with Advanced RAG Techniques
Hello World Thunderbird Extension Tutorial
July 22, 2025
Hello World Thunderbird Extension Tutorial
Supercharging AI Agents with RAG and MCP
July 11, 2025
Supercharging AI Agents with RAG and MCP
Company Logo

We are an innovative and globally-minded IT firm dedicated to creating insights and data-driven tech solutions that accelerate growth and bring substantial changes.We are on a mission to leverage the power of leading-edge technology to turn ideas into tangible and profitable products.

Subscribe

Stay in the Loop - Get the latest insights straight to your inbox!

  • Contact
  • Privacy
  • FAQ
  • Terms
  • Linkedin
  • Instagram

Copyright © 2011 - 2025, All Right Reserved, Coditude Private Limited

There are two main approaches for clustering unlabeled data: K-Means Clustering and Hierarchical clustering. In K-Means clustering a centroid for each cluster is selected and then data points are assigned to the cluster whose centroid has the smallest distance to data points. On the other hand in hierarchical clustering, the distance between every point is calculated to form a big cluster which is then decomposed to get N number of clusters. In this article, we will see how hierarchical clustering can be used to cluster Iris Dataset.

Hierarchical clustering can be broadly categorized into two groups: Agglomerative Clustering and Divisive clustering. In the Agglomerative clustering, smaller data points are clustered together in the bottom-up approach to form bigger clusters while in Divisive clustering, bigger clustered are split to form smaller clusters. In this article, we will see agglomerative clustering.

Here's the checklist of hierarchical clustering:

How Hierarchical Clustering Works?

Simple Example of Hierarchical Clustering

Hierarchical Clustering of Iris Data