Company Logo
  • Industries

      Industries

    • Retail and Wholesale
    • Travel and Borders
    • Fintech and Banking
    • Textile and Fashion
    • Life Science and MedTech
    • Featured

      image
    • Mastering Prompt Engineering in 2025
    • Techniques, Trends & Real-World Examples

      image
    • Edge AI vs. Cloud AI: Choosing the Right Intelligence for the Right Moment
    • From lightning-fast insights at the device level to deep computation in the cloud, AI deployment is becoming more strategic than ever.

  • Capabilities

      Capabilities

    • Agentic AI
    • Product Engineering
    • Digital Transformation
    • Browser Extension
    • Devops
    • QA Test Engineering
    • Data Science
    • Featured

      image
    • Agentic AI for RAG and LLM: Autonomous Intelligence Meets Smarter Retrieval
    • Agentic AI is making retrieval more contextual, actions more purposeful, and outcomes more intelligent.

      image
    • Agentic AI in Manufacturing: Smarter Systems, Autonomous Decisions
    • As industries push toward hyper-efficiency, Agentic AI is emerging as a key differentiator—infusing intelligence, autonomy, and adaptability into the heart of manufacturing operations.

  • Resources

      Resources

    • Insights
    • Case Studies
    • AI Readiness Guide
    • Trending Insights

      image
    • Safeguarding the Future with AI TRiSM
    • Designing Intelligent Systems That Are Trustworthy, Secure, and Accountable

      image
    • Agentic AI in Manufacturing: Smarter Systems, Autonomous Decisions
    • As industries push toward hyper-efficiency, Agentic AI is emerging as a key differentiator—infusing intelligence, autonomy, and adaptability into the heart of manufacturing operations.

  • About

      About

    • About Coditude
    • Press Releases
    • Social Responsibility
    • Women Empowerment
    • Events

    • Coditude At RSAC 2024: Leading Tomorrow's Tech.
    • Generative AI Summit Austin 2025
    • Foundation Day 2025
    • Featured

      image
    • Coditude Turns 14!
    • Celebrating People, Purpose, and Progress

      image
    • Tree Plantation Drive From Saplings to Shade
    • Coditude CSR activity at Baner Hills, where we planted 100 trees, to protect our environment and create a greener sustainable future.

  • Careers

      Careers

    • Careers
    • Internship Program
    • Company Culture
    • Featured

      image
    • Mastering Prompt Engineering in 2025
    • Techniques, Trends & Real-World Examples

      image
    • GitHub Copilot and Cursor: Redefining the Developer Experience
    • AI-powered coding tools aren’t just assistants—they’re becoming creative collaborators in software development.

  • Contact
Coditude Logo
  • Industries
    • Retail
    • Travel and Borders
    • Fintech and Banking
    • Martech and Consumers
    • Life Science and MedTech
    • Featured

      Mastering Prompt Engineering in 2025

      Techniques, Trends & Real-World Examples

      Edge AI vs. Cloud AI: Choosing the Right Intelligence for the Right Moment

      From lightning-fast insights at the device level to deep computation in the cloud, AI deployment is becoming more strategic than ever.

  • Capabilities
    • Agentic AI
    • Product Engineering
    • Digital transformation
    • Browser extension
    • Devops
    • QA Test Engineering
    • Data Science
    • Featured

      Agentic AI for RAG and LLM: Autonomous Intelligence Meets Smarter Retrieval

      Agentic AI is making retrieval more contextual, actions more purposeful, and outcomes more intelligent.

      Agentic AI in Manufacturing: Smarter Systems, Autonomous Decisions

      As industries push toward hyper-efficiency, Agentic AI is emerging as a key differentiator—infusing intelligence, autonomy, and adaptability into the heart of manufacturing operations.

  • Resources
    • Insights
    • Case studies
    • AI Readiness Guide
    • Trending Insights

      Safeguarding the Future with AI TRiSM

      Designing Intelligent Systems That Are Trustworthy, Secure, and Accountable

      Agentic AI in Manufacturing: Smarter Systems, Autonomous Decisions

      As industries push toward hyper-efficiency, Agentic AI is emerging as a key differentiator—infusing intelligence, autonomy, and adaptability into the heart of manufacturing operations.

  • About
    • About Coditude
    • Press Releases
    • Social Responsibility
    • Women Empowerment
    • Events

      Coditude At RSAC 2024: Leading Tomorrow's Tech.

      Generative AI Summit Austin 2025

      Foundation Day 2025

    • Featured

      Coditude Turns 14!

      Celebrating People, Purpose, and Progress

      Tree Plantation Drive From Saplings to Shade

      Coditude CSR activity at Baner Hills, where we planted 100 trees, to protect our environment and create a greener sustainable future.

  • Careers
    • Careers
    • Internship Program
    • Company Culture
    • Featured

      Mastering Prompt Engineering in 2025

      Techniques, Trends & Real-World Examples

      GitHub Copilot and Cursor: Redefining the Developer Experience

      AI-powered coding tools aren’t just assistants—they’re becoming creative collaborators in software development.

  • Contact

Contact Info

  • 3rd Floor, Indeco Equinox, 1/1A/7, Baner Rd, next to Soft Tech Engineers, Baner, Pune, Maharashtra 411045
  • info@coditude.com
Breadcrumb Background
  • Insights

Using Isolation Forest for Outlier Detection In Python

Let's Talk

Backend

Python

Types of Outliers

Depending upon the feature space, outliers can be of two kinds: Univariate and Multivariate. The univariate outliers are the outliers generated by manipulating the value of a single feature. Univariate outliers are visible to the naked eye when plotted on one dimensional or two-dimensional feature space. The multivariate outliers are generated by manipulating values of multiple features.

In addition to categorizing outlier by feature space, we can also group outliers by their type. There are three major types of outliers:

1.Point Outliers

Observation or data point that is too far from other data points in n-dimensional feature space. These are the simplest type of outlier

2.Contextual Outliers

Contextual outliers are the type of outliers that depend upon the context. For instance, a temperature of -5 degrees in the north of Africa during summer(June/July) is considered an anomaly while the temperature of -5 degree in Norway during December is considered normal. Hence these outliers depend upon the context.

3.Collective outliers

Collective outliers are a group of data points that occur together closely but are far away from the mean of the rest of the data points.

Reasons for Outliers

Presence of outliers in the dataset, can be attributed to several reasons. Some of them have been enlisted below:

1. Errors while performing data entry. Especially if the data is entered by a human, the chance of human error remains high.

2. Outliers generate due to an error in experimentation

3. Outliers generated during data preprocessing phase

4. Nature outliers which arise due to the behaviour of the data and aren't generated as a result of any error. These are the outliers that should be retained in the dataset.

Why Outlier Detection is Important

Outlier detection is important for two reasons. Outliers correspond to the aberrations in the dataset, outlier detection can help detect fraudulent bank transactions. Consider the scenario where most of the bank transactions of a particular customer take place from a certain geographical location. Now if a transaction of that particular customer takes place through another geographical location, the transaction will be detected as an outlier. In such cases, further checks such as one-time-pin for cell phones can be used to ensure that the actual user is executing the transaction.

Outlier detection is also important because it highly impacts the mean and standard deviation of the dataset which can result in increased classification or regression error. To train a prediction algorithm that generalizes well on the unseen data, the outliers are often removed from the training data.

Outlier Detection Using Isolation Forest

In this section, we will see how outlier detection can be performed using Isolation Forest, which is one of the most widely used algorithms for outlier detection.

A Simple Example

We will first see a very simple and intuitive example of isolation forest before moving to a more advanced example where we will see how isolation forest can be used for predicting fraudulent transactions.

We will start by importing the required libraries. Execute the following script:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.ensemble import IsolationForest

Next, we need to create a two-dimensional array that will contain our dummy dataset. Execute the following script:

X = np.array( [ [9,17], [10,15],[9,16],[11,17],[12,17],
                         
[10,21],[12,18],[13,20],[10,21],[12,13],
                        
[9,15],[14,14],[90,30],[92,28],[15,15],
                        
[13,14],[13,16],[14,16],[13,16],[15,17], ] )

After that, we will create a pandas dataframe from the two-dimensional array. The dataframe will contain two columns A and B. Run the script below:

new_data = pd.DataFrame(np.array(X), columns=["A", "B"])

Let's plot our dataset and see if we can find any outliers with the naked eye. In the script below, we increase the size of our plot and then plot the columns A and B against each other on a two-dimensional space.

import matplotlib.pyplot as plt

print(plt.rcParams.get("figure.figsize"))

fig_size = plt.rcParams[“figure.figsize”]

fig_size[0] = 10

fig_size[1] = 8

plt.rcParams[“figure.figsize”] = fig_size

new_data.plot(x='A', y='B', style='o')

Let's plot our dataset and see if we can find any outliers with the naked eye. In the script below, we increase the size of our plot and then plot the columns A and B against each other on a two-dimensional space.

import matplotlib.pyplot as plt

print(plt.rcParams.get('figure.figsize'))

fig_size = plt.rcParams[“figure.figsize”]

fig_size[0] = 10

fig_size[1] = 8

plt.rcParams[“figure.figsize”] = fig_size

new_data.plot(x='A', y='B', style='o')

In the output, you will see the following figure:

Outlier Detection Using Isolation Forest

From the naked eye, we can see that the data points at the top right i.e. points (90, 30) and (92, 28) are the outliers. Let's see if the isolation forest algorithm also declares these points as outliers or not. Look at the following script:

iso_forest = IsolationForest(n_estimators=300, contamination=0.10)

iso_forest = iso_forest .fit(new_data)

In the script above, we create an object of "IsolationForest" class and pass it our dataset. The "fit" method trains the algorithm and finds the outliers from our dataset. To find the outliers, we need to again pass our dataset to the "predict" method as shown below:

isof_outliers = iforest.predict(new_data)

The outliers are assigned a value of -1, therefore we can get actual data points by passing the result of the "predict" function to our dataset as shown below:

isoF_outliers_values = new_data[iforest.predict(new_data) == -1]

isoF_outliers_values

In the output, you should see the following result:

Outlier Detection Using Isolation Forest

The result shows that the outlier data points predicted by the isolation forest are indeed (90, 30) and (92, 28) as we discussed earlier.

Removing Outliers Can Improve Algorithm Performance

Removing outliers from the dataset can improve the performance of the algorithm in some cases. Let's now compare the performance of a machine learning algorithm for predicting the value in columns B, give the value in column A. Since the values in column B are continuous, this is a regression problem.

Execute the following script to divide the data into feature and label set:

X = new_data.drop(['B'], axis=1)

y = new_data[['B']]

Next, we need to divide our data into training and test sets:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

We will use the random forest algorithm to predict the values. You can choose any algorithm and see if you achieve better results:

from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

Next, let's see how well the algorithm performs:

from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))

print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

In the output, you should see the following results:

Mean Absolute Error: 2.2758333333333343

Mean Squared Error: 6.115945833333335

Root Mean Squared Error: 2.4730438397515995

Let's now remove the outliers from our dataset and see if we can get better results:

X_train = X_train.drop(isoF_outliers_values .index.values.tolist())

y_train = y_train.drop(isoF_outliers_values .index.values.tolist())

Now, if you again train the algorithm on training set and evaluate it on test set as shown below:

regressor = RandomForestRegressor(n_estimators=20, random_state=0)

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))

print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

In the output, you should see the following results:

Mean Absolute Error: 2.1366666666666663

Mean Squared Error: 5.925653287981859

Root Mean Squared Error: 2.434266478424632

The results show that the algorithm performs better after removing the outliers as the Mean Absolute Error, Mean Squared Error and Root Mean Squared Error have decreased after removing the outliers.

Detecting Fraudulent Credit Card Detections

One of the most common examples of anomaly detection is the detection of fraudulent credit card transactions. In this section, we will see how isolation forest algorithm can be used for detecting fraudulent transactions.

The dataset for this section can be downloaded from this kaggle link.

As a first step we need to import our dataset and drop the time column. The following script does that:

card_data = pd.read_csv('E:Datasetscreditcard.csv')

card_data = card_data .drop(['Time'] , axis=1)

Next, we will divide our dataset into normal transactions and fraudulent transactions. All the normal transactions have 0 as the value for class column, while fraudulent transactions have class 1:

fraudulent_transactions = card_data.loc[card_data['Class']==1]

normal_transactions = card_data.loc[ card_data['Class']==0]

Since, anomaly detection is a supervised learning technique, we do not need the class labels. The following script removes the class labels:

fraudulent_transactions = fraudulent_transactions .drop(['Class'] , axis=1)

normal_transactions = normal_transactions.drop(['Class'] , axis=1)

Next, we need to divide our data into three sets: a training set which will be used for training the isolation forest, the test of normal transactions, and the test set of fraudulent transactions. The following script does that:

from sklearn.model_selection import train_test_split

train_set, dev_set= train_test_split(normal_transactions, test_size=0.5, random_state=42)

test_set = np.array(fraudulent_transactions)

The next step is to train the isolation forest algorithm on the training set:

classifier = IsolationForest(max_samples=100)

classifier.fit(train_set)

Finally, we evaluate the performance of our algorithm for detecting normal and fraudulent transactions:

train_predictions = classifier.predict(train_set)

dev_predictions = classifier.predict(dev_set)

test_predictions = classifier.predict(test_set)

print(“Normal Detection Accuracy:”, list(train_predictions ).count(1)/train_predictions.shape[0])

print(“Fraudulent Detection Accuracy:”, list(test_predictions).count(-1)/test_predictions.shape[0])

In the output, you should see the following results:

Normal Detection Accuracy: 0.89999788965721

Fraudulent Detection Accuracy: 0.8821138211382114

The result shows that isolation forest has accuracy for 89.99% for detecting normal transactions and an accuracy of 88.21 percent for detecting fraudulent detection which is pretty decent.

Conclusion

Anomaly or outline detection is one of the most important Machine learning tasks. Anomaly detection has a variety of applications ranging from suspicious website login to fraudulent credit card transaction. In this article, the theory of outlier detection has been explained. Furthermore, fraudulent transaction detection has been explained as a practical example.

Build Strong Apps With Coditude's Back End

Build Strong Apps With Coditude's Back End

Connect with Coditude

Chief Executive Officer

Hrishikesh Kale

Chief Executive Officer

Chief Executive OfficerLinkedin

30 mins FREE consultation

Popular Feeds

Crawling Websites Built with Modern UI Frameworks Like React
August 25, 2025
Crawling Websites Built with Modern UI Frameworks Like React
Scraping JavaScript-Rendered Web Pages with Python
August 18, 2025
Scraping JavaScript-Rendered Web Pages with Python
 Enhancing Chatbots with Advanced RAG Techniques
August 05, 2025
Enhancing Chatbots with Advanced RAG Techniques
Hello World Thunderbird Extension Tutorial
July 22, 2025
Hello World Thunderbird Extension Tutorial
Company Logo

We are an innovative and globally-minded IT firm dedicated to creating insights and data-driven tech solutions that accelerate growth and bring substantial changes.We are on a mission to leverage the power of leading-edge technology to turn ideas into tangible and profitable products.

Subscribe

Stay in the Loop - Get the latest insights straight to your inbox!

  • Contact
  • Privacy
  • FAQ
  • Terms
  • Linkedin
  • Instagram

Copyright © 2011 - 2025, All Right Reserved, Coditude Private Limited

Isolation Forest

Anomaly, also known as an outlier is a data point which is so far away from the other data points that suspicions arise over the authenticity or the truthfulness of the dataset. Hawkins (1980) defines outliers as:

"Observation which deviates so much from other observations as to arouse suspicion it was generated by a different mechanism"

Here's the checklist of isolation forest for outlier detection:

Types of Outliers

Reasons for Outliers

Why Outlier Detection is Important

Outlier Detection Using Isolation Forest

Removing Outliers Can Improve Algorithm Performance

Fraudulent Credit Card Detections