Fraud Detection Using Machine Learning
Fraud Detection Using Machine Learning
INTRODUCTION
This technology holds the potential to minimize financial losses, protect user privacy, and enhance the overall
security of digital payment ecosystems. In this era of constant technological evolution, it is crucial for financial
institutions, finance companies,and payment service providers to implement advanced machine learning models
The threat posed by financial transaction fraud to organizations and individuals has prompted the
development of cutting-edge methods for detection and prevention. The use of real-time monitoring systems
and machine learning algorithms to improve fraud detection and prevention in financial transactions is explored
in this research study. The paper addresses the drawbacks of conventional rule-based systems, explains why
real-time monitoring and machine learning should be used, and describes the goals of the research. To
comprehend the current methodologies and pinpoint research gaps, a thorough literature study is done. The
suggested approach includes dimensionality reduction, feature engineering, data preparation, and the application
of machine learning models built into a real-time monitoring system. Results are assessed using performance
measures and contrasted with the performance of current systems. Two proactive fraud prevention techniques
under investigation are adaptive thresholds and dynamic risk scoring. Considerations for scalability and
deployment, including data security and legal compliance, are also covered. The study suggests areas for
additional research in this field and helps to design reliable fraud detection systems.
1
1.1.2 TITLE:Fraud Detection in Online Transactions Using Machine Learning.
AUTHOR:Jashandeep Singh.
This extensive research uses state-of-the-art machine learning methods to explore the complex
domain of digital banking real-time fraud detection. The goal is to drastically reduce the frequency of
fraudulent acts in online banking transactions. This project seeks to improve machine learning's ability to
detect and prevent fraudulent online purchases by reducing the number of false positives and tackling
important issues linked to data privacy. The research takes a critical look at how machine-learning
approaches are being used in the ever-changing world of online commerce. In addition to the usual suspects, it
investigates new ideas including deep reinforcement learning, the importance of financial literacy in
enhancing security measures, and unlearning for anomaly detection. This study might have a profound
impact on the digital banking industry, which is why it is important. This project aims to provide a digital
transaction environment that is safer and trustworthy by utilizing machine learning's inherent capabilities and
providing answers to current hurdles. With online transactions playing a crucial role in financial
operations, this study's findings will add to the progress being made in the rapidly evolving world of digital
2
CHAPTER 2
SYSTEM ANALYSIS
In existing systems, modern techniques such as artificial neural networks are used. Different machine learning
algorithms are used, such as autoencoders, K-means clustering, and local outlier factors. These algorithms can
be integrated into existing systems; however, a major drawback is their inability to detect fraudulent
transactions prior to payment. After the payment is successful, determine whether the transaction is valid or
2.1.1 DISADVANTAGES
Post-payment Detection: The inability to detect fraudulent transactions prior to payment means that consumers
are vulnerable to financial losses. Once the payment is made, it becomes much more challenging to recover the
funds or reverse the transaction, leading to potential financial devastation for the consumer.
Limited Preventative Measures: Since the detection of fraudulent transactions occurs after payment, there is a
lack of proactive measures to prevent fraudulent activities in real-time. This reactive approach increases the risk
of successful fraudulent transactions occurring before they are identified, potentially leading to greater losses.
Delayed Response: Detecting fraudulent transactions after payment leads to a delayed response in addressing
the issue. This delay can provide fraudsters with an opportunity to carry out additional fraudulent activities
before any action is taken, further exacerbating the financial impact on the consumer.
3
Loss of Consumer Trust: Consumers may lose trust in the system if they consistently experience fraudulent
transactions that are only detected after payment. This loss of trust can have long-term consequences for
businesses, leading to decreased customer loyalty and potentially damaging their reputation.
Manual Intervention: In many cases, identifying fraudulent transactions post-payment may require manual
intervention, such as reviewing transaction logs or contacting the consumer directly. This manual process is
time-consuming and resource-intensive, which can increase operational costs for businesses.
Increased Operational Costs: Dealing with fraudulent transactions after payment can result in increased
operational costs for businesses, including expenses related to fraud investigation, customer support, and
potential reimbursement of funds to affected consumers. These additional costs can impact the profitability of
businesses and may ultimately be passed on to consumers in the form of higher prices or fees.
In the proposed system, we are implementing algorithms such as Logistic Regression, Decision Tree,
Random Forest, KNN, Support Vector Machine, and Voting Classifier. These algorithms are used to enhance
the accuracy of identifying fraudulent individuals. One major advantage is the ability to identify fraudulent
individuals before the transaction occurs. We aim to reduce financial losses for consumers and identify
fraudsters by providing a website where users can enter suspicious numbers into a search box to verify if the
2.2.1 ADVANTAGES
4
2.3 ALGORITHM
Logistic regression is a data analysis technique that uses mathematics to find the relationships between
two data factors. It then uses this relationship to predict the value of one of those factors based on the other. The
Processor –Core i3
Monitor
Windows 7 or higher
Python 3.8+
Pycharm IDE.
Python Libraries :
5
CHAPTER-3
MODULE DESCRIPTION
A Dataset is the basic data container in PyMVPA. It serves as the primary form of data storage, but also
This Open Access web version of Python for Data Analysis 3rd Edition is now available as a companion
to the print and digital editions. If you encounter any errata, please report them here. Please note that some
aspects of this site as produced by Quarto will differ from the formatting of the print and eBook versions from
O’Reilly.
A fraud detection module is a component within a system or application that is specifically designed to
identify and prevent fraudulent activities. It utilizes various techniques, algorithms, and data analysis methods
6
CHAPTER 4
SYSTEM DESIGN
Keras is an open-source library that provides a Python interface for artificial neural networks. Keras acts
as an interface for the TensorFlow library. Keras. Original author(s) François Chollet.
Pandas (styled as pandas) is a software library written for the Python programming language for data
manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical
tables and time series. It is free software released under the three-clause BSD license.
7
4.3What is Fraud detection in UPI payments Using Machine Learning?
Fraud detection in Unified Payments Interface (UPI) payments using machine learning involves applying
various algorithms and techniques to analyze transaction data and identify fraudulent activities. Here's how it
can be done:
Data Collection: Gather transaction data from UPI payment platforms, including details such as transaction
amount, timestamp, sender's and receiver's identifiers (e.g., mobile numbers, UPI IDs), transaction remarks,
Feature Engineering: Extract meaningful features from the transaction data that can help distinguish between
legitimate and fraudulent transactions. These features may include transaction frequency, amount variability,
sender-receiver relationships, geographic locations, time of day, device information, and transaction history.
Data Preprocessing: Cleanse and preprocess the collected data to handle missing values, outliers, and
inconsistencies. Standardize or normalize numerical features and encode categorical variables appropriately
Model Selection: Choose suitable machine learning models for fraud detection tasks. Commonly used
algorithms include logistic regression, decision trees, random forests, support vector machines (SVM),
Model Training: Train the selected models using historical transaction data labeled as either legitimate or
fraudulent. Use techniques like cross-validation to optimize model hyperparameters and ensure robust
performance.
Real-time Monitoring: Deploy the trained models to monitor incoming UPI transactions in real-time.
Evaluate each transaction against the learned patterns and detect any deviations that may indicate potential
fraud.
8
Scoring and Decision Making: Assign a fraud score or probability to each transaction based on the output of
the deployed models. Define thresholds or rules for classifying transactions as legitimate or fraudulent.
Transactions surpassing certain thresholds may be flagged for further investigation or declined outright.
Feedback Loop: Incorporate feedback from flagged transactions into the training data to improve the
performance of the fraud detection models over time. Continuously update the models with new data and
Integration with UPI Systems: Integrate the fraud detection module seamlessly with UPI payment platforms
or banking systems. Ensure that the detection process does not introduce significant latency or disrupt the user
Monitoring and Evaluation: Regularly monitor the performance of the fraud detection system and evaluate
its effectiveness in terms of detection accuracy, false positive rate, and other relevant metrics. Fine-tune the
Software design sits at the technical kernel of the software engineering process and is applied regardless
of the development paradigm and area of application. Design is the first step in the development phase for any
engineered product or system. The designer’s goal is to produce a model or representation of an entity that will
later be built. Beginning, once system requirement have been specified and analyzed, system design is the first
of the three technical activities -design, code and test that is required to build and verify software.
The importance can be stated with a single word “Quality”. Design is the place where quality is fostered
in software development. Design provides us with representations of software that can assess for quality.
Design is the only way that we can accurately translate a customer’s view into a finished software product or
system. Software design serves as a foundation for all the software engineering steps that follow. Without a
strong design we risk building an unstable system – one that will be difficult to test, one whose quality cannot
9
During design, progressive refinement of data structure, program structure, and procedural details are
developed reviewed and documented. System design can be viewed from either technical or project
management perspective. From the technical point of view, design is comprised of four activities – architectural
4.5 NORMALIZATION
It is a process of converting a relation to a standard form. The process is used to handle the problems
that can arise due to data redundancy i.e. repetition of data in the database, maintain data integrity as well as
handling problems that can arise due to insertion, updation, deletion anomalies.
Decomposing is the process of splitting relations into multiple relations to eliminate anomalies and
maintain anomalies and maintain data integrity. To do this we use normal forms or rules for structuring
relation.
1. Insertion anomaly: Inability to add data to the database due to absence of other data.
3. Update anomaly: Data inconsistency resulting from data redundancy and partial update
4. Normal Forms: These are the rules for structuring relations that eliminate anomalies.
A relation is said to be in first normal form if the values in the relation are atomic for every attribute in the
relation. By this we mean simply that no attribute value can be a set of values or, as it is sometimes expressed, a
repeating group.
A relation is said to be in second Normal form is it is in first normal form and it should satisfy any one
10
2) No non key attributes are present
3) Every non key attribute is fully functionally dependent on full set of primary key.
Transitive Dependency:
If two non key attributes depend on each other as well as on the primary key then they are said to be
transitively dependent.
The above normalization principles were applied to decompose the data in multiple tables thereby
4.6 E – R DIAGRAMS
The relation upon the system is structure through a conceptual ER-Diagram, which not only specifics the
existential entities but also the standard relations through which the system exists and the cardinalities
The entity Relationship Diagram (ERD) depicts the relationship between the data objects. The ERD is
the notation that is used to conduct the date modeling activity the attributes of each data object noted is
The set of primary components that are identified by the ERD are
1. Data object
2. Relationships
3. Attributes
The primary purpose of the ERD is to represent data objects and their relationships.
11
4.7 DATA FLOW DIAGRAMS
A data flow diagram is graphical tool used to describe and analyze movement of data through a system.
These are the central tool and the basis from which the other components are developed. The transformation of
data from input to output, through processed, may be described logically and independently of physical
components associated with the system. These are known as the logical data flow diagrams. The physical data
flow diagrams show the actual implements and movement of data between people, departments and
workstations. A full description of a system actually consists of a set of data flow diagrams. Using two familiar
notations Yourdon, Gane and Sarson notation develops the data flow diagrams. Each component in a DFD is
Process is further identified with a number that will be used for identification purpose. The
development of DFD’S is done in several levels. Each process in lower level diagrams can be broken down into
a more detailed DFD in the next level. The lop-level diagram is often called context diagram. It consists a
single process bit, which plays vital role in studying the current system. The process in the context level
The idea behind the explosion of a process into more process is that understanding at one level of detail
is exploded into greater detail at the next level. This is done until further explosion is necessary and an
Larry Constantine first developed the DFD as a way of expressing system requirements in a graphical
12
A DFD is also known as a “bubble Chart” has the purpose of clarifying system requirements and
identifying major transformations that will become programs in system design. So it is the starting point of the
design to the lowest level of detail. A DFD consists of a series of bubbles joined by data flows in the system.
Data flow
Data Store
1. Process should be named and numbered for an easy reference. Each name should be representative of the
process.
13
2. The direction of flow is from top to bottom and from left to right. Data traditionally flow from source to the
destination although they may flow back to the source. One way to indicate this is to draw long flow line
back to a source. An alternative way is to repeat the source symbol as a destination. Since it is used more
3. When a process is exploded into lower level details, they are numbered.
4. The names of data stores and destinations are written in capital letters. Process and dataflow names have the
A DFD typically shows the minimum contents of data store. Each data store should contain all the data
Questionnaires should contain all the data elements that flow in and out. Missing interfaces
1. The DFD shows flow of data, not of control loops and decision are controlled considerations do
2. The DFD does not indicate the time factor involved in any process whether the dataflow take
1. Current Physical
2. Current Logical
3. New Logical
4. New Physical
14
4.8.1 CURRENT PHYSICAL:
In Current Physical DFD process label include the name of people or their positions or the names of
computer systems that might provide some of the overall system-processing label includes an identification of
the technology used to process the data. Similarly data flows and data stores are often labels with the names of
the actual physical media on which data are stored such as file folders, computer files, business forms or
computer tapes.
The physical aspects at the system are removed as much as possible so that the current system is reduced
to its essence to the data and the processors that transform them regardless of actual physical form.
This is exactly like a current logical model if the user were completely happy with he user were
completely happy with the functionality of the current system but had problems with how it was implemented
typically through the new logical model will differ from current logical model while having additional
The new physical represents only the physical implementation of the new system.
4.9.1 PROCESS
1) No process can have only outputs.
2) No process can have only inputs. If an object has only inputs than it must be a sink.
15
4.9.2 DATA STORE
1) Data cannot move directly from one data store to another data store, a process must move data.
2) Data cannot move directly from an outside source to a data store, a process, which receives, must
move data from the source and place the data into data store
1) Data cannot move direly from a source to sink it must be moved by a process
DATA FLOW
1) A Data Flow has only one direction of flow between symbols. It may flow in both directions between a
process and a data store to show a read before an update. The later is usually indicated however by two
2) A join in DFD means that exactly the same data comes from any of two or more different processes data
3) A data flow cannot go directly back to the same process it leads. There must be atleast one other process
that handles the data flow produce some other data flow returns the original data into the beginning process.
6) A data flow has a noun phrase label more than one data flow noun phrase can appear on a single arrow as
long as all of the flows on the same arrow move together as one package.
16
4.9.5 DATA DICTONARY
After carefully understanding the requirements of the client the the entire data storage requirements are
divided into tables. The below tables are normalized to avoid any anomalies during the course of data entry
4.10.1Actor:
A coherent set of roles that users of use cases play when interacting with the use cases.
FIGURE: 3 ACTOR
A description of sequence of actions, including variants, that a system performs that yields an
observable result of value of an actor.
UML stands for Unified Modeling Language. UML is a language for specifying, visualizing and documenting
the system. This is the step while developing any product after analysis. The goal from this is to produce a
model of the entities involved in the project which later need to be built. The representation of the entities that
Sequence Diagram
Class diagram
Activity Diagram
17
4.11 USECASE DIAGRAMS:
Use case diagrams model behavior within a system and helps the developers understand of what the user
require. The stick man represents what’s called an actor.Use case diagram can be useful for getting an overall
view of the system and clarifying who can do and more importantly what they can’t do. Use case diagram
consists of use cases and actors and shows the interaction between the use case and actors.
The purpose is to show the interactions between the use case and actor.
18
CHAPTER 5
SYSTEM DEVELOPMENT
5.2INTRODUCTION TO PYTHON:
Python is currently the most widely used multi-purpose, high-level programming language, which allows
programming in Object-Oriented and Procedural paradigms. Python programs generally are smaller than other
programming languages like Java. Programmers have to type relatively less and the indentation requirement of the
Python language is being used by almost all tech-giant companies like – Google, Amazon, Facebook,
The biggest strength of Python is huge collection of standard library which can be used for the following:
Machine Learning
Test frameworks
Multimedia
Scientific computing
19
Python is currently the most widely used multi-purpose, high-level programming language,
which allows programming in Object-Oriented and Procedural paradigms. Python programs generally are
smaller than other programming languages like Java. Programmers have to type relatively less and the
indentation requirement of the language, makes them readable all the time.
Now that we know what makes Python a high level language, l et us also have a look at some of
5.2.1 Function annotations: Python allows you to add annotations to function parameters and return
types. These annotations are optional and do not affect the function’s behaviour, but they can be used to
5.2.2 Coroutines: Python supports coroutines, which are functions that can be paused and resumed.
Coroutines are useful for writing asynchronous code, such as in web servers or networking applications.
5.2.3 Enumerations: Python has a built-in Enum class that allows you to define symbolic names for
values. Enumerations are useful for improving the readability and maintainability of your code.
5.2.4 List comprehensions: Python allows you to create lists in a concise and readable way using list
comprehensions. For example, you can create a list of squares of numbers using [x**2 for x in range(10)].
5.2.5 Extended iterable unpacking: Python allows you to unpack iterable (e.g., lists, tuples, and
dictionaries) into variables using the * and ** operators. This feature makes it easy to work with complex
data structures.
5.2.6 The with statement: Python’s with statement allows you to manage resources (such as files or
network connections) cleanly and concisely. The with statement automatically opens and closes the
20
5.2.7 The walrus operator: Python 3.8 introduced the walrus operator (:=), which allows you to assign a
value to a variable as part of an expression. This feature is usedful for simplifying code and reducing the
number of lines.
5.2.8 The slots attribute: Python allows you to optimize memory usage by using the slots attribute in
classes. This attribute tells Python to allocate memory only for the specified attributes, reducing memory
overhead.
Python is a popular programming language for multiple reasons, some of which include:
1. Simplicity: Python is easy to learn and read, making it a number one choice for beginners. Its
syntax is straightforward and consistent, allowing developers to write code quickly and efficiently.
2. Versatility: Python is a general-purpose language, which means it can be used for a wide range of
applications, including web development, data analysis, machine learning, and artificial
intelligence.
3. Large community: Python has a large and active community of developers who contribute to its
development and offer support to new users. This community has created a vast array of libraries
4. Open-source: Python is an open-source language, which means its source code is freely available
and can be modified and distributed by anyone. This has led to the creation of many useful libraries
5. Scalability: Python can be used to build both small and large-scale applications. Its scalability
These are just a few of the lesser-known features of Python as a language. Python is a powerful and
flexible language that offers many more features and capabilities that can help developers write efficient,
21
5.3 LIBRARIES IN PYTHON
Normally, a library is a collection of books or is a room or place where many books are stored to
be used later. Similarly, in the programming world, a library is a collection of precompiled codes that can
be used later on in a program for some specific well-defined operations. Other than pre-compiled codes, a
library may contain documentation, configuration data, message templates, classes, and values, etc.
A Python library is a collection of related modules. It contains bundles of code that can be used
repeatedly in different programs. It makes Python Programming simpler and convenient for the
programmer. As we don’t need to write the same code again and again for different programs. Python
libraries play a very vital role in fields of Machine Learning, Data Science, Data Visualization, etc.
As is stated above, a Python library is simply a collection of codes or modules of codes that we
can use in a program for specific operations. We use libraries so that we don’t need to write the code again
in our program that is already available. But how it works. Actually, in the MS Windows environment, the
library files have a DLL extension (Dynamic Load Libraries). When we link a library with our program
and run that program, the linker automatically searches for that library. It extracts the functionalities of
that library and interprets the program accordingly. That’s how we use the methods of a library in our
program. We will see further, how we bring in the libraries in our Python programs.
The Python Standard Library contains the exact syntax, semantics, and tokens of Python. It
contains built-in modules that provide access to basic system functionality like I/O and some other core
modules. Most of the Python Libraries are written in the C programming language. The Python standard
library consists of more than 200 core modules. All these work together to make Python a high-level
22
programming language. Python Standard Library plays a very important role. Without it, the programmers
can’t have access to the functionalities of Python. But other than this, there are several other libraries in
Python that make a programmer’s life easier. Let’s have a look at some of the commonly used libraries:
5.5.1 TensorFlow: This library was developed by Google in collaboration with the Brain Team. It is an
open-source library used for high-level computations. It is also used in machine learning and deep
learning algorithms. It contains a large number of tensor operations. Researchers also use this Python
5.5.2 Matplotlib: This library is responsible for plotting numerical data. And that’s why it is used in data
analysis. It is also an open-source library and plots high-defined figures like pie charts, histograms,
5.5.3 Pandas: Pandas are an important library for data scientists. It is an open-source machine learning
library that provides flexible high-level data structures and a variety of analysis tools. It eases data
analysis, data manipulation, and cleaning of data. Pandas support operations like Sorting, Re-indexing,
5.5.4 Numpy: The name “Numpy” stands for “Numerical Python”. It is the commonly used library. It is a
popular machine learning library that supports large matrices and multi-dimensional data. It consists of in-
built mathematical functions for easy computations. Even libraries like TensorFlow use Numpy internally
to perform several operations on tensors. Array Interface is one of the key features of this library.
5.5.5 SciPy: The name “SciPy” stands for “Scientific Python”. It is an open-source library used for high-
level scientific computations. This library is built over an extension of Numpy. It works with Numpy to
handle complex computations. While Numpy allows sorting and indexing of array data, the numerical data
code is stored in SciPy. It is also widely used by application developers and engineers.
23
5.5.6 Scrapy: It is an open-source library that is used for extracting data from websites. It provides very
fast web crawling and high-level screen scraping. It can also be used for data mining and automated
testing of data.
5.5.7 Scikit-learn: It is a famous Python library to work with complex data. Scikit-learn is an open-source
library that supports machine learning. It supports variously supervised and unsupervised algorithms like
linear regression, classification, clustering, etc. This library works in association with Numpy and SciPy.
5.5.8 PyGame: This library provides an easy interface to the Standard Directmedia Library (SDL)
platform-independent graphics, audio, and input libraries. It is used for developing video games using
computer graphics and audio libraries along with Python programming language.
5.5.9 PyTorch:PyTorch is the largest machine learning library that optimizes tensor computations. It has
rich APIs to perform tensor computations with strong GPU acceleration. It also helps to solve application
5.5.10 PyBrain: The name “PyBrain” stands for Python Based Reinforcement Learning, Artificial
Intelligence, and Neural Networks library. It is an open-source library built for beginners in the field of
Machine Learning. It provides fast and easy-to-use algorithms for machine learning tasks. It is so flexible
and easily understandable and that’s why is really helpful for developers that are new in research fields.
There are many more libraries in Python. We can use a suitable library for our purposes.
As we write large-size programs in Python, we want to maintain the code’s modularity. For the
easy maintenance of the code, we split the code into different parts and we can use that code later ever we
need it. In Python, modules play that part. Instead of using the same code in different programs and
making the code complex, we define mostly used functions in modules and we can just simply import
24
them in a program wherever there is a requirement. We don’t need to write that code but still, we can use
its functionality by importing its module. Multiple interrelated modules are stored in a library. And
whenever we need to use a module, we import it from its library. In Python, it’s a very simple job to do
Guido van Rossum began working on Python in the late 1980s as a successor to the ABC
programming language and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000.
Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier
versions. Python 2.7.18, released in 2020, was the last release of Python 2. Python consistently ranks as
Python was conceived in the late 1980s by Guido van Rossum at Centrum
which was inspired by SETL, capable of exception handling and interfacing with the Amoeba operating
system. Its implementation began in December 1989. Van Rossum shouldered sole responsibility for the
project, as the lead developer, until 12 July 2018, when he announced his "permanent vacation" from his
responsibilities as Python's "benevolent dictator for life", a title the Python community bestowed upon him
to reflect his long-term commitment as the project's chief decision-maker. In January 2019, active Python
Python 2.0 was released on 16 October 2000, with many major new features such as list
Python 3.0, released on 3 December 2008, with many of its major features back ported to
Python 2.6.x and 2.7.x. Releases of Python 3 include the 2to3 utility, which automates the translation of
Python 2.7's end-of-life was initially set for 2015, then postponed to 2020 out of concern that a
large body of existing code could not easily be forward-ported to Python 3. No further security patches or
25
other improvements will be released for it. Currently only 3.7 and later are supported. In 2021,
Python 3.9.2 and 3.8.8 were expedited as all versions of Python (including 2.7) had security issues leading
In 2022, Python 3.10.4 and 3.9.12 were expedited and 3.8.13, and 3.7.13, because of many
security issues. When Python 3.9.13 was released in May 2022, it was announced that the 3.9 series
(joining the older series 3.8 and 3.7) would only receive security fixes in the future. On September 7,
2022, four new releases were made due to a potential denial-of-service attack: 3.10.7, 3.9.14, 3.8.14, and
3.7.14. As of November 2022, Python 3.11.0 is the current stable release. Notable changes from 3.10
programming are fully supported, and many of their features support functional programming and aspect-
oriented programming (including meta programming and meta objects). Many other paradigms are
Python uses dynamic typing and a combination of reference counting and a cycle-detecting
garbage collector for memory management. It uses dynamic name resolution (late binding), which binds
Its design offers some support for functional programming in the Lisp tradition. The standard library has
two modules (itertools and functools) that implement functional tools borrowed
Its core philosophy is summarized in the document The Zen of Python (PEP 20), which
26
Beautiful is better than ugly.
Readability counts.
Rather than building all of its functionality into its core, Python was designed to be
highly extensible via modules. This compact modularity has made it particularly popular as a means of
adding programmable interfaces to existing applications. Van Rossum's vision of a small core language
with a large standard library and easily extensible interpreter stemmed from his frustrations with ABC,
Python strives for a simpler, less-cluttered syntax and grammar while giving developers a choice
in their coding methodology. In contrast to Perl's "there is more than one way to do it" motto, Python
embraces a "there should be one—and preferably only one—obvious way to do it" philosophy. Alex
Martelli, a Fellow at the Python Software Foundation and Python book author, wrote: "To describe
Python's developers strive to avoid premature optimization and reject patches to non-critical
parts of the CPython reference implementation that would offer marginal increases in speed at the cost of
clarity. When speed is important, a Python programmer can move time-critical functions to extension
modules written in languages such as C; or use PyPy, a just-in-time compiler. Cython is also available,
which translates a Python script into C and makes direct C-level API calls into the Python interpreter.
Python's developers aim for it to be fun to use. This is reflected in its name—a tribute to the
British comedy group Monty Pythonand in occasionally playful approaches to tutorials and reference
materials, such as the use of the terms "spam" and "eggs" (a reference to a Monty Python sketch) in
27
A common neologism in the Python community is pythonic, which has a wide range of meanings
related to program style. "Pythonic" code may use Python idioms well, be natural or show fluency in the
language, or conform with Python's minimalist philosophy and emphasis on readability. Code that is
difficult to understand or reads like a rough transcription from another programming language is
called unpythonic.
Python is meant to be an easily readable language. Its formatting is visually uncluttered and often
uses English keywords where other languages use punctuation. Unlike many other languages, it does not
use curly brackets to delimit blocks, and semicolons after statements are allowed but rarely used. It has
Python uses whitespace indentation, rather than curly brackets or keywords, to delimit blocks.
An increase in indentation comes after certain statements; a decrease in indentation signifies the end of the
current block. Thus, the program's visual structure accurately represents its semantic structure. This
feature is sometimes termed the off-side rule. Some other languages use indentation this way; but in most,
indentation has no semantic meaning. The recommended indent size is four spaces.
The if statement, which conditionally executes a block of code, along with else and elif (a contraction of
else-if)
The for statement, which iterates over an iterable object, capturing each element to a local variable for
The while statement, which executes a block of code as long as its condition is true
28
The try statement, which allows exceptions raised in its attached code block to be caught and handled
by except clauses (or new syntax except* in Python 3.11 for exception groups[85]); it also ensures that
clean-up code in a finally block is always run regardless of how the block exits
The raise statement, used to raise a specified exception or re-raise a caught exception
The class statement, which executes a block of code and attaches its local namespace to a class, for use
in object-oriented programming
The with statement, which encloses a code block within a context manager (for example, acquiring
a lock before it is run, then releasing the lock; or opening and closing a file), allowing resource-
The continue statement, which skips the rest of the current iteration and continues with the next
The del statement, which removes a variable—deleting the reference from the name to the value, and
The pass statement, serving as a NOP, syntactically needed to create an empty code block
The assert statement, used in debugging to check for conditions that should apply
The yield statement, which returns a value from a generator function (and also an operator); used to
implement coroutines
The import and from statements, used to import modules whose functions or variables can be used in the
current program
allocated object. Variables may subsequently be rebound at any time to any object. In Python, a variable
name is a generic reference holder without a fixed data type; however, it always refers to some object with
29
a type. This is called dynamic typing—in contrast to statically-typed languages, where each variable may
Python does not support tail call optimization or first-class continuations, and, according to Van
Rossum, it never will. However, better support for coroutine-like functionality is provided by extending
Python's generators. Before 2.5, generators were lazy iterators; data was passed unidirectionally out of the
generator. From Python 2.5 on, it is possible to pass data back into a generator function; and from version
5.9 DEVELOPMENT
Python's development is conducted largely through the Python Enhancement Proposal (PEP)
process, the primary mechanism for proposing major new features, collecting community input on issues,
and documenting Python design decisions. Python coding style is covered in PEP 8 Outstanding PEPs are
reviewed and commented on by the Python community and the steering council.
Enhancement of the language corresponds with the development of the CPython reference
implementation. The mailing list python-dev is the primary forum for the language's development.
Specific issues were originally discussed in the Roundup bug tracker hosted at by the foundation. In 2022,
all issues and discussions were migrated to GitHub. Development originally took place on a self-
hosted source-code repository running Mercurial, until Python moved to GitHub in January 2017.
CPython's public releases come in three types, distinguished by which part of the version number is
incremented:
Backward-incompatible versions, where code is expected to break and needs to be manually ported. The
first part of the version number is incremented. These releases happen infrequently—version 3.0 was
released 8 years after 2.0. According to Guido van Rossum, a version 4.0 is very unlikely to ever happen.
Major or "feature" releases are largely compatible with the previous version but introduce new features.
30
The second part of the version number is incremented. Starting with Python 3.9, these releases are
expected to happen annually. Each major version is supported by bug fixes for several years after its
release.
Bugfix releases, which introduce no new features, occur about every 3 months and are made when a
sufficient number of bugs have been fixed upstream since the last release. Security vulnerabilities are also
patched in these releases. The third and final part of the version number is incremented.
Many alpha, beta, and release-candidates are also released as previews and for testing before final
releases. Although there is a rough schedule for each release, they are often delayed if the code is not
ready. Python's development team monitors the state of the code by running the large unit test suite during
development.
The major academic conference on Python is PyCon. There are also special Python mentoring programs,
such as Pyladies.
Python 3.10 deprecated wstr (to be removed in Python 3.12; meaning Python extensions need to be
Tools that can generate documentation for Python API include pydoc (available as part of the
standard library), Sphinx, Pdoc and its forks, Doxygen and Graphviz, among others.
5.9.2 Naming
Python's name is derived from the British comedy group Monty Python, whom Python creator
Guido van Rossum enjoyed while developing the language. Monty Python references appear frequently in
Python code and culture; for example, the metasyntactic variables often used in Python literature
31
are spam and eggs instead of the traditional foo and bar. The official Python documentation also contains
various references to Monty Python routines. The prefix Py- is used to show that something is related to
Python. Examples of the use of this prefix in names of Python applications or libraries include Pygame,
a binding of SDL to Python (commonly used to create games); PyQt and PyGTK, which bind Qt and GTK
Popular it Since 2003, Python has consistently ranked in the top ten most popular programming
languages in the TIOBE Programming Community Index where as of December 2022 it was the most
popular language (ahead of C, C++, and Java). It was selected Programming Language of the Year (for
"the highest rise in ratings in a year") in 2007, 2010, 2018, and 2020 (the only language to have done so
four times as of 202).An empirical study found that scripting languages, such as Python, are more
productive than conventional languages, such as C and Java, for programming problems involving string
5.10 OPEN CV
OpenCV (Open Source Computer Vision Library) is an open source computer vision and
machine learning software library. OpenCV was built to provide a common infrastructure for computer
vision applications and to accelerate the use of machine perception in the commercial products. Being an
Apache 2 licensed product, OpenCV makes it easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a comprehensive set of
both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can
be used to detect and recognize faces, identify objects, classify human actions in videos, track camera
movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo
cameras, stitch images together to produce a high resolution image of an entire scene, find similar images
from an image database, remove red eyes from images taken using flash, follow eye movements,
recognize scenery and establish markers to overlay it with augmented reality, etc. OpenCV has more than
32
47 thousand people of user community and estimated number of downloads exceeding 18 million. The
Along with well-established companies like Google, Yahoo, Microsoft, Intel, IBM, Sony,
Honda, Toyota that employ the library, there are many startups such as Applied Minds, VideoSurf, and
Zeitera, that make extensive use of OpenCV. OpenCV’s deployed uses span the range from stitching
streetview images together, detecting intrusions in surveillance video in Israel, monitoring mine
equipment in China, helping robots navigate and pick up objects at Willow Garage, detection of
swimming pool drowning accidents in Europe, running interactive art in Spain and New York, checking
runways for debris in Turkey, inspecting labels on products in factories around the world on to rapid face
detection in Japan.
It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and
Mac OS. OpenCV leans mostly towards real-time vision applications and takes advantage of MMX and
SSE instructions when available. A full-featured CUDAandOpenCL interfaces are being actively
developed right now. There are over 500 algorithms and about 10 times as many functions that compose
or support those algorithms. OpenCV is written natively in C++ and has a templated interface that works
This module covers the basic data structures such as Scalar, Point, Range, etc., that are used to
build OpenCV applications. In addition to these, it also includes the multidimensional array Mat, which is
used to store the images. In the Java library of OpenCV, this module is included as a package with the
name org.opencv.core.
33
5.11.2 Image Processing
This module covers various image processing operations such as image filtering, geometrical
image transformations, color space conversion, histograms, etc. In the Java library of OpenCV, this
5.11.3 Video
This module covers the video analysis concepts such as motion estimation, background
subtraction, and object tracking. In the Java library of OpenCV, this module is included as a package with
This module explains the video capturing and video codecs using OpenCV library. In the Java
library of OpenCV, this module is included as a package with the name org.opencv.videoio.
5.11.5 calib3d
This module includes algorithms regarding basic multiple-view geometry algorithms, single and
stereo camera calibration, object pose estimation, stereo correspondence and elements of3D
reconstruction. In the Java library of OpenCV, this module is included as a package with the name
org.opencv.calib3d.
5.11.6 features2d
This module includes the concepts of feature detection and description. In the Java library of
5.11.7 Objdetect
This module includes the detection of objects and instances of the predefined classes such as
faces, eyes, mugs, people, cars, etc. In the Java library of OpenCV, this module is included as a package
34
5.11.8 Highgui
This is an easy-to-use interface with simple UI capabilities. In the Java library of OpenCV, the
features of this module is included in two different packages namely, org.opencv.imgcodecs and
org.opencv.videoio.
5.12 PILLOW
Digital Image processing means processing the image digitally with the help of a computer.
Using image processing we can perform operations like enhancing the image, blurring the image,
extracting text from images, and many more operations. There are various ways to process images
Python Pillow is built on the top of PIL (Python Image Library) and is considered as the fork for
Pillow supports many image file formats including BMP, PNG, JPEG, and TIFF. The library
encourages adding support for newer formats in the library by creating new file decoders.
The Pillow module provides the open() and show() function to read and display the image
respectively. For displaying the image Pillow first converts the image to a .png format (on Windows OS)
and stores it in a temporary buffer and then displays it. Therefore, due to the conversion of the image
format to .png some properties of the original image file format might be lost (like animation). Therefore,
5.13 TENSORFLOW
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive,
flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in
ML, and gives developers the ability to easily build and deploy ML-powered applications.
TensorFlow provides a collection of workflows with intuitive, high-level APIs for both
beginners and experts to create machine learning models in numerous languages. Developers have the option to
35
deploy models on a number of platforms such as on servers, in the cloud, on mobile and edge devices, in
browsers, and on many other JavaScript platforms. This enables developers to go from model building and
Build and train ML models easily using intuitive high-level APIs like Keras with eager execution, which
Easily train and deploy models in the cloud, on-prem, in the browser, or on-device no matter what
A simple and flexible architecture to take new ideas from concept to code, to state-of-the-art models,
TensorFlow was originally developed by researchers and engineers working on the Google Brain
Team within Google’s Machine Intelligence research organization for the purposes of conducting
machine learning and deep neural networks research, but the system is general enough to be applicable
in a wide variety of other domains as well! Let us first try to understand what the
word TensorFlow actually mean! TensorFlow is basically a software library for numerical
Edges in the graph represent the multidimensional data arrays (called tensors) communicated between
them. (Please note that tensor is the central unit of data in TensorFlow).
36
5.14 TORCH
Deep Learning is a branch of Machine Learning where algorithms are written which mimic the
functioning of a human brain. The most commonly used libraries in deep learning are Tensorflow and
PyTorch. As there are various deep learning frameworks available, one might wonder when to use PyTorch.
Here are reasons why one might prefer using Pytorch for specific tasks.
Pytorch is an open-source deep learning framework available with a Python and C++ interface.
Pytorch resides inside the torch module. In PyTorch, the data that has to be processed is input in the form of a
tensor.
37
CHAPTER 6
SYSTEM TESTING
Software testing is a critical element of software quality assurance and represents the ultimate review of
software specification, design and coding. The increasing visibility of software as a system element and the
attendant “costs” associated with a software failure are motivating forces for conference management system
project well planned, thorough testing. It is not unusual for conference management system project software.
Development organization to expend 40 percent of total project effort on testing. Hence the importance of
software testing and its implications with respect to software quality cannot be overemphasized. Different
types of testing have been carried out for conference management system project this system, and they are
Testing is the process of trying to discover every conceivable fault or weakness in a work product. The
Unit testing involves the design of test cases that validate that the internal program logic is functioning
properly, and that program inputs produce valid outputs. All decision branches and internal code flow should be
validated. It is the testing of individual software units of the application .it is done after the completion of an
38
This is a structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform basic
tests at component level and test a specific business process, application, and/or system configuration. Unit tests
ensure that each unique path of a business process performs accurately to the documented specifications and
Integration tests are designed to test integrated software components to determine if they actually run as
one program. Testing is event driven and is more concerned with the basic outcome of screens or fields.
Integration tests demonstrate that although the components were individually satisfaction, as shown by
successfully unit testing, the combination of components is correct and consistent. Integration testing is
specifically aimed at exposing the problems that arise from the combination of components.
Tests were performed to find conformity with the requirements. Plans and procedures were designed to
ensure that all functional requirements are satisfied. The software was alpha-tested there are two goals in
preparing test plans. Firstly, a properly detailed test plan demonstrates that the program specifications are
understood completely. Secondly, the test plan is used during program testing to prove the correctness of the
program.
39
CHAPTER 7
WORKING
import os
import tkinter as tk
from tkinter import *
from tkinter import messagebox, filedialog, PhotoImage
from PIL import Image, ImageTk
from webbrowser import open
import pandas as pd
import cv2
root = Tk()
root.title("UPI Data")
root.geometry("650x650")
root.config(bg="#827b7a")
root.resizable(False, False)
background_image = PhotoImage(file="Images/upi-icon.png")
background_label = Label(root, image=background_image)
background_label.place(x=1, y=1)
icon = PhotoImage(file='Images/upi.png')
icon_photo = tk.Label(root, image=icon)
icon_photo.place(x=45, y=44)
label = tk.Label(root, text="Fraud Detection for UPI Payments", font=("Times New Roman",
23, "bold"), fg="black")
label.place(x=150, y=72)
def data_enter(e):
40
data_lbl.delete('0', 'end')
def data_leave(e):
p = data_lbl.get()
if p == '':
data_lbl.insert('0', "Enter UPI Phone Number")
def clear_lbl():
result_lbl.config(fg='#827b7a')
def upi_data():
data = 0
upi_no = int(data_lbl.get())
df = pd.read_csv('upi_fraud_dataset.csv')
for u in range(0, len(df['upi_number'])):
if df['upi_number'][u] == upi_no:
data = int(df['fraud_risk'][u])
if data:
print('Detected: Fraud Transaction')
data_lbl.delete(0, 'end')
result_lbl.config(text='Detected: Fraud Transaction', fg='black')
result_lbl.place(x=150, y=490)
else:
print('Detected: Valid Transaction')
data_lbl.delete(0, 'end')
result_lbl.config(text='Detected: Valid Transaction', fg='black')
result_lbl.place(x=150, y=490)
42
OUTPUT:
Figure: 7.1.1
Figure:7.1.2
43
CHAPTER 8
CONCLUSION
In conclusion, the implementation of a fraud detection system for UPI payments using machine learning holds
significant promise in bolstering the security and integrity of the UPI ecosystem. Through the utilization of
advanced algorithms and techniques, such as feature engineering, model training, and real-time monitoring, our
project aims to effectively identify and prevent fraudulent activities, thereby mitigating financial risks for users
By leveraging historical transaction data and continuously adapting to evolving fraud patterns, our system
demonstrates robustness and adaptability in detecting a wide range of fraudulent behaviors, including
unauthorized transactions, account takeovers, and social engineering attacks. The integration of feedback
mechanisms enables iterative improvement of the detection models, ensuring optimal performance over time.
Furthermore, the seamless integration of the fraud detection module with existing UPI payment platforms
facilitates frictionless user experiences while maintaining stringent security standards. By swiftly flagging
suspicious transactions for further review or intervention, our system minimizes the impact of fraudulent
As the landscape of digital payments continues to evolve, the need for robust fraud detection mechanisms
becomes increasingly paramount. Through our project, we contribute to the ongoing efforts to combat financial
fraud and promote a safer, more secure environment for digital transactions, ultimately fostering confidence and
44
8. FUTURE ENHANCEMENT:
Real-time Integration: Deploy the model in a way that allows real-time analysis of
transactions, enabling immediate intervention during suspicious activity.
Scalable Infrastructure: Ensure the system can handle the ever-growing volume of UPI
transactions efficiently. Cloud-based solutions can be a good option here.
Privacy Preservation: Develop methods to anonymize sensitive user data while still allowing
for effective fraud detection. This ensures user privacy is protected.
Regulatory Compliance: Stay updated on evolving regulations around data privacy and
financial security to ensure your system adheres to best practices
45
CHAPTER 9
REFERENCES
1. A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective -
SamanehSorournejad,
5. Solving the False positives problem in fraud prediction using automated feature engineering - Wedge,
results
46