0% found this document useful (0 votes)
177 views

Final Year Project

This document provides an overview of machine learning techniques for heart disease prediction and summarizes several related studies. 1. Several studies have applied machine learning algorithms such as naive Bayes, decision trees, random forest, logistic regression and SVM to heart disease prediction datasets. These studies found logistic regression and SVM performed best in terms of accuracy. 2. Other studies have specifically compared decision trees, neural networks, naive Bayes, KNN and SVM on heart disease datasets. These found SVM and decision trees had the highest accuracy scores, and SVM had the best sensitivity and specificity rates. 3. Additional studies introduced heart disease prediction functions using logistic regression, neural networks and random forest models in R language. They performed comparative analysis to

Uploaded by

Muhammad Faseeh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views

Final Year Project

This document provides an overview of machine learning techniques for heart disease prediction and summarizes several related studies. 1. Several studies have applied machine learning algorithms such as naive Bayes, decision trees, random forest, logistic regression and SVM to heart disease prediction datasets. These studies found logistic regression and SVM performed best in terms of accuracy. 2. Other studies have specifically compared decision trees, neural networks, naive Bayes, KNN and SVM on heart disease datasets. These found SVM and decision trees had the highest accuracy scores, and SVM had the best sensitivity and specificity rates. 3. Additional studies introduced heart disease prediction functions using logistic regression, neural networks and random forest models in R language. They performed comparative analysis to

Uploaded by

Muhammad Faseeh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 57

CHAPTER 1

Introduction
1.1 Introduction
Sometimes patients need different laboratory tests to get diagnosed correctly, which can put into
difficulties to them with extra physical activities, time, and for sure economic issues. Common
reasons behind heart problems can be unhealthy food, tobacco consumption, high sugar and weight
gain. Common symptoms can be the pain in chest and arms, abnormal breathing and abnormal blood
pressure. These symptoms can be independent to each other. A dataset having details of these
symptoms can be very helpful and proper study on such type of dataset can improve the process of
diagnoses and can help the heart consultants.
Machine learning the branch of AI (Artificial Intelligence) has helped us in making calculation and
predictions by performing calculations with algorithms. Machine learning is the technique that gives
computers with the ability to learn from given data as input and without extra programming learning
from that data. Algorithm helps machine to make patterns. When we provide some input, computer
makes calculation and makes some predictions by recognizing previously built patterns. Machine
learning can be categorized into two types. The first one is supervised machine learning and the
second one is unsupervised type machine-learning.
The supervised learning method means that we input the algorithm a dataset input in which actual
output is provided as label. While on the other hand Unsupervised Learning can be defined as. We
are given a dataset and we are not told what to do with it. We are not told what each point means.
Unsupervised learning is learning on data about another data without knowing correct answers
[Andrew NG Stanford University].
As it is discussed earlier there are many machine learning algorithms for solving problems. They
work differently on some data. At the end we get different results. Comparative analysis of machine
learning algorithms helps us in evaluating different algorithms like SVM, Random Forest, Decision-
Tree, and Clustering with K-means technique, KNN.

1.2 Problem statement:


The proper diagnosis of heart problem is a difficult task which requires different laboratories tests,
and use of medical equipment. These tests put Burdon on patients and their families in the form of
time and money. In the medical science time is everything, diagnosing any disease right on time can
save patient’s life. This study attempts to support this procedure with the help of technology.

1
To overcome those scenarios where doctors are confused with patient’s condition and leave some
evidences behind. To support doctors in understanding the patient’s condition, with the help of
computerized system.
1.3 Motivation:
Finding the best ML algorithm for disease diagnosis (prediction). My motivation is to help the
doctors and medical staff not only for the doctors, but for a person while staying in the house who
has doubt about his condition. Applying findings for helping people in diagnosis of heart disease
while saving their money and time.
1.4 Scope:
The proposed system will help us to find the most suitable machine learning algorithm for prediction.
And at the same time through implementation we will be able to predict heart disease. This system
will work as decision support system for health department and institutions. A person will be able to
find out that he/she could have a disease or not.
1.5 Purpose:
The purpose of making this project is to help heart consultants as a support system. From perspective
of both the education & research, and the society. Figuring out measures that would help a lot for
making system work efficiently.
1.6 Goal:
Main goal of this projects is the comparative study of machine learning algorithm. And building a
system that helps consultants in predicting heart failure of patients coming for checkup in hospital.
Serving the people with advance tools and techniques which are reliable and efficient.

2
Chapter 2
Literature Review
1 Chapter 2: Literature Review and Related Work
This chapter is the review of the related work already done. This chapter emphasize techniques,
working and achievements in already done work.

[1] Comparing five models with predictive technique to find the chances of heart failure. Dataset
collected from different locations available on kaggle.com is taken in this research. Their data
contains 14 attributes which are mutually used for analysis of heart failure. [1] has used Rapid Miner
tool is used for this research and study. Five ML algorithms and their trained models which includes
Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, and SVM. In the first phase of
operation, the training data used for learning purpose of the ML models. For this purpose, they
retrieved the dataset by using Read_CSV method. Additionally, [1] copied dataset five times to
connect with ML models. For avoiding under/over fitting K-Fold technique is used with 1/10 ratio.
In this study, numerous preprocessing steps are smeared on the dataset. [1] Found dataset size
unsatisfactory for implementation. To overcome this dissatisfaction by the help of minimum and
maximum values, random values are derived for every column to increase the dataset weightage.
This practice upgraded the volume of the data to three times and sized 1013, and performance of the
classifiers enhanced.
To find missing values and noisy data in the dataset with Rapid Minor data cleansing method
applied. Some null values which has been ascribed by the use of K-Nearest-Neighbor method. The
outlier detection technique is used to guess the noise in data. And no outlier found in the dataset.
Outlier detection used distance calculation technique.

[2] Examines the performance of a number of classification practice functions in data mining for
forecasting heart failure. Logistics, Multilayer Perception and Sequential Minimal Optimization
function applied in this study work. Dataset collected from Cleveland cardiovascular disease clinic
and available at UCI ML repository, used in [2]. Thirteen columns and 303 records are present in the
dataset. Analyses is performed with the help of WEKA data mining tool.
True Positive rate, F measure, Receiver Operating Characteristics, Kappa Statistics are used as
accuracy measures.
It is observed that logistic classification function performed good and produced greater accuracy
with lowest errors as compared to SMO and MLP.

4
[3] Aims to pull a comparison between different algorithms used for heart disease prediction. With
WEKA version 3.6, 4 classifiers are compared and observed. The dataset taken from UCI ML
repository. Their dataset has 13 attribute columns, 1 label column and three hundred and three record
rows. The algorithm is applied on dataset for pre-processing. After selecting features, we deleted
missing valued records, which provides us with 296 records for further processing.
Classification models which includes Decision Tree, Artificial Neural Networks, RIPPER and
Support Vector Machine are compared. Furthermore the authors performed the comparison of their
work with others work.
A system with three common data mining systems Decision Trees, Neural Network, Naïve Bayes.
The dataset with 909 records and 15 attributes which is split into two equal size datasets. The
training data set had 455 records and testing dataset had 454 records.
In this paper, the study pointed out different classifiers that show good results. By using different
classification algorithms, or techniques such as K-Nearest Neighbor, Neural Networks, and Naïve
Bayes and Decision Trees, Support Vector Machine, Multi-Layer Perceptron, Artificial Neural
Networks. However, the NB, DT and the SVM have more accurate results as compared to other
methods.
[4] In this study researchers from SASTRA University has applied SVM, KNN and Decision Tree
algorithms to catch the most beneficial algorithm for heart disease prediction. [4] Proposed a study
on accuracy measures sensitivity and specificity for these stated methods. R programming language
and predefined functions are used. Predefined function are caret, r-part and e1071.
Preprocessing is not a part of this work. Dataset taken from UCI repository of machine learning. The
data has 12 attributes and 270 records. The algorithms are employed and the accuracy is calculated.
A bar graph is designed by using accuracy and two more accuracy measures Sensitivity and
Specificity. Confusion matrix is used to calculate the accuracy, sensitivity and specificity. As far as
results are concerned KNN’s accuracy score is nearly 72%, DT’s accuracy score was around 89%
and SVM’s score was 91%. The accuracy score measure sensitivity rates observed, are KNN’s score
is 83%, DT’s score is 83% and SVM’s score is around 100%. Specificity rate produced by the D-
Tree is 100%, Support Vector Machine is 83%, K-Nearest Neighbor is 60%.
As compared to KNN and D-Tree SVM did best in terms of accuracy, specificity and sensitivity
rates. These measure are derived from confusion matrix. So it can be said that for heart disease
prediction SVM would be a good option.

[5] Compares different machine learning algorithms to find best performances algorithm in heart
disease prediction with R language. A Heart Disease Prediction function is introduced that forecast

4
the probability. For building the predictive model, Logistic Regression, Neural Network and Random
Forest models are used, analyzing their performances through comparative analysis of important
evaluation parameters. The main concern of this study is emphasizing that, which method results
highest accuracy. R programming language is used for this study. Dataset is taken from UCI ML
repository.
From 76 columns only 13 are selected which consists on 303 records in it. The performance of
algorithms is evaluated by the accuracy measures sensitivity rate, specificity rate, accuracy rate and
execution time. These assessment measures are derived by using h-measure bundle. To check
dependencies of proposed model, ROC curve is used.
With investigation of the outcomes, it can be easily inferred that Neural Network ended up as the
best classifier for Heart Disease prediction. Neural network’s accuracy rate is greater as compared to
logistic regression and random forest. Random forest is at second and logistic regression is at the last
in accuracy measure race. It is also an observable point that maximum time taken for execution by
neural network. In this study it can be viewed that Logistic Regression took the minimum time of
execution.

4
Chapter 3
Planning and Methodology

4
2 Chapter 3: Planning and Methodology
In this chapter it is described that how this project will be completed which includes information
about project scheduling, deliverables, deliverable dates and the methods for accomplishment of
tasks.

3.1 Project Deliverables

Documentation up to first 5 chapter


The list of project deliverables is:

 Project Proposal

 Introduction

 Literature Review

 Planning and Methodology Documents

 Requirement Specification Documents

 System Design Documents

 Test Case Documents

 Bug Report

 Final Report

 Poster

 Final Submission

4
3.2 Process Gantt chart

Dates
14- 24- 8- 15- 25- 24- 7- 21- 20- 17- 8- 11- 13- 14-
Nov Nov De De De Ja Feb Fe Mar Apr Ma May May May
c c c n b y
Project
Proposal
Introduction

Literature
Review
Planning
Methodology
Requirements
Specification
System Design
W
Revision
O
Coding
Implementatio
R
n
Coding
Implementatio K
n
Coding
Implementatio
n
Coding
Implementatio
n
Test Case
Development
Testing and
Debugging
Final Report

Revise Report

Poster
Preparation
Final
Submission

Figure 3.1 Process Gantt Chart

4
3.3 Process Model
For developing the system for heart disease prediction V model is selected as a process
model.

The purpose of selecting this model is that it associates quality assurance during the
development to ensure quality of the system. this model divides the fundamental process
activities of requirements modeling, architectural design, component design, and code
generation and represents them as separate process phase. V is the variant of waterfall
model.

This process model is adopted because it results artifacts which are tested by following
appropriate test procedures. V model includes

Figure 3.2 Process Model

3.4 Methodology for comparative analysis

Dataset collected from Hungarian Institute of Cardiology, University of Zurich, University of


Basel, Cleveland clinic Medical center is used of training and testing process. This dataset is
available on UCI repository.

4
This dataset contains 13 attributes and 1 resultant column in it.

This dataset has been used in more then 50 research projects. And available on UCI Machine
Learning Repository. This data is already in processed form and does not need more processing.
For comparison four machine learning algorithms are selected which are Logistic Regression,
KNN, NB and SVM.

Logistic Regression:
Logistic regression is the supervised type of machine learning algorithm used for classification. It
uses sigmoid function for classification. LR is the best algorithm when we need to perform
binary classification. Output can be 0 or 1.
n
Hypothesis => z=∑ ( xi. θi)
i=0

Sigmoid => 1/(1+ e−z )

Figure 3.3 Sigmoid Function


For prediction we can set threshold value based on this threshold value the obtained probability
can be classified.

KNN:
KNN is the supervised machine learning classification algorithm which uses lazy learning
approach for classification. Assumes that only similar things can exists near each other.
KNN draws a boundary around its new input data point by calculating distance using distance
measuring functions. Manhattan and Euclidean distance measuring techniques can be used.
Euclidean distance function is used in this algorithm for selecting K neighbors. After calculating
distances and selecting K neighbors KNN finds the majority of class in those neighbors and
predicts accordingly.
Euclidean distance.

4
Figure 3.4 Euclidean Distance
Naïve Bayes:
NB are family of classifiers based on Bayes theorem. NB are supervised machine learning
classification algorithm. The reason they are called Naïve is that each attribute is considered
independent of each other. In this project Gaussian Naïve Bayes algorithm is used because
Gaussian is the function which works with continuous values.
Bayes probability function.

2.1
Figure 3.5 Bayes Theorem

Gaussian function.

Figure 3.6 Gaussian Function

SVM:
Support vector machine is the supervised type of machine learning algorithm used for
classification. It uses hyper-plane to draw boundary lines to separate two classes. The main
objective of this algorithm is to draw boundary lines at maximum gap between two data points
from different classes.
Closest points to the boundary lines are called support vectors.

4
2.2
Figure 3.7 Support Vector Machine

There can be two approaches for classification of data Linear and Kernel. For linearly separable
data it is obvious that linear is the best approach but when data is not linearly separable Kernel
approach is the best option.

Process Flow:
2.3 Data collection:
2.4 Data is collected from UCI machine learning repository
2.5
2.6 Data Pre-processing:
This step consists of process which is converting raw data into useful form. Currently this data is
already in understandable form so we don’t need to pre-process data.
Data Filtering:
Data filtering is the process of filtering useful attributes. Initially there were 76 attributes in the
data but later data provider filtered the data and now it consists of 13 attributes which are enough
for heart disease prediction.
Algorithm:
After all completing all the procedures described above dataset can be fed to machine learning
algorithms.
Validation:
For training and testing of the models k-fold method is used.
In k-fold validation method we divide input data into k equal parts. We
have used 10-fold method in this method 1/10 will be used as testing of

4
model and rest will be used as training data.

Process Flow Diagram:

2.7

Figure 3.8 Process Flow

Summary
Using V process model it is plan that this project will be completed. For implementation
dataset from internet repository will be used along SVM, NB, KNN, LR. This chapter is
the detailed description of project plan.

4
Chapter 4
System Specification

4
3 Chapter 4: System Specification
Software Requirement Specification (SRS) as the name shows this chapter describes system
requirements. What is the system? What the system shall do or shall not do. What is the actual
system who will use the system? Better understanding of the project “system” result good
quality product. In this chapter all the above questions will be answered about Heart Disease
Prediction System which is a Decision Support System for consultants.

Creating SRS before implementation helps designer and developer to understand the problem
and achieve milestones without any hurdle. All the type requirements such as user
requirements, functional requirements, non-functional requirements all the constraints and the
information that will help during development is described in this chapter.

The purpose of this SRS is to specify the requirements of this project which needed to be
delivered as functionalities.

This document contains:

1. Business Requirements

2. User Requirements

3. Functional Requirements

4. Non-Functional Requirements

5. Use Cases

4.1Business Requirements

This project is for helping the consultants in diagnosis of the heart problem. The result
must be supportive. So accuracy is the most important requirement for this system.

4.1.1 User Requirements

 Database

 Registration form

 Login form

4
 Machine learning model

 Checkup system for non-admitted patients at main page

 Admit patient form


 View patient information
 Checkup system for admitted patients at patient info page
 Save diagnosis results
 Logout from system

4.2Process Flow

Figure 4.1 Process Flow

4.3Functional Requirements

FR-0: Data Base


FR-0-01: The system shall have an active connection with database
FR-0-02: The system shall have a table in DB for registration of new users with fields(username,
email, contact, hospital_id, rank, password)
FR-0-03: The system shall have a table in DB for admission of new patients with fields(name,

4
cnic, age, sex, doctor, total_checks, avg_results, positives, negatives, last_checkup)
FR-0-04: The system shall have a table in DB for storing each checkup

FR-1: Registration
FR-1-01: System shall provide a form for registration
FR-1-02: The system shall ask user to enter username
FR-1-03: The system shall ask user to enter email
FR-1-04: The system shall ask user to enter contact
FR-1-05: The system shall ask user to enter hospital id
FR-1-06: The system shall ask user to enter his/her rank in hospital
FR-1-07: The system shall ask user to enter password
FR-1-08: The system shall ask user to enter password for confirmation
FR-1-09: The system shall have a submit button for saving biodata form record into registered
user table
FR-1-10: The system shall have a link button to login page for already registered users.

FR-2: Login
FR-2-01: The system shall provide a form for login
FR-2-02: The system shall ask the user to enter his/her hospital id.
FR-2-03: The system shall ask the user to enter his/her password.
FR-2-04: The system shall have a login button.
FR-2-05: The system shall verify the user id and password from registered user table in DB.
FR-2-06: The system shall redirect to homepage on matching of credentials.
FR-2-07: The system shall not allow login on not matching of credentials.
FR-2-08: The system shall have a button/link to get reset user password in case of forgotten
password on login page.
FR-2-09: The system shall have a link button to registration page for registering new user.

FR-3: Checkup system for non-admitted patients at


main page

FR-3-01: The system shall have checkup system with machine learning model at homepage.
FR-3-02: The system shall ask the user to enter age (age)
FR-3-03: The system shall ask the user to enter gender (sex)
FR-3-04: The system shall ask the user to enter chest pain (cp)
FR-3-05: The system shall ask the user to enter blood pressure (trestbps)
FR-3-06: The system shall ask the user to enter cholesterol (chol)
FR-3-07: The system shall ask the user to enter blood sugar (fbs)
FR-3-08: The system shall ask the user to enter ecg (restecg)
FR-3-09: The system shall ask the user to enter maximum heart rate (thalac)
FR-3-10: The system shall ask the user to enter exercise induced angina (exang)
FR-3-11: The system shall ask the user to enter depression of ST wave (oldpeak)
FR-3-12: The system shall ask the user to enter slope of ST wave (slope)
FR-3-13: The system shall ask the user to enter colored vessels (ca)
FR-3-14: The system shall ask the user to enter damage to muscles (thal)
FR-3-15: The system shall have a submit button for processing input parameters
FR-3-16: The system shall check result using predict function of model
FR-3-17: The system shall display result on screen

4
FR-3-18: The system shall store parameters and result into the table in database

FR-4: Admit new patient

FR-4-01: System shall provide a form for admission of new patient


FR-4-02: The system shall ask the user to enter age of patient
FR-4-03: The system shall ask the user to enter name of patient
FR-4-04: The system shall ask the user to enter gender
FR-4-05: The system shall ask the user to enter cnic
FR-4-06: The system shall save inputs on submitting the form
FR-4-07: The system shall save admission date from system date into the table
FR-4-08: The system shall save doctor’s hospital id from logged in doctor into the table
FR-4-09: The system shall save value zero in total checks into the table
FR-4-10: The system shall save value zero in positives into the table
FR-4-11: The system shall save value zero in negative into the table
FR-4-12: The system shall save value zero in avg_result into the table
FR-4-13: The System shall save value null in last checkup into the table

FR-5: Checkup system for admitted patients

FR-5-01: The system shall have checkup system with machine learning model at patient info
page
FR-5-02: The system shall get the age (age) from database using patient cnic provided in url
FR-5-03: The system shall get the gender (sex) from database using patient cnic provided in url
FR-5-04: The system shall ask the user to enter chest pain (cp)
FR-5-05: The system shall ask the user to enter blood pressure (trestbps)
FR-5-06: The system shall ask the user to enter cholesterol (chol)
FR-5-07: The system shall ask the user to enter blood sugar (fbs)
FR-5-08: The system shall ask the user to enter ecg (restecg)
FR-5-09: The system shall ask the user to enter maximum heart rate (thalac)
FR-5-10: The system shall ask the user to enter exercise induced angina (exang)
FR-5-11: The system shall ask the user to enter depression of ST wave (oldpeak)
FR-5-12: The system shall ask the user to enter slope of ST wave (slope)
FR-5-13: The system shall ask the user to enter colored vessels (ca)
FR-5-14: The system shall ask the user to enter damage to muscles (thal)
FR-5-15: The system shall have a submit button for processing input parameters
FR-5-16: The system shall check result using predict function of model
FR-5-17: The system shall display result on screen
FR-5-18: The system shall store parameters and result into the table in database
FR-5-19: The system shall increment in total checks into the table
FR-5-20: The system shall increment in positives on positive result
FR-5-21: The system shall not increment in positives on negative result
FR-5-22: The system shall calculate percentage with updated values of record
FR-5-23: The System shall store percentage into the table
FR-5-24: The system shall update last check date to current date by taking from system date

FR-6: Display admitted Patient’s records


FR-6-01: The system shall display record of admitted patients

4
FR-7: Display Patient’s checkup dataset
FR-7-01: The system shall display patient’s checkup dataset

FR-8: Logout
FR-8-01: System shall allow user to logout from the system

Non-Functional Requirements
4.3.1 NFR-1:

NFR-1-01: The system shall available when needed


NFR-1-02: The system shall be capable of handling multiple users
NFR-1-03: The result given by the system must be authentic and understandable
NFR-1-04: The system must ensure security of the data
NFR-1-05: The system must respond quickly during checkup
NFR-1-06: The system shall have a maintainable database.
NFR-1-07: The system must ensure reliability

Use Cases:
Assumptions and Constraints
4.3.2 Development Languages and Tools

Server Side: p y t h o n , d j a n g o
DBMS: MySQL database
Client Side: bootstrap, html, css, javascript, jquery
Platform: VS Code, phpMyAdmin

4.3.3 Operating System


The system built in windows 10. It can be hosted on any web server.

4.4 Actors
The actor for this system:
 End Users (Doctor)

4.5 Use-Cases

UC-01 Register into the System


Use case Name: Registration
This use case describes how a user register into the System.
Actor: User
Pre- Opened the system in the browser
Condition:
Post- User must be registered into the system
Condition:
Flow: 1. User opens the registration page

4
2. enters his/her username
3. enters email
4. enters contact
5. enters hospital id
6. enters rank
7. enters password
8. confirms password
9. press the submit button to register
10. system registers him/her into the system
Alternate
Flow:

UC-02 Login into the system


Use case name: Login
This use case show how a user logs into this System.
Actor: User
Pre- User must have registered himself
Condition:
Post- On successful login system redirects user to homepage
Condition:
Flow: 1. User opens login page
2. User enters his/her hospital id
3. Enters his password
4. User press the submit button
5. System checks whether user is registered into the system
6. System redirect user to homepage when user is registered
Alternate 6.1 system did not redirect the user to homepage when user is not registered
Flow:

UC-03 Patient checkup on homepage


Use case name: Checking up the patient.
This use case describes how a user can use checkup system for heart attack prediction
Actor: Primary Actor: User
Secondary actor: Machine Learning Model
Pre- User must have logged into the system
Condition:
Post- System prompt the result
Condition:
Flow: 1. User enters age into age input field
2. User enters gender of patient into gender radio input field
3. User enters chest pain type into select input field
4. User enters blood pressure into input field
5. User enters cholesterol level into input field
6. User enters blood sugar into fbs radio input field
7. User enters max heart rate
8. User enters ecg result
9. User enters exang
10. User enters oldpeak result
11. User enters slope result

4
12. User enters ca result
13. User enters thal result
14. User press the submit button
15. System uses ML model to predict the result with user provided inputs
16. System saves the result
17. System prompts the result
Alternate 14.1 system asks user to fill all input fields if user leaves them empty
Flow:

UC-04 Admit new patient


Use case name: Admission of patient
This use case describes how a user admit new patient
Actor: User
Pre- User must be logged in.
Condition: User must have navigated to patient admission page.
Post- A patient record must be created
Condition:
Flow: 1. User enters the patient’s name
2. User enters the patient’s age
3. User enters the patient’s gender
4. User enters the patient’s cnic
5. User press the submit button
6. System creates a record into the patient table
Alternate 6.1 system asks user to fill all input fields if user leaves them empty
Flow:

UC-05 Check up for admitted patient


Use case name: Checking up the admitted patient
This use case describes how a user can checkup the admitted patient.
Actor: Primary Actor: User
Secondary actor: Machine Learning Model
Pre- User must be logged in.
Condition: User must have navigated to patient’s table page.
User must have selected the patient from table
Post- System prompt the result
Condition:
Flow: 1. User enters chest pain type into select input field
2. User enters blood pressure into input field
3. User enters cholesterol level into input field
4. User enters blood sugar into fbs radio input field
5. User enters max heart rate
6. User enters ecg result
7. User enters exang
8. User enters oldpeak result
9. User enters slope result
10. User enters ca result
11. User enters thal result
12. System retrieves the age of patient from database
13. System retrieves the gender of patient from database
14. User press the submit button
15. System uses ML model to predict the result with user provided inputs

4
16. System saves the result
17. System prompts the result
Alternate 14.1 system asks user to fill all input fields if user leaves them empty
Flow:

UC-06 Display admitted patient records


Use case name: Displaying records of admitted patients
This use case describes how a user can view record of his patients
Actor: User
Pre- User must be logged in.
Condition:
Post- User must be navigated to records page
Condition:
Flow: 1. User clicks the patient management list button
2. User selects the “My patients”
3. System navigates to patient records page
Alternate
Flow:

UC-07 Display patient’s checkup data


Use case name: Displaying patient’s checkup data
This use case describes how a user can view patient’s checkup data
Actor: User
Pre- User must be logged in.
Condition:
Post- User must be navigated to patient’s checkup data page
Condition:
Flow: 1. User clicks the patient management list button
2. User selects the “Patients Data”
3. System navigates to patient checkup data page
Alternate
Flow:

UC-08 Logout
Use case name: logging out
This use case describes how a user can logout from system
Actor: User
Pre- User must be logged in.
Condition:
Post- User must be logged out
Condition: Page must be navigated to login page
Flow: 1. User clicks the “Profile” list button
2. User selects the “Logout”
3. System logout the user
4. System redirects to login form
4.6 Use Case Modeling
4.6.1 Complete System Diagram

4
Figure 4. 2 Use Case Diagram
4.7 Traceability Matrix
3.1
3.2
3.3
3.4
4.8Behavioral Model
4.8.1 Sequence Diagram

4
Figure 4. 3 Sequence Diagram

3.5 Summary
This chapter is detail and description of architecture design, use-cases, FRs, NFRs and
their traceability. All the system specifications are given in this chapter in complete detail.
All the processes their working and flows are presented graphically, now it is easy for
everyone to understand the system

4
4
Chapter 5: System Design
Following chapter consists of diagrams of all use case, behavior and structural models.
These models are being explained in details in this chapter.

5.1 Software Design Goals


Performance

Performance is the one thing every object need to show. This is the website and the
number of simultaneous user can rise with time. This system should be able to give its best
performance in any kind of situation.

Usability

The most important thing for a software is that how much it is easy to use. The system
should be easy to understand and use. Its interface should be understandable. And user
must not get confused while surfing through it. Once a user goes through it, he should
never forget about its working and flows even after a long time.

Availability

As this system is web based it must be present in service 24/7. It should not take long time
in performing tasks like login, register, diagnosis etc.

Maintainable

While operating, if client wants to make some changes to the software, the software must
be flexible enough to accept changes without any problem.

5.2Data Model
Data Flow Diagram
5.2.1 Context Level

4
5.2.2 Level 1 DFD

Figure 5.3 Level 1 Data Flow Diagram


5.3 ER-Diagram

4
Figure 5.4 ER Diagram

5.4 Structural Model

Figure 5.5 Structural Model


5.5 Deployment Diagram

Figure 5.6 Deployment Diagram

Summary
System architecture design has important role in acceptance or rejection of a software

4
and it is the most important part of development. All non-functional requirements have
to be implemented while designing system. In this chapter the flow of system working
using sequence diagrams, behavioral model and deployment diagrams is described.

4
Chapter 6
Implementation

4
6.1 Block Diagrams
Following diagram describes the first part of this whole project which is comparative analysis. In this
part four model are trained using logistic regression, KNN, NB and SVM. Each model is trained
using the algorithms described earlier. These algorithms are coded in the form of class. The result of
every model evaluated using confusion matrix.

Figure 6.1 Block Diagram

Following diagram describes the second part of this whole project which is the web based
application for heart disease prediction. This web based application provides the facility to register
user, login to their accounts, predict heart disease by providing data parameters as input. Moving
further user can add a patient and manage them according to their history created by system.

Figure 6.2 Block Diagram

6.2 Images
6.2.1 Login form

4
6.2.2 Registration form

4
6.2.3 Full Homepage

4
4
6.2.4 Add Patient Page

4
6.2.5 Patient History And New Checkup page

4
6.2.6 User’s Profile Setting Page

4
4
4
Chapter 7
Software Testing

4
4 Chapter 7: Software Tests Processing
This chapter consists of the testing results of this project. Blackbox testing technique is
used for this purpose. To ensure the quality, testing is important. All the input forms are
tested and their results are attached in form of screenshots.

7.1Test Case 01: LOGIN


4.1
This test case covers the login process, and will be executed whenever user try to access
account. Following test case is for confirmation that user did not try to

 Provide invalid username or password.


 Left field empty.

Screenshots:

4
Test case:

Test Pre condition Actions Expected Result


Result
case ID
Login- User shall have registered User Enter valid Login successful. Pass
01 himself. hospital_id and Redirection to the
Must have valid Password and clicks the homepage.
hospital_id and password. login button
Login- Enter hospital_id and Password field is Pass
02 no password and press required.”Please fill
login button out this field”
Login- Invalid hospital_id and “Invalid hospital_id or Pass
password”.
03 valid password and
press
login button
Login- Both fields are empty Hospital_id field is Pass
required.”Please fill
04 and press login button
out this field”
Login- Invalid hospital_id and “Invalid hospital_id or Pass
password”.
05 password entered and
press login button
Table 8.1 Login Test Case

4.2 7.1Test Case 02: Register


4.3
This is the test case that covers register user process. User will enter username,
email, hospital_id, password, confirm password, rank, contact.

7.1.1 User has entered data in all the required fields.


7.1.2 Special characters, spaces are not allowed in Username.
7.1.3 Both password and confirm password are same or not.
7.1.4 Account with existing hospital_id.

Table

4
Screenshots:

4
Test case:

Test case Pre condition Actions Expected Result Test


ID Result
Signup- User shall have active Enter username in No issue Passed
01 internet connection. character form. found
User must have valid
hospital_id number.
Signup- Enter username in Pass
“Please match the
alphanumeric form.
02 requested format”
Signup- Enter password 1234, Pass
“Please Fill Both
Confirm password 123
03 Password Fields
with same
password”
Signup- Enter hospital_id which “A User Exists Pass
already exists in With Same
04
database. Hospital
Identification!”
Signup- Leave fields empty. Pass
”Please fill out
05 this field”
Signup- Click Save Button. Account Pass
06 Registered.

Table 7. 2 Register Test Case

4
7.2Test Case 03: Admit Patient
4.4
This test covers patient admission process. User will enter cnic, name, age and sex.
7.2.1 User has enter all the required fields. And no mandatory field left empty.
7.2.2 Patient’s record with existing cnic.
7.2.3 All input fields must check input according to the pattern.

Table

Screenshots:

4
Test Case:

Test case Pre condition Action Expected Result Test


ID Result
Admit- User shall have logged- Leaves fields empty No issue Passed
01 in into his/her account. found

Admit-02 Enter cnic which exists Pass


“A Patient Exists
already in database.
With CNIC”
Admit-03 Enter cnic in name field, Pass
Should not accept
Enter name in cnic field

Admit-04 Enter numbers in name Pass


Should not accept
field

Admit-05 Enter hospital_id which “A User Exists With Pass


already exists in database. Same Hospital
Identification!”
Admit-06 Enter 9 in age input Should not accept Pass

Admit-07 Click Save Button. Pass


Patient Admitted.

Table 8.3 Admit Patient Test Case


7.3Test Case 04: Patient Check UP
4.5

4
This test case covers patient’s checkup process. User will enter input parameters in
form fields. In this use case it will be tested that
7.3.1 User has filled all the required fields.
7.3.2 Patient Check Up completed or not.
7.3.3 Does inputs accept less than minimum values or greater than maximum values, or not.
7.3.4 Character inputs in numeric input fields.

Table

Screenshots:

4
Test case:

Test case Pre condition Actions Expected Result Test


ID Result
Checkup- User must have logged in User press the process button Required Passed
01 into his/her account. when fields are empty fields.
Checkup-02 User entered 9 in age input. Passed
“Please match the
requested format”.
User entered characters in Passed
Checkup-03 “Please match the
age input.
requested format”.
User entered all inputs Check Up results Passed
Checkup-04
according to required format must be displayed.
and press the Proceed Button.
Table 8.4 Patient Check UP Test Case

4
Chapter 8
Conclusion

4
5 Chapter 8: Conclusion

This thesis write-up is the detailed description of project “Comparative Analysis of


Machine Learning Algorithm for Heart Disease prediction”. Heart Disease Prediction
which has web application that provides the facility to predict / find heart problem. This
thesis has seven chapters which are describing the system and process execution from start
to end. Introduction is given in first chapter. Second Chapter contains reviews from related
work. Third chapter describes methodologies and process models. Fourth chapter is main
part of SRS. Chapter five, six, seven belongs to system design and implementation. This
thesis describes that this project has two parts. The first is the analysis part. In this part
four models are trained using four machine learning algorithms SVM, NB, KNN and
Logistic Regression. The model with best results is used in second part of this project. The
second part is the application. The second part which is web-based system for heart disease
prediction aims to help consultants in reducing time and save patient’s money. This system
can be deployed in any heart hospital and clinic where patients stay for few days during
their treatment. This application is for adult patients only. A patient below ten years is
considered to be a child because child patients are mostly referred to children hospitals.

The main objective of this project was to compare the algorithms using statistical
measures. And making a system which help the doctors in diagnosing of heart disease.

Results:

For comparison confusion matrix is used. Measures like true positive, true negative, false,
false negative are collected from confusion matrix. After finding these measures, accuracy
measures Specificity, Sensitivity and F1-score are calculated for each fold. The formulas
are:

Specificity = TN / (TN+FP)

Sensitivity = True Positive / (True Positive + False Negative)

Precision = True Positive / (True Positive + False Positive)

F1 Score = 2 * (Sensitivity * Precision) / (Sensitivity + Precision)

4
Model Sensitivit Specificity F1-Score
y
Logistic 0.50 0.81 0.61
Regressio
n
KNN 0.72 0.54 0.68
NB 0.84 0.76 0.82
SVM 0.88 0.74 0.84
In the above table all the accuracy measures rates are described. From this table it can be
concluded that SVM is the algorithm which is the best choice for heart disease prediction.
Because SVM has best accuracy rates among all the algorithms.

User Guide
User Manual:
For using this application user need to perform these task:

1. Register into the system.

1. Enter data in the form fields.

2. Username field accepts only text input.

3. Email follows [email protected] format.

4. If both passwords does not match system will highlight them

5. If server does not responds on form submission due to any type o wrong input
user may have to fill the form again for registration.

2. Login

1. Enter hospital id

2. Enter password

3. Press login button

3. Main body contains top bar for navigation buttons. User need to press the navigation
link buttons for redirecting to respected page.

4
4. Homepage contains the form for checkup of outdoor patients.

5. User have to fill that form for diagnosis and decision support.

1. Age input accepts value from 10 to 99. Value lower than 10 and higher than 99
may cause some error in execution.

2. User cannot leave any input field empty.

3. Cholesterol, Max-Heart Rate, Blood pressure, old peak, these input field has
minimum and maximum values set which is described below these input fields
lower than minimum and higher than maximum would not be accepted or may
cause error.

4. When all inputs are given properly a user can press Proceed button to get result.

5. For Showing result an alert message will appear.

6. It should be known by the user that all the inputs and check up results are been
stored into the database along with their hospital id and date.

6. User can add patient into the system

1. User have to fill patient admission form for this purpose

2. Name input field accepts only text input.

3. Minimum age allowed to enter is 10 years a patient below ten years is not
allowed to be admitted and added in the system.

4. System saves the record on pressing admit button.

7. User can use diagnosis system for admitted patients.

1. User didn’t need to enter age and sex of patient manually. These values are
automatically retrieved from database.

2. User have to fill other input fields for further processing.

3. On pressing Proceed button system shows alert message with result.

4. User have to know that the result and input parameters are saved on pressing
button along with current date, patient cnic, and doctor hospital id.

4
5. System calculates the averages of previous inputs and display them on patient
info page.

4
6 References

[1] Alotaibi, Fahd Saleh. "Implementation of machine learning model to predict heart failure
disease." (IJACSA) International Journal of Advanced Computer Science and Applications 10.6
(2019).
[2] Vijayarani, S., and S. Sudha. "Comparative analysis of classification function techniques for heart
disease prediction." International Journal of Innovative Research in Computer and Communication
Engineering 1.3 (2013): 735-741.
[3] Khan, Sundas Naqeeb, et al. "Comparative analysis for heart disease prediction." JOIV:
International Journal on Informatics Visualization 1.4-2 (2017): 227-231.
[4] Hariharan, K., et al. "A comparative study on heart disease analysis using classification
techniques." (2018).
[5] Tarun, Avni Sharma1 Deeksha Tyagi2 Dr, and Kumar Gupta. "Comparative Analysis of Machine
Learning Techniques in Heart Disease Prediction by R Language.”
Machine learning course. “https://www.coursera.org/learn/machine-
learning/home/welcome” (Started November-2019).
For SVM. “https://towardsdatascience.com/support-vector-machine-introduction-to-machine-
learning-algorithms-934a444fca47”.
For Logistic Regression. “https://towardsdatascience.com/logistic-regression-detailed-
overview-46c4da4303bc”, “https://machinelearningmastery.com/logistic-regression-for-
machine-learning/”.
For KNN “https://www.youtube.com/watch?v=6kZ-OPLNcgE”.
For NB “https://www.youtube.com/watch?v=vz_xuxYS2PM”.
Django documentation “https://docs.djangoproject.com/en/3.0/”.
Bootstrap links “https://mdbootstrap.com/md-bootstrap-cdn/”.
Dataset “http://archive.ics.uci.edu/ml/datasets/Heart+Disease ”.

You might also like