Project Documentation
Project Documentation
A PROJECT REPORT ON
BY
THE DEGREE OF
MARCH 2023-2024
S.I.E.S COLLEGE OF ARTS, SCIENCE AND COMMERCE (AUTONOMOUS),
SION(W)
MUMBAI – 400 022.
CERTIFICATE
This is to certify that the project entitled “Cardio Guardian Pro - Heart Disease
Prediction System” is a bona fide work of
Mr. Mahesh Velmaiel Dravidar bearing Roll No. SMCS2324008 submitted to the
college for partial fulfilment for the post- graduation degree in Master of Science
(Computer Science), academic year 2023-24.
College Seal
Acknowledgment
Abstract
Project Objectives
I.Technical Outcomes:
Software Requirements:
IDE : PyCharm
Front-end : HTML5, CSS3, JavaScript
Web Framework : Flask
Python Package : NumPy, Pandas, Matplotlib and
Seaborn, Scikit-learn, Flask
Back-end : Flask
Database : SQLite
STAKEHOLDERS
11. Legal and Compliance Teams: Legal experts who may need to
ensure that the project complies with data protection laws and
regulations, especially if patient data is involved.
3. Model Implementation:
Select the algorithm with the highest predictive accuracy.
Implement the chosen algorithm in a user-friendly application that
allows users to input their health data and receive a heart disease risk
prediction.
7. Deployment:
Deploy the web-based heart disease prediction system to a secure and
scalable environment. Ensure that the deployment adheres to best
practices for web application hosting and maintenance.
Key Attributes:
2. Random Forest:
Random Forest is a versatile algorithm for heart disease prediction.
It builds a forest of decision trees, each trained on a random subset
of the dataset, and combines their outputs to make predictions. In
this context, Random Forest assesses heart disease risk by
considering various patient attributes, such as age, gender, and
medical history, to make informed predictions. It handles complex
interactions between features and provides feature importance
scores, helping identify key risk factors. Random Forest's ability to
handle both categorical and numerical data is valuable for
comprehensive heart disease prediction.
3. Logistic Regression:
Logistic Regression is a fundamental algorithm for binary
classification, making it well-suited for heart disease prediction. It
models the probability of a patient having heart disease based on
their health attributes. Logistic Regression's coefficients reveal the
impact of each feature on the likelihood of disease, aiding in risk
factor identification. It's interpretable and can provide insights into
which patient characteristics contribute most to the prediction.
4. Support Vector Machine (SVM):
SVM is another effective algorithm for heart disease prediction. It
finds a hyperplane that best separates patients with and without
heart disease, maximizing the margin between the two classes.
SVM is robust in handling high-dimensional data, making it
suitable for heart disease datasets with numerous features. It can
capture complex decision boundaries, which is advantageous when
dealing with non-linear relationships between risk factors.
5. Decision Tree:
Decision Trees are interpretable models often used in heart disease
prediction. Each node in the tree represents a feature, and branches
correspond to different feature values. In this context, a Decision
Tree creates a transparent decision-making process for assessing
heart disease risk.
2. Naive Bayes:
• Accuracy: 85.37%
• Cross Validation Mean: 81.83%
• AUC-ROC Score: 0.9311
3. Random Forest:
• Accuracy: 94.63%
• Cross Validation Mean: 90.24%
• AUC-ROC Score: 0.9924
4. K-Neighbors Classifier:
• Accuracy: 87.80%
• Cross Validation Mean: 84.02%
• AUC-ROC Score: 0.9468
➢ Algorithm Implementation:
3. Hyperparameter Tuning:
• To optimize the performance of the SVC, hyperparameter
tuning was conducted. This involved adjusting parameters
like the kernel type, regularization parameter (C), and the
kernel coefficient.
• Grid search and cross-validation techniques were
employed to find the optimal combination of
hyperparameters.
4. Model Evaluation:
• The performance of the SVC was assessed using various
metrics, including accuracy, sensitivity, specificity, and the
AUC-ROC score.
• Cross-validation techniques were crucial in ensuring that
the model's performance was robust and generalizable to
new, unseen data.
➢ User-Friendly Application Development:
1. Input Form:
• A clear and concise input form prompted users to enter
relevant health information, such as age, sex, blood
pressure, cholesterol levels, and exercise habits.
• Input fields were accompanied by helpful tooltips and
examples, ensuring users understood the type of
information required.
2. Prediction Results:
• Upon submitting the input form, users were presented with
a visually appealing and easy-to-interpret display of the
heart disease risk prediction.
• Results included a probability score and a binary
prediction (presence or absence of heart disease).
3. Accessibility and Responsiveness:
• The application was designed to be accessible across
various devices, ensuring a responsive layout that adapts to
different screen sizes.
• This consideration aimed at maximizing the application's
reach and usability.
Confusion Matrix:
Insights:
TRAINING DATASET
Sequence Diagram:
Data Flow Diagram:
Use Case:
Conclusion
1. Algorithmic Comparison:
Our research involved a meticulous comparison of seven machine
learning algorithms: Logistic Regression, Naive Bayes, Random
Forest, K-Neighbors Classifier, Decision Tree Classifier, and
Support Vector Classifier. Each algorithm brought its unique
strengths to the table, and the comparison was rooted in robust
evaluation metrics such as Accuracy, Sensitivity, Specificity, and
the Area Under the Receiver Operating Characteristic Curve (AUC-
ROC).
Performance Metrics:
1. Logistic Regression:
• Accuracy: 86.34%
2. Naive Bayes:
• Accuracy: 85.37%
3. Random Forest:
• Accuracy: 94.63%
4. K-Neighbors Classifier:
• Accuracy: 87.80%
2. Personalized Healthcare:
• The web-based application facilitates personalized risk
predictions based on individual health data.
• Healthcare providers can leverage these predictions to
tailor interventions and treatment plans, moving towards a
more patient-centric healthcare approach.
3. Resource Optimization:
• Accurate predictive models assist healthcare systems in
optimizing resource allocation.
• By identifying individuals at higher risk, resources can be
directed towards targeted screenings, consultations, and
preventive measures, streamlining healthcare delivery.
4. Empowering Individuals:
• Providing individuals with accessible tools to assess their
heart disease risk empowers them to proactively manage
their health.
• The user-friendly interface enhances health literacy,
fostering a sense of responsibility and engagement in one's
well-being.
References
https://www.analyticsvidhya.com/blog/2022/02/heart-disease-
prediction-using-machine-learning-2/
https://www.geeksforgeeks.org/ml-heart-disease-prediction-using-
logistic-regression/
https://github.com/topics/heart-disease