0% found this document useful (0 votes)

104 views5 pages

Univariate and Multivariate Analysis - Jupyter Notebook

This document discusses univariate and multivariate analysis of diabetes patient data. It performs summary statistics on the data, calculates correlations between variables, builds a linear regression model to predict diabetes outcome, and generates box and scatter plots. The analysis finds that higher BMI and glucose levels are associated with increased diabetes risk, while age also positively correlates with outcome.

Uploaded by

AnuvidyaKarthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views5 pages

Univariate and Multivariate Analysis - Jupyter Notebook

Uploaded by

AnuvidyaKarthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

3/3/22, 6:59 PM Univariate and Multivariate Analysis - Jupyter Notebook

In [18]:

options(repr.plot.width=8, repr.plot.height = 4, repr.plot.res = 200) #setting R environme

In [19]:

library(tidyverse) #for data manipulation and visualization

In [20]:

data <- read.csv(url("https://datalifex.in/dataml/diabetes.csv"))

In [21]:

str(data)

'data.frame': 768 obs. of 9 variables:

$ Pregnancies : int 6 1 8 1 0 5 3 10 2 8 ...

$ Glucose : int 148 85 183 89 137 116 78 115 197 125 ...

$ BloodPressure : int 72 66 64 66 40 74 50 0 70 96 ...

$ SkinThickness : int 35 29 0 23 35 0 32 0 45 0 ...

$ Insulin : int 0 0 0 94 168 0 88 0 543 0 ...

$ BMI : num 33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5
0 ...

$ DiabetesPedigreeFunction: num 0.627 0.351 0.672 0.167 2.288 ...

$ Age : int 50 31 32 21 33 30 26 29 53 54 ...

$ Outcome : int 1 0 1 0 1 0 1 0 1 1 ...

In [22]:

head(data)

A data.frame: 6 × 9

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFuncti

<int> <int> <int> <int> <int> <dbl> <db

1 6 148 72 35 0 33.6 0.6

2 1 85 66 29 0 26.6 0.3

3 8 183 64 0 0 23.3 0.6

4 1 89 66 23 94 28.1 0.1

5 0 137 40 35 168 43.1 2.2

6 5 116 74 0 0 25.6 0.2

Checking for NULL values

localhost:8888/notebooks/Univariate and Multivariate Analysis.ipynb# 1/5

3/3/22, 6:59 PM Univariate and Multivariate Analysis - Jupyter Notebook

In [23]:

colSums(is.na(data))

Pregnancies: 0 Glucose: 0 BloodPressure: 0 SkinThickness: 0 Insulin: 0 BMI: 0

DiabetesPedigreeFunction: 0 Age: 0 Outcome: 0

Univariate Analysis
In [24]:

data$Outcome<-as.factor(data$Outcome)
summary(data)

Pregnancies Glucose BloodPressure SkinThickness

Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00

1st Qu.: 1.000 1st Qu.: 99.0 1st Qu.: 62.00 1st Qu.: 0.00

Median : 3.000 Median :117.0 Median : 72.00 Median :23.00

Mean : 3.845 Mean :120.9 Mean : 69.11 Mean :20.54

3rd Qu.: 6.000 3rd Qu.:140.2 3rd Qu.: 80.00 3rd Qu.:32.00

Max. :17.000 Max. :199.0 Max. :122.00 Max. :99.00

Insulin BMI DiabetesPedigreeFunction Age

Min. : 0.0 Min. : 0.00 Min. :0.0780 Min. :21.00

1st Qu.: 0.0 1st Qu.:27.30 1st Qu.:0.2437 1st Qu.:24.00

Median : 30.5 Median :32.00 Median :0.3725 Median :29.00

Mean : 79.8 Mean :31.99 Mean :0.4719 Mean :33.24

3rd Qu.:127.2 3rd Qu.:36.60 3rd Qu.:0.6262 3rd Qu.:41.00

Max. :846.0 Max. :67.10 Max. :2.4200 Max. :81.00

Outcome

0:500

1:268

Bivariate Analysis
In [34]:

data2 <- read.csv(url("https://datalifex.in/dataml/diabetes.csv"))

localhost:8888/notebooks/Univariate and Multivariate Analysis.ipynb# 2/5

3/3/22, 6:59 PM Univariate and Multivariate Analysis - Jupyter Notebook

In [36]:

cor(data2)

A matrix: 9 × 9 of type dbl

Pregnancies Glucose BloodPressure SkinThickness Insulin

Pregnancies 1.00000000 0.12945867 0.14128198 -0.08167177 -0.07353461

Glucose 0.12945867 1.00000000 0.15258959 0.05732789 0.33135711

BloodPressure 0.14128198 0.15258959 1.00000000 0.20737054 0.08893338

SkinThickness -0.08167177 0.05732789 0.20737054 1.00000000 0.43678257

Insulin -0.07353461 0.33135711 0.08893338 0.43678257 1.00000000

BMI 0.01768309 0.22107107 0.28180529 0.39257320 0.19785906

DiabetesPedigreeFunction -0.03352267 0.13733730 0.04126495 0.18392757 0.18507093

Age 0.54434123 0.26351432 0.23952795 -0.11397026 -0.04216295

Outcome 0.22189815 0.46658140 0.06506836 0.07475223 0.13054795

Correlation Indicates
->The direction of the relationship between the 2 variables
->The strength of the
relationship between the 2 variables

Regarding the direction of the relationship: On the one hand, a negative correlation implies that the two
variables under consideration vary in opposite directions, that is, if a variable increases the other decreases and
vice versa. On the other hand, a positive correlation implies that the two variables under consideration vary in
the same direction, i.e., if a variable increases the other one increases and if one decreases the other one
decreases as well.

Regarding the strength of the relationship: The more extreme the correlation coefficient (the closer to -1 or 1),
the stronger the relationship. This also means that a correlation close to 0 indicates that the two variables are
independent, that is, as one variable increases, there is no tendency in the other variable to either decrease or
increase.

localhost:8888/notebooks/Univariate and Multivariate Analysis.ipynb# 3/5

3/3/22, 6:59 PM Univariate and Multivariate Analysis - Jupyter Notebook

In [41]:

lm(Outcome ~ Pregnancies+Glucose+BloodPressure+SkinThickness+Insulin+BMI+DiabetesPedigreeFu

Warning message in model.response(mf, "numeric"):

"using type = "numeric" with a factor response will be ignored"

Warning message in Ops.factor(y, z$residuals):

"'-' not meaningful for factors"

Call:

lm(formula = Outcome ~ Pregnancies + Glucose + BloodPressure +

SkinThickness + Insulin + BMI + DiabetesPedigreeFunction +

Age, data = data)

Coefficients:

(Intercept) Pregnancies Glucose

0.1461057 0.0205919 0.0059203
BloodPressure SkinThickness Insulin
-0.0023319 0.0001545 -0.0001805
BMI DiabetesPedigreeFunction Age
0.0132440 0.1472374 0.0026214

In [ ]:

Box Plot
In [25]:

p1 <- ggplot(data, aes(x=Outcome, y=BMI, fill=Outcome)) + geom_boxplot()

print(p1)

INFERENCE: here 0 represents that diabetes is not present and 1 represents that diabetes is present It is clear
from the plot that people with less BMI are less prone to Diabetes that is people who are obese have more
diabetes

localhost:8888/notebooks/Univariate and Multivariate Analysis.ipynb# 4/5

3/3/22, 6:59 PM Univariate and Multivariate Analysis - Jupyter Notebook

Scatter Plot
In [29]:

p2 <- ggplot(data, aes(x=Age, y=Glucose, col=Outcome)) + geom_point()

#loess method:local regression fitting

p2 + geom_smooth(method="loess")

`geom_smooth()` using formula 'y ~ x'

INFERENCE: The blue line indicates presence of Diabetes and it is clearly seen that the Glucose level for that
is high compared to the case of Non-diabetes

In [ ]:

localhost:8888/notebooks/Univariate and Multivariate Analysis.ipynb# 5/5

Pset 6 - Fall2019 - Solutions PDF
100% (3)
Pset 6 - Fall2019 - Solutions PDF
33 pages
Mba Mid-Term 4 - Solutions
No ratings yet
Mba Mid-Term 4 - Solutions
200 pages
DAGGER: A New Approach To Combining Multiple Models Learned From Disjoint Subsets
No ratings yet
DAGGER: A New Approach To Combining Multiple Models Learned From Disjoint Subsets
16 pages
Diabetes
No ratings yet
Diabetes
97 pages
Final Paper
No ratings yet
Final Paper
77 pages
Pima Indian Diabetes Data Analysis in Python - Canopus Business Management Group
No ratings yet
Pima Indian Diabetes Data Analysis in Python - Canopus Business Management Group
21 pages
09ClassAdvanced
No ratings yet
09ClassAdvanced
64 pages
fds1
No ratings yet
fds1
44 pages
SSRN 1571891
No ratings yet
SSRN 1571891
39 pages
Do Competitors' Financial Constraints Affect Corporate Disclosure
No ratings yet
Do Competitors' Financial Constraints Affect Corporate Disclosure
48 pages
linear_merged_pagenumber
No ratings yet
linear_merged_pagenumber
48 pages
diabetes_test report
No ratings yet
diabetes_test report
62 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
35 pages
vertopal.com_python2025
No ratings yet
vertopal.com_python2025
25 pages
مختار النعيري - The Course Work Submission (1)
No ratings yet
مختار النعيري - The Course Work Submission (1)
31 pages
ML Proj Diabetes.pptx
No ratings yet
ML Proj Diabetes.pptx
51 pages
1.6*_Goodness of fit tests 2
No ratings yet
1.6*_Goodness of fit tests 2
37 pages
Stat 520 CH 7 Slides
No ratings yet
Stat 520 CH 7 Slides
35 pages
21BCE9757 ITT Summer Internship AI ML Report
No ratings yet
21BCE9757 ITT Summer Internship AI ML Report
18 pages
diabetes-prediction-using-machine-learning
No ratings yet
diabetes-prediction-using-machine-learning
16 pages
BBA 504 Research-Methodology
No ratings yet
BBA 504 Research-Methodology
39 pages
second slideshow
No ratings yet
second slideshow
15 pages
Aishwarya K S
No ratings yet
Aishwarya K S
15 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Documentation Code
No ratings yet
Documentation Code
20 pages
Pythone code for predicting diabetes using ML
No ratings yet
Pythone code for predicting diabetes using ML
18 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
20 pages
Diabetes_Prediction_1704256341
No ratings yet
Diabetes_Prediction_1704256341
17 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
Thesis proposal
No ratings yet
Thesis proposal
12 pages
ML Practical 04
No ratings yet
ML Practical 04
20 pages
Business Statistics: A Decision-Making Approach: Analysis of Variance
No ratings yet
Business Statistics: A Decision-Making Approach: Analysis of Variance
14 pages
5. Implementation of PL-TaRL to Improve Vocational Student Learning Outcomes
No ratings yet
5. Implementation of PL-TaRL to Improve Vocational Student Learning Outcomes
18 pages
Logidtic_Regression_ASSIGNMENT
No ratings yet
Logidtic_Regression_ASSIGNMENT
13 pages
Diabetes EDA and Kears Modeling
No ratings yet
Diabetes EDA and Kears Modeling
26 pages
Diabetes
No ratings yet
Diabetes
10 pages
Kailash ML Report
No ratings yet
Kailash ML Report
51 pages
Revision Guideline and Solved Problems JAN2018
No ratings yet
Revision Guideline and Solved Problems JAN2018
24 pages
E_AI_Lab_EX_2and_3
No ratings yet
E_AI_Lab_EX_2and_3
9 pages
M Arch Syllabus
No ratings yet
M Arch Syllabus
13 pages
healthcare-project-simplilearn- Week2
No ratings yet
healthcare-project-simplilearn- Week2
8 pages
Diabetes
No ratings yet
Diabetes
7 pages
22IM30025 Prakriti Assign 02 Stl Lab
No ratings yet
22IM30025 Prakriti Assign 02 Stl Lab
9 pages
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
No ratings yet
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
10 pages
Mean Vector and Correlation Matrix in R - Jupyter Notebook
No ratings yet
Mean Vector and Correlation Matrix in R - Jupyter Notebook
7 pages
Data Science Practical 9
No ratings yet
Data Science Practical 9
6 pages
Cia 2 ML 2348352
No ratings yet
Cia 2 ML 2348352
6 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
222ECO01 Anand Advanced Econometrics Activity1
No ratings yet
222ECO01 Anand Advanced Econometrics Activity1
6 pages
Project 190
No ratings yet
Project 190
6 pages
(2015) A Quantitative Approach To Information Systems Audit in Small and Medium Enterprises
No ratings yet
(2015) A Quantitative Approach To Information Systems Audit in Small and Medium Enterprises
7 pages
Business Analytics Assingment: Neha Singh
No ratings yet
Business Analytics Assingment: Neha Singh
7 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
8.Perform Correlation and scatter plots (1)
No ratings yet
8.Perform Correlation and scatter plots (1)
5 pages
FUNDAMENTALS OF BUSINESS ANALYTICS
No ratings yet
FUNDAMENTALS OF BUSINESS ANALYTICS
5 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Pima
No ratings yet
Pima
5 pages
Project
No ratings yet
Project
8 pages
EDA ASS-1 Data Collection
No ratings yet
EDA ASS-1 Data Collection
5 pages
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
No ratings yet
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
8 pages
Unit5 - Logistic Regression
No ratings yet
Unit5 - Logistic Regression
4 pages
Year 10 Statistics Coursework
100% (2)
Year 10 Statistics Coursework
8 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Logistic - Ipynb - Colaboratory
No ratings yet
Logistic - Ipynb - Colaboratory
6 pages
Abdul Mateen
No ratings yet
Abdul Mateen
6 pages
healthcare-project-simplilearn- Week1
No ratings yet
healthcare-project-simplilearn- Week1
6 pages
Exp 5
No ratings yet
Exp 5
7 pages
Diabetics Data Set
No ratings yet
Diabetics Data Set
4 pages
IB Biology Lab Manual
100% (4)
IB Biology Lab Manual
87 pages
QA Index
No ratings yet
QA Index
8 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
4 pages
Case Study - Healthcare Industry
No ratings yet
Case Study - Healthcare Industry
2 pages
Capstone Project 2
No ratings yet
Capstone Project 2
15 pages
Applied Probability and Statistics Unit I:Random Variables
No ratings yet
Applied Probability and Statistics Unit I:Random Variables
10 pages
Edition, Gravetter and Wallnau (2006) : Wadsworth.: Tentative Schedule
No ratings yet
Edition, Gravetter and Wallnau (2006) : Wadsworth.: Tentative Schedule
5 pages
KNN For Classification
No ratings yet
KNN For Classification
4 pages
Curriculum Map Template 1 Subject: Mathematics Grade Level: 10 Teacher (S) : Strand (S)
No ratings yet
Curriculum Map Template 1 Subject: Mathematics Grade Level: 10 Teacher (S) : Strand (S)
2 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
Statistics: 1-Completely Randomized Design
No ratings yet
Statistics: 1-Completely Randomized Design
20 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Pima Indians Diabetes Database Analysis - Kaggle
No ratings yet
Pima Indians Diabetes Database Analysis - Kaggle
37 pages
ML Minor May
No ratings yet
ML Minor May
5 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
1 page
Artificial Intelligence Methodology For Definitions and For Hypotheses.
100% (1)
Artificial Intelligence Methodology For Definitions and For Hypotheses.
8 pages
Diabetic Prediction Using LogicalRegression
No ratings yet
Diabetic Prediction Using LogicalRegression
9 pages
Krishna Institute of Nursing Science & Research
No ratings yet
Krishna Institute of Nursing Science & Research
369 pages
Stat 211 - Digital Assignment 2-2017
No ratings yet
Stat 211 - Digital Assignment 2-2017
4 pages
Strengthening The Livelihood of Chepang People Vulnerable To Biodiversity Losses in Chitwan District, Nepal
No ratings yet
Strengthening The Livelihood of Chepang People Vulnerable To Biodiversity Losses in Chitwan District, Nepal
13 pages
Understanding Diabetes and Glycemic Index
From Everand
Understanding Diabetes and Glycemic Index
Jeannine Hill
No ratings yet
Eat Millets : The Ancient Superfood That Heals Diabetes, Obesity, Cancer & More
From Everand
Eat Millets : The Ancient Superfood That Heals Diabetes, Obesity, Cancer & More
Maitreya
No ratings yet