0% found this document useful (0 votes)

7 views

Exp 8_LM

The document outlines Experiment No 8, which focuses on data visualization using bivariate analysis in Python. It covers the aim, learning outcomes, prerequisites, and necessary materials, along with a detailed introduction to bivariate analysis, data cleaning, and visualization techniques. Additionally, it includes a case study on bank telemarketing campaign data, providing step-by-step instructions for data preparation, analysis, and interpretation of results.

Uploaded by

bwubtd22360

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Exp 8_LM

Uploaded by

bwubtd22360

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Experiment No 8

8.1 Aim/Purpose of the Experiment

To familiarize the students with data visualization using two feature variables.

8.2 Learning Outcomes

Knowledge of the Data cleaning, Data preparation and data visualization using bivariate
analysis in python.

8.3 Prerequisites

Basic knowledge of programming, python syntax, matplotlib, seaborn, different libraries.

8.4 Materials/Equipment/Apparatus / Devices/Software required

Jupyter Notebook.

8.5 Introduction and Theory

Bivariate analysis is a statistical method used to examine the relationship between two
variables. In Python, you can perform bivariate analysis using libraries such as NumPy,
Pandas, and Matplotlib/Seaborn for data manipulation, analysis, and visualization. Here's a
brief outline of the process:

 Data Preparation: Load your dataset into a Pandas DataFrame and clean/preprocess
the data if necessary. Ensure that the two variables of interest are numeric or can be
appropriately converted into numeric format.

 Descriptive Analysis: Compute descriptive statistics for each variable separately using
methods like mean, median, standard deviation, etc. This provides initial insights into
the characteristics of the variables.

 Visualization: Create visualizations to explore the relationship between the two

variables. Common plots for bivariate analysis include scatter plots, line plots, box
plots, and correlation matrices. Seaborn is particularly useful for creating attractive
statistical visualizations.

 Correlation Analysis: Calculate the correlation coefficient between the two variables
to measure the strength and direction of the linear relationship. Pearson correlation
coefficient is commonly used for this purpose.
Case Study:
Term deposits also called fixed deposits, are the cash investments made for a specific time
period ranging from 1 month to 5 years for predetermined fixed interest rates. The fixed
interest
rates offered for term deposits are higher than the regular interest rates for savings accounts.
The customers receive the total amount (investment plus the interest) at the end of the
maturity
period. Also, the money can only be withdrawn at the end of the maturity period.
Withdrawing
money before that will result in an added penalty associated, and the customer will not
receive
any interest returns.
Your target is to do end to end EDA on this bank telemarketing campaign data set to infer
knowledge that where bank has to put more effort to improve it's positive response rate.

Bivariate Analysis

#import the warnings.

import warnings
warnings.filterwarnings("ignore")
#import the useful libraries.
import pandas as pd, numpy as np
import matplotlib.pyplot as plt, seaborn as sns
%matplotlib inline

Session- 2, Data Cleaning

Segment- 2, Data Types
There are multiple types of data types available in the data set. some of them are numerical
type
and some of categorical type. You are required to get the idea about the data types after
reading
the data frame.
Following are the some of the types of variables:
• Numeric data type: banking dataset: salary, balance, duration and age.
• Categorical data type: banking dataset: education, job, marital, poutcome and month
etc.
• Ordinal data type: banking dataset: Age group.
• Time and date type
• Coordinates type of data: latitude and longitude type.

#read the data set of "bank telemarketing campaign" in inp0.

inp0= pd.read_csv("bank_marketing_updated_v1.csv")
#Print the head of the data frame.
inp0.head()

Segment- 3, Fixing the Rows and Columns

Checklist for fixing rows:
• Delete summary rows: Total and Subtotal rows
• Delete incorrect rows: Header row and footer row
• Delete extra rows: Column number, indicators, Blank rows, Page No.
Checklist for fixing columns:
• Merge columns for creating unique identifiers, if needed, for example, merge the
columns State and City into the column Full address.
• Split columns to get more data: Split the Address column to get State and City columns
to analyse each separately.
• Add column names: Add column names if missing.
• Rename columns consistently: Abbreviations, encoded columns.
• Delete columns: Delete unnecessary columns.
• Align misaligned columns: The data set may have shifted columns, which you need to
align correctly

#read the file in inp0 without first two rows as it is of no use.

inp0=pd.read_csv("bank_marketing_updated_v1.csv", skiprows= 2)
#print the head of the data frame.
inp0.head()

#drop the customer id as it is of no use.

inp0.drop("customerid", axis=1, inplace=True)
inp0.head()

#Extract job in newly created 'job' column from "jobedu" column.

inp0['job']=inp0.jobedu.apply(lambda x: x.split(",")[0])
inp0.head()

#Extract education in newly created 'education' column from "jobedu"

column.
inp0['education']=inp0.jobedu.apply(lambda x: x.split(",")[1])
inp0.head()

#drop the "jobedu" column from the dataframe.

inp0.drop('jobedu',axis= 1, inplace= True)
inp0.head()

inp0[inp0.month.apply(lambda x: isinstance(x,float))== True]

inp0.isnull().sum()

Segment- 4, Impute/Remove missing values

Take aways from the lecture on missing values:
• Set values as missing values: Identify values that indicate missing data, for example,
treat blank strings, "NA", "XX", "999", etc., as missing.
• Adding is good, exaggerating is bad: You should try to get information from reliable
external sources as much as possible, but if you can’t, then it is better to retain missing
values rather than exaggerating the existing rows/columns.
• Delete rows and columns: Rows can be deleted if the number of missing values is
insignificant, as this would not impact the overall analysis results. Columns can be
removed if the missing values are quite significant in number.
• Fill partial missing values using business judgement: Such values include missing time
zone, century, etc. These values can be identified easily.
Types of missing values:
• MCAR: It stands for Missing completely at random (the reason behind the missing value
is not dependent on any other feature).
• MAR: It stands for Missing at random (the reason behind the missing value may be
associated with some other features).
• MNAR: It stands for Missing not at random (there is a specific reason behind the missing
value).

#count the missing values in age column.

inp0.age.isnull().sum()

#pring the shape of dataframe inp0

inp0.shape

#calculate the percentage of missing values in age column.

float(100.0*20/45211)

#drop the records with age missing in inp0 and copy in inp1 dataframe.
inp1=inp0[-inp0.age.isnull()].copy()
inp1.shape

#count the missing values in month column in inp1.

inp1.month.isnull().sum()

#print the percentage of each month in the data frame inp1.

float(100.0*50/45191)

#find the mode of month in inp1

month_mode=inp1.month.mode()[0]
month_mode

# fill the missing values with mode value of month in inp1.

inp1.month.fillna(month_mode, inplace= True)
inp1.month.value_counts(normalize= True)

#let's see the null values in the month column.

inp1.month.isnull().sum()
0

#count the missing values in response column in inp1.

inp1.response.isnull().sum()
30
#calculate the percentage of missing values in response column.
float(100.0*30/45191)
0.06638489964816004

#drop the records with response missings in inp1.

inp1= inp1[~inp1.response.isnull()]
#calculate the missing values in each column of data frame: inp1.
inp1.isnull().sum()

#describe the pdays column of inp1.

inp1.pdays.describe()

-1 indicates the missing values. Missing value does not always be present as null. How to
handle
it:
Objective is:
• you should ignore the missing values in the calculations
• simply make it missing - replace -1 with NaN.
• all summary statistics- mean, median etc. we will ignore the missing values of pdays.

#describe the pdays column with considering the -1 values.

inp1.loc[inp1.pdays<0,"pdays"]=np.NaN
inp1.pdays.describe()

Session- 4, Bivariate and Multivariate Analysis

Segment-2, Numeric- numeric analysis
There are three ways to analyse the numeric- numeric data types simultaneously.
• Scatter plot: describes the pattern that how one variable is varying with other variable.
• Correlation matrix: to describe the linearity of two numeric variables.
• Pair plot: group of scatter plots of all numeric variables in the data frame.

#plot the scatter plot of balance and salary variable in inp1

plt.scatter(inp1.salary, inp1.balance)
plt.show()

#plot the scatter plot of balance and age variable in inp1

inp1.plot.scatter(x='age', y='balance')
plt.show()

#plot the pair plot of salary, balance and age in inp1 dataframe.
sns.pairplot(data=inp1, vars=["salary","balance", "age"])
plt.show()
#plot the correlation matrix of salary, balance and age in inp1
dataframe.
sns.heatmap( inp1[["salary","balance", "age"]].corr(), annot= True,
cmap= "Reds")
plt.show()

Segment- 4, Numerical categorical variable

Salary vs response

#groupby the response to find the mean of the salary with response no
& yes seperatly.
inp1.groupby("response")["salary"].mean()

#groupby the response to find the median of the salary with response
no & yes seperatly.
inp1.groupby("response")["salary"].median()

#plot the box plot of salary for yes & no responses.

sns.boxplot(data=inp1,x="response", y="salary")
plt.show()

#plot the box plot of balance for yes & no responses.

sns.boxplot(data=inp1,x="response", y="balance")
plt.show()

#groupby the response to find the mean of the balance with response no
& yes seperatly.
inp1.groupby("response")["balance"].mean()

#groupby the response to find the median of the balance with response
no & yes seperatly.
inp1.groupby("response")["balance"].median()

#function to find the 75th percentile.

def p75(x):
return np.quantile(x, 0.75)
#calculate the mean, median and 75th percentile of balance with
response
inp1.groupby("response")["balance"].aggregate(["mean","median",p75])

#plot the bar graph of balance's mean an median with response.

inp1.groupby("response")
["balance"].aggregate(["mean","median"]).plot.bar()
plt.show()
Education vs salary

#groupby the education to find the mean of the salary education

category.
inp1.groupby("education")["salary"].mean()

#groupby the education to find the median of the salary for each
education category.
inp1.groupby("education")["salary"].median()

Job vs salary
#groupby the job to find the mean of the salary for each job category.
inp1.groupby('job')['salary'].mean()

inp1.groupby('job')['salary'].median()

Segment- 5, Categorical categorical variable

#create response_flag of numerical data type where response "yes"= 1,
"no"= 0
inp1["response_flag"]=np.where(inp1.response=="yes", 1, 0)
inp1.response.value_counts()

inp1.response.value_counts(normalize= True)

inp1.response_flag.mean()

Education vs response rate

#calculate the mean of response_flag with different education
categories.
inp1.groupby("education")["response_flag"].mean()

Marital vs response rate

#calculate the mean of response_flag with different marital status
categories.
inp1.groupby(["marital"])["response_flag"].mean()

#plot the bar graph of marital status with average value of

response_flag
inp1.groupby(["marital"])["response_flag"].mean().plot.barh()
plt.show()

Loans vs response rate

#plot the bar graph of personal loan status with average value of
response_flag
inp1.groupby(["loan"])["response_flag"].mean().plot.bar()
plt.show()
Housing loans vs response rate
#plot the bar graph of housing loan status with average value of
response_flag
inp1.groupby(["housing"])["response_flag"].mean().plot.bar()
plt.show()

Age vs response
#plot the boxplot of age with response_flag
sns.boxplot(data=inp1, x="response",y="age")
plt.show()

#plot the bar graph of job categories with response_flag mean value.
inp1.groupby(['job'])['response_flag'].mean().plot.barh()
plt.show()

8.6 Operating Procedure

 Open Jupyter note book
 Take a new python file
 Type the code
 Run it
 Take inputs from user
 Observe the results
 Verify the results manually
 Store the note book file

8.7 Precautions and/or Troubleshooting

Precautions:
 Save Your Work: Regularly save your Jupyter Notebook to avoid losing your
work. You can save your notebook by clicking on the save icon or using the
keyboard shortcut Ctrl + S (or Cmd + S on Mac).

 Restart Kernel: If you encounter unexpected behavior or errors, try restarting the
kernel. This clears all the variables and imported modules, essentially resetting
the notebook's state. You can restart the kernel by going to the "Kernel" menu
and selecting "Restart."

 Clear Outputs: To reduce clutter and confusion, consider clearing the outputs of
code cells that are no longer relevant. You can do this by selecting "Clear
Outputs" from the "Edit" menu.

 Readability: Keep your code and comments clear and well-organized to make it
easier to understand and maintain. Use markdown cells for explanations,
headings, and documentation.

 Check Dependencies: If you're using external libraries or packages, ensure they

are properly installed in your Jupyter environment. You can check the installed
packages by running !pip list or !conda list in a code cell.

 Kernel Selection: Make sure you're using the correct kernel for your notebook.
The kernel determines the programming language and environment in which
your code runs. You can change the kernel by clicking on "Kernel" > "Change
kernel" in the menu.

 Resource Usage: Be mindful of the resources your notebook is using, especially

if you're working with large datasets or running intensive computations. Check
system monitor tools to ensure you're not exhausting memory or CPU resources.

Troubleshooting:
 Syntax Errors: Check for syntax errors in your code. Python is sensitive to
indentation and syntax, so ensure your code is properly formatted.

 Variable Scope: Be aware of variable scope issues, especially if you're reusing

variable names or working with nested functions.

 Library Installation: If you encounter Module Not Found Error or similar errors,
ensure that the required libraries are installed in your Jupyter environment. You
can install libraries using !pip install <library> or !conda install <library> in a
code cell.

 Kernel Crashes: If the kernel crashes frequently, consider reducing the

complexity of your code or optimizing resource usage. Large datasets or
intensive computations can sometimes overwhelm the kernel.

 Browser Issues: If you experience rendering or responsiveness issues in the

notebook interface, try clearing your browser cache or using a different browser.

 Documentation: Consult the official Jupyter documentation and community

forums for additional troubleshooting tips and solutions to common problems.

8.8 Observations
Observe the results obtained in each operation.

8.9 Calculations & Analysis

Calculations should be given for each operation.

8.10 Result & Interpretation

Result should be printed and pasted in laboratory copy found from Jupyter note book.

8.11 Follow-up Questions

 You need to check the relationship between the two variables. Which graph would you
use?
 You need to check if a variable has outliers. Which graph would you use?
 You need to perform a univariate analysis. Which graph will you use?
 What is a data cleaning step?
 What are the ways to handle missing data?
 What are some of the methods for univariate analysis?
 What problems can outliers cause?
8.12 Extension and Follow-up Activities (if applicable)
NA
8.13 Assessments
8.14 Suggested reading
NA

Lesson Plan
No ratings yet
Lesson Plan
9 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
DAP writeups_merged
No ratings yet
DAP writeups_merged
33 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
data analysis
No ratings yet
data analysis
42 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Class Activity-2
No ratings yet
Class Activity-2
3 pages
Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
56 pages
lab record dev
No ratings yet
lab record dev
20 pages
Dmdw-Lab Manual
No ratings yet
Dmdw-Lab Manual
61 pages
EMPLOYEE PERFORMANCE ANALYSIS
No ratings yet
EMPLOYEE PERFORMANCE ANALYSIS
3 pages
EDA_INDEPTH
No ratings yet
EDA_INDEPTH
19 pages
Certificate
No ratings yet
Certificate
25 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
prac2
No ratings yet
prac2
11 pages
Practicals
No ratings yet
Practicals
42 pages
12 Useful Pandas Techniques in Python For Data Manipulation
100% (2)
12 Useful Pandas Techniques in Python For Data Manipulation
19 pages
22067515 Kushal Kadayat
No ratings yet
22067515 Kushal Kadayat
33 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
29 pages
Capstone Project Assignment
No ratings yet
Capstone Project Assignment
3 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
EDA+Cheatsheet+-+Class+Note
No ratings yet
EDA+Cheatsheet+-+Class+Note
29 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Endsem Imp Bi Unit 4
No ratings yet
Endsem Imp Bi Unit 4
36 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Data Science
No ratings yet
Data Science
18 pages
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
Phython Example
No ratings yet
Phython Example
12 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
DP
No ratings yet
DP
9 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
2 pages
Journal
No ratings yet
Journal
48 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
Python Report Ritik
No ratings yet
Python Report Ritik
15 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
fraud2 (1)
No ratings yet
fraud2 (1)
20 pages
Code explanation for date types
No ratings yet
Code explanation for date types
8 pages
CSE445 NSU Week_3
No ratings yet
CSE445 NSU Week_3
48 pages
Data Mining Using Python Manual
No ratings yet
Data Mining Using Python Manual
69 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
AL Notes
No ratings yet
AL Notes
61 pages
ds
No ratings yet
ds
114 pages
Pandas Fuction Notes
No ratings yet
Pandas Fuction Notes
3 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
DEV RECORD AIDS
No ratings yet
DEV RECORD AIDS
24 pages
Exp 01-B Feature Selection and Extraction
No ratings yet
Exp 01-B Feature Selection and Extraction
12 pages
final dev record
No ratings yet
final dev record
49 pages
Bank Loan Case Study Report
No ratings yet
Bank Loan Case Study Report
23 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
71 pages
L6 and 7-Data Preprocessing-coding
No ratings yet
L6 and 7-Data Preprocessing-coding
34 pages
Document (4)
No ratings yet
Document (4)
15 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
أثر خصائص أليات الحوكمة على الأفصاح عن التغيرات المناخية
No ratings yet
أثر خصائص أليات الحوكمة على الأفصاح عن التغيرات المناخية
22 pages
The Relationship Between Subjective Well-Being in School and Children's
No ratings yet
The Relationship Between Subjective Well-Being in School and Children's
9 pages
Chapter 8 The Cartesian Plane
No ratings yet
Chapter 8 The Cartesian Plane
43 pages
Incidence and Predictors of Mortality Among Neonates Admitted With Birth Asphyxia To Neonatal Intensive Care Unit of West Shewa Zone Public Hospitals, Central Ethiopia
No ratings yet
Incidence and Predictors of Mortality Among Neonates Admitted With Birth Asphyxia To Neonatal Intensive Care Unit of West Shewa Zone Public Hospitals, Central Ethiopia
8 pages
227E3B
No ratings yet
227E3B
1 page
827b551be7606030c4c1ca693fb54a0ed875
No ratings yet
827b551be7606030c4c1ca693fb54a0ed875
12 pages
FDN - Maths - Correlation and Regression - Test 2 - Que
No ratings yet
FDN - Maths - Correlation and Regression - Test 2 - Que
4 pages
Data Analytics Week 5
No ratings yet
Data Analytics Week 5
20 pages
SPSS Lecture Note 2022
No ratings yet
SPSS Lecture Note 2022
226 pages
Applied Multivariate Research Design and Interpretation 3rd Edition Lawrence S. Meyers instant download
100% (4)
Applied Multivariate Research Design and Interpretation 3rd Edition Lawrence S. Meyers instant download
52 pages
Regression Analysis Material
No ratings yet
Regression Analysis Material
12 pages
IB AAHL 4.2 Correlation & Regression
No ratings yet
IB AAHL 4.2 Correlation & Regression
12 pages
Statistical Methods
No ratings yet
Statistical Methods
2 pages
01 Multivariate Analysis
100% (1)
01 Multivariate Analysis
40 pages
Review of Literature On Breast Self Examination in India
100% (1)
Review of Literature On Breast Self Examination in India
8 pages
Get The Fundamentals of Political Science Research Third Edition. Edition Paul M. Kellstedt PDF Ebook With Full Chapters Now
100% (6)
Get The Fundamentals of Political Science Research Third Edition. Edition Paul M. Kellstedt PDF Ebook With Full Chapters Now
62 pages
Doing Statistical Analysis A Student s Guide To Quantitative Research 1st Edition Christer Thrane - The latest ebook version is now available for instant access
100% (1)
Doing Statistical Analysis A Student s Guide To Quantitative Research 1st Edition Christer Thrane - The latest ebook version is now available for instant access
66 pages
SS 187 - Applied Statistics for Humanities - Salaar Khan
No ratings yet
SS 187 - Applied Statistics for Humanities - Salaar Khan
5 pages
Chapter - Processing and Analysis of Data PDF
100% (2)
Chapter - Processing and Analysis of Data PDF
7 pages
Ccw331 Lab Manual
No ratings yet
Ccw331 Lab Manual
102 pages
Traditional Radio Scriptwriting (Article) NEW
No ratings yet
Traditional Radio Scriptwriting (Article) NEW
14 pages
Statistics for Criminology and Criminal Justice Ronet D. Bachman - The ebook in PDF and DOCX formats is ready for download
100% (1)
Statistics for Criminology and Criminal Justice Ronet D. Bachman - The ebook in PDF and DOCX formats is ready for download
67 pages
Instant Access to Applied Multivariate Research Design and Interpretation 3rd Edition Lawrence S. Meyers ebook Full Chapters
100% (6)
Instant Access to Applied Multivariate Research Design and Interpretation 3rd Edition Lawrence S. Meyers ebook Full Chapters
51 pages
Syll 6thsem
No ratings yet
Syll 6thsem
7 pages
[FREE PDF sample] Applied Statistics From Bivariate Through Multivariate Techniques 2nd Edition Edition Rebecca M. Warner ebooks
No ratings yet
[FREE PDF sample] Applied Statistics From Bivariate Through Multivariate Techniques 2nd Edition Edition Rebecca M. Warner ebooks
67 pages
Statistics for Criminology and Criminal Justice Ronet D. Bachman instant download
100% (1)
Statistics for Criminology and Criminal Justice Ronet D. Bachman instant download
64 pages
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
No ratings yet
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
108 pages
PDF An R Companion to Political Analysis 2nd Edition Philip H. Pollock Iii download
100% (4)
PDF An R Companion to Political Analysis 2nd Edition Philip H. Pollock Iii download
77 pages
SL Bivariate Analysis Questions
No ratings yet
SL Bivariate Analysis Questions
15 pages

Uploaded by

Uploaded by

Experiment No 8

8.1 Aim/Purpose of the Experiment

8.2 Learning Outcomes

Basic knowledge of programming, python syntax, matplotlib, seaborn, different libraries.

8.4 Materials/Equipment/Apparatus / Devices/Software required

8.5 Introduction and Theory

 Visualization: Create visualizations to explore the relationship between the two

#import the warnings.

Session- 2, Data Cleaning

#read the data set of "bank telemarketing campaign" in inp0.

Segment- 3, Fixing the Rows and Columns

#read the file in inp0 without first two rows as it is of no use.

#drop the customer id as it is of no use.

#Extract job in newly created 'job' column from "jobedu" column.

#Extract education in newly created 'education' column from "jobedu"

#drop the "jobedu" column from the dataframe.

inp0[inp0.month.apply(lambda x: isinstance(x,float))== True]

Segment- 4, Impute/Remove missing values

#count the missing values in age column.

#pring the shape of dataframe inp0

#calculate the percentage of missing values in age column.

#count the missing values in month column in inp1.

#print the percentage of each month in the data frame inp1.

#find the mode of month in inp1

# fill the missing values with mode value of month in inp1.

#let's see the null values in the month column.

#count the missing values in response column in inp1.

#drop the records with response missings in inp1.

#describe the pdays column of inp1.

#describe the pdays column with considering the -1 values.

Session- 4, Bivariate and Multivariate Analysis

#plot the scatter plot of balance and salary variable in inp1

#plot the scatter plot of balance and age variable in inp1

Segment- 4, Numerical categorical variable

#plot the box plot of salary for yes & no responses.

#plot the box plot of balance for yes & no responses.

#function to find the 75th percentile.

#plot the bar graph of balance's mean an median with response.

#groupby the education to find the mean of the salary education

Segment- 5, Categorical categorical variable

Education vs response rate

Marital vs response rate

#plot the bar graph of marital status with average value of

Loans vs response rate

8.6 Operating Procedure

8.7 Precautions and/or Troubleshooting

 Check Dependencies: If you're using external libraries or packages, ensure they

 Resource Usage: Be mindful of the resources your notebook is using, especially

 Variable Scope: Be aware of variable scope issues, especially if you're reusing

 Kernel Crashes: If the kernel crashes frequently, consider reducing the

 Browser Issues: If you experience rendering or responsiveness issues in the

 Documentation: Consult the official Jupyter documentation and community

8.9 Calculations & Analysis

8.10 Result & Interpretation

8.11 Follow-up Questions

You might also like