0% found this document useful (0 votes)
18 views40 pages

Cs3361 Datascience Lab Record

manual

Uploaded by

814723104177
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views40 pages

Cs3361 Datascience Lab Record

manual

Uploaded by

814723104177
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

SRM TRP ENGINEERING COLLEGE

IRUNGALUR, TIRUCHIRAPALLI– 621 105

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3361- DATA SCIENCE LABORATORY

Name

Roll No.

Reg. No.
SRM TRP ENGINEERING COLLEGE
IRUNGALUR, TIRUCHIRAPALLI-621 105.

RECORD NOTE BOOK

Register Number:

This is to certify that this Practical work titled CS3361 DATA SCIENCE LABORATORY record of
work done by Mr./Ms. of

III Semester, Department of Computer Science and Engineering during the academic year
2024-2025.

Faculty in-charge Head of the Department

Submitted for the University Practical Examination on …………………..

Internal Examiner External Examiner


TABLE OF CONTENTS

Ex. Page
Date Exercise No Marks Sign
No.
Download, Install and Explore The Features Of Numpy,
1 Scipy, Jupyter, Statsmodels and Pandas Packages

2 Working With Numpy Arrays

3 Working with pandas data frames

Reading data from text files, Excel and the web and
4 exploring various commands for doing descriptive
analytics on the Iris data set.

5 Use the diabetes data set from UCI and Pima Indians
Diabetes data set for performing the following:

5(a) Univariate analysis: Frequency, Mean, Median, Mode,


Variance, Standard Deviation, Skewness and Kurtosis.

5(b) Bivariate analysis: Linear and logistic regression


modeling

5(c) Multiple Regression analysis


Apply and explore various plotting functions on UCI
6 data sets.

6(a) Normal curves

6(b) Density and contour plots

6(c) Correlation and scatter plots

6(d) Histograms

6(e) Three dimensional plotting

7 Visualizing Geographic Data with Basemap


SRM TRP Engineering College, Trichy
Department of Computer Science and Engineering
Vision of the Institute

To carve the youth as dynamic, competent, valued and knowledgeable Technocrats through
research, innovation and entrepreneurial development for accomplishing the global expectations.

Mission of the Institute

M1: To inculcate academic excellence in engineering education to create talented professionals


M2: To promote research in basic sciences and applied engineering among faculty and students to
fulfill the societal expectations.

M3: To enhance the holistic development of students through meaningful interaction with industry
and academia.

M4: To foster the students on par with sustainable development goals thereby contributing to the
process of nation building

M5: To nurture and retain conducive lifelong learning environment towards professional
excellence.

Vision of the Department

To be recognized as Centre of Excellence for innovation and research in computer science and
engineering through the futuristic technologies by developing technocrats with ethical values to
serve the society at global level.

Mission of the Department

M1: To develop quality and technically competent computer professionals


through excellence in academics.
M2: To encouraging the faculty and students towards research and
development with advanced tools and technologies.

M3: To enhance industry institute interaction to build a strong technical


expertise among the students.
M4: To inculcate leadership skills with ethical behaviors and social
consciousness within the students.
M5: To nurture professional empowerment among students through continuous Learning.

Program Educational Objectives (PEO's)

The graduate of Computer Science and Engineering will have

PEO1: Ability to analyze and get solutions in the field of Computer Science and Engineering
through application of fundamental knowledge of Mathematics, Science and Electronics
(Preparation).
PEO2: Innovative ideas, methods and techniques thereby rendering expertise to the industrial
and societal needs in an effective manner and will be a competent computer/software
engineer (Core Competency).

PEO3: Good and broad knowledge with interpersonal skills so as to comprehend, analyze, design
and create novel products and solutions for real-time applications (Breadth).

PEO4: Professional with ethical values to develop leadership, effective communication skills and
teamwork to excel in career. (Professionalism)

PEO5: Strive to learn continuously and update their knowledge in the specific fields of computer
science & engineering for the societal growth. (Learning environment).

Program Outcomes (PO'S)

PO1: Engineering knowledge: Apply the basic knowledge of science, mathematics and
engineering fundamentals in the field of Computer Science and Engineering to solve complex
engineering problems.

PO2: Problem analysis: Ability to use basic principles of mathematics, natural sciences, and
engineering sciences to Identify, formulate, review research literature and analyze Computer
Science and engineering problems.
PO3: Design/development of solutions: Ability to design solutions for complex Computer
Science and engineering problems and basic design system to meet the desired needs within
realistic constraints such as manufacturability, durability, reliability, sustainability and economy
with appropriate consideration for the public health, safety, cultural, societal, and environmental
considerations.
PO4: Conduct investigations of complex problems: Ability to execute the experimental
activities using research-based knowledge and methods including analyze, interpret the data and
results with valid conclusion.

PO5: Modern tool usage: Ability to use state of the art of techniques, skills and modern
engineering tools necessary for engineering practice to satisfy the needs of the society with an
understanding of the limitations.

PO6: The Engineer and Society: Ability to apply reasoning informed by the contextual
knowledge to assess the impact of Computer Science and engineering solutions in legal, health,
cultural, safety and societal context and the consequent responsibilities relevant to the professional
engineering practice.

PO7: Environment and sustainability: Ability to understand the professional responsibility and
accountability to demonstrate the need for sustainable development globally in Computer
Science domain with consideration of environmental effect.

PO8: Ethics: Ability to understand and apply ethical principles and commitment to address the
professional ethical responsibilities of an engineer.

PO9: Individual and team work: Ability to function efficiently as an individual or as a group
member or leader in a team in multidisciplinary environment.

PO10: Communication: Ability to communicate, comprehend and present effectively with


engineering community and the society at large on complex engineering activities by receiving
clear instructions for preparing effective reports, design documentation and presentations.

PO11: Project management and finance: Ability to acquire and demonstrate the knowledge of
contemporary issues related to finance and managerial skills in one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.

PO12: Life-long learning: Ability to recognize and adapt to the emerging field of application in
engineering and technology by developing self-confidence for lifelong learning process.

Program Specific Outcome (PSO's)


The graduates of Bachelor of Engineering in Computer Science and Engineering Programme will
be able to:
PSO1:Use Data structures, Data management, Networking, System software, Data science with
high end programming skills to design and implement automation in various domains of emerging
technologies.
PSO2: Apply engineering knowledge in project development with the end products and services
in the field of hardware and software platform to accomplish the industry expectations.
SRM TRP ENGINEERING COLLEGE
IRUNGALUR, TRICHY

COURSE OUTCOMES: At the end of this course, the students will be able to:
CO1: Make use of the python libraries for data science
CO2: Make use of the basic Statistical and Probability measures for data science.
CO3: Perform descriptive analytics on the benchmark data sets.
CO4: Perform correlation and regression analytics on standard data sets
CO5: Present and interpret data using visualization packages in Python.

CO’s-PO’s &PSO’s MAPPING


Ex. No:1 DOWNLOAD, INSTALL AND EXPLORE THE FEATURES
Date: OF NUMPY, SCIPY, JUPYTER, STATSMODELS AND
PANDAS PACKAGES

INTRODUCTION – SOFTWARE INSTALLATION

Software required

Spyder IDE.

What is Spyder?

Spyder is an open- source cross- platform IDE.


Written completely in Python
Also called as Scientific Python Development IDE.

Features of Spyder

Syntax highlight
Availability of breakponts
Run configuration
Automatic colon insertion after if, while, etc..
Support all ipython commands.
Inline display for graphics produced using Matplotlib.
Also provides features such as help, file, explorer, find files and so on.

SPYDER IDE Installation

Comes as a default implementation along with the Anaconda python distribution.

Step 1:
Go to Anaconda website https://www.anaconda.com.

Step 2: Click get started and click on download


option.

Step 3: Choose the version that is suitable for your OS and click on
download.

Step 4: Complete the Setup and Click on


Finish.

Step 5:
Launch Sypder from the Anaconda Navigator.
2) Working With Numpy Arrays

import numpy as np
a=np.array([[1,2,3],[4,5,6]])
b=np.array([[10,11,12],[13,14,15]])
c= a + b
print(c)
[[11 13 15]
[17 19 21]]

import numpy as np
a= np.array([[1,2,3],[4,5,6]])
b= 3 * a
print(b)
[[ 3 6 9]
[12 15 18]]

import numpy as np
i=np.eye(4)
print(i)

[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]

import numpy as np
a=np.array([[1,2,3],[4,5,6], [7,8,9]])
b=np.array([[2,3,4],[5,6,7],[8,9,10]])
c= a@b
print(c)

[[ 36 42 48]
[ 81 96 111]
[126 150 174]]

import numpy as np
a =np.array([[1,2,3],[4,5,6], [7,8,9]])
b = a.T
print(b)
[[1 4 7]
[2 5 8]
[3 6 9]]

import numpy as np
a =np.array([[2.5, 3.8, 1.5],[4.7, 2.9, 1.56]])
b = a.astype('int')
print(b)
[[2 3 1]
[4 2 1]]
import numpy as np
a1 =np.array([[1,2,3],[4,5,6]])
a2 = np.array([[7,8,9],[10,11,12]])
c = np.hstack((a1, a2))
print(c)

[[ 1 2 3 7 8 9]

[ 4 5 6 10 11 12]]

import numpy as np
a = np.array([[1,2],[3,4], [5,6]])
b = np.array([[7,8],[9,10], [10,11]])
c = np.vstack((a, b))
print(c)

[[ 1 2]
[ 3 4]
[ 5 6]
[ 7 8]
[ 9 10]
[10 11]]

import numpy as np
list = [x for x in range(0, 101, 2)]
a=np.array(list)
print(a)
[ 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70
72 74 76 78 80 82 84 86 88 90 92 94 96 98 100]

import numpy as np
a =np.full((2, 3), 5)
print(a)

[[5 5 5]
[5 5 5]]

import numpy as np
a = np.array ([[1, 4, 2], [3, 4, 6],[0, -1, 5]])
print (np.sort(a, axis = None))
print (np.sort(a, axis = 1))
print (np.sort(a, axis = 0))

[-1 0 1 2 3 4 4 5 6]
[[ 1 2 4]
[ 3 4 6]
[-1 0 5]]
[[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]
3) Working With Numpy Arrays

import pandas as pd
df =pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []

import pandas as pd
data =[1,2,3,4,5]
df =pd.DataFrame(data)
print (df)

0
0 1
1 2
2 3
3 4
4 5

import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print (df)
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13

import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve','Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print (df)
Name Age
0 Tom 28
1 Jack 34
2 Steve 29
3 Ricky 42

import pandas as pd
d = { 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print (df)
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
import pandas as pd
data ={ 'Name':['Tom', 'Jack', 'Steve','Ricky'],'Age':[28,34,29,42]}
df =pd.DataFrame(data)
print (df)
df_sorted = df.sort_values(by='Name')
print ("Sorted data frame…")
print(df_sorted)

Name Age
0 Tom 28
1 Jack 34
2 Steve 29
3 Ricky 42
Sorted data frame…
Name Age
1 Jack 34
3 Ricky 42
2 Steve 29
0 Tom 28

import pandas as pd
d = { 'one' :pd.Series([1, 2, 3],
index=['a', 'b', 'c']),
'two' :pd.Series([1, 2, 3, 4],
index=['a', 'b', 'c', 'd'])}
df =pd.DataFrame(d)
print(df [ 'one'])
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64

import pandas as pd
d = { 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
import pandas as pd
d = { 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' :pd.Series([10,20,30], index=['a','b','c'])}
df = pd.DataFrame(d)
print ("Deleting the first column using DEL function:")
del df['one']
print(df)

Deleting the first column using DEL function:


two three
a 1 10.0
b 2 20.0
c 3 30.0
d 4 NaN

import pandas as pd
d = { 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print
df.loc['b']
one 2.0
two 2.0
Name: b, dtype: float64

import pandas as pd
df =pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 =pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df =df.append(df2)
print (df)

a b
0 1 2
1 3 4
0 5 6
1 7 8

import pandas as pd
df =pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 =pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df =df.append(df2) # Drop rows with label 0 df =
df.drop(0)
print (df)
a b
0 1 2
1 3 4
0 5 6
1 7 8
4) Reading data from text files, Excel and the web and
exploring various commands for doing descriptive
analytics on the Iris data set.

import pandas as pd
df=pd.read_csv("iris_csv.csv")
df.head()
df.shape
df.info()
df.describe()
df.isnull().sum()
df.value_counts("class")
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype

0 sepallength 150 non-null float64


1 sepalwidth 150 non-null float64
2 petallength 150 non-null float64
3 petalwidth 150 non-null float64
4 class 150 non-null object

dtypes: float64(4), object(1)


memory usage: 6.0+ KB
class
Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
dtype: int64
5a) Univariate analysis: Frequency, Mean, Median, Mode,
Variance, Standard Deviation, Skewness and Kurtosis.

import pandas as pd
import numpy as np
import statistics as st
df=pd.read_csv("diabetes_csv.csv")
print(df.shape)
print(df.info())
print('MEAN:\n',df.mean())
print('MEDIAN:\n',df.median())
print('MODE:\n',df.mode())
print('STANDARD DEVIATION\n',df.std())
print('VARIANCE:\n:',df.var())
print('SKEWNESS:\n:',df.skew())
print('KURTOSIS|n',df.kurtosis())
df.describe

Output:
(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype

0 preg 768 non-null int64


1 plas 768 non-null int64
2 pres 768 non-null int64
3 skin 768 non-null int64
4 insu 768 non-null int64
5 mass 768 non-null float64
6 pedi 768 non-null float64
7 age 768 non-null int64
8 class 768 non-null object
dtypes: float64(2), int64(6), object(1)
memory usage: 54.1+ KB
None
MEAN:
preg 3.845052
plas 120.894531
pres 69.105469
skin 20.536458
insu 79.799479
mass 31.992578
pedi 0.471876
age 33.240885
dtype: float64
MEDIAN:
preg 3.0000
plas 117.0000
pres 72.0000
skin 23.0000
insu 30.5000
mass 32.0000
pedi 0.3725
age 29.0000
dtype: float64
MODE:
preg plas pres skin insu mass pedi age class
0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 tested_negative
1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN
STANDARD DEVIATION
preg 3.369578

plas 31.972618
pres 19.355807
skin 15.952218
insu 115.244002
mass 7.884160
pedi 0.331329
age 11.760232
dtype: float64

VARIANCE:
: preg 11.354056
plas 1022.248314
pres 374.647271
skin 254.473245
insu 13281.180078
mass 62.159984
pedi 0.109779
age 138.303046
dtype: float64
SKEWNESS:
: preg 0.901674
plas 0.173754
pres -1.843608
skin 0.109372
insu 2.272251
mass -0.428982
pedi 1.919911
age 1.129597
dtype: float64
KURTOSIS|n preg 0.159220
plas 0.640780
pres 5.180157
skin -0.520072
insu 7.214260
mass 3.290443
pedi 5.594954
age 0.643159
dtype: float64
pedi age class
0 6 148 72 35 0 33.6 0.62 50 tested_positiv
7 e
1 1 85 66 29 0 26.6 0.35 31 tested_negativ
1 e
2 8 183 64 0 0 23.3 0.67 32 tested_positiv
2 e
3 1 89 66 23 94 28.1 0.16 21 tested_negativ
7 e
4 0 137 40 35 168 43.1 2.28 33 tested_positiv
8 e
.. ... ... ... ... ... ... ... ... ...
763 10 101 76 48 180 32.9 0.17 63 tested_negativ
1 e
764 2 122 70 27 0 36.8 0.34 27 tested_negativ
0 e
765 5 121 72 23 112 26.2 0.24 30 tested_negativ
5 e
766 1 126 60 0 0 30.1 0.34 47 tested_positiv
9 e
767 1 93 70 31 0 30.4 0.31 23 tested_negativ
5 e

[768 rows x 9 columns]>


5b) Bivariate analysis: Linear and logistic regression
modeling
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sklearn import datasets


from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import cross_val_score

# Load the diabetes dataset


diabetes = datasets.load_diabetes()

# Put the data into a DataFrame


df = pd.DataFrame(diabetes['data'], columns=diabetes['feature_names'])x
= df
y = diabetes['target']

# Split the data into training and testing sets


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3,

# Initialize the Linear Regression model


model = linear_model.LinearRegression()

# Train the model


model.fit(x_train, y_train)

# Predict the test set results


y_pre = model.predict(x_test)

# Cross Validation Scores


scores = cross_val_score(model, x, y, scoring="neg_mean_squared_error", c
rmse_scores = np.sqrt(-scores).mean()
print('Cross validation RMSE:', rmse_scores)

# Checking predictions accuracy by r2 Score


r2 = r2_score(y_test, y_pre)
print('r^2:', r2)

# Calculating Root Mean Square Error mse


= mean_squared_error(y_test, y_pre)rmse =
np.sqrt(mse)
print('RMSE:', rmse)

# Getting Weights and Intercept of Model


print("Weights:", model.coef_)
print("\nIntercept:", model.intercept_)
Output:
Cross validation RMSE: 54.40468149952541
r^2: 0.45767579788519963
RMSE: 58.00932552866432
Weights: [ -8.02358048 -308.83941066 583.63743356 299.99074281 -360.66
454462
95.11692608 -93.03587104 118.15977759 662.11309186 26.07805489]

Intercept: 153.72032548545178
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sklearn import datasets


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import r2_score, mean_squared_error

# Load the diabetes dataset


diabetes = datasets.load_diabetes()

# Print the keys to find the content of data


print(diabetes.keys())

# Put the data into a DataFrame


df = pd.DataFrame(diabetes['data'], columns=diabetes['feature_names'])
x = df
y = diabetes['target']

# Split the data into training and testing sets


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3,

# Build Logistic Regression Model


model = LogisticRegression()
model.fit(x_train, y_train)

# Prediction of test set result of the Prepared Model


y_pre = model.predict(x_test)

# Checking predictions accuracy by r2 Score


r2 = r2_score(y_test, y_pre)
print('r^2:', r2)

# Calculating Root Mean Square Error


mse = mean_squared_error(y_test, y_pre)
rmse = np.sqrt(mse)
print('RMSE:', rmse)

Output:
dict_keys(['data', 'target', 'frame', 'DESCR', 'feature_names', 'data_fil
ename', 'target_filename', 'data_module'])
r^2: -0.44401265478624397
RMSE: 94.65723681369009
5c) Multiple Regression analysi

import matplotlib.pyplot as plt


import numpy as np
from sklearn import datasets, linear_model, metrics
import pandas as pd

# Read the CSV file


df = pd.read_csv('diabetes.csv')

# Extract relevant
df = pd.read_csv('diabetes.csv')

# Extract relevant features and target


data = df[['Age', 'Glucose', 'BMI', 'BloodPressure', 'Pregnancies']] # C
target = df[['Outcome']]

print(data) print(target)

# Define feature matrix (X) and response vector (y)


X = data
y = target

# Split X and y into training and testing sets


from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,

# Create linear regression object


reg = linear_model.LinearRegression()

# Train the model using the training sets


reg.fit(X_train, y_train)

# Predictions on the test set


y_predict = reg.predict(X_test)

# Regression coefficients
print('Coefficients:', reg.coef_)

# Variance score: 1 means perfect prediction


print('Variance score: {}'.format(reg.score(X_test, y_test)))

# Checking predictions accuracy by r2 Scorefrom


sklearn.metrics import r2_score print('r^2:',
r2_score(y_test, y_predict))

# Calculating Root Mean Square Error


from sklearn.metrics import mean_squared_errormse
= mean_squared_error(y_test, y_predict) rmse =
np.sqrt(mse)
print('RMSE:', rmse)
Output:
Age Glucose BMI BloodPressure Pregnancies
0 50 148 33.6 72 6
1 31 85 26.6 66 1
2 32 183 23.3 64 8
3 21 89 28.1 66 1
4 33 137 43.1 40 0
.. ... ... ... ... ...
763 63 101 32.9 76 10
764 27 122 36.8 70 2
765 30 121 26.2 72 5
766 47 126 30.1 60 1
767 23 93 30.4 70 1

[768 rows x 5 columns]


Outcome
0 1
1 0
2 1
3 0
4 1
.. ...
763 0
764 0
765 0
766 1
767 0

[768 rows x 1 columns]


Coefficients: [[ 0.00362921 0.0057603 0.01359201 -0.0022797 0.019033
24]]
Variance score: 0.3119613858813981
r^2: 0.3119613858813981
RMSE: 0.3958061749043919
6a) Normal curves

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
f,ax=plt.subplots(figsize=(10,6))
x=df['Age']
ax=sns.distplot(x,bins=10)
plt.show()

Output:
6b) Density and contour plots

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
f,ax=plt.subplots(figsize=(10,6))
x=df['Age']
X=pd.Series(x,name="Age variable")
ax=sns.kdeplot(x,shade=True,color='r')
plt.show()

Output:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
f,ax=plt.subplots(figsize=(8,6))
ax=sns.countplot(x='ChestPain',data=df)
plt.show()

Output:
6c) Correlation and scatter plots

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
sns.pairplot(data=df)

Output:
<seaborn.axisgrid.PairGrid at 0x1a1534e2130>
6d) Histograms

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
plt.figure(figsize=(20,10))
sns.heatmap(df.corr(),annot=True,cmap='terrain')

Output:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
f,ax=plt.subplots(figsize=(8,6))
ax=sns.scatterplot(x="Age",y="RestBP",data=df)
6e) Three dimensional plotting

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
df.hist(figsize=(12,12),layout=(5,3))

Output:
array([[<AxesSubplot:title={'center':'Unnamed: 0'}>,
<AxesSubplot:title={'center':'Age'}>,
<AxesSubplot:title={'center':'Sex'}>],
[<AxesSubplot:title={'center':'RestBP'}>,
<AxesSubplot:title={'center':'Chol'}>,
<AxesSubplot:title={'center':'Fbs'}>],
[<AxesSubplot:title={'center':'RestECG'}>,
<AxesSubplot:title={'center':'MaxHR'}>,
<AxesSubplot:title={'center':'ExAng'}>],
[<AxesSubplot:title={'center':'Oldpeak'}>,
<AxesSubplot:title={'center':'Slope'}>,
<AxesSubplot:title={'center':'Ca'}>],
[<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>]], dtype=object)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
fig = plt.figure()
ax = plt.axes(projection='3d')
x=df["Age"]
x=pd.Series(x,name="Age variable")
y=df["Sex"]
y=pd.Series(y,name="Sex variable")
z=df["Chol"]
z=pd.Series(z,name="Cholestrol Variable")
ax.plot3D(x,y,z,'green')
ax.set_title('3D line plot Heart disease dataset')
plt.show()

Output:

I
7) Visualizing Geographic Data with Basemap

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.feature import NaturalEarthFeature

plt.figure(figsize=(8, 8))

# Create a Cartopy Orthographic projection


ax = plt.axes(projection=ccrs.Orthographic(central_latitude=50, central_l

# Add natural features such as coastlines, land, and ocean


ax.add_feature(NaturalEarthFeature('physical', 'land', '50m', edgecolor='
ax.add_feature(NaturalEarthFeature('physical', 'ocean', '50m', edgecolor=

# Set the title


plt.title('Orthographic Projection')

# Show the plot


plt.show()

Output:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.feature import NaturalEarthFeature

# Create a Cartopy Plate Carrée projection


fig, ax = plt.subplots(figsize=(6, 6), subplot_kw={'projection': ccrs.Pla

# Add natural features such as coastlines, land, and ocean


ax.add_feature(NaturalEarthFeature('physical', 'land', '50m', edgecolor='
ax.add_feature(NaturalEarthFeature('physical', 'ocean', '50m', edgecolor=

# Plot Seattle
seattle_lon, seattle_lat = -122.3, 47.6
ax.plot(seattle_lon, seattle_lat, 'ok', markersize=5)
ax.text(seattle_lon, seattle_lat, ' Seattle', fontsize=12, transform=ccrs

# Set the title


plt.title('Plate Carrée Projection')

# Show the plot


plt.show()

Output:
In [16]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from itertools import chain

def draw_map(ax, scale=0.2):


ax.set_global()
ax.add_feature(cfeature.LAND, edgecolor='black', facecolor='lightgray
ax.add_feature(cfeature.COASTLINE, edgecolor='black', linewidth=0.5)
ax.add_feature(cfeature.BORDERS, linestyle='-', linewidth=0.5)

# Draw parallels and meridians


ax.gridlines(linestyle='-', alpha=0.3, color='white', draw_labels=Fal

# Create a figure and axes with PlateCarree projection


fig, ax = plt.subplots(subplot_kw={'projection': ccrs.PlateCarree()}, fig

# Draw the map


draw_map(ax)

# Show the plot


plt.show()

Output:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from itertools import chain

def draw_map(ax, scale=0.2):


ax.set_global()
ax.add_feature(cfeature.LAND, edgecolor='black', facecolor='lightgray
ax.add_feature(cfeature.COASTLINE, edgecolor='black', linewidth=0.5)
ax.add_feature(cfeature.BORDERS, linestyle='-', linewidth=0.5)

# Draw parallels and meridians


ax.gridlines(linestyle='-', alpha=0.3, color='white', draw_labels=Fal

# Create a figure and axes with Mollweide projection


fig, ax = plt.subplots(subplot_kw={'projection': ccrs.Mollweide()}, figsi

# Draw the map


draw_map(ax)

# Show the plot


plt.show()

Output:
Content Beyond Syllabus

8. Use NumPy to implement a simple image processing algorithm.

Aim:
To Use NumPy to implement a simple image processing algorithm.

Procedure:
Step 1: Install Required Libraries
Make sure you have NumPy, Matplotlib, and Pillow installed. If not, you can install them
using:
pip install numpy matplotlib Pillow

Step 2: Import Libraries

Step 3: Load the Image

Step 4: Convert Image to NumPy Array

Convert the image to a NumPy array for processing:

Step 5: Display the Original Image

Use Matplotlib to display the original image:

Step 6: Grayscale Conversion

Convert the RGB image to grayscale using NumPy

Program:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

# Load the image


image_path = "example_image.jpg"
image = Image.open(image_path)

# Convert the image to a NumPy array


image_array = np.array(image)

# Display the original image


plt.figure(figsize=(8, 4))
plt.subplot(1, 3, 1)
plt.imshow(image_array)
plt.title('Original Image')

# Grayscale conversion
gray_image = np.mean(image_array, axis=-1, keepdims=True)
# Display the grayscale image
plt.subplot(1, 3, 2)
plt.imshow(np.squeeze(gray_image), cmap='gray')
plt.title('Grayscale Image')

# Contrast adjustment (increase contrast)


adjusted_image = np.clip((gray_image - 100) * 1.5 + 100, 0, 255).astype(np.uint8)

# Display the adjusted image


plt.subplot(1, 3, 3)
plt.imshow(np.squeeze(adjusted_image), cmap='gray')
plt.title('Adjusted Image')

# Show the plots


plt.tight_layout()
plt.show()

Output:

Result:
Thus NumPy is used to implement a simple image processing algorithm and executed
Successfully.
9. Use NumPy to implement a simple algorithm for image classification.

Aim:
To Use NumPy to implement a simple algorithm for image classification.

Procedure:
Step 1: Install Required Libraries
Make sure you have NumPy installed:

Step 2: Import Libraries

Step 3: Generate or Load a Dataset


For simplicity, let's generate a synthetic dataset. Replace this step with loading your dataset:

Step 4: Split the Dataset


Split the dataset into training and testing sets:

Step 5: Define the Model


Define a simple linear classifier:

Step 6: Training
Train the model using simple gradient descent. Replace this with a more sophisticated
training algorithm for a real-world scenario:

Step 7: Make Predictions


Make predictions on the test set:

Step 8: Evaluate Accuracy


Evaluate the accuracy of the model:

Program:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate synthetic dataset


np.random.seed(42)
num_samples = 1000
image_size = 28 * 28 # Assuming 28x28 pixel images
num_classes = 2

X = np.random.rand(num_samples, image_size)
y = np.random.randint(0, num_classes, size=num_samples)

# Split the dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the model


class SimpleClassifier:
def __init__(self, input_size, output_size):
self.weights = np.random.rand(input_size, output_size)
self.bias = np.zeros(output_size)

def predict(self, X):


return np.dot(X, self.weights) + self.bias

# Training
def train(model, X, y, learning_rate=0.001, epochs=100):
for epoch in range(epochs):
predictions = model.predict(X)
loss = np.mean((predictions - y.reshape(-1, 1))**2)

gradient_weights = 2 * np.dot(X.T, predictions - y.reshape(-1, 1)) / len(y)


gradient_bias = 2 * np.mean(predictions - y.reshape(-1, 1))

model.weights -= learning_rate * gradient_weights


model.bias -= learning_rate * gradient_bias

if epoch % 10 == 0:
print(f"Epoch {epoch}, Loss: {loss}")

# Initialize the model


model = SimpleClassifier(input_size=image_size, output_size=num_classes)

# Train the model


train(model, X_train, y_train)

# Make predictions
test_predictions = np.argmax(model.predict(X_test), axis=1)

# Evaluate accuracy
accuracy = accuracy_score(y_test, test_predictions)
print(f"Accuracy: {accuracy}")
Output:

Result:
Thus NumPy is used to implement a simple algorithm for image classification and executed
Successfully.

You might also like