Cs3361 Datascience Lab Record
Cs3361 Datascience Lab Record
Name
Roll No.
Reg. No.
SRM TRP ENGINEERING COLLEGE
IRUNGALUR, TIRUCHIRAPALLI-621 105.
Register Number:
This is to certify that this Practical work titled CS3361 DATA SCIENCE LABORATORY record of
work done by Mr./Ms. of
III Semester, Department of Computer Science and Engineering during the academic year
2024-2025.
Ex. Page
Date Exercise No Marks Sign
No.
Download, Install and Explore The Features Of Numpy,
1 Scipy, Jupyter, Statsmodels and Pandas Packages
Reading data from text files, Excel and the web and
4 exploring various commands for doing descriptive
analytics on the Iris data set.
5 Use the diabetes data set from UCI and Pima Indians
Diabetes data set for performing the following:
6(d) Histograms
To carve the youth as dynamic, competent, valued and knowledgeable Technocrats through
research, innovation and entrepreneurial development for accomplishing the global expectations.
M3: To enhance the holistic development of students through meaningful interaction with industry
and academia.
M4: To foster the students on par with sustainable development goals thereby contributing to the
process of nation building
M5: To nurture and retain conducive lifelong learning environment towards professional
excellence.
To be recognized as Centre of Excellence for innovation and research in computer science and
engineering through the futuristic technologies by developing technocrats with ethical values to
serve the society at global level.
PEO1: Ability to analyze and get solutions in the field of Computer Science and Engineering
through application of fundamental knowledge of Mathematics, Science and Electronics
(Preparation).
PEO2: Innovative ideas, methods and techniques thereby rendering expertise to the industrial
and societal needs in an effective manner and will be a competent computer/software
engineer (Core Competency).
PEO3: Good and broad knowledge with interpersonal skills so as to comprehend, analyze, design
and create novel products and solutions for real-time applications (Breadth).
PEO4: Professional with ethical values to develop leadership, effective communication skills and
teamwork to excel in career. (Professionalism)
PEO5: Strive to learn continuously and update their knowledge in the specific fields of computer
science & engineering for the societal growth. (Learning environment).
PO1: Engineering knowledge: Apply the basic knowledge of science, mathematics and
engineering fundamentals in the field of Computer Science and Engineering to solve complex
engineering problems.
PO2: Problem analysis: Ability to use basic principles of mathematics, natural sciences, and
engineering sciences to Identify, formulate, review research literature and analyze Computer
Science and engineering problems.
PO3: Design/development of solutions: Ability to design solutions for complex Computer
Science and engineering problems and basic design system to meet the desired needs within
realistic constraints such as manufacturability, durability, reliability, sustainability and economy
with appropriate consideration for the public health, safety, cultural, societal, and environmental
considerations.
PO4: Conduct investigations of complex problems: Ability to execute the experimental
activities using research-based knowledge and methods including analyze, interpret the data and
results with valid conclusion.
PO5: Modern tool usage: Ability to use state of the art of techniques, skills and modern
engineering tools necessary for engineering practice to satisfy the needs of the society with an
understanding of the limitations.
PO6: The Engineer and Society: Ability to apply reasoning informed by the contextual
knowledge to assess the impact of Computer Science and engineering solutions in legal, health,
cultural, safety and societal context and the consequent responsibilities relevant to the professional
engineering practice.
PO7: Environment and sustainability: Ability to understand the professional responsibility and
accountability to demonstrate the need for sustainable development globally in Computer
Science domain with consideration of environmental effect.
PO8: Ethics: Ability to understand and apply ethical principles and commitment to address the
professional ethical responsibilities of an engineer.
PO9: Individual and team work: Ability to function efficiently as an individual or as a group
member or leader in a team in multidisciplinary environment.
PO11: Project management and finance: Ability to acquire and demonstrate the knowledge of
contemporary issues related to finance and managerial skills in one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
PO12: Life-long learning: Ability to recognize and adapt to the emerging field of application in
engineering and technology by developing self-confidence for lifelong learning process.
COURSE OUTCOMES: At the end of this course, the students will be able to:
CO1: Make use of the python libraries for data science
CO2: Make use of the basic Statistical and Probability measures for data science.
CO3: Perform descriptive analytics on the benchmark data sets.
CO4: Perform correlation and regression analytics on standard data sets
CO5: Present and interpret data using visualization packages in Python.
Software required
Spyder IDE.
What is Spyder?
Features of Spyder
Syntax highlight
Availability of breakponts
Run configuration
Automatic colon insertion after if, while, etc..
Support all ipython commands.
Inline display for graphics produced using Matplotlib.
Also provides features such as help, file, explorer, find files and so on.
Step 1:
Go to Anaconda website https://www.anaconda.com.
Step 3: Choose the version that is suitable for your OS and click on
download.
Step 5:
Launch Sypder from the Anaconda Navigator.
2) Working With Numpy Arrays
import numpy as np
a=np.array([[1,2,3],[4,5,6]])
b=np.array([[10,11,12],[13,14,15]])
c= a + b
print(c)
[[11 13 15]
[17 19 21]]
import numpy as np
a= np.array([[1,2,3],[4,5,6]])
b= 3 * a
print(b)
[[ 3 6 9]
[12 15 18]]
import numpy as np
i=np.eye(4)
print(i)
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
import numpy as np
a=np.array([[1,2,3],[4,5,6], [7,8,9]])
b=np.array([[2,3,4],[5,6,7],[8,9,10]])
c= a@b
print(c)
[[ 36 42 48]
[ 81 96 111]
[126 150 174]]
import numpy as np
a =np.array([[1,2,3],[4,5,6], [7,8,9]])
b = a.T
print(b)
[[1 4 7]
[2 5 8]
[3 6 9]]
import numpy as np
a =np.array([[2.5, 3.8, 1.5],[4.7, 2.9, 1.56]])
b = a.astype('int')
print(b)
[[2 3 1]
[4 2 1]]
import numpy as np
a1 =np.array([[1,2,3],[4,5,6]])
a2 = np.array([[7,8,9],[10,11,12]])
c = np.hstack((a1, a2))
print(c)
[[ 1 2 3 7 8 9]
[ 4 5 6 10 11 12]]
import numpy as np
a = np.array([[1,2],[3,4], [5,6]])
b = np.array([[7,8],[9,10], [10,11]])
c = np.vstack((a, b))
print(c)
[[ 1 2]
[ 3 4]
[ 5 6]
[ 7 8]
[ 9 10]
[10 11]]
import numpy as np
list = [x for x in range(0, 101, 2)]
a=np.array(list)
print(a)
[ 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70
72 74 76 78 80 82 84 86 88 90 92 94 96 98 100]
import numpy as np
a =np.full((2, 3), 5)
print(a)
[[5 5 5]
[5 5 5]]
import numpy as np
a = np.array ([[1, 4, 2], [3, 4, 6],[0, -1, 5]])
print (np.sort(a, axis = None))
print (np.sort(a, axis = 1))
print (np.sort(a, axis = 0))
[-1 0 1 2 3 4 4 5 6]
[[ 1 2 4]
[ 3 4 6]
[-1 0 5]]
[[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]
3) Working With Numpy Arrays
import pandas as pd
df =pd.DataFrame()
print(df)
Empty DataFrame
Columns: []
Index: []
import pandas as pd
data =[1,2,3,4,5]
df =pd.DataFrame(data)
print (df)
0
0 1
1 2
2 3
3 4
4 5
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print (df)
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve','Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print (df)
Name Age
0 Tom 28
1 Jack 34
2 Steve 29
3 Ricky 42
import pandas as pd
d = { 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print (df)
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
import pandas as pd
data ={ 'Name':['Tom', 'Jack', 'Steve','Ricky'],'Age':[28,34,29,42]}
df =pd.DataFrame(data)
print (df)
df_sorted = df.sort_values(by='Name')
print ("Sorted data frame…")
print(df_sorted)
Name Age
0 Tom 28
1 Jack 34
2 Steve 29
3 Ricky 42
Sorted data frame…
Name Age
1 Jack 34
3 Ricky 42
2 Steve 29
0 Tom 28
import pandas as pd
d = { 'one' :pd.Series([1, 2, 3],
index=['a', 'b', 'c']),
'two' :pd.Series([1, 2, 3, 4],
index=['a', 'b', 'c', 'd'])}
df =pd.DataFrame(d)
print(df [ 'one'])
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
import pandas as pd
d = { 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
import pandas as pd
d = { 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' :pd.Series([10,20,30], index=['a','b','c'])}
df = pd.DataFrame(d)
print ("Deleting the first column using DEL function:")
del df['one']
print(df)
import pandas as pd
d = { 'one' :pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' :pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print
df.loc['b']
one 2.0
two 2.0
Name: b, dtype: float64
import pandas as pd
df =pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 =pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df =df.append(df2)
print (df)
a b
0 1 2
1 3 4
0 5 6
1 7 8
import pandas as pd
df =pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 =pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df =df.append(df2) # Drop rows with label 0 df =
df.drop(0)
print (df)
a b
0 1 2
1 3 4
0 5 6
1 7 8
4) Reading data from text files, Excel and the web and
exploring various commands for doing descriptive
analytics on the Iris data set.
import pandas as pd
df=pd.read_csv("iris_csv.csv")
df.head()
df.shape
df.info()
df.describe()
df.isnull().sum()
df.value_counts("class")
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
import pandas as pd
import numpy as np
import statistics as st
df=pd.read_csv("diabetes_csv.csv")
print(df.shape)
print(df.info())
print('MEAN:\n',df.mean())
print('MEDIAN:\n',df.median())
print('MODE:\n',df.mode())
print('STANDARD DEVIATION\n',df.std())
print('VARIANCE:\n:',df.var())
print('SKEWNESS:\n:',df.skew())
print('KURTOSIS|n',df.kurtosis())
df.describe
Output:
(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
plas 31.972618
pres 19.355807
skin 15.952218
insu 115.244002
mass 7.884160
pedi 0.331329
age 11.760232
dtype: float64
VARIANCE:
: preg 11.354056
plas 1022.248314
pres 374.647271
skin 254.473245
insu 13281.180078
mass 62.159984
pedi 0.109779
age 138.303046
dtype: float64
SKEWNESS:
: preg 0.901674
plas 0.173754
pres -1.843608
skin 0.109372
insu 2.272251
mass -0.428982
pedi 1.919911
age 1.129597
dtype: float64
KURTOSIS|n preg 0.159220
plas 0.640780
pres 5.180157
skin -0.520072
insu 7.214260
mass 3.290443
pedi 5.594954
age 0.643159
dtype: float64
pedi age class
0 6 148 72 35 0 33.6 0.62 50 tested_positiv
7 e
1 1 85 66 29 0 26.6 0.35 31 tested_negativ
1 e
2 8 183 64 0 0 23.3 0.67 32 tested_positiv
2 e
3 1 89 66 23 94 28.1 0.16 21 tested_negativ
7 e
4 0 137 40 35 168 43.1 2.28 33 tested_positiv
8 e
.. ... ... ... ... ... ... ... ... ...
763 10 101 76 48 180 32.9 0.17 63 tested_negativ
1 e
764 2 122 70 27 0 36.8 0.34 27 tested_negativ
0 e
765 5 121 72 23 112 26.2 0.24 30 tested_negativ
5 e
766 1 126 60 0 0 30.1 0.34 47 tested_positiv
9 e
767 1 93 70 31 0 30.4 0.31 23 tested_negativ
5 e
Intercept: 153.72032548545178
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
Output:
dict_keys(['data', 'target', 'frame', 'DESCR', 'feature_names', 'data_fil
ename', 'target_filename', 'data_module'])
r^2: -0.44401265478624397
RMSE: 94.65723681369009
5c) Multiple Regression analysi
# Extract relevant
df = pd.read_csv('diabetes.csv')
print(data) print(target)
# Regression coefficients
print('Coefficients:', reg.coef_)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
f,ax=plt.subplots(figsize=(10,6))
x=df['Age']
ax=sns.distplot(x,bins=10)
plt.show()
Output:
6b) Density and contour plots
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
f,ax=plt.subplots(figsize=(10,6))
x=df['Age']
X=pd.Series(x,name="Age variable")
ax=sns.kdeplot(x,shade=True,color='r')
plt.show()
Output:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
f,ax=plt.subplots(figsize=(8,6))
ax=sns.countplot(x='ChestPain',data=df)
plt.show()
Output:
6c) Correlation and scatter plots
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
sns.pairplot(data=df)
Output:
<seaborn.axisgrid.PairGrid at 0x1a1534e2130>
6d) Histograms
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
plt.figure(figsize=(20,10))
sns.heatmap(df.corr(),annot=True,cmap='terrain')
Output:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
f,ax=plt.subplots(figsize=(8,6))
ax=sns.scatterplot(x="Age",y="RestBP",data=df)
6e) Three dimensional plotting
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
df.hist(figsize=(12,12),layout=(5,3))
Output:
array([[<AxesSubplot:title={'center':'Unnamed: 0'}>,
<AxesSubplot:title={'center':'Age'}>,
<AxesSubplot:title={'center':'Sex'}>],
[<AxesSubplot:title={'center':'RestBP'}>,
<AxesSubplot:title={'center':'Chol'}>,
<AxesSubplot:title={'center':'Fbs'}>],
[<AxesSubplot:title={'center':'RestECG'}>,
<AxesSubplot:title={'center':'MaxHR'}>,
<AxesSubplot:title={'center':'ExAng'}>],
[<AxesSubplot:title={'center':'Oldpeak'}>,
<AxesSubplot:title={'center':'Slope'}>,
<AxesSubplot:title={'center':'Ca'}>],
[<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>]], dtype=object)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("Heart.csv")
fig = plt.figure()
ax = plt.axes(projection='3d')
x=df["Age"]
x=pd.Series(x,name="Age variable")
y=df["Sex"]
y=pd.Series(y,name="Sex variable")
z=df["Chol"]
z=pd.Series(z,name="Cholestrol Variable")
ax.plot3D(x,y,z,'green')
ax.set_title('3D line plot Heart disease dataset')
plt.show()
Output:
I
7) Visualizing Geographic Data with Basemap
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.feature import NaturalEarthFeature
plt.figure(figsize=(8, 8))
Output:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.feature import NaturalEarthFeature
# Plot Seattle
seattle_lon, seattle_lat = -122.3, 47.6
ax.plot(seattle_lon, seattle_lat, 'ok', markersize=5)
ax.text(seattle_lon, seattle_lat, ' Seattle', fontsize=12, transform=ccrs
Output:
In [16]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from itertools import chain
Output:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from itertools import chain
Output:
Content Beyond Syllabus
Aim:
To Use NumPy to implement a simple image processing algorithm.
Procedure:
Step 1: Install Required Libraries
Make sure you have NumPy, Matplotlib, and Pillow installed. If not, you can install them
using:
pip install numpy matplotlib Pillow
Program:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
# Grayscale conversion
gray_image = np.mean(image_array, axis=-1, keepdims=True)
# Display the grayscale image
plt.subplot(1, 3, 2)
plt.imshow(np.squeeze(gray_image), cmap='gray')
plt.title('Grayscale Image')
Output:
Result:
Thus NumPy is used to implement a simple image processing algorithm and executed
Successfully.
9. Use NumPy to implement a simple algorithm for image classification.
Aim:
To Use NumPy to implement a simple algorithm for image classification.
Procedure:
Step 1: Install Required Libraries
Make sure you have NumPy installed:
Step 6: Training
Train the model using simple gradient descent. Replace this with a more sophisticated
training algorithm for a real-world scenario:
Program:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
X = np.random.rand(num_samples, image_size)
y = np.random.randint(0, num_classes, size=num_samples)
# Training
def train(model, X, y, learning_rate=0.001, epochs=100):
for epoch in range(epochs):
predictions = model.predict(X)
loss = np.mean((predictions - y.reshape(-1, 1))**2)
if epoch % 10 == 0:
print(f"Epoch {epoch}, Loss: {loss}")
# Make predictions
test_predictions = np.argmax(model.predict(X_test), axis=1)
# Evaluate accuracy
accuracy = accuracy_score(y_test, test_predictions)
print(f"Accuracy: {accuracy}")
Output:
Result:
Thus NumPy is used to implement a simple algorithm for image classification and executed
Successfully.