Data Science Lab Manual
Data Science Lab Manual
Laboratory Manual
AD 3411- Data Scince And Analaytics Laboratory
UG Regulation 2021
Degree/Branch: B.Tech – AI&DS
Year/Semester ; II /IV
Acadamic Year: 2023-2024
Revision No :01
Prepared By Approved By
Dr.P.Senthil Pandian HOD/CSE
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
To be a premier institute for higher education, nurturing youth and other stakeholders as global,
socially responsible citizens through academic, technical & innovative excellence and inclusivity.
M1: To create an ecosystem of academic, research & innovation and skilled learning
M2: To mentor and facilitate inclusive learners to acquire academic excellence and application
M4: To Up skill and encourage transformation of learners as lifelong learners and responsible
change makers
M5: To promote the above through social immersion, community engagement, technical
upgradation and industry-institute interaction
To provide Quality education and research in the field of Computer Science and Engineering
with emerging technologies and to develop a self-motivated, employable individuals to the society
M1: To provide an environment for teaching, learning and research in the theory and
applications of Computer Science and Engineering.
M2: To prepare the students for productive careers with multidisciplinary and leadership skills.
M3: To inculcate the values of ethics and social responsibilities for their future endeavors to
serve the society.
PROGRAMMOUTCOMES
2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics,natural sciences, and engineering sciences
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities
and norms of the engineering practice.
PEO 2: Posses the ability to work and adapt with strong focus on problem solving and
interpersonal skills in multidisciplinary teams, leadership roles and other diverse career
paths (Professionalism)
4. Frequency distributions
5. Averages
6. Variability
7. Normal curves
9. Correlation coefficient
10. Regression
CONTENTS
Marks Staff
Sl. No. Name of the Experiment Page No. (100) Signature
AIM
ALGORITHM
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operactions of
array Step4: Stop
PROGRAM
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2,
3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ",
arr.ndim) # Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of
array print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)
OUTPUT
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing")
print(a[1:])
Output
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]
Output:
Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]
The items in the second column are:
[2 4 5]
The items in the second row are:
[3 4 5]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]
Result:
Thus the working with Numpy arrays was successfully completed.
Ex no: 2 Create a dataframe using a list of elements.
Aim:
ALGORITHM
Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop
PROGRAM
import numpy as np
import pandas as pd
data = np.array([['','Col1','Col2'],
['Row1',1,2],
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:],
index = data[1:,0],
columns=data[0,1:]))
# Take a 2D array as input to your DataFrame
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))
Output:
Col1 Col2
Row1 1 2
Row2 3 4
0 1 2
0 1 2 3
1 4 5 61 2 3
0 1 1 2
1 3 2 4A
0 4
1 5
2 6
3 7
0
United Kingdom London
India New Delhi
United States Washington
Belgium Brussels
(2, 3)
2
Result:
Thus the working with Pandas data frames was successfully completed.
Ex. No.:3 Basic plots using
Matplotlib Aim:
ALGORITHM
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop
Program:3a
# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]
Program:3b
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# naming the x-
axis plt.xlabel('Day
->')
Output:
Program:4c
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15]
# use fig whenever u want the
# output in a new window
also # specify the window size
you # want ans to be
displayed
fig = plt.figure(figsize =(10, 10))
sub1.plot(a, 'sb')
sub2.plot(b, 'or')
sub4.plot(c, 'Dm')
Output:
Result:
Thus the basic plots using Matplotlib in Python program was successfully completed.
Ex. No.:4 Frequency distributions
Aim:
To Count the frequency of occurrence of a word in a body of text is often needed during
text processing.
ALGORITHM
Program:
from nltk.tokenize import
word_tokenize from nltk.corpus import
gutenberg
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample)
wlist = []
for i in range(50):
wlist.append(token[i])
Result:
Thus the count the frequency of occurrence of a word in a body of text is often needed
during text processing and Conditional Frequency Distribution program using python was
successfully completed.
Ex. No.:5 Averages
Aim:
To compute weighted averages in Python either defining your own functions or using Numpy
ALGORITHM
Program:6c
df['employees_number']),2) weighted_avg_m3
Output:
44225.35
Result:
Thus the compute weighted averages in Python either defining your own functions or using
Numpy was successfully completed.
Ex. No.: 6. Variability
Aim:
To write a python program to calculate the variance.
ALGORITHM
Program:
# Python code to demonstrate variance()
# function on varying range of data-types
Output :
Result:
Thus the computation for variance was successfully completed.
Ex. No.:7 Normal Curve
Aim:
To create a normal curve using python program.
ALGORITHM
Program:
sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density')
Output:
Result:
Thus the normal curve using python program was successfully completed.
Ex. No.: 8 Correlation and scatter plots
Aim:
To write a python program for correlation with scatter plot
ALGORITHM
Program:
# Data
#Plot
# Plot
Result:
Thus the Correlation and scatter plots using python program was successfully completed.
Ex. No.: 9 Correlation coefficient
Aim:
To write a python program to compute correlation coefficient.
ALGORITHM
Program:
i=0
while i < n :
# sum of elements of array
X. sum_X = sum_X + X[i]
# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]
Output :
0.953463
Result:
Thus the computation for correlation coefficient was successfully completed.
Ex. No.: 10 Simple Linear Regression
Aim:
To write a python program for Simple Linear Regression
ALGORITHM
Program:
import numpy as np
import matplotlib.pyplot as plt
# mean of x and y
vector m_x =
np.mean(x)
m_y = np.mean(y)
# predicted response
vector y_pred = b[0] +
b[1]*x
# putting
labels
plt.xlabel('x')
plt.ylabel('y')
# function to show
plot plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating
coefficients b =
estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
Graph:
Result:
Thus the computation for Simple Linear Regression was successfully completed.