Uploaded by

akshatswamiisro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views4 pages

DAV Guidelines

Uploaded by

akshatswamiisro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

DISCIPLINE SPECIFIC ELECTIVE COURSE: Data Analysis and Visualization

Sem-III (Bsc(H) CS ) UGCF

TOPICS/UNITS Chapter Ref

Week 1 Unit 1 Introduction to basic statistics and Ch1: pg 6-29, pg 32-33 [2]
to 2 analysis: Fundamentals of Data Analysis,
Statistical foundations for Data Analysis, Types
of data, Descriptive Statistics, Correlation and
covariance, Linear Regression, Statistical Ch 1: 1.3 (pg 4-6) [1]
Hypothesis Generation and Testing
Python Libraries: NumPy, Pandas, Matplotlib

Week 3 Unit 2 Array manipulation using Numpy: Ch4:4.1-4.2, Usage of rand(), randn() and randint() [1]
to 5 NumPy array: Creating NumPy arrays, various functions of NumPy
data types of NumPy arrays
Indexing and slicing, swapping axes, transposing
arrays, data processing using Numpy arrays

Week 6 Unit 3 Data Manipulation using Pandas: Data Ch 5: 5.1, 5.2 excluding Arithmetic and data [1]
to 10 Structures in Pandas: Series, Data Frame, Index alignment, axis indexes with duplicate labels, 5.3
objects, loading data into Panda’s data frame,
Working with Data Frames: Arithmetics, Ch 6: 6.1 excluding JSON data and XML data,, 6.2
Statistics, Binning, Indexing, Reindexing, Reading Microsoft Excel files only
Filtering, Handling missing data, Hierarchical
indexing, Data wrangling: Data cleaning, Chapter 7 : 7.1, 7.2 till Detection and Filtering
transforming, merging and reshaping Outliers,7.3 till String object methods

Chapter 8 : 8.1, 8.2 exclude combining data with

overlap, 8.3 till Reshaping with Hierarchical
Indexing

Week Unit 4 Plotting and Visualization: Using Chapter 9 : 9.1, 9.2 excluding Facet Grids and [1]
11 to 13 Matplotlib to plot data: figures, subplots, Categorical Data
markings, color and line styles, labels and
legends, Plotting functions in Pandas: Lines, bar, Ch 5 : pg 281-282 [2]
Scatter plots, histograms, stacked bars, Heatmap

Week Data Aggregation and Group operations: Chapter 10: 10.1, 10.2, 10.3 excluding example [1]
14 to 15 Group by mechanics, Data aggregation, General Group wise Linear Regression, 10.4
split-apply-combine, Pivot tables and cross
tabulation

References
1. McKinney W. Python for Data Analysis: Data Wrangling with Pandas, NumPy and IPython. 2nd edition. O’Reilly
Media, 2018..
2. Molin S. Hands-On Data Analysis with Pandas, Packt Publishing, Second Edition, 2021.
3. Gupta S.C., Kapoor V.K., Fundamentals of Mathematical Statistics, Sultan Chand & Sons, 2020.
Suggested Practical List For Data Analysis and Visualization DSE Sem III
Note:
● Any platform for Python can be used for lab exercises
● Use a data set of your choice from Open Data Portal (https:// data.gov.in/, UCI repository) or load from scikit,
seaborn library for the following exercises to practice the concepts learnt.

1. Write programs in Python using NumPy library to do the following:

a. Create a two dimensional array, ARR1 having random values from 0 to 1. Compute the mean, standard
deviation, and variance of ARR1 along the second axis.
b. Create a 2-dimensional array of size m x n integer elements, also print the shape, type and data type of
the array and then reshape it into an n x m array, where n and m are user inputs given at the run time.
c. Test whether the elements of a given 1D array are zero, non-zero and NaN. Record the indices of these
elements in three separate arrays.
d. Create three random arrays of the same size: Array1, Array2 and Array3. Subtract Array 2 from Array3
and store in Array4. Create another array Array5 having two times the values in Array1. Find Co-
variance and Correlation of Array1 with Array4 and Array5 respectively.
e. Create two random arrays of the same size 10: Array1, and Array2. Find the sum of the first half of both
the arrays and product of the second half of both the arrays.
f. Create an array with random values. Determine the size of the memory occupied by the array.
g. Create a 2-dimensional array of size m x n having integer elements in the range (10,100). Write
statements to swap any two rows, reverse a specified column and store updated array in another
variable

2. Do the following using PANDAS Series:

a. Create a series with 5 elements. Display the series sorted on index and also sorted on values seperately
b. Create a series with N elements with some duplicate values. Find the minimum and maximum ranks
assigned to the values using ‘first’ and ‘max’ methods
c. Display the index value of the minimum and maximum element of a Series

3. Create a data frame having at least 3 columns and 50 rows to store numeric data generated using a random
function. Replace 10% of the values by null values whose index positions are generated using random function.
Do the following:
a. Identify and count missing values in a data frame.
b. Drop the column having more than 5 null values.
c. Identify the row label having maximum of the sum of all values in a row and drop that row.
d. Sort the data frame on the basis of the first column.
e. Remove all duplicates from the first column.
f. Find the correlation between first and second column and covariance between second and third
column.
g. Discretize the second column and create 5 bins.

4. Consider two excel files having attendance of two workshops, each of duration 5 days. Each file has three
fields ‘Name’, ‘Date, duration (in minutes) where names may be repetitve within a file. Note that duration may
take one of three values (30, 40, 50) only. Import the data into two data frames and do the following:
a. Perform merging of the two data frames to find the names of students who had attended both
workshops.
b. Find names of all students who have attended a single workshop only.
c. Merge two data frames row-wise and find the total number of records in the data frame.
d. Merge two data frames row-wise and use two columns viz. names and dates as multi-row indexes.
Generate descriptive statistics for this hierarchical data frame.

5. Using Iris data, plot the following with proper legend and axis labels: (Download IRIS data from:
https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn datasets)
a. Load data into pandas’ data frame. Use pandas.info () method to look at the info on datatypes in the
dataset.
b. Find the number of missing values in each column (Check number of null values in a column using
df.isnull().sum())
c. Plot bar chart to show the frequency of each class label in the data.
d. Draw a scatter plot for Petal Length vs Sepal Length and fit a regression line
e. Plot density distribution for feature Petal width.
f. Use a pair plot to show pairwise bivariate distribution in the Iris Dataset.
g. Draw heatmap for any two numeric attributes
h. Compute mean, mode, median, standard deviation, confidence interval and standard error for each
numeric feature
i. Compute correlation coefficients between each pair of features and plot heatmap

6. Using Titanic dataset, to do the following:

a. Clean the data by dropping the column which has the largest number of missing values.
b. Find total number of passengers with age more than 30

c. Find total fare paid by passengers of second class

d. Compare number of survivors of each passenger class
e. Compute descriptive statistics for age attribute gender wise
f. Draw a scatter plot for passenger fare paid by Female and Male passengers separately
g. Compare density distribution for features age and passenger fare
h. Draw the pie chart for three groups labelled as class 1, class 2, class 3 respectively displayed in different
colours. The occurrence of each group converted into percentage should be displayed in the pie chart.
Appropriately Label the chart.
i. Find % of survived passengers for each class and answer the question “Did class play a role in survival?”.

7. Consider the following data frame containing a family name, gender of the family member and her/his monthly
income in each record.
FamilyName Gender MonthlyIncome (Rs.) What is your problem?
Shah Male 44000.00
Vats Male 65000.00
Vats Female 43150.00
Kumar Female 66500.00
Vats Female 255000.00
Kumar Male 103000.00
Shah Male 55000.00
Shah Female 112400.00
Kumar Female 81030.00
Vats Male 71900.00
Write a program in Python using Pandas to perform the following:
a. Calculate and display familywise gross monthly income.
b. Display the highest and lowest monthly income for each family name
c. Calculate and display monthly income of all members earning income less than Rs. 80000.00.
d. Display total number of females along with their average monthly income
e. Delete rows with Monthly income less than the average income of all members

Project : Students are required to work on a good dataset in consultation with their faculty and apply the concepts
learned in the course. Each project must include the following:

i. Download a dataset (Either web-scrap the data from various data sources like twitter, amazon, news sites or
download a dataset from kaggle/UCI/data.gov.in etc.). Select a dataset which requires at least two steps of data
cleaning and two steps of data pre-processing.
ii. Make an objective of the data analysis for that dataset. Depending on the dataset, perform the data cleaning, data
pre-processing steps.
iii. The data cleaning steps may include handling of missing values, handling duplicate data, handling inconsistent
data (e.g. height is given in feet for some objects and in inches in some other objects), removing redundant data
(e.g. in some datasets, age and date of birth are given as two column while the analysis needs only the age),
handling incomplete data (e.g. email address doesn’t have @ symbol).
iv. The pre-processing steps may include transforming the data to some other format (e.g text comment converted to
term vector, image files converted to various types of features), discretization and binning,
standardization/normalization and outlier detection. Some string manipulations may also be required.
v. Prepare at least eight analysis questions for exploration of the data and at least two questions for the
visualization of the data.

Prepared By:

1. Dr. Anamika Gupta (SSCBS) 2. Prof Arpita Sharma (DDUC) 3. Prof Hema Banati (DSC) 4. Prof. Sharanjit Kaur (ANDC)

Essentials of Human Diseases and Conditions 7th Edition TEXTBOOK
33% (3)
Essentials of Human Diseases and Conditions 7th Edition TEXTBOOK
13 pages
GE Practical Sem 2 (2)
No ratings yet
GE Practical Sem 2 (2)
28 pages
DAV Practical File 234003
No ratings yet
DAV Practical File 234003
14 pages
Guidelines_DAVP
No ratings yet
Guidelines_DAVP
3 pages
manishadav
No ratings yet
manishadav
27 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
23HCS4142.pdf
No ratings yet
23HCS4142.pdf
24 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
GEC PRACTICALS
No ratings yet
GEC PRACTICALS
31 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
DAV_practicle_File
No ratings yet
DAV_practicle_File
28 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Guidelines_ Data Exploration and Visualization
No ratings yet
Guidelines_ Data Exploration and Visualization
3 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Data Science
No ratings yet
Data Science
18 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
python 1
No ratings yet
python 1
16 pages
DAV Practical
No ratings yet
DAV Practical
12 pages
CS 2 SEM SYLLABUS
No ratings yet
CS 2 SEM SYLLABUS
3 pages
ASSIGNMENT 1
No ratings yet
ASSIGNMENT 1
2 pages
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
DSA lab manual pgms_fINAL
No ratings yet
DSA lab manual pgms_fINAL
34 pages
Khadeeja_DS_PRACTICAL 4
No ratings yet
Khadeeja_DS_PRACTICAL 4
24 pages
Copy of AE II Simulation File.pdf
No ratings yet
Copy of AE II Simulation File.pdf
32 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Data Science
No ratings yet
Data Science
3 pages
Practical Assignment4 1
No ratings yet
Practical Assignment4 1
6 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
GE02 (DAVP) Assignment
No ratings yet
GE02 (DAVP) Assignment
3 pages
dav end sem (1)
No ratings yet
dav end sem (1)
2 pages
Lab #2 - Data Analysis With NumPy and Pandas
No ratings yet
Lab #2 - Data Analysis With NumPy and Pandas
7 pages
Pracfile Program Index XII-C IP 2023-24
No ratings yet
Pracfile Program Index XII-C IP 2023-24
6 pages
External
No ratings yet
External
11 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
CS3361 Set2
No ratings yet
CS3361 Set2
6 pages
Practical List 2022-23
100% (1)
Practical List 2022-23
4 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Worksheet-1 (Python)
No ratings yet
Worksheet-1 (Python)
9 pages
Pandas_Worksheet
No ratings yet
Pandas_Worksheet
19 pages
Python Practical Questions@Subas
No ratings yet
Python Practical Questions@Subas
7 pages
DSBDA_Manual[1]
No ratings yet
DSBDA_Manual[1]
76 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
index
No ratings yet
index
4 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Lab 9
No ratings yet
Lab 9
2 pages
PYQ Data Analysis and Visualisation Using Python GE May 2024
No ratings yet
PYQ Data Analysis and Visualisation Using Python GE May 2024
6 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
DP prog
No ratings yet
DP prog
10 pages
IP Book 12 Question Bank
No ratings yet
IP Book 12 Question Bank
20 pages
EX-02-Data manipulation pandas matplot
No ratings yet
EX-02-Data manipulation pandas matplot
9 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
From Everand
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
Abhishek Mishra
No ratings yet
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
Clinical Anatomy and Operative Surgery (2-3 Years_ Medicine )
No ratings yet
Clinical Anatomy and Operative Surgery (2-3 Years_ Medicine )
14 pages
Grade 3
No ratings yet
Grade 3
14 pages
Steam Table in Bars
No ratings yet
Steam Table in Bars
9 pages
Philips
No ratings yet
Philips
19 pages
NASA: Monograph37
100% (1)
NASA: Monograph37
175 pages
Office of The Secretary: Use of Slip-Form Paver in Portland Cement Concrete Pavement Construction
No ratings yet
Office of The Secretary: Use of Slip-Form Paver in Portland Cement Concrete Pavement Construction
2 pages
First Mock 2025 Obj c. Maths
No ratings yet
First Mock 2025 Obj c. Maths
3 pages
20211bdc0003 Diya Maria Benoy 27june2023 2pdf 2
No ratings yet
20211bdc0003 Diya Maria Benoy 27june2023 2pdf 2
3 pages
Effects of Heavy-Vehicle Characteristics On Pavement Response and Performance
100% (1)
Effects of Heavy-Vehicle Characteristics On Pavement Response and Performance
136 pages
04-01: The Review Questions For Basics of Sheet Metalworking
No ratings yet
04-01: The Review Questions For Basics of Sheet Metalworking
7 pages
Myanmar's Current Transportation System Concerning With Development
100% (1)
Myanmar's Current Transportation System Concerning With Development
3 pages
Artificial Intelligence C 1&2
No ratings yet
Artificial Intelligence C 1&2
19 pages
Alkylphenol Ethoxylates (APEO) in Textiles
No ratings yet
Alkylphenol Ethoxylates (APEO) in Textiles
2 pages
William Shakespeare Poems
No ratings yet
William Shakespeare Poems
42 pages
Resource and Development: Resources
No ratings yet
Resource and Development: Resources
1 page
Bohol English4 Q4 PLP WK2
No ratings yet
Bohol English4 Q4 PLP WK2
7 pages
Value at Our Core: 2014 Annual Report
No ratings yet
Value at Our Core: 2014 Annual Report
144 pages
All Photocopiers Error Codes & Remedies
No ratings yet
All Photocopiers Error Codes & Remedies
74 pages
A Refolution in Five Parts From Critical Modernism - Where Is Post-Modernism Going - by Charles Jencks - 006-13-Preface-109
No ratings yet
A Refolution in Five Parts From Critical Modernism - Where Is Post-Modernism Going - by Charles Jencks - 006-13-Preface-109
8 pages
Mechanical properties of matter
No ratings yet
Mechanical properties of matter
13 pages
Grade 7 Bio Paper
No ratings yet
Grade 7 Bio Paper
3 pages
Good Agricultural Practice For Corn - (GAP Corn)
No ratings yet
Good Agricultural Practice For Corn - (GAP Corn)
10 pages
Body Fluids and Circulation
No ratings yet
Body Fluids and Circulation
13 pages
Dorner 3200 Vacuum Conveyor
No ratings yet
Dorner 3200 Vacuum Conveyor
8 pages
Vitamin D
No ratings yet
Vitamin D
7 pages
Report (Refrigeration)
No ratings yet
Report (Refrigeration)
9 pages
Audi A4, A5, Q5 (B8) ABS MVB - Measuring Value Blocks
100% (1)
Audi A4, A5, Q5 (B8) ABS MVB - Measuring Value Blocks
10 pages
QAHB Vol-I
No ratings yet
QAHB Vol-I
522 pages
GS Circulating: ISO VG 100, 150
No ratings yet
GS Circulating: ISO VG 100, 150
1 page