0% found this document useful (0 votes)

39 views

DMV Lab 12

Aids

Uploaded by

sahilmukund.awasarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

DMV Lab 12

Aids

Uploaded by

sahilmukund.awasarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Department of AI & DS Engineering Computer Lab I

*************************************************************************************
Part II
Assignment 12
*************************************************************************************
Title: Data Aggregation
Problem Statement: Analyzing Sales Performance by Region in a Retail Company.
Dataset: ""Retail_Sales_Data.csv"
Description: The dataset contains information about sales transactions in a retail
company. It includes attributes such as transaction date, product category, quantity sold,
and sales amount. The goal is to perform data aggregation to analyze the sales performance
by region and identify the top-performing regions.
Tasks to Perform:
1. Import the "Retail_Sales_Data.csv" dataset.
2. Explore the dataset to understand its structure and content.
3. Identify the relevant variables for aggregating sales data, such as region, sales amount,
and product category.
4. Group the sales data by region and calculate the total sales amount for each region.
5. Create bar plots or pie charts to visualize the sales distribution by region.
6. Identify the top-performing regions based on the highest sales amount.
7. Group the sales data by region and product category to calculate the total sales amount
for each combination.
8. Create stacked bar plots or grouped bar plots to compare the sales amounts across
different regions and product categories.

Theory:

Data aggregation is the process of combining data from multiple sources or summarizing
data from a single source to produce a more concise and meaningful representation of the
data. It can be used to identify trends, patterns, and relationships in the data that would not
be apparent if the data was analyzed individually.

Data aggregation can be performed on a variety of data types, including numerical data,
categorical data, and text data. Some common aggregation operations include:

 Sum: Calculates the sum of all values in a column or group of columns.

 Mean: Calculates the average of all values in a column or group of columns.
 Median: Calculates the middle value in a sorted list of values.
 Mode: Calculates the most frequently occurring value in a column or group of
columns.
 Count: Counts the number of non-null values in a column or group of columns.
Examples of how data aggregation:

Matoshri College of Engineering & Research Centre, Nashik

Department of AI & DS Engineering Computer Lab I

 Identifying the top-performing sales regions: A retail company can aggregate sales
data by region to identify the regions that are generating the most revenue.
 Tracking website traffic: A website owner can aggregate website traffic data by
source to identify the most effective marketing channels.
 Data aggregation is a powerful tool that can be used to gain insights from data and
make better decisions.
Benefits of data aggregation:

 Improved efficiency: Data aggregation can help businesses to improve their

efficiency by automating tasks such as report generation and data analysis.

 Increased accuracy: Data aggregation can help businesses to increase the accuracy
of their data by reducing the number of errors that occur when data is manually
processed.

 Enhanced decision-making: Data aggregation can help businesses to make better

decisions by providing them with a more complete and accurate view of their data.

1. Import the "Retail_Sales_Data.csv" dataset.

It is csv file, we can use pandas. The read_csv is used to read it into a DataFrame:
In [5]:
import
2. pandas as pd
3.
# 4.Import Dataset
df=pd.read_csv('Retail_Sales_Data.csv');
5.
df.head();

2. Explore the dataset to understand its structure and content.

 data.head(): Assuming that data is a pandas DataFrame, head() is a method that is
used to display the first few rows of the DataFrame. It provides a quick way to
inspect the structure and content of the dataset. by default, shows the first 5 rows of
the DataFrame. You can specify the number of rows you want to display by passing a
number inside the parentheses, like data.head(10) to show the first 10 rows.
 info() method in Pandas is used to get a concise summary of a DataFrame, including
information about its columns, data types, non-null values, and memory usage.
When you call df.info(), it prints a summary report to the console.

Matoshri College of Engineering & Research Centre, Nashik

Department of AI & DS Engineering Computer Lab I

#Explore the dataset

data.head()

#Explore the dataset

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99457 entries, 0 to 99456
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 invoice_no 99457 non-null object
1 customer_id 99457 non-null object
2 gender 99457 non-null object
3 age 99457 non-null int64
4 category 99457 non-null object
5 quantity 99457 non-null int64
6 price 99457 non-null float64
7 payment_method 99457 non-null object
8 invoice_date 99457 non-null object
9 shopping_mall 99457 non-null object
dtypes: float64(1), int64(2), object(7)
memory usage: 7.6+ MB

3. Identify the relevant variables for aggregating sales data, such as region, sales
amount, and product category.
Aggregating sales data involves summarizing and grouping information to derive
meaningful insights. The choice of relevant variables depends on the specific goals of
analysis and the nature of the business. However, some common variables for aggregating
sales data include:
1. Shopping Mall:
 Mall ID or Name: To identify and group sales data based on the shopping mall
where the transactions occurred.
2. Price:
 Unit Price: The price of each unit of the product sold.
 Total Sales: The total revenue generated from the sale of products.

Matoshri College of Engineering & Research Centre, Nashik

Department of AI & DS Engineering Computer Lab I

3. Category:
 Product Category: Categorization of products based on their type (e.g.,
electronics, clothing, food).
 Sales by Category: Aggregating sales data based on the product categories.

#Identify relevant variables & make list of relevant variables

relevant_columns = ["shopping_mall", "price", "category"]

4. Group the sales data by region and calculate the total sales amount for each
region.

Grouping sales data by region and shopping mall and calculating the total sales amount for
each region serves several important purposes in business analysis and decision-making.
In[14]:
#Group by shopping mall and calculate total sales amount
sales_by_region = data.groupby("shopping_mall")["price"].sum()
sales_by_region

Out[14]:
shopping_mall
Cevahir AVM 3433671.84
Emaar Square Mall 3390408.31
Forum Istanbul 3336073.82
Istinye Park 6717077.54
Kanyon 13710755.24
Mall of Istanbul 13851737.62
Metrocity 10249980.07
Metropol AVM 6937992.99
Viaport Outlet 3414019.46
Zorlu Center 3509649.02
Name: price, dtype: float64

5. Create bar plots or pie charts to visualize the sales distribution by region.

Pie charts represent data in a circular graph, where each slice (or sector) of the pie
corresponds to a particular category or group. In the context of sales distribution by region,
each slice represents a different region, and the size of each slice indicates the proportion
of sales attributed to that region.
In[15]:
#Create a pie plot to visualize sales distribution by region
plt.figure(figsize=(6, 6))
plt.pie(sales_by_region, labels=sales_by_region.index,
autopct="%1.1f%%", startangle=140)
plt.title("Sales Distribution by Region")
plt.axis("equal")
plt.show()

Matoshri College of Engineering & Research Centre, Nashik

Department of AI & DS Engineering Computer Lab I

Out[15]:

In[16]:
plt.figure()
plt.bar(sales_by_region.index, sales_by_region.values)
plt.xlabel('Region')
plt.ylabel('Total Sales Amount')
plt.title('Total Sales Amount by Region')
plt.show()
Out[16]:

Matoshri College of Engineering & Research Centre, Nashik

Department of AI & DS Engineering Computer Lab I

6. Identify the top-performing regions based on the highest sales amount.

In[17]:

#Identify top-performing regions

top_regions = sales_by_region.sort_values(ascending=False).head(5)
print("Top-performing regions:")
print(top_regions)

Out[17]:
Top-performing regions:
shopping_mall
Mall of Istanbul 13851737.62
Kanyon 13710755.24
Metrocity 10249980.07
Metropol AVM 6937992.99
Istinye Park 6717077.54
Name: price, dtype: float64

7. Group the sales data by region and product category to calculate the total sales amount for
each combination.
In[18]:
#Group by region of shopping mall and product category, calculate
total sales
sales_by_region_category = data.groupby(["shopping_mall",
"category"])["price"].sum()

Out[17]:
shopping_mall category
Cevahir AVM Books 11998.80
Clothing 1554414.40
Cosmetics 88394.84
Food & Beverage 11992.39
Shoes 884050.41
...
Zorlu Center Food & Beverage 11589.68
Shoes 953670.13
Souvenir 8398.68
Technology 803250.00
Toys 54691.84
Name: price, Length: 80, dtype: float64

8. Create stacked bar plots or grouped bar plots to compare the sales amounts across
different regions and product categories.

A stacked bar plot is a type of bar chart that represents individual data values as bars, with
each bar divided into segments that represent different categories or groups. In the context

Matoshri College of Engineering & Research Centre, Nashik

Department of AI & DS Engineering Computer Lab I

of comparing sales amounts across different regions and product categories, a stacked bar
plot can be a useful visualization tool.
A stacked bar plot to compare sales amounts:
1. Plotting:
 Use a plotting library (such as Matplotlib in Python or ggplot2 in R) to create a
stacked bar plot.
 The x-axis should represent the regions, and the y-axis should represent the total
sales amount.
 Each bar is divided into segments, with each segment representing a different
product category.
 The height of each segment corresponds to the sales amount for that specific
product category in the respective region.
2. Color Coding:
 Assign different colors to each product category to make it visually clear which part
of the bar corresponds to which category.
 The stacked nature of the bars helps in comparing the total sales amounts across
regions while also understanding the contribution of each product category within a
region.
3. Legend and Labels:
 Include a legend to explain the color-coding of the product categories.
 Label the axes and provide a title to make the plot more informative.

Use following code to visualize stacked bar plot to compare sales across regions and
categories
In[19]:
#Create a stacked bar plot to compare sales across regions and
categories
sales_by_region_category.unstack().plot(kind="bar", stacked=True,
figsize=(12, 8))
plt.title("Sales Comparison by Region and Product Category")
plt.xlabel("Region")
plt.ylabel("Total Sales Amount")
plt.legend(title="Category")
plt.show()

Out[19]:

Matoshri College of Engineering & Research Centre, Nashik

Department of AI & DS Engineering Computer Lab I

Conclusion:

We have implemented data aggregation, we were able to analyze the sales performance of
the retail company by region and identify the top-performing regions. We also identified
the top-selling product categories.

Dated Sign
Performance Innovation Completion Total of Subject
Teacher
3 1 1 5

Matoshri College of Engineering & Research Centre, Nashik

Smart Data Discovery
No ratings yet
Smart Data Discovery
29 pages
5G - NR RACH Procedure
No ratings yet
5G - NR RACH Procedure
22 pages
Data Analysis On BigMart Sales
67% (3)
Data Analysis On BigMart Sales
17 pages
Data Collection and Data Cleaning: Next Connect To The Drive
No ratings yet
Data Collection and Data Cleaning: Next Connect To The Drive
16 pages
Supermarket Sales Data analysis
No ratings yet
Supermarket Sales Data analysis
6 pages
DATA AGGREGATION USING PYTHON (1)
No ratings yet
DATA AGGREGATION USING PYTHON (1)
33 pages
BA lab report 1
No ratings yet
BA lab report 1
6 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
DMV - 5 - Jupyter Notebook
No ratings yet
DMV - 5 - Jupyter Notebook
5 pages
DOC-20250118-WA0002.
No ratings yet
DOC-20250118-WA0002.
4 pages
Supermart Grocery Sales Analysis
No ratings yet
Supermart Grocery Sales Analysis
8 pages
Supermarket Sales Analysis 1
No ratings yet
Supermarket Sales Analysis 1
13 pages
Divyanshi 05401172023 Ds Practical
No ratings yet
Divyanshi 05401172023 Ds Practical
18 pages
Guides
No ratings yet
Guides
23 pages
Data Visualization For Python - Sales Retail - r1
No ratings yet
Data Visualization For Python - Sales Retail - r1
19 pages
Python Project
No ratings yet
Python Project
20 pages
rithika.ppt
No ratings yet
rithika.ppt
16 pages
Task-by-Task Guide - Retail Data Analysis (2)
No ratings yet
Task-by-Task Guide - Retail Data Analysis (2)
6 pages
Data Science
No ratings yet
Data Science
22 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
Sample Project 1
No ratings yet
Sample Project 1
14 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Cap 793
No ratings yet
Cap 793
17 pages
Ali Shafi BSBA 2-A 6522 Sales Market Data
No ratings yet
Ali Shafi BSBA 2-A 6522 Sales Market Data
40 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
Diwali Sales Analysis
No ratings yet
Diwali Sales Analysis
14 pages
Training
No ratings yet
Training
17 pages
Data Manipulation With Pandas - Yulei's Sandbox
No ratings yet
Data Manipulation With Pandas - Yulei's Sandbox
18 pages
E201-Aakah Jathore - Lab - Ass - No - 04
No ratings yet
E201-Aakah Jathore - Lab - Ass - No - 04
3 pages
Problem Statement 2.1
No ratings yet
Problem Statement 2.1
1 page
Python - Pandas_Numpy Interview Q&A
No ratings yet
Python - Pandas_Numpy Interview Q&A
12 pages
Intro To BA
No ratings yet
Intro To BA
7 pages
Assignment
No ratings yet
Assignment
2 pages
Notes 20241025083428
No ratings yet
Notes 20241025083428
4 pages
Avneesh_To be printed Information Practice
No ratings yet
Avneesh_To be printed Information Practice
8 pages
ALOJIPAN Assessment_Task_1_Sampling_Data_Visualization
No ratings yet
ALOJIPAN Assessment_Task_1_Sampling_Data_Visualization
12 pages
Visual Analytics Using Tableau-Class 3
No ratings yet
Visual Analytics Using Tableau-Class 3
16 pages
Data Visualization Case Study in Python
No ratings yet
Data Visualization Case Study in Python
7 pages
Project Sale Analysis
No ratings yet
Project Sale Analysis
8 pages
Diwali Sales Analysis EDA 1696347982
No ratings yet
Diwali Sales Analysis EDA 1696347982
8 pages
Unit - 1 Eda Continuation 2
No ratings yet
Unit - 1 Eda Continuation 2
34 pages
Assgn
No ratings yet
Assgn
6 pages
Questions Sales Data Analysis Project
No ratings yet
Questions Sales Data Analysis Project
11 pages
Coding and Communication in Statistics Presentation 2024
No ratings yet
Coding and Communication in Statistics Presentation 2024
11 pages
IP Project Final
No ratings yet
IP Project Final
9 pages
Advance Data Analytics ASSIGNMENT
No ratings yet
Advance Data Analytics ASSIGNMENT
10 pages
Sales Report Analysis Project for IP
No ratings yet
Sales Report Analysis Project for IP
17 pages
Data Mining & Data Warehouse
No ratings yet
Data Mining & Data Warehouse
17 pages
Q1063255_JEROMEBASIL_VSTT_SET_ASSIGNMENT
No ratings yet
Q1063255_JEROMEBASIL_VSTT_SET_ASSIGNMENT
24 pages
All Analysiscode Explanation
No ratings yet
All Analysiscode Explanation
22 pages
An Introduction To Data Warehousing: Yannis Kotidis
No ratings yet
An Introduction To Data Warehousing: Yannis Kotidis
32 pages
RITHIKA CONTENT
No ratings yet
RITHIKA CONTENT
25 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
08 Sales analysis
No ratings yet
08 Sales analysis
4 pages
PDF of Knowledge
No ratings yet
PDF of Knowledge
3 pages
Experiment 8
No ratings yet
Experiment 8
9 pages
Retail Pricing Using Optimization - by Riya Kulshrestha - Analytics Vidhya - Medium
No ratings yet
Retail Pricing Using Optimization - by Riya Kulshrestha - Analytics Vidhya - Medium
16 pages
EDA_Module_3-1
No ratings yet
EDA_Module_3-1
40 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
SalesDataAnalysisProject
No ratings yet
SalesDataAnalysisProject
4 pages
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet
Dell EMC PowerStore Introduction To The Platform
No ratings yet
Dell EMC PowerStore Introduction To The Platform
42 pages
TUGx-Abstracts 190702 PDF
No ratings yet
TUGx-Abstracts 190702 PDF
15 pages
chapter 7
No ratings yet
chapter 7
34 pages
Tunnel Quiz - Attempt Review
No ratings yet
Tunnel Quiz - Attempt Review
4 pages
28 - Flight Booking System
No ratings yet
28 - Flight Booking System
10 pages
Mind Games: The Tortured Lives of ‘Targeted Individuals’ | WIRED
No ratings yet
Mind Games: The Tortured Lives of ‘Targeted Individuals’ | WIRED
6 pages
Model Bank R13: T24 Updater Setup
No ratings yet
Model Bank R13: T24 Updater Setup
16 pages
CS607 MidTerm MCQs by Talha Sajid
No ratings yet
CS607 MidTerm MCQs by Talha Sajid
47 pages
Learngit Answerhint
No ratings yet
Learngit Answerhint
39 pages
ENG2005 Workshop W12
No ratings yet
ENG2005 Workshop W12
8 pages
Aditya Resume
No ratings yet
Aditya Resume
1 page
Graphic Design 2 Working With A Client Sample
No ratings yet
Graphic Design 2 Working With A Client Sample
47 pages
Opera V4 Users Guide
100% (7)
Opera V4 Users Guide
475 pages
rohini_30709080307
No ratings yet
rohini_30709080307
7 pages
Datasheet (3) (254-284) - 1-5
No ratings yet
Datasheet (3) (254-284) - 1-5
5 pages
Program
No ratings yet
Program
45 pages
Types of Media
No ratings yet
Types of Media
50 pages
DCET-1 Ans PDF
No ratings yet
DCET-1 Ans PDF
2 pages
D.I.S.C. (Digital Integrated Servo Control) Integration Manual
No ratings yet
D.I.S.C. (Digital Integrated Servo Control) Integration Manual
134 pages
Brochure Pss Sincal en s4
No ratings yet
Brochure Pss Sincal en s4
6 pages
Production Tools For The MPEG-H
No ratings yet
Production Tools For The MPEG-H
5 pages
Michael Arnold, Martin Gibbs, Tamara Kohn, James Meese, Bjorn Nansen - Death and Digital Media-Routledge (2018)
No ratings yet
Michael Arnold, Martin Gibbs, Tamara Kohn, James Meese, Bjorn Nansen - Death and Digital Media-Routledge (2018)
189 pages
CSE352 MIDSemAssignment2021-22 EvenSem
No ratings yet
CSE352 MIDSemAssignment2021-22 EvenSem
1 page
Sad Final
No ratings yet
Sad Final
25 pages
SONITROL S1000 Controller Datasheet
No ratings yet
SONITROL S1000 Controller Datasheet
4 pages
Ir Pulse Generator Pseudocode
No ratings yet
Ir Pulse Generator Pseudocode
3 pages
TC2823en-Ed04 Infocollect Usage Features Bug Fixes Debug Methods
No ratings yet
TC2823en-Ed04 Infocollect Usage Features Bug Fixes Debug Methods
25 pages
Key Concepts: 2.1 Introduction To Hyper Text Markup Language (HTML)
No ratings yet
Key Concepts: 2.1 Introduction To Hyper Text Markup Language (HTML)
66 pages
shenba resume updated
No ratings yet
shenba resume updated
2 pages