0% found this document useful (0 votes)

12 views

SalesDataAnalysis__1693296057

Uploaded by

SubhransuSekharSahoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

SalesDataAnalysis__1693296057

Uploaded by

SubhransuSekharSahoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

SalesDataAnalysis

August 27, 2023

[1]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

[2]: df = pd.read_csv('sales_data.csv')

[3]: df.head()

[3]: Order Date Order ID Product Product_ean \

0 2019-01-22 21:25:00 141234 iPhone 5.638009e+12
1 2019-01-28 14:15:00 141235 Lightning Charging Cable 5.563320e+12
2 2019-01-17 13:33:00 141236 Wired Headphones 2.113973e+12
3 2019-01-05 20:33:00 141237 27in FHD Monitor 3.069157e+12
4 2019-01-25 11:59:00 141238 Wired Headphones 9.692681e+12

catégorie Purchase Address Quantity Ordered \

0 Vêtements 944 Walnut St, Boston, MA 02215 1
1 Alimentation 185 Maple St, Portland, OR 97035 1
2 Vêtements 538 Adams St, San Francisco, CA 94016 2
3 Sports 738 10th St, Los Angeles, CA 90001 1
4 Électronique 387 10th St, Austin, TX 73301 1

Price Each Cost price turnover margin

0 700.00 231.0000 700.00 469.0000
1 14.95 7.4750 14.95 7.4750
2 11.99 5.9950 23.98 11.9900
3 149.99 97.4935 149.99 52.4965
4 11.99 5.9950 11.99 5.9950

[4]: df.isnull().sum()

[4]: Order Date 0

Order ID 0
Product 0
Product_ean 0
catégorie 0
Purchase Address 0

1
Quantity Ordered 0
Price Each 0
Cost price 0
turnover 0
margin 0
dtype: int64

[5]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185950 entries, 0 to 185949
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order Date 185950 non-null object
1 Order ID 185950 non-null int64
2 Product 185950 non-null object
3 Product_ean 185950 non-null float64
4 catégorie 185950 non-null object
5 Purchase Address 185950 non-null object
6 Quantity Ordered 185950 non-null int64
7 Price Each 185950 non-null float64
8 Cost price 185950 non-null float64
9 turnover 185950 non-null float64
10 margin 185950 non-null float64
dtypes: float64(5), int64(2), object(4)
memory usage: 15.6+ MB

[6]: df['Order Date'].head(1)

[6]: 0 2019-01-22 21:25:00

Name: Order Date, dtype: object

[7]: df['Order Year']=df['Order Date'].str.split(' ').str[0].str.split('-').str[0]

[8]: df['Order Month']=df['Order Date'].str.split(' ').str[0].str.split('-').str[1]

[9]: df.head(1)

[9]: Order Date Order ID Product Product_ean catégorie \

0 2019-01-22 21:25:00 141234 iPhone 5.638009e+12 Vêtements

Purchase Address Quantity Ordered Price Each Cost price \

0 944 Walnut St, Boston, MA 02215 1 700.0 231.0

turnover margin Order Year Order Month

0 700.0 469.0 2019 01

2
[10]: df['Purchase Address'].head(1)

[10]: 0 944 Walnut St, Boston, MA 02215

Name: Purchase Address, dtype: object

[11]: df['Purchase City']=df['Purchase Address'].str.split(',').str[1]

[12]: df.head(1)

[12]: Order Date Order ID Product Product_ean catégorie \

0 2019-01-22 21:25:00 141234 iPhone 5.638009e+12 Vêtements

Purchase Address Quantity Ordered Price Each Cost price \

0 944 Walnut St, Boston, MA 02215 1 700.0 231.0

turnover margin Order Year Order Month Purchase City

0 700.0 469.0 2019 01 Boston

[13]: df.drop(columns={'Order Date','Order ID','Product_ean','Purchase␣

↪Address'},axis=1,inplace=True)

[14]: df.head(1)

[14]: Product catégorie Quantity Ordered Price Each Cost price turnover \
0 iPhone Vêtements 1 700.0 231.0 700.0

margin Order Year Order Month Purchase City

0 469.0 2019 01 Boston

[15]: df['Product'].value_counts()

[15]: USB-C Charging Cable 21903

Lightning Charging Cable 21658
AAA Batteries (4-pack) 20641
AA Batteries (4-pack) 20577
Wired Headphones 18882
Apple Airpods Headphones 15549
Bose SoundSport Headphones 13325
27in FHD Monitor 7507
iPhone 6842
27in 4K Gaming Monitor 6230
34in Ultrawide Monitor 6181
Google Phone 5525
Flatscreen TV 4800
Macbook Pro Laptop 4724
ThinkPad Laptop 4128
20in Monitor 4101
Vareebadd Phone 2065

3
LG Washing Machine 666
LG Dryer 646
Name: Product, dtype: int64

[16]: def change(x):

if x in ['USB-C Charging Cable','Lightning Charging Cable']:
return 'Charging Cables'
elif x in ['AAA Batteries (4-pack)','AA Batteries (4-pack)']:
return 'Batteries'
elif x in ['Wired Headphones','Apple Airpods Headphones','Bose SoundSport␣
↪Headphones']:

return 'Headphones'
elif x in ['27in FHD Monitor','27in 4K Gaming Monitor','34in Ultrawide␣
↪Monitor','Flatscreen TV','20in Monitor']:

return 'Smart Tv'

elif x in ['iPhone','Google Phone','Vareebadd Phone']:
return 'Smart Phones'
elif x in ['Macbook Pro Laptop','ThinkPad Laptop']:
return 'Laptops'
elif x in ['LG Washing Machine','LG Dryer']:
return 'Cleaning Machines'
else:
return 'Others'

[17]: df['Product'] = df['Product'].apply(change)

[18]: df['Product'].value_counts()

[18]: Headphones 47756

Charging Cables 43561
Batteries 41218
Smart Tv 28819
Smart Phones 14432
Laptops 8852
Cleaning Machines 1312
Name: Product, dtype: int64

[19]: df.head(1)

[19]: Product catégorie Quantity Ordered Price Each Cost price \

0 Smart Phones Vêtements 1 700.0 231.0

turnover margin Order Year Order Month Purchase City

0 700.0 469.0 2019 01 Boston

[20]: df.rename(columns={'catégorie':'Category'},inplace=True)

4
[21]: df.head(1)

[21]: Product Category Quantity Ordered Price Each Cost price \

0 Smart Phones Vêtements 1 700.0 231.0

turnover margin Order Year Order Month Purchase City

0 700.0 469.0 2019 01 Boston

[22]: df['Category'].value_counts()

[22]: Sports 46925

Vêtements 46405
Alimentation 46342
Électronique 46278
Name: Category, dtype: int64

[23]: data_mapping = {
'Vêtements': 'Clothes',
'Électronique': 'Electronics'
}
df['Category'] = df['Category'].map(data_mapping).fillna(df['Category'])

[24]: df['Category'].value_counts()

[24]: Sports 46925

Clothes 46405
Alimentation 46342
Electronics 46278
Name: Category, dtype: int64

[25]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185950 entries, 0 to 185949
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Product 185950 non-null object
1 Category 185950 non-null object
2 Quantity Ordered 185950 non-null int64
3 Price Each 185950 non-null float64
4 Cost price 185950 non-null float64
5 turnover 185950 non-null float64
6 margin 185950 non-null float64
7 Order Year 185950 non-null object
8 Order Month 185950 non-null object
9 Purchase City 185950 non-null object
dtypes: float64(4), int64(1), object(5)

5
memory usage: 14.2+ MB

[26]: df['Order Month'] = df['Order Month'].astype(int)

df['Order Year'] = df['Order Year'].astype(int)

[27]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185950 entries, 0 to 185949
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Product 185950 non-null object
1 Category 185950 non-null object
2 Quantity Ordered 185950 non-null int64
3 Price Each 185950 non-null float64
4 Cost price 185950 non-null float64
5 turnover 185950 non-null float64
6 margin 185950 non-null float64
7 Order Year 185950 non-null int32
8 Order Month 185950 non-null int32
9 Purchase City 185950 non-null object
dtypes: float64(4), int32(2), int64(1), object(3)
memory usage: 12.8+ MB

[28]: df.head()

[28]: Product Category Quantity Ordered Price Each Cost price \

0 Smart Phones Clothes 1 700.00 231.0000
1 Charging Cables Alimentation 1 14.95 7.4750
2 Headphones Clothes 2 11.99 5.9950
3 Smart Tv Sports 1 149.99 97.4935
4 Headphones Electronics 1 11.99 5.9950

turnover margin Order Year Order Month Purchase City

0 700.00 469.0000 2019 1 Boston
1 14.95 7.4750 2019 1 Portland
2 23.98 11.9900 2019 1 San Francisco
3 149.99 52.4965 2019 1 Los Angeles
4 11.99 5.9950 2019 1 Austin

[29]: cat = df.select_dtypes(include='object').columns.tolist()

col = len(cat)

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

axs = axs.flatten()

for i , var in enumerate (cat):

6
sns.countplot(y=var,data=df,ax=axs[i])
axs[i].set_title(var)

if col < len(axs):

for i in range(col, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()
plt.show()

[30]: num = df.select_dtypes(include=['int','float']).columns.tolist()

col = len(num)

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

axs = axs.flatten()

7
for i , var in enumerate (num):
df[var].plot.hist(ax=axs[i])
axs[i].set_title(var)

if col < len(axs):

for i in range(col, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()
plt.show()

[31]: num = df.select_dtypes(include=['int','float']).columns.tolist()

col = len(num)

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

axs = axs.flatten()

for i , var in enumerate (num):

sns.histplot(data=df,x=var,kde=True,ax=axs[i])

8
axs[i].set_title(var)

if col < len(axs):

for i in range(col, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()
plt.show()

[32]: num = ['Price Each','Cost price']

col = len(num)

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,15))

axs = axs.flatten()

for i , var in enumerate (num):

sns.scatterplot(x=var,y='turnover',data=df,ax=axs[i])
axs[i].set_title(var)

9
if col < len(axs):
for i in range(col, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()
plt.show()

[33]: cat = ['Product','Category','Purchase City']

col = len(cat)
fig, axs = plt.subplots(nrows=col,ncols=2,figsize=(15,15))
axs = axs.flatten()

for i, var in enumerate (cat):

sns.barplot(x='Cost price', y=var, data=df, ax=axs[i])
axs[i].set_title(var)

if col < len(axs):

for i in range(col, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()
plt.show()

10
[34]: num = ['Quantity Ordered','Price Each','Cost price','turnover','margin']
col = len(num)

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

axs = axs.flatten()

for i , var in enumerate (num):

sns.boxplot(x=var, data=df,ax=axs[i])
axs[i].set_title(var)

if col < len(axs):

for i in range(col, len(axs)):
fig.delaxes(axs[i])

fig.tight_layout()

plt.show()

11
[35]: for col in df.select_dtypes(include=['object']).columns:
print(f'{col}: {df[col].unique()}')

Product: ['Smart Phones' 'Charging Cables' 'Headphones' 'Smart Tv' 'Batteries'

'Laptops' 'Cleaning Machines']
Category: ['Clothes' 'Alimentation' 'Sports' 'Electronics']
Purchase City: [' Boston' ' Portland' ' San Francisco' ' Los Angeles' ' Austin'
' Atlanta' ' Seattle' ' New York City' ' Dallas']

[39]: from sklearn import preprocessing

for col in df.select_dtypes(include=['object']).columns:

label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df[col].unique())
df[col] = label_encoder.transform(df[col])
print(f'{col}: {df[col].unique()}')

Product: [5 1 3 6 0 4 2]
Category: [1 0 3 2]

12
Purchase City: [2 6 7 4 1 0 8 5 3]

[41]: plt.figure(figsize=(25,20))
sns.heatmap(df.corr(), fmt='.2g', annot=True)
plt.show()

[47]: def correlation(df,threshold):

col_corr = set()
corr_matrix = df.corr()
for i in range(len(corr_matrix.columns)):
for j in range(i):
if abs(corr_matrix.iloc[i,j]) > threshold:
col_name = corr_matrix.columns[i]
col_corr.add(col_name)
return col_corr

[48]: correlation(df,0.7)

13
[48]: {'Cost price', 'margin', 'turnover'}

[49]: df.drop(columns={'Cost price', 'margin', 'turnover'},axis=1,inplace=True)

[52]: plt.figure(figsize=(15,10))
sns.heatmap(df.corr(), fmt='.2g', annot=True)
plt.show()

[ ]:

SMDM Final - Jupyter Notebook
100% (1)
SMDM Final - Jupyter Notebook
17 pages
NCM 103
100% (1)
NCM 103
18 pages
MeriSkill Sales Analysis
No ratings yet
MeriSkill Sales Analysis
17 pages
BigMart Sales Data Analysis
No ratings yet
BigMart Sales Data Analysis
16 pages
Data description
No ratings yet
Data description
6 pages
Online Sales Data Analysis
No ratings yet
Online Sales Data Analysis
9 pages
Sales Dataset Analysis
No ratings yet
Sales Dataset Analysis
28 pages
Guides
No ratings yet
Guides
23 pages
ML 5
No ratings yet
ML 5
11 pages
Advance Data Analytics ASSIGNMENT
No ratings yet
Advance Data Analytics ASSIGNMENT
10 pages
Marketing Analytics Assignment 1
No ratings yet
Marketing Analytics Assignment 1
6 pages
documentpython2
No ratings yet
documentpython2
22 pages
Amazon Apparel PDF
No ratings yet
Amazon Apparel PDF
138 pages
Implement K-Means Clustering.: Preprocessing
No ratings yet
Implement K-Means Clustering.: Preprocessing
8 pages
GRL - EX - 4 (1) .Ipynb - Colaboratory
No ratings yet
GRL - EX - 4 (1) .Ipynb - Colaboratory
7 pages
Lab 1 ML
No ratings yet
Lab 1 ML
2 pages
PRJCT Report
No ratings yet
PRJCT Report
22 pages
EXP 5 DE lab
No ratings yet
EXP 5 DE lab
5 pages
Sales Analysis Using Python and SQL
No ratings yet
Sales Analysis Using Python and SQL
15 pages
5-2a dataframes column operations - instruction
No ratings yet
5-2a dataframes column operations - instruction
2 pages
SalesMgmtSystem XII IP Projectreport 2022 23
No ratings yet
SalesMgmtSystem XII IP Projectreport 2022 23
18 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
13 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
DA108 Lab 08 Assignment
No ratings yet
DA108 Lab 08 Assignment
2 pages
Project 4: Final Project: Bigmart Sales Prediction: Chapter 1: Problem Statement
No ratings yet
Project 4: Final Project: Bigmart Sales Prediction: Chapter 1: Problem Statement
35 pages
Extracted Notebook Content
No ratings yet
Extracted Notebook Content
17 pages
Sales analysis project
No ratings yet
Sales analysis project
11 pages
Python
No ratings yet
Python
8 pages
Assgn
No ratings yet
Assgn
6 pages
outputs and code
No ratings yet
outputs and code
14 pages
Grocery
No ratings yet
Grocery
41 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
Masterclass Data Analysis.ipynb - Colab
No ratings yet
Masterclass Data Analysis.ipynb - Colab
4 pages
Dataframe
No ratings yet
Dataframe
19 pages
Registro da analise de dataset de laptops
No ratings yet
Registro da analise de dataset de laptops
1 page
PRACTICALS
No ratings yet
PRACTICALS
52 pages
MRA-Part-A-
No ratings yet
MRA-Part-A-
30 pages
7
No ratings yet
7
18 pages
Task 1 - Data preparation and customer analytics - Jupyter Notebook
No ratings yet
Task 1 - Data preparation and customer analytics - Jupyter Notebook
64 pages
2023 08 05 13 43 36 - 1691223216
No ratings yet
2023 08 05 13 43 36 - 1691223216
7 pages
Task 6
No ratings yet
Task 6
14 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
KPMG - Task 1
No ratings yet
KPMG - Task 1
22 pages
DMV - 1 - Jupyter Notebook
No ratings yet
DMV - 1 - Jupyter Notebook
4 pages
Amazon Sales Reports - Jupyter Notebook
No ratings yet
Amazon Sales Reports - Jupyter Notebook
29 pages
Ali Shafi BSBA 2-A 6522 Sales Market Data
No ratings yet
Ali Shafi BSBA 2-A 6522 Sales Market Data
40 pages
lab task 9.ipynb - Colab
No ratings yet
lab task 9.ipynb - Colab
4 pages
Practical 1
No ratings yet
Practical 1
65 pages
Porter Case Study
No ratings yet
Porter Case Study
27 pages
Dbms db03 2020 Assessment (Solved) : Find Study Resources
50% (2)
Dbms db03 2020 Assessment (Solved) : Find Study Resources
12 pages
Nikita Prasad - Exploratory Data Analysis (EDA)
No ratings yet
Nikita Prasad - Exploratory Data Analysis (EDA)
18 pages
Stationary Shop Management System ( Ip Class Xii )
No ratings yet
Stationary Shop Management System ( Ip Class Xii )
23 pages
STATIONARY MANAGEMENT SYSTEM IP CLASS XII (2024-25)
No ratings yet
STATIONARY MANAGEMENT SYSTEM IP CLASS XII (2024-25)
26 pages
Inventory Management System
No ratings yet
Inventory Management System
17 pages
EcommerceAnalysis 1680541297
No ratings yet
EcommerceAnalysis 1680541297
11 pages
DMV - 5 - Jupyter Notebook
No ratings yet
DMV - 5 - Jupyter Notebook
5 pages
DOC-20250118-WA0002.
No ratings yet
DOC-20250118-WA0002.
4 pages
Amazon Sales Analysis
No ratings yet
Amazon Sales Analysis
20 pages
Project Amazon Sales Data Analysis
No ratings yet
Project Amazon Sales Data Analysis
12 pages
Engineering Service Revenues World Summary: Market Values & Financials by Country
From Everand
Engineering Service Revenues World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Real Estate Credit Revenues World Summary: Market Values & Financials by Country
From Everand
Real Estate Credit Revenues World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Day_07__1693295835
No ratings yet
Day_07__1693295835
7 pages
MLOps_Roadmap_1693296293
No ratings yet
MLOps_Roadmap_1693296293
6 pages
Credit Card Default Clients Prediction 1693295790
No ratings yet
Credit Card Default Clients Prediction 1693295790
23 pages
Temporal Dead Zone 1697241863
No ratings yet
Temporal Dead Zone 1697241863
6 pages
Naive_Bayes_1696233556
No ratings yet
Naive_Bayes_1696233556
5 pages
Most Frequently Used String Methods in Real Time Projects 1696233506
No ratings yet
Most Frequently Used String Methods in Real Time Projects 1696233506
10 pages
Identity Management the Foundation of 1695790200
No ratings yet
Identity Management the Foundation of 1695790200
8 pages
CSS Measurement 1697000454
No ratings yet
CSS Measurement 1697000454
7 pages
Level_Up_Your_React_Skills_with_These_8_Hooks_1699841285
No ratings yet
Level_Up_Your_React_Skills_with_These_8_Hooks_1699841285
10 pages
Data Cleaning Null and Missing Values 1695787806
No ratings yet
Data Cleaning Null and Missing Values 1695787806
17 pages
Optimizing the Hyperparameters 1693296270
No ratings yet
Optimizing the Hyperparameters 1693296270
11 pages
Redis Introduction and Installation 1695738500
No ratings yet
Redis Introduction and Installation 1695738500
6 pages
Assignments_For_Front_End_developer_1695789943
No ratings yet
Assignments_For_Front_End_developer_1695789943
5 pages
ML Public Datasets 1693110238
No ratings yet
ML Public Datasets 1693110238
39 pages
Why Use Websockets Over HTTP 1697870893
No ratings yet
Why Use Websockets Over HTTP 1697870893
6 pages
15 Terms Every React Developer Should Know
No ratings yet
15 Terms Every React Developer Should Know
17 pages
Local Storage in Javascript 1697241756
No ratings yet
Local Storage in Javascript 1697241756
9 pages
K Means Clustering Customer Clustering
No ratings yet
K Means Clustering Customer Clustering
7 pages
JavaScript Closures Unraveled
No ratings yet
JavaScript Closures Unraveled
6 pages
_Learn_JSON_in_2_
No ratings yet
_Learn_JSON_in_2_
8 pages
Book List 2020 2021
0% (1)
Book List 2020 2021
17 pages
Tentech MTD20KWe Digital Earth Tester Users Guide Manual
100% (1)
Tentech MTD20KWe Digital Earth Tester Users Guide Manual
45 pages
Topic 6 ICT551
No ratings yet
Topic 6 ICT551
67 pages
Keeway Historia
No ratings yet
Keeway Historia
2 pages
Inspection of Schools Audit
No ratings yet
Inspection of Schools Audit
38 pages
Lab Report CG FID
No ratings yet
Lab Report CG FID
10 pages
Connectx - 6 Vpi Card: 200Gb/S Infiniband & Ethernet Adapter Card
No ratings yet
Connectx - 6 Vpi Card: 200Gb/S Infiniband & Ethernet Adapter Card
3 pages
ITP111153971197
No ratings yet
ITP111153971197
3 pages
LESSON PLAN in TLE GRADE 8 COMMERCIAL CO
No ratings yet
LESSON PLAN in TLE GRADE 8 COMMERCIAL CO
4 pages
Deed of Conditional Sale
No ratings yet
Deed of Conditional Sale
2 pages
Fair Value Accounting: ©2018 John Wiley & Sons Australia LTD
No ratings yet
Fair Value Accounting: ©2018 John Wiley & Sons Australia LTD
42 pages
Study of Adaptability and Efficacy of Menstrual Cu
No ratings yet
Study of Adaptability and Efficacy of Menstrual Cu
10 pages
A 336 - A 336M - 03 - Qtmzni0wmw
No ratings yet
A 336 - A 336M - 03 - Qtmzni0wmw
8 pages
ECE Data For Call With DFCs
No ratings yet
ECE Data For Call With DFCs
410 pages
Itec 3100 Student Response Lesson Plan
No ratings yet
Itec 3100 Student Response Lesson Plan
3 pages
PP66S12L71 4045TF290 - PKG Spec
No ratings yet
PP66S12L71 4045TF290 - PKG Spec
2 pages
UTS & UAS Language Assessment 2021 Muh Syafei
100% (1)
UTS & UAS Language Assessment 2021 Muh Syafei
2 pages
Petitioner Vs Vs Respondent: en Banc
No ratings yet
Petitioner Vs Vs Respondent: en Banc
10 pages
Practicum 1 Professional Development: Week 3-4 Grooming and Personality
No ratings yet
Practicum 1 Professional Development: Week 3-4 Grooming and Personality
44 pages
There WAS There Were: English Class Alfonso Del Castillo 8vo Grado
No ratings yet
There WAS There Were: English Class Alfonso Del Castillo 8vo Grado
13 pages
Auto Pricelist at Davao Yard
No ratings yet
Auto Pricelist at Davao Yard
1 page
Deye
No ratings yet
Deye
9 pages
The Blockchain Adventurers Guild Brown Paper PDF
No ratings yet
The Blockchain Adventurers Guild Brown Paper PDF
6 pages
Big Book of Kaiju 02 - Insect Kaiju
100% (2)
Big Book of Kaiju 02 - Insect Kaiju
23 pages
JEE Main 2021 - July 25th - Morning Session
No ratings yet
JEE Main 2021 - July 25th - Morning Session
18 pages
Reading comprehension c
No ratings yet
Reading comprehension c
1 page
mobilebet9ja-GoogleSearch_1706271013912
No ratings yet
mobilebet9ja-GoogleSearch_1706271013912
4 pages
Arduino Uno Manuscript - MUSTASA, ANS KAYE M.
No ratings yet
Arduino Uno Manuscript - MUSTASA, ANS KAYE M.
7 pages
Volkswagen ID Batteries
No ratings yet
Volkswagen ID Batteries
18 pages

Uploaded by

Uploaded by

SalesDataAnalysis

August 27, 2023

[1]: import numpy as np

[3]: Order Date Order ID Product Product_ean \

catégorie Purchase Address Quantity Ordered \

Price Each Cost price turnover margin

[4]: Order Date 0

[6]: df['Order Date'].head(1)

[6]: 0 2019-01-22 21:25:00

[7]: df['Order Year']=df['Order Date'].str.split(' ').str[0].str.split('-').str[0]

[8]: df['Order Month']=df['Order Date'].str.split(' ').str[0].str.split('-').str[1]

[9]: Order Date Order ID Product Product_ean catégorie \

Purchase Address Quantity Ordered Price Each Cost price \

turnover margin Order Year Order Month

[10]: 0 944 Walnut St, Boston, MA 02215

[11]: df['Purchase City']=df['Purchase Address'].str.split(',').str[1]

[12]: Order Date Order ID Product Product_ean catégorie \

Purchase Address Quantity Ordered Price Each Cost price \

turnover margin Order Year Order Month Purchase City

[13]: df.drop(columns={'Order Date','Order ID','Product_ean','Purchase␣

margin Order Year Order Month Purchase City

[15]: USB-C Charging Cable 21903

[16]: def change(x):

return 'Smart Tv'

[17]: df['Product'] = df['Product'].apply(change)

[18]: Headphones 47756

[19]: Product catégorie Quantity Ordered Price Each Cost price \

turnover margin Order Year Order Month Purchase City

[21]: Product Category Quantity Ordered Price Each Cost price \

turnover margin Order Year Order Month Purchase City

[22]: Sports 46925

[24]: Sports 46925

[26]: df['Order Month'] = df['Order Month'].astype(int)

[28]: Product Category Quantity Ordered Price Each Cost price \

turnover margin Order Year Order Month Purchase City

[29]: cat = df.select_dtypes(include='object').columns.tolist()

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

for i , var in enumerate (cat):

if col < len(axs):

[30]: num = df.select_dtypes(include=['int','float']).columns.tolist()

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

if col < len(axs):

[31]: num = df.select_dtypes(include=['int','float']).columns.tolist()

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

for i , var in enumerate (num):

if col < len(axs):

[32]: num = ['Price Each','Cost price']

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,15))

for i , var in enumerate (num):

[33]: cat = ['Product','Category','Purchase City']

for i, var in enumerate (cat):

if col < len(axs):

fig , axs = plt.subplots(nrows=col,ncols=2,figsize=(15,20))

for i , var in enumerate (num):

if col < len(axs):

Product: ['Smart Phones' 'Charging Cables' 'Headphones' 'Smart Tv' 'Batteries'

[39]: from sklearn import preprocessing

for col in df.select_dtypes(include=['object']).columns:

[47]: def correlation(df,threshold):

[49]: df.drop(columns={'Cost price', 'margin', 'turnover'},axis=1,inplace=True)

You might also like