0% found this document useful (0 votes)
41 views

Amazon Sales Reports - Jupyter Notebook

Uploaded by

Ankit Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Amazon Sales Reports - Jupyter Notebook

Uploaded by

Ankit Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

Amazon Sales Report


In [1]: 1 import numpy as np
2 import pandas as pd
3 import matplotlib.pyplot as plt
4 %matplotlib inline
5 import seaborn as sns

In [2]: 1 df = pd.read_csv('Amazon Sale Report.csv',encoding = 'unicode_escape')

In [3]: 1 df.shape

Out[3]: (128976, 21)

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 1/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [4]: 1 df.head(10)

Out[4]: ship-
Sales Courier
index Order ID Date Status Fulfilment service- Category Size ... currency Amount ship-city ship-
Channel Status
level

405- 04-
On the
0 0 8078784- 30- Cancelled Merchant Amazon.in Standard T-shirt S ... INR 647.62 MUMBAI MAHARASH
Way
5731545 22

171- 04- Shipped -


1 1 9198151- 30- Delivered Merchant Amazon.in Standard Shirt 3XL Shipped ... INR 406.00 BENGALURU KARNAT
1101146 22 to Buyer

404- 04-
2 2 0687676- 30- Shipped Amazon Amazon.in Expedited Shirt XL Shipped ... INR 329.00 NAVI MUMBAI MAHARASH
7273146 22

403- 04-
On the
3 3 9615377- 30- Cancelled Merchant Amazon.in Standard Blazzer L ... INR 753.33 PUDUCHERRY PUDUCHE
Way
8133951 22

407- 04-
4 4 1069790- 30- Shipped Amazon Amazon.in Expedited Trousers 3XL Shipped ... INR 574.00 CHENNAI TAMIL N
7240320 22

404- 04-
UT
5 5 1490984- 30- Shipped Amazon Amazon.in Expedited T-shirt XL Shipped ... INR 824.00 GHAZIABAD
PRAD
4578765 22

408- 04-
6 6 5748499- 30- Shipped Amazon Amazon.in Expedited T-shirt L Shipped ... INR 653.00 CHANDIGARH CHANDIG
6859555 22

406- 04- Shipped -


7 7 7807733- 30- Delivered Merchant Amazon.in Standard Shirt S Shipped ... INR 399.00 HYDERABAD TELANG
3785945 22 to Buyer

407- 04-
8 8 5443024- 30- Cancelled Amazon Amazon.in Expedited T-shirt 3XL Cancelled ... NaN NaN HYDERABAD TELANG
5233168 22

402- 04-
9 9 4393761- 30- Shipped Amazon Amazon.in Expedited Shirt XXL Shipped ... INR 363.00 Chennai TAMIL N
0311520 22

10 rows × 21 columns

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 2/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [5]: 1 df.tail()

Out[5]: ship-
Sales Courier
index Order ID Date Status Fulfilment service- Category Size ... currency Amount ship-city sh
Channel Status
level

406- 05-
128971 128970 6001380- 31- Shipped Amazon Amazon.in Expedited Shirt XL Shipped ... INR 517.0 HYDERABAD TELA
7673107 22

402- 05-
128972 128971 9551604- 31- Shipped Amazon Amazon.in Expedited T-shirt M Shipped ... INR 999.0 GURUGRAM HA
7544318 22

407- 05-
128973 128972 9547469- 31- Shipped Amazon Amazon.in Expedited Blazzer XXL Shipped ... INR 690.0 HYDERABAD TELA
3152358 22

402- 05-
128974 128973 6184140- 31- Shipped Amazon Amazon.in Expedited T-shirt XS Shipped ... INR 1199.0 Halol
0545956 22

408- 05-
128975 128974 7436540- 31- Shipped Amazon Amazon.in Expedited T-shirt S Shipped ... INR 696.0 Raipur CHHATT
8728312 22

5 rows × 21 columns

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 3/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [6]: 1 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128976 entries, 0 to 128975
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 index 128976 non-null int64
1 Order ID 128976 non-null object
2 Date 128976 non-null object
3 Status 128976 non-null object
4 Fulfilment 128976 non-null object
5 Sales Channel 128976 non-null object
6 ship-service-level 128976 non-null object
7 Category 128976 non-null object
8 Size 128976 non-null object
9 Courier Status 128976 non-null object
10 Qty 128976 non-null int64
11 currency 121176 non-null object
12 Amount 121176 non-null float64
13 ship-city 128941 non-null object
14 ship-state 128941 non-null object
15 ship-postal-code 128941 non-null float64
16 ship-country 128941 non-null object
17 B2B 128976 non-null bool
18 fulfilled-by 39263 non-null object
19 New 0 non-null float64
20 PendingS 0 non-null float64
dtypes: bool(1), float64(4), int64(2), object(14)
memory usage: 19.8+ MB

In [7]: 1 #drop unrelated/blank columns


2 df.drop(['New','PendingS'], axis=1, inplace=True)

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 4/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [8]: 1 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128976 entries, 0 to 128975
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 index 128976 non-null int64
1 Order ID 128976 non-null object
2 Date 128976 non-null object
3 Status 128976 non-null object
4 Fulfilment 128976 non-null object
5 Sales Channel 128976 non-null object
6 ship-service-level 128976 non-null object
7 Category 128976 non-null object
8 Size 128976 non-null object
9 Courier Status 128976 non-null object
10 Qty 128976 non-null int64
11 currency 121176 non-null object
12 Amount 121176 non-null float64
13 ship-city 128941 non-null object
14 ship-state 128941 non-null object
15 ship-postal-code 128941 non-null float64
16 ship-country 128941 non-null object
17 B2B 128976 non-null bool
18 fulfilled-by 39263 non-null object
dtypes: bool(1), float64(2), int64(2), object(14)
memory usage: 17.8+ MB

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 5/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [9]: 1 pd.isnull(df)
2 # checking null values

Out[9]: ship- ship-


Order Sales Courier ship- ship- sh
index Date Status Fulfilment service- Category Size Qty currency Amount postal-
ID Channel Status city state coun
level code

0 False False False False False False False False False False False False False False False False Fa

1 False False False False False False False False False False False False False False False False Fa

2 False False False False False False False False False False False False False False False False Fa

3 False False False False False False False False False False False False False False False False Fa

4 False False False False False False False False False False False False False False False False Fa

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

128971 False False False False False False False False False False False False False False False False Fa

128972 False False False False False False False False False False False False False False False False Fa

128973 False False False False False False False False False False False False False False False False Fa

128974 False False False False False False False False False False False False False False False False Fa

128975 False False False False False False False False False False False False False False False False Fa

128976 rows × 19 columns

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 6/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [10]: 1 pd.isnull(df).sum()
2 # sum will give total values of null values

Out[10]: index 0
Order ID 0
Date 0
Status 0
Fulfilment 0
Sales Channel 0
ship-service-level 0
Category 0
Size 0
Courier Status 0
Qty 0
currency 7800
Amount 7800
ship-city 35
ship-state 35
ship-postal-code 35
ship-country 35
B2B 0
fulfilled-by 89713
dtype: int64

In [11]: 1 df.shape

Out[11]: (128976, 19)

In [12]: 1 #drop null values


2 df.dropna(inplace=True)

In [13]: 1 df.shape

Out[13]: (37514, 19)

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 7/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [14]: 1 df.columns

Out[14]: Index(['index', 'Order ID', 'Date', 'Status', 'Fulfilment', 'Sales Channel',


'ship-service-level', 'Category', 'Size', 'Courier Status', 'Qty',
'currency', 'Amount', 'ship-city', 'ship-state', 'ship-postal-code',
'ship-country', 'B2B', 'fulfilled-by'],
dtype='object')

In [15]: 1 #change data type


2 df['ship-postal-code']=df['ship-postal-code'].astype('int')

In [16]: 1 #checking whelther the data type change or not


2 df['ship-postal-code'].dtype

Out[16]: dtype('int32')

In [17]: 1 df['Date']=pd.to_datetime(df['Date'])

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 8/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [18]: 1 df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 37514 entries, 0 to 128892
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 index 37514 non-null int64
1 Order ID 37514 non-null object
2 Date 37514 non-null datetime64[ns]
3 Status 37514 non-null object
4 Fulfilment 37514 non-null object
5 Sales Channel 37514 non-null object
6 ship-service-level 37514 non-null object
7 Category 37514 non-null object
8 Size 37514 non-null object
9 Courier Status 37514 non-null object
10 Qty 37514 non-null int64
11 currency 37514 non-null object
12 Amount 37514 non-null float64
13 ship-city 37514 non-null object
14 ship-state 37514 non-null object
15 ship-postal-code 37514 non-null int32
16 ship-country 37514 non-null object
17 B2B 37514 non-null bool
18 fulfilled-by 37514 non-null object
dtypes: bool(1), datetime64[ns](1), float64(1), int32(1), int64(2), object(13)
memory usage: 5.3+ MB

In [19]: 1 df.columns

Out[19]: Index(['index', 'Order ID', 'Date', 'Status', 'Fulfilment', 'Sales Channel',


'ship-service-level', 'Category', 'Size', 'Courier Status', 'Qty',
'currency', 'Amount', 'ship-city', 'ship-state', 'ship-postal-code',
'ship-country', 'B2B', 'fulfilled-by'],
dtype='object')

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 9/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [20]: 1 #rename Columns


2 df.rename(columns={'Qty':'Quantity'})

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 10/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

Out[20]: ship-
Sales Courier
index Order ID Date Status Fulfilment service- Category Size Quantity currency Amount ship-city
Channel Status
level

405-
2022- On the
0 0 8078784- Cancelled Merchant Amazon.in Standard T-shirt S 0 INR 647.62 MUMBAI
04-30 Way
5731545

171- Shipped -
2022-
1 1 9198151- Delivered Merchant Amazon.in Standard Shirt 3XL Shipped 1 INR 406.00 BENGALURU
04-30
1101146 to Buyer

403-
2022- On the
3 3 9615377- Cancelled Merchant Amazon.in Standard Blazzer L 0 INR 753.33 PUDUCHERRY
04-30 Way
8133951

406- Shipped -
2022-
7 7 7807733- Delivered Merchant Amazon.in Standard Shirt S Shipped 1 INR 399.00 HYDERABAD
04-30
3785945 to Buyer

405- Shipped -
2022-
12 12 5513694- Delivered Merchant Amazon.in Standard Shirt XS Shipped 1 INR 399.00 Amravati.
04-30
8146768 to Buyer

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

405- Shipped -
2022-
128875 128874 4724097- Delivered Merchant Amazon.in Standard T-shirt S Shipped 1 INR 854.00 ALLUR
06-01
1016369 to Buyer

403-
2022- On the
128876 128875 9524128- Cancelled Merchant Amazon.in Standard Blazzer XL 0 INR 734.29 Barabanki
06-01 Way
9243508

405- Shipped -
2022-
128888 128887 6493630- Delivered Merchant Amazon.in Standard Trousers M Shipped 1 INR 518.00 NOIDA
05-31
8542756 to Buyer

407-
2022- On the
128891 128890 0116398- Cancelled Merchant Amazon.in Standard Wallet Free 0 INR 398.10 MADURAI
05-31 Way
1810752

403- Shipped -
2022- UTTAR
128892 128891 0317423- Delivered Merchant Amazon.in Standard Blazzer M Shipped 1 INR 721.00
05-31 BAGDOGRA
9322704 to Buyer

37514 rows × 19 columns

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 11/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [21]: 1 #describe() method return description of the data in the Dataframe(i.e count,mean,std,min.etc)
2 df.describe()

Out[21]: index Qty Amount ship-postal-code

count 37514.000000 37514.000000 37514.000000 37514.000000

mean 60953.809858 0.867383 646.553960 463291.552754

std 36844.853039 0.354160 279.952414 194550.425637

min 0.000000 0.000000 0.000000 110001.000000

25% 27235.250000 1.000000 458.000000 370465.000000

50% 63470.500000 1.000000 629.000000 500019.000000

75% 91790.750000 1.000000 771.000000 600042.000000

max 128891.000000 5.000000 5495.000000 989898.000000

In [22]: 1 df.describe(include='object')

Out[22]: ship-
Sales Courier ship- fulfilled-
Order ID Status Fulfilment service- Category Size currency ship-city ship-state
Channel Status country by
level

count 37514 37514 37514 37514 37514 37514 37514 37514 37514 37514 37514 37514 37514

unique 34664 11 1 1 1 8 11 3 1 4698 58 1 1

Shipped
171-
- Easy
top 5057375- Merchant Amazon.in Standard T-shirt M Shipped INR BENGALURU MAHARASHTRA IN
Delivered Ship
2831560
to Buyer

freq 12 28741 37514 37514 37514 14062 6806 31859 37514 2839 6236 37514 37514

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 12/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [23]: 1 #use describe() for specific columns


2 df[['Qty','Amount']].describe()

Out[23]: Qty Amount

count 37514.000000 37514.000000

mean 0.867383 646.553960

std 0.354160 279.952414

min 0.000000 0.000000

25% 1.000000 458.000000

50% 1.000000 629.000000

75% 1.000000 771.000000

max 5.000000 5495.000000

Exploratory Data Analysis


In [24]: 1 df.columns

Out[24]: Index(['index', 'Order ID', 'Date', 'Status', 'Fulfilment', 'Sales Channel',


'ship-service-level', 'Category', 'Size', 'Courier Status', 'Qty',
'currency', 'Amount', 'ship-city', 'ship-state', 'ship-postal-code',
'ship-country', 'B2B', 'fulfilled-by'],
dtype='object')

Size

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 13/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [25]: 1 ax=sns.countplot(x='Size' , data=df)

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 14/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [26]: 1 ax=sns.countplot(x='Size' , data=df)


2 ​
3 for bars in ax.containers:
4 ax.bar_label(bars)

Note: From above Graph you can see that most of the people buys M-size

Graph By:-
The groupby() function in pandas is used to group data based on one or more columns in a DataFrame

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 15/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [27]: 1 df.groupby(['Size'], as_index=False)['Qty'].sum().sort_values(by='Qty',ascending=False)

Out[27]: Size Qty

6 M 5905

5 L 5795

8 XL 5481

10 XXL 4465

0 3XL 3972

7 S 3896

9 XS 2191

4 Free 467

3 6XL 170

2 5XL 104

1 4XL 93

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 16/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [28]: 1 S_Qty=df.groupby(['Size'], as_index=False)['Qty'].sum().sort_values(by='Qty',ascending=False)


2 ​
3 sns.barplot(x='Size',y='Qty', data=S_Qty)

Out[28]: <Axes: xlabel='Size', ylabel='Qty'>

Note: From above Graph you can see that most of the Qty buys M-Size in the sales

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 17/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [29]: 1 sns.countplot(data=df, x='Courier Status', hue='Status')

Out[29]: <Axes: xlabel='Courier Status', ylabel='count'>

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 18/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [30]: 1 plt.figure(figsize=(10,5))
2 ​
3 ax=sns.countplot(data=df, x='Courier Status',hue= 'Status')
4 ​
5 plt.show()

Note: From above Graph the majority of the orders are shipped through the courier

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 19/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [31]: 1 #histogram
2 df['Size'].hist()

Out[31]: <Axes: >

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 20/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [32]: 1 df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 37514 entries, 0 to 128892
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 index 37514 non-null int64
1 Order ID 37514 non-null object
2 Date 37514 non-null datetime64[ns]
3 Status 37514 non-null object
4 Fulfilment 37514 non-null object
5 Sales Channel 37514 non-null object
6 ship-service-level 37514 non-null object
7 Category 37514 non-null object
8 Size 37514 non-null object
9 Courier Status 37514 non-null object
10 Qty 37514 non-null int64
11 currency 37514 non-null object
12 Amount 37514 non-null float64
13 ship-city 37514 non-null object
14 ship-state 37514 non-null object
15 ship-postal-code 37514 non-null int32
16 ship-country 37514 non-null object
17 B2B 37514 non-null bool
18 fulfilled-by 37514 non-null object
dtypes: bool(1), datetime64[ns](1), float64(1), int32(1), int64(2), object(13)
memory usage: 5.3+ MB

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 21/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [33]: 1 df['Category'] = df['Category'].astype(str)


2 column_data = df['Category']
3 plt.figure(figsize=(10,5))
4 plt.hist(column_data, bins=10, edgecolor='Black')
5 plt.xticks(rotation=90)
6 plt.show()

Note: From above Graph you can see that most of the buyers are T-shirt

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 22/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [34]: 1 #Checking B2B Data by using pie chart


2 B2B_Check = df['B2B'].value_counts()
3 ​
4 # Plot the Pie chart
5 plt.pie(B2B_Check, labels=B2B_Check.index ,autopct='%1.1f%%')
6 #plt.axis('equals')
7 plt.show()

Note: From above chart we can see that maximum i.e. 99.3% of buyers are retailers and 0.8% are B2B buyers

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 23/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [35]: 1 # Prepare data for scatter plot


2 x_data = df['Category']
3 y_data = df['Size']
4 ​
5 # Plot the scatter plot
6 plt.scatter(x_data, y_data)
7 plt.xlabel('Category')
8 plt.ylabel('Size')
9 plt.title('Scatter Plot')
10 plt.show()

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 24/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [36]: 1 #plot count of cities by state


2 plt.figure(figsize=(12,6))
3 sns.countplot(data=df,x='ship-state')
4 plt.xlabel('ship-state')
5 plt.ylabel('count')
6 plt.title('Distribution of State')
7 plt.xticks(rotation=90)
8 plt.show()

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 25/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 26/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

In [37]: 1 #top_10_States
2 top_10_state = df['ship-state'].value_counts().head(10)
3 # Plot count of cities by State
4 plt.figure(figsize=(12,6))
5 sns.countplot(data=df[df['ship-state'].isin(top_10_state.index)],x='ship-state')
6 plt.xlabel('ship-state')
7 plt.ylabel('count')
8 plt.title('Distribution of State')
9 plt.xticks(rotation=45)
10 plt.show()

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 27/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

Note: From above you can see that most of the buyers are Maharashtra state

Conclusion:-

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 28/29


3/23/24, 9:59 AM Amazon Sales Reports - Jupyter Notebook

The Data Analysis reveals that the business has a significant customer base in Maharashtra state,mainly serves retailers, fulfills orders through
Amazon, experience high demand for T-shirts, and sees M-Size as the preferred choice among buyers.

Thank you

localhost:8888/notebooks/03_05_new_Python/Amazon Sales Reports.ipynb 29/29

You might also like