Amazon Sales Reports - Jupyter Notebook
Amazon Sales Reports - Jupyter Notebook
In [3]: 1 df.shape
In [4]: 1 df.head(10)
Out[4]: ship-
Sales Courier
index Order ID Date Status Fulfilment service- Category Size ... currency Amount ship-city ship-
Channel Status
level
405- 04-
On the
0 0 8078784- 30- Cancelled Merchant Amazon.in Standard T-shirt S ... INR 647.62 MUMBAI MAHARASH
Way
5731545 22
404- 04-
2 2 0687676- 30- Shipped Amazon Amazon.in Expedited Shirt XL Shipped ... INR 329.00 NAVI MUMBAI MAHARASH
7273146 22
403- 04-
On the
3 3 9615377- 30- Cancelled Merchant Amazon.in Standard Blazzer L ... INR 753.33 PUDUCHERRY PUDUCHE
Way
8133951 22
407- 04-
4 4 1069790- 30- Shipped Amazon Amazon.in Expedited Trousers 3XL Shipped ... INR 574.00 CHENNAI TAMIL N
7240320 22
404- 04-
UT
5 5 1490984- 30- Shipped Amazon Amazon.in Expedited T-shirt XL Shipped ... INR 824.00 GHAZIABAD
PRAD
4578765 22
408- 04-
6 6 5748499- 30- Shipped Amazon Amazon.in Expedited T-shirt L Shipped ... INR 653.00 CHANDIGARH CHANDIG
6859555 22
407- 04-
8 8 5443024- 30- Cancelled Amazon Amazon.in Expedited T-shirt 3XL Cancelled ... NaN NaN HYDERABAD TELANG
5233168 22
402- 04-
9 9 4393761- 30- Shipped Amazon Amazon.in Expedited Shirt XXL Shipped ... INR 363.00 Chennai TAMIL N
0311520 22
10 rows × 21 columns
In [5]: 1 df.tail()
Out[5]: ship-
Sales Courier
index Order ID Date Status Fulfilment service- Category Size ... currency Amount ship-city sh
Channel Status
level
406- 05-
128971 128970 6001380- 31- Shipped Amazon Amazon.in Expedited Shirt XL Shipped ... INR 517.0 HYDERABAD TELA
7673107 22
402- 05-
128972 128971 9551604- 31- Shipped Amazon Amazon.in Expedited T-shirt M Shipped ... INR 999.0 GURUGRAM HA
7544318 22
407- 05-
128973 128972 9547469- 31- Shipped Amazon Amazon.in Expedited Blazzer XXL Shipped ... INR 690.0 HYDERABAD TELA
3152358 22
402- 05-
128974 128973 6184140- 31- Shipped Amazon Amazon.in Expedited T-shirt XS Shipped ... INR 1199.0 Halol
0545956 22
408- 05-
128975 128974 7436540- 31- Shipped Amazon Amazon.in Expedited T-shirt S Shipped ... INR 696.0 Raipur CHHATT
8728312 22
5 rows × 21 columns
In [6]: 1 df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128976 entries, 0 to 128975
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 index 128976 non-null int64
1 Order ID 128976 non-null object
2 Date 128976 non-null object
3 Status 128976 non-null object
4 Fulfilment 128976 non-null object
5 Sales Channel 128976 non-null object
6 ship-service-level 128976 non-null object
7 Category 128976 non-null object
8 Size 128976 non-null object
9 Courier Status 128976 non-null object
10 Qty 128976 non-null int64
11 currency 121176 non-null object
12 Amount 121176 non-null float64
13 ship-city 128941 non-null object
14 ship-state 128941 non-null object
15 ship-postal-code 128941 non-null float64
16 ship-country 128941 non-null object
17 B2B 128976 non-null bool
18 fulfilled-by 39263 non-null object
19 New 0 non-null float64
20 PendingS 0 non-null float64
dtypes: bool(1), float64(4), int64(2), object(14)
memory usage: 19.8+ MB
In [8]: 1 df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128976 entries, 0 to 128975
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 index 128976 non-null int64
1 Order ID 128976 non-null object
2 Date 128976 non-null object
3 Status 128976 non-null object
4 Fulfilment 128976 non-null object
5 Sales Channel 128976 non-null object
6 ship-service-level 128976 non-null object
7 Category 128976 non-null object
8 Size 128976 non-null object
9 Courier Status 128976 non-null object
10 Qty 128976 non-null int64
11 currency 121176 non-null object
12 Amount 121176 non-null float64
13 ship-city 128941 non-null object
14 ship-state 128941 non-null object
15 ship-postal-code 128941 non-null float64
16 ship-country 128941 non-null object
17 B2B 128976 non-null bool
18 fulfilled-by 39263 non-null object
dtypes: bool(1), float64(2), int64(2), object(14)
memory usage: 17.8+ MB
In [9]: 1 pd.isnull(df)
2 # checking null values
0 False False False False False False False False False False False False False False False False Fa
1 False False False False False False False False False False False False False False False False Fa
2 False False False False False False False False False False False False False False False False Fa
3 False False False False False False False False False False False False False False False False Fa
4 False False False False False False False False False False False False False False False False Fa
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
128971 False False False False False False False False False False False False False False False False Fa
128972 False False False False False False False False False False False False False False False False Fa
128973 False False False False False False False False False False False False False False False False Fa
128974 False False False False False False False False False False False False False False False False Fa
128975 False False False False False False False False False False False False False False False False Fa
In [10]: 1 pd.isnull(df).sum()
2 # sum will give total values of null values
Out[10]: index 0
Order ID 0
Date 0
Status 0
Fulfilment 0
Sales Channel 0
ship-service-level 0
Category 0
Size 0
Courier Status 0
Qty 0
currency 7800
Amount 7800
ship-city 35
ship-state 35
ship-postal-code 35
ship-country 35
B2B 0
fulfilled-by 89713
dtype: int64
In [11]: 1 df.shape
In [13]: 1 df.shape
In [14]: 1 df.columns
Out[16]: dtype('int32')
In [17]: 1 df['Date']=pd.to_datetime(df['Date'])
In [18]: 1 df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 37514 entries, 0 to 128892
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 index 37514 non-null int64
1 Order ID 37514 non-null object
2 Date 37514 non-null datetime64[ns]
3 Status 37514 non-null object
4 Fulfilment 37514 non-null object
5 Sales Channel 37514 non-null object
6 ship-service-level 37514 non-null object
7 Category 37514 non-null object
8 Size 37514 non-null object
9 Courier Status 37514 non-null object
10 Qty 37514 non-null int64
11 currency 37514 non-null object
12 Amount 37514 non-null float64
13 ship-city 37514 non-null object
14 ship-state 37514 non-null object
15 ship-postal-code 37514 non-null int32
16 ship-country 37514 non-null object
17 B2B 37514 non-null bool
18 fulfilled-by 37514 non-null object
dtypes: bool(1), datetime64[ns](1), float64(1), int32(1), int64(2), object(13)
memory usage: 5.3+ MB
In [19]: 1 df.columns
Out[20]: ship-
Sales Courier
index Order ID Date Status Fulfilment service- Category Size Quantity currency Amount ship-city
Channel Status
level
405-
2022- On the
0 0 8078784- Cancelled Merchant Amazon.in Standard T-shirt S 0 INR 647.62 MUMBAI
04-30 Way
5731545
171- Shipped -
2022-
1 1 9198151- Delivered Merchant Amazon.in Standard Shirt 3XL Shipped 1 INR 406.00 BENGALURU
04-30
1101146 to Buyer
403-
2022- On the
3 3 9615377- Cancelled Merchant Amazon.in Standard Blazzer L 0 INR 753.33 PUDUCHERRY
04-30 Way
8133951
406- Shipped -
2022-
7 7 7807733- Delivered Merchant Amazon.in Standard Shirt S Shipped 1 INR 399.00 HYDERABAD
04-30
3785945 to Buyer
405- Shipped -
2022-
12 12 5513694- Delivered Merchant Amazon.in Standard Shirt XS Shipped 1 INR 399.00 Amravati.
04-30
8146768 to Buyer
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
405- Shipped -
2022-
128875 128874 4724097- Delivered Merchant Amazon.in Standard T-shirt S Shipped 1 INR 854.00 ALLUR
06-01
1016369 to Buyer
403-
2022- On the
128876 128875 9524128- Cancelled Merchant Amazon.in Standard Blazzer XL 0 INR 734.29 Barabanki
06-01 Way
9243508
405- Shipped -
2022-
128888 128887 6493630- Delivered Merchant Amazon.in Standard Trousers M Shipped 1 INR 518.00 NOIDA
05-31
8542756 to Buyer
407-
2022- On the
128891 128890 0116398- Cancelled Merchant Amazon.in Standard Wallet Free 0 INR 398.10 MADURAI
05-31 Way
1810752
403- Shipped -
2022- UTTAR
128892 128891 0317423- Delivered Merchant Amazon.in Standard Blazzer M Shipped 1 INR 721.00
05-31 BAGDOGRA
9322704 to Buyer
In [21]: 1 #describe() method return description of the data in the Dataframe(i.e count,mean,std,min.etc)
2 df.describe()
In [22]: 1 df.describe(include='object')
Out[22]: ship-
Sales Courier ship- fulfilled-
Order ID Status Fulfilment service- Category Size currency ship-city ship-state
Channel Status country by
level
count 37514 37514 37514 37514 37514 37514 37514 37514 37514 37514 37514 37514 37514
Shipped
171-
- Easy
top 5057375- Merchant Amazon.in Standard T-shirt M Shipped INR BENGALURU MAHARASHTRA IN
Delivered Ship
2831560
to Buyer
freq 12 28741 37514 37514 37514 14062 6806 31859 37514 2839 6236 37514 37514
Size
Note: From above Graph you can see that most of the people buys M-size
Graph By:-
The groupby() function in pandas is used to group data based on one or more columns in a DataFrame
6 M 5905
5 L 5795
8 XL 5481
10 XXL 4465
0 3XL 3972
7 S 3896
9 XS 2191
4 Free 467
3 6XL 170
2 5XL 104
1 4XL 93
Note: From above Graph you can see that most of the Qty buys M-Size in the sales
In [30]: 1 plt.figure(figsize=(10,5))
2
3 ax=sns.countplot(data=df, x='Courier Status',hue= 'Status')
4
5 plt.show()
Note: From above Graph the majority of the orders are shipped through the courier
In [31]: 1 #histogram
2 df['Size'].hist()
In [32]: 1 df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 37514 entries, 0 to 128892
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 index 37514 non-null int64
1 Order ID 37514 non-null object
2 Date 37514 non-null datetime64[ns]
3 Status 37514 non-null object
4 Fulfilment 37514 non-null object
5 Sales Channel 37514 non-null object
6 ship-service-level 37514 non-null object
7 Category 37514 non-null object
8 Size 37514 non-null object
9 Courier Status 37514 non-null object
10 Qty 37514 non-null int64
11 currency 37514 non-null object
12 Amount 37514 non-null float64
13 ship-city 37514 non-null object
14 ship-state 37514 non-null object
15 ship-postal-code 37514 non-null int32
16 ship-country 37514 non-null object
17 B2B 37514 non-null bool
18 fulfilled-by 37514 non-null object
dtypes: bool(1), datetime64[ns](1), float64(1), int32(1), int64(2), object(13)
memory usage: 5.3+ MB
Note: From above Graph you can see that most of the buyers are T-shirt
Note: From above chart we can see that maximum i.e. 99.3% of buyers are retailers and 0.8% are B2B buyers
In [37]: 1 #top_10_States
2 top_10_state = df['ship-state'].value_counts().head(10)
3 # Plot count of cities by State
4 plt.figure(figsize=(12,6))
5 sns.countplot(data=df[df['ship-state'].isin(top_10_state.index)],x='ship-state')
6 plt.xlabel('ship-state')
7 plt.ylabel('count')
8 plt.title('Distribution of State')
9 plt.xticks(rotation=45)
10 plt.show()
Note: From above you can see that most of the buyers are Maharashtra state
Conclusion:-
The Data Analysis reveals that the business has a significant customer base in Maharashtra state,mainly serves retailers, fulfills orders through
Amazon, experience high demand for T-shirts, and sees M-Size as the preferred choice among buyers.
Thank you