0% found this document useful (0 votes)

5 views

Extracted Notebook Content

The document outlines a data analysis process using ecommerce behavior data from 2019, focusing on user interactions and conversion rates. It includes code snippets for data preparation, visualization of visitor trends, brand popularity, and conversion rates, as well as the preparation of a training dataset for predictive modeling. Key insights include the dominance of Samsung in brand sales and the analysis of user purchasing behavior.

Uploaded by

evildevil.com

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Extracted Notebook Content

Uploaded by

evildevil.com

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 17

## Code:

```python
#Installing packages
!pip install squarify
!pip install statsmodels
!pip install seaborn
!pip install xgboost
```

## Code:
```python
#importing libraries
import numpy as np
import pandas as pd
import os
from statsmodels import api as sm
import pylab as py
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from datetime import datetime
import matplotlib.ticker as ticker
import matplotlib.cm as cm
import matplotlib as mpl
from matplotlib.gridspec import GridSpec
import seaborn as sns
import squarify
from scipy.stats import kstest,norm
from scipy.stats import norm
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from xgboost import plot_importance
from sklearn.utils import resample
from sklearn import metrics
from scipy.stats import chi2_contingency
```

## Markdown:
## **Dataset**

Ecommerce behaviour data from multi category store.

Dataset contains events of 285 million users from ecommerce website.

For purpose of analysis 2019 dataset is used.

## Code:
```python
def reduce_mem_usage(df, verbose=True):
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
start_mem = df.memory_usage().sum() / 1024**2
for col in df.columns:
col_type = df[col].dtypes
if col_type in numerics:
c_min = df[col].min()
c_max = df[col].max()
if str(col_type)[:3] == 'int':
if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
df[col] = df[col].astype(np.int8)
elif c_min > np.iinfo(np.int16).min and c_max <
np.iinfo(np.int16).max:
df[col] = df[col].astype(np.int16)
elif c_min > np.iinfo(np.int32).min and c_max <
np.iinfo(np.int32).max:
df[col] = df[col].astype(np.int32)
elif c_min > np.iinfo(np.int64).min and c_max <
np.iinfo(np.int64).max:
df[col] = df[col].astype(np.int64)
else:
if c_min > np.finfo(np.float16).min and c_max <
np.finfo(np.float16).max:
df[col] = df[col].astype(np.float16)
elif c_min > np.finfo(np.float32).min and c_max <
np.finfo(np.float32).max:
df[col] = df[col].astype(np.float32)
else:
df[col] = df[col].astype(np.float64)

end_mem = df.memory_usage().sum() / 1024**2

print('Memory usage after optimization is: {:.2f} MB'.format(end_mem))
print('Decreased by {:.1f}%'.format(100 * (start_mem - end_mem) / start_mem))

return df
```

## Code:
```python
df = pd.read_csv('2019-Nov.csv')
```

## Code:
```python
df=reduce_mem_usage(df)
```

## Code:
```python
df.head()
```
## Code:
```python
df.info()
```

## Markdown:
# Rows

Each row in dataset describes an event.

There were around 6 crore 75 lakh rows present in initial dataset.

# Columns

There were 9 columns

## Code:
```python
#no of rows with null values
print("category_code ",df['category_code'].isnull().sum())
print("brand ",df['brand'].isnull().sum())
print("Both ",(df['category_code'].isnull() & df['brand'].isnull()).sum())
```

## Markdown:

* 21 Lakh with missing category code

* 92 Lakhs with missing branch
* Total 57 lakh with null values in category
code or branch or both

Since we have ample data we drop the data with null values and data is now reduced
to 4 crores

## Code:
```python
df = df.dropna()
```

## Code:
```python
df.shape
```
## Markdown:
# No of visitors by date

To analyze the number of visitors by date we have grouped the dataset by columns
event_time and user_id.

The no.of visitors on each date is extracted and shown in the graph below.

## Code:
```python
#No of visitors by date
data = df.loc[:,['event_time','user_id']]
#Extracting only dates
data['event_time'] = data['event_time'].apply(lambda s: str(s)[0:10])
visitor_by_date = data.drop_duplicates().groupby(['event_time'])
['user_id'].agg(['count']).sort_values(by=['event_time'], ascending=True)
x = pd.Series(visitor_by_date.index.values).apply(lambda s: datetime.strptime(s,
'%Y-%m-%d').date())
y = visitor_by_date['count']
plt.rcParams['figure.figsize'] = (20,8)

plt.plot(x,y)
plt.show()
```

## Markdown:

* There is a peak in the curve on 17th Nov 2019(Sunday).Based on the traffic we

can infer that people are mostly free on sunday to shop.
* Another peak could be found on 29th Nov 2019 which was Black Friday where the
ecommerce websites displays the sneakpeak of all offers during the sale.

## Markdown:
# Most bought brand

## Code:
```python
df['brand'].value_counts()
df['event_type'].value_counts()
```

## Code:
```python
title_type = df.groupby('brand').agg('count')
print(title_type)
type_labels = title_type.user_id.sort_values().index
type_counts = title_type.user_id.sort_values()
plt.figure(1,figsize =(20,10))
the_grid = GridSpec(2,2)
cmap = plt.get_cmap('Spectral')
colors = [cmap(i) for i in np.linspace(0,1,8)]
plt.subplot(the_grid[0,1],aspect=1,title = 'Brand titles')
type_show_ids = plt.pie(type_counts,labels = type_labels,autopct = '%1.1f%%',shadow
= True,colors = colors)
plt.show()

```

## Markdown:

The above *piechart* shows the popularity of brands in the market with *"Samsung"*
being the top brand

## Markdown:
# Popular product categories

A squarify plot is used to visually represent which categories of product has drawn
more demand from customer.

Most of the items are given two category codes separated by a point.First word
depicts the main category of the item whereas the following word depicts its sub
category.

## Code:
```python

top_category_n = 30
top_category = df.loc[:,'category_code'].value_counts()
[:top_category_n].sort_values(ascending=False)
squarify.plot(sizes=top_category, label=top_category.index.array,
color=["red","cyan","green","orange","blue","grey"], alpha=.7 )
plt.axis('off')
plt.show()
```

## Markdown:
"Smartphones" which comes under electronics goods are more popular.

A huge fraction of items bought are electronics which concludes there have been
major discounts and price deals available on ecommerce platform.
## Code:
```python
labels = ['view', 'cart','purchase']
size = df['event_type'].value_counts()
colors = ['yellowgreen', 'lightskyblue','lightcoral']
explode = [0, 0.1,0.1]

plt.rcParams['figure.figsize'] = (8, 8)
plt.pie(size, colors = colors, explode = explode, labels = labels, shadow = True,
autopct = '%.2f%%')
plt.title('Event_Type', fontsize = 20)
plt.axis('off')
plt.legend()
plt.show()
```

## Markdown:
# Conversion Rates
We have three type of events view,add to cart and purchase.Not every user view ,add
it to the cart and purchase it.Most users tend to have look at the product and its
price.

Conversion rates gives us the idea of how many users actually purchased the product
as opposed to how many times the products are viewed or added to the cart.

How many products are actually purchased as opposed to number of products added to
the cart.

We find

1.Count of people who viewed the item.

2.Count of people who added the item to cart.

3.Count of people who bought the item.

## Code:
```python
view_count = df['event_type'].value_counts()[0]
cart_count = df['event_type'].value_counts()[1]
purchase_count = df['event_type'].value_counts()[2]
print("Rate of conversion between view and purchase events"
+str((purchase_count/view_count)*100) +'%')
print("Rate of conversion between view and add to cart events"
+str((cart_count/view_count)*100) +'%')
print("Rate of conversion between add to cart and purchase events"
+str((purchase_count/cart_count)*100) +'%')
```

## Markdown:
Rate of conversion from view to purchase is 1.67%

Rate of conversion from view to add to cart is 5.38%

Rate of conversion of buying the item which is added to the cart is 31.16%

There are some cases where no carting data is recorded before purchase which
indicates that in some cases the customer directly purchases the product without
adding it to the cart.

## Markdown:
# Brandwise sales of all event types

## Code:
```python
#Brandwise sales of all event types
df['brand'].value_counts().head(50).plot.bar(figsize = (18,7))
plt.title('Top brand',fontsize = 20)
plt.xlabel('Names of brand')
plt.ylabel('Count')
plt.show()

```

## Markdown:
From the above plot we infer "Samsung" is the top brand where we consider all type
of events(view,cart and purchase)

## Markdown:
We consider only purchase events which tells us which brand is the first player in
the market

## Code:
```python

d = df.loc[df['event_type'].isin(['purchase'])].drop_duplicates()
print(d['brand'].value_counts())
d['brand'].value_counts().head(70).plot.bar(figsize =(18,7))
plt.xlabel('Names of brand')
plt.ylabel('Count')
plt.show()

```

## Markdown:
* As seen in the graph samsung is again the top in the market closely followed by
apple.

* A number of brands with only one product sale each including cameo,Imetec and
Zapco take the last position.

## Code:
```python
top_player = df['brand'].value_counts()[0]
second_player = df['brand'].value_counts()[1]
last_player = df['brand'].value_counts()[-1]
print("Top brand saw " +str((top_player/second_player)*100)+"%more sales than
second_player in the market")
print("Top brand saw " +str((top_player/last_player)*100)+"%more sales than bottom
player in the market")
```

## Markdown:
# Purchase path
The standard idea is that most people first view an item,compare with other item
and add to cart to buy specific item.not many people follow this path.

Given below is an example of customer who followed the path view->cart->purchase

## Code:
```python
df.loc[df.user_session =="ef3daa59-4936-43e5-a530-32902f64b2f4"].sort_values(by =
"event_time")
```

## Markdown:
# User's journey

The code below shows the user who purchased a apple product after which he views
other products manufactured by the same company "apple".

## Code:
```python
user_ID = 518267348
df.loc[df['user_id'] == user_ID]
```
## Markdown:
The user below views an android phone ,purchases the same and goes on to view other
apple products clock and phone. He buys the apple clock.

The inference is apple customers wear brand loyalty.Apple customers have only
viewed other products of apple whereas android customers view other company
products also.

## Code:
```python
user_ID = 513351129
df.loc[df['user_id'] == user_ID]
```

## Markdown:
# Preparing data

To predict whether the product added to the cart is actually purchased by the
customer based on factors such as its category,event_weekday,activity of the user
in that session etc

We added some new features in training data

* category_code_level1 - category
* category_code_level2 - subcategory
* event_weekday - weekday of the event
* activity_count - no.of activity in that session

* is_purchased - whether the put in cart item is purchased.

The training dataset contains every non-duplicated cart transactions with above
mentioned features.We will use these features with original price and brand to
predict whether the customer will eventuall purchase the item in the cart.

## Markdown:
#List of people who has bought or added products to the cart

## Code:
```python
#List of people who has bought or added products to the cart
cart_purchase_users =
df.loc[df["event_type"].isin(["cart","purchase"])].drop_duplicates(subset=['user_id
'])
cart_purchase_users.dropna(how='any', inplace=True)
print(cart_purchase_users)
```

## Markdown:
#All activities of above users including view event

## Code:
```python
cart_purchase_users_all_activity =
df.loc[df['user_id'].isin(cart_purchase_users['user_id'])]
print(cart_purchase_users_all_activity)
```

## Markdown:
#Counting no of activities in one session

## Code:
```python

activity_in_session = cart_purchase_users_all_activity.groupby(['user_session'])
['event_type'].count().reset_index()
activity_in_session =
activity_in_session.rename(columns={"event_type":"activity_count"})
print(activity_in_session)
```

## Markdown:
Extract event date from event_time column and find on which date the activity
occurs

## Code:
```python
def convert_time_to_date(utc_timestamp):
utc_date = datetime.strptime(utc_timestamp[0:10],'%Y-%m-%d').date()
return utc_date
```

## Code:
```python
df['event_date'] = df['event_time'].apply(lambda s:convert_time_to_date(s))
```
## Markdown:
Splitting of category and sub category is done by string handling

## Code:
```python
df_targets =
df.loc[df["event_type"].isin(["cart","purchase"])].drop_duplicates(subset =
['event_type','product_id','price','user_id','user_session'])
df_targets["is_purchased"] = np.where(df_targets["event_type"]=="purchase",1,0)
df_targets["is_purchased"] = df_targets.groupby(["user_session","product_id"])
["is_purchased"].transform("max")
df_targets =
df_targets.loc[df_targets["event_type"]=='cart'].drop_duplicates(["user_session","p
roduct_id","is_purchased"])
df_targets['event_weekday'] = df_targets['event_date'].apply(lambda s:s.weekday())
df_targets.dropna(how = 'any',inplace = True)
df_targets["category_code_level1"] =
df_targets["category_code"].str.split(",",expand = True)[0].astype('category')
df_targets["category_code_level2"] =
df_targets["category_code"].str.split(",",expand = True)[1].astype('category')
```

## Code:
```python
df_targets = df_targets.merge(activity_in_session,on = 'user_session',how ='left')
df_targets['activity_count'] = df_targets['activity_count'].fillna(0)
df_targets.head()
```

## Code:
```python
df_targets.info()
```

## Code:
```python
#Saving a copy of preprocessed data
df_targets.to_csv('training_data.csv')
```

## Code:
```python
df_targets = pd.read_csv('training_data.csv')
```
## Code:
```python
df_targets.head()
```

## Markdown:
#Resampling data to have equal no of purchased and not purchased items

no.of rows when the item was purchased was around 5 lakh and not purchased was
around 8 lakh.

To balance data resampling is done

## Code:
```python

is_purchase_set = df_targets[df_targets['is_purchased'] == 1]
is_purchase_set.shape[0]
```

## Code:
```python
not_purchase_set = df_targets[df_targets['is_purchased'] == 0]
not_purchase_set.shape[0]
```

## Code:
```python
n_samples = 500000
is_purchase_downsampled = resample(is_purchase_set,replace = False,n_samples =
n_samples,random_state = 27)
not_purchase_set_downsampled = resample(not_purchase_set,replace = False,n_samples
= n_samples,random_state = 27)
```

## Code:
```python
downsampled = pd.concat([is_purchase_downsampled,not_purchase_set_downsampled])
downsampled['is_purchased'].value_counts()
```

## Code:
```python
features =
downsampled[['brand','price','event_weekday','category_code_level1','category_code_
level2','activity_count']]
```

## Markdown:
#Encoding categorical attributes

## Code:
```python

features.loc[:,'brand']
=LabelEncoder().fit_transform(downsampled.loc[:,'brand'].copy())
features.loc[:,'event_weekday'] =
LabelEncoder().fit_transform(downsampled.loc[:,'event_weekday'].copy()
features.loc[:,'category_code_level1'] =
LabelEncoder().fit_transform(downsampled.loc[:,'category_code_level1'].copy()
features.loc[:,'category_code_level2'] =
LabelEncoder().fit_transform(downsampled.loc[:,'category_code_level2'].copy()
is_purchased = LabelEncoder().fit_transform(downsampled['is_purchased'])
features.head()

```

## Code:
```python
features.info()
```

## Markdown:
#Hypothesis testing - atrribute dependence

## Code:
```python

df.head()
```

## Markdown:
#Chi square test - association between two attributes

Weekday vs price

## Code:
```python

print()
print("Chi square test")
#Event weekday and price
table = pd.crosstab(features['event_weekday'],features['price'],margins = False)
stat,p,dof,expected = chi2_contigency(table)
alpha = 0.05
print("For weekday of the event and price")
print("p value is " +str(p))
if p >= alpha:
print("No significant association between these attributes -H0 holds true")
else:
print("significant association between these attributes -H0 is rejected")

```

## Markdown:
user_id vs category_id

## Code:
```python
d1 =df[:100000]
table1 = pd.crosstab(d1['category_id'],d1['user_id'],margins = False)
stat,p,dof,expected = chi2_contigency(table1)
alpha = 0.05
print("For category code and user")
print("p value is " +str(p))
if p >= alpha:
print("No significant association between these attributes -H0 holds true")
else:
print("significant association between these attributes -H0 is rejected")

```

## Markdown:
#Time series analysis

## Code:
```python

parse_date = ['date']
parse_time = ['time']
df.head()

```

## Code:
```python
timeseries_df = df[['date','event_type']]
plt.plot('xlabel','ylabel',data =timeseries_df)
plt.show()
```

## Markdown:
#Covariance matrix

## Code:
```python

matrix =
downsampled[['brand','price','event_weekday','category_code_level1','category_code_
level2','activity_count','is_purchased']]
matrix.loc[:,'brand'] =
LabelEncoder().fit_transform(downsampled.loc[:,'brand'].copy())
matrix.loc[:,'event_weekday'] =
LabelEncoder().fit_transform(downsampled.loc[:,'event_weekday'].copy()
matrix.loc[:,'category_code_level1'] =
LabelEncoder().fit_transform(downsampled.loc[:,'category_code_level1'].copy()
matrix.loc[:,'category_code_level2'] =
LabelEncoder().fit_transform(downsampled.loc[:,'category_code_level2'].copy()
matrix.head()
```

## Code:
```python
cov_matrix = matrix.cov()
sns.heatmap(cov_matrix,annot = True)
plt.show()
```

## Markdown:
#Correlation matrix

## Code:
```python

corr_matrix = matrix.corr()
sns.heatmap(corr_matrix,annot = True)
plt.show()
```

## Markdown:
#ML models

## Code:
```python

#Train Test split

X_train ,X_test ,y_train,y_test = train_test_split(features,is_purchased,test_size
= 0.25,random_state =0)
```

## Markdown:
#Decision tree classification

## Code:
```python

from sklearn.tree import DecisionTreeClassifier

dt_model = DecisionTreeClassifier
dt_model = dt_model.fit(X_train,y_train)
y_pred = dt_model.predict(X_test)

```

## Code:
```python
print("Accuracy",metrics.accuracy_score(y_test,y_pred))
print("Precision",metrics.precision_score(y_test,y_pred))
print("Recall",metrics.recall_score(y_test,y_pred))
print("fbeta",metrics.fbeta_score(y_test,y_pred,average = 'weighted',beta=0.5))
```

## Markdown:
#XGBoost classification

## Code:
```python

from xgboost import XGBClassifier

xg_model = XGBClassifier(learning rate = 0.1)
xg_model.fit(X_train,y_train)
y_pred = xg_model.predict(X_test)
```

## Code:
```python

from sklearn.linear_model import LogisticRegression

log_model = LogisticRegression()
log_model = log_model.fit(X_train,y_train)
y_pred = log_model.predict(X_test)
```

Problem scenario
No ratings yet
Problem scenario
13 pages
Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Quantum Dissipative Systems
100% (3)
Quantum Dissipative Systems
527 pages
Weka Tutorial 3
No ratings yet
Weka Tutorial 3
60 pages
A Lazy Layman's Guide To Quantum Physics PDF
No ratings yet
A Lazy Layman's Guide To Quantum Physics PDF
6 pages
Axe Submission
No ratings yet
Axe Submission
4 pages
Assgn
No ratings yet
Assgn
6 pages
Guides
No ratings yet
Guides
23 pages
Laptop Price Prediction
No ratings yet
Laptop Price Prediction
15 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
27 Jupyter Notebook
No ratings yet
27 Jupyter Notebook
42 pages
Customer Segmentation PDF
No ratings yet
Customer Segmentation PDF
18 pages
Registro da analise de dataset de laptops
No ratings yet
Registro da analise de dataset de laptops
1 page
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
100% (1)
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
14 pages
BigMart Sales Data Analysis
No ratings yet
BigMart Sales Data Analysis
16 pages
Advance Data Analytics ASSIGNMENT
No ratings yet
Advance Data Analytics ASSIGNMENT
10 pages
Amazon Apparel PDF
No ratings yet
Amazon Apparel PDF
138 pages
plate-notebook-guided-project-1-1
No ratings yet
plate-notebook-guided-project-1-1
58 pages
Case Study Module 1
No ratings yet
Case Study Module 1
4 pages
Task 6
No ratings yet
Task 6
14 pages
Assvid
No ratings yet
Assvid
13 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
30 pages
UI21CS29_Lab2
No ratings yet
UI21CS29_Lab2
11 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
SalesDataAnalysis__1693296057
No ratings yet
SalesDataAnalysis__1693296057
14 pages
Project Sale Analysis
No ratings yet
Project Sale Analysis
8 pages
Diwali Sales Analysis EDA 1696347982
No ratings yet
Diwali Sales Analysis EDA 1696347982
8 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
B Tech-AIML-question bank-2 Answer Key
No ratings yet
B Tech-AIML-question bank-2 Answer Key
9 pages
Capstone CLA1
No ratings yet
Capstone CLA1
16 pages
Data Science Tutorial 1686911993
No ratings yet
Data Science Tutorial 1686911993
41 pages
Supermarket Sales Data analysis
No ratings yet
Supermarket Sales Data analysis
6 pages
1data Cleansing Cheklist
No ratings yet
1data Cleansing Cheklist
2 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
PYF_Project_LearnerNotebook_LowCode
No ratings yet
PYF_Project_LearnerNotebook_LowCode
6 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Nikita Prasad - Exploratory Data Analysis (EDA)
No ratings yet
Nikita Prasad - Exploratory Data Analysis (EDA)
18 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Data Science Papers
No ratings yet
Data Science Papers
109 pages
IIM PBA Assignment 2
No ratings yet
IIM PBA Assignment 2
3 pages
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
No ratings yet
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
10 pages
Tasks for Students (1)
No ratings yet
Tasks for Students (1)
4 pages
COMP1810 - Data and Web Analytics
100% (1)
COMP1810 - Data and Web Analytics
47 pages
Pandas Roadmap
No ratings yet
Pandas Roadmap
6 pages
In Tenshi PPP Tte Jum Am
No ratings yet
In Tenshi PPP Tte Jum Am
23 pages
A - B Testing
No ratings yet
A - B Testing
15 pages
EDA_INDEPTH
No ratings yet
EDA_INDEPTH
19 pages
Balaji 1
No ratings yet
Balaji 1
30 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
EDA LAB ASSIGNMENT2
No ratings yet
EDA LAB ASSIGNMENT2
10 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
1Demand
No ratings yet
1Demand
13 pages
DS1
No ratings yet
DS1
20 pages
SalesMgmtSystem XII IP Projectreport 2022 23
No ratings yet
SalesMgmtSystem XII IP Projectreport 2022 23
18 pages
Complete Guide to Exploratory Data Analysis With Python Plotly _ by Anar Abiyev _ Mar, 2022 _ Medium
No ratings yet
Complete Guide to Exploratory Data Analysis With Python Plotly _ by Anar Abiyev _ Mar, 2022 _ Medium
11 pages
ML lab manual 1-10
No ratings yet
ML lab manual 1-10
58 pages
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
DSA Practical
No ratings yet
DSA Practical
76 pages
Image Steganography
No ratings yet
Image Steganography
25 pages
Solutions To Assignment 1
No ratings yet
Solutions To Assignment 1
5 pages
Math1324 CH 7 Sample Lab-22-23
No ratings yet
Math1324 CH 7 Sample Lab-22-23
7 pages
DSP Lab Manual 2024-25
No ratings yet
DSP Lab Manual 2024-25
27 pages
Evaluation of Information Retrieval Systems PDF
100% (1)
Evaluation of Information Retrieval Systems PDF
2 pages
Advanced Techniques For Fault Detection and Classification in Electrical Power Transmission Systems: An Overview
No ratings yet
Advanced Techniques For Fault Detection and Classification in Electrical Power Transmission Systems: An Overview
10 pages
Jacob White: Introduction To Simulation - Lecture 8
No ratings yet
Jacob White: Introduction To Simulation - Lecture 8
33 pages
Cia 1 - Aiml
No ratings yet
Cia 1 - Aiml
3 pages
Flowchart Algorithm
No ratings yet
Flowchart Algorithm
13 pages
Introduction To Weka
No ratings yet
Introduction To Weka
38 pages
Rightpdf 100 Percent Maths cl-9 ch-2 Ty 2023 Watermark Unlocked
No ratings yet
Rightpdf 100 Percent Maths cl-9 ch-2 Ty 2023 Watermark Unlocked
5 pages
Explainable Ai Recipes Implement Solutions To Model Explainability And Interpretability With Python 1st Edition Pradeeptamishra download
No ratings yet
Explainable Ai Recipes Implement Solutions To Model Explainability And Interpretability With Python 1st Edition Pradeeptamishra download
87 pages
Espinosa Et Al. (2023) - Predictability and Financial Sufficiency in Colombia - Bayesian Approach
No ratings yet
Espinosa Et Al. (2023) - Predictability and Financial Sufficiency in Colombia - Bayesian Approach
18 pages
HW2 Written Sol
No ratings yet
HW2 Written Sol
4 pages
UNIT V (1)
No ratings yet
UNIT V (1)
25 pages
Book
No ratings yet
Book
96 pages
ifem.ch29
No ratings yet
ifem.ch29
6 pages
DSP Revision 2022-2023
No ratings yet
DSP Revision 2022-2023
17 pages
Daftar Mata Kuliah Ps S1 Statistika
No ratings yet
Daftar Mata Kuliah Ps S1 Statistika
5 pages
Discrete-Time Models
No ratings yet
Discrete-Time Models
17 pages
3 Deferent Types of Problem
No ratings yet
3 Deferent Types of Problem
6 pages
An Introduction To Computer Simulation Methods: Harvey Gould, Jan Tobochnik, and Wolfgang Christian July 31, 2005
No ratings yet
An Introduction To Computer Simulation Methods: Harvey Gould, Jan Tobochnik, and Wolfgang Christian July 31, 2005
8 pages
05-FACTORIAL DESIGN (PROBLEMS WITH SOLUTIONS)
No ratings yet
05-FACTORIAL DESIGN (PROBLEMS WITH SOLUTIONS)
14 pages
Palaeocurrent Direction From FMI 5941703 01
No ratings yet
Palaeocurrent Direction From FMI 5941703 01
3 pages
CH 14 Game Theory
No ratings yet
CH 14 Game Theory
33 pages
Week 4
No ratings yet
Week 4
13 pages

Uploaded by

Uploaded by

## Code:

Ecommerce behaviour data from multi category store.

Dataset contains events of 285 million users from ecommerce website.

For purpose of analysis 2019 dataset is used.

end_mem = df.memory_usage().sum() / 1024**2

Each row in dataset describes an event.

There were around 6 crore 75 lakh rows present in initial dataset.

There were 9 columns

* 21 Lakh with missing category code

* There is a peak in the curve on 17th Nov 2019(Sunday).Based on the traffic we

1.Count of people who viewed the item.

2.Count of people who added the item to cart.

3.Count of people who bought the item.

Rate of conversion from view to add to cart is 5.38%

Given below is an example of customer who followed the path view->cart->purchase

We added some new features in training data

* is_purchased - whether the put in cart item is purchased.

To balance data resampling is done

#Train Test split

from sklearn.tree import DecisionTreeClassifier

from xgboost import XGBClassifier

from sklearn.linear_model import LogisticRegression

You might also like