100% found this document useful (2 votes)

918 views44 pages

Data Mini Proj

The document describes analyzing customer data from a bank using clustering techniques. It first performs exploratory data analysis on the data, finding it has 7 variables and 210 records with no missing values. Univariate analysis shows the variables are numeric and right skewed. Bivariate analysis finds strong positive correlations between variables like spending and advance payments. The document applies hierarchical and k-means clustering with 3 clusters identified as optimal. Customer profiles are described for high, medium, and low spending groups with promotional strategy recommendations.

Uploaded by

Zohaib Imam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

918 views44 pages

Data Mini Proj

Uploaded by

Zohaib Imam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

BUSINESS REPORT

On Data Mining (Clustering, CART-RF-ANN)

By - Zohaib Imam.

Problem 1: Clustering
A leading bank wants to develop a customer segmentation to give promotional offers to its
customers. They collected a sample that summarizes the activities of users during the past few
months. You are given the task to identify the segments based on credit card usage.

1.1 Read the data, do the necessary initial steps, and exploratory data analysis (Univariate,
Bi-variate, and multivariate analysis).

Initial Observation (EDA):

• There are 7 variables and 210 records.
• No missing record based on initial analysis.
• All the variables numeric type.
• There are no missing values.
• There are no duplicate rows.

Univariate Analysis:
Observation:
• Based on summary descriptive, the data looks good.
• We see for most of the variable, mean/medium are nearly equal.
• Include a 90% to see variations and it looks distributely evenly.
• Std Deviation is high for spending variable.

• Minimum spending is 10.59

• Maximum spending is 21.18
• Mean value is 14.847523809523818
• Median value: 14.355
• Standard deviation is 2.90969943068736
• Spending seems to be right skewed
• There are no outliers present in spending
• Minimum advance_payments is 12.41
• Maximum advance_payments is 17.25
• Mean value is 14.559285714285727
• Median value is 14.32
• Standard deviation is 1.305958726564022
• advance_payments seem to be right skewed
• There are no outliers present in advance_payments

• Minimum probability_of_full_payment is 0.8081

• Maximum probability_of_full_payment is 0.9183
• Mean value is 0.8709985714285714
• Median value is 0.8734500000000001
• Standard deviation is 0.023629416583846496
• probability_of_full_payment seem to be slightly left skewed
• There are 3 outliers in probability_of_full_payment lower
• Mean value of current_balance is 5.628533333333334
• Median value of current_balance is 5.5235
• Standard deviation of current_balance is 0.4430634777264493
• current_balance seem to be right skewed
• There are no outliers present

• Minimum credit_limit: 2.63

• Maximum credit_limit: 4.033
• Mean value: 3.258604761904763
• Median value: 3.237
• Standard deviation: 0.3777144449065874
• credit_limit seem to right skewed
• There are no outliers present
• Minimum max_spent_in_single_shopping is 4.519
• Maximum max_spent_in_single_shoppings is 6.55
• Mean value is 5.408071428571429
• Median value is 5.223000000000001
• Standard deviation is 0.4914804991024054
• max_spent_in_single_shopping seems to be left skewed
• There are no outliers present

• Minimum min_payment_amt: 0.7651

• Maximum min_payment_amt: 8.456
• Mean value: 3.7002009523809507
• Median value: 3.599
• Standard deviation: 1.5035571308217792
• min_payment_amt seems to be very slight right skewed
• There are 2 outliers present in min_payment_amt upper

Bi-Variate Analysis:
Observations
- Credit limit average is around $3.258(10000s).
- Distribution is skewed to right tail for all the variable except probability_of_full_payment variable,
which has left tail.
Multivariate analysis:
Observation:
- Strong positive correlation between:
• spending & advance_payments
• advance_payments & current_balance
• credit_limit & spending
• spending & current_balance
• credit_limit & advance_payments
• max_spent_in_single_shopping current_balance
1.2 Do you think scaling is necessary for clustering in this case? Justify.
• Scaling needs to be done as the values of the variables are different.
• spending, advance_payments are in different values and this may get more weightage.
• Scaling will have all the values in the relative same range.
• I have used zscore to standarised the data to relative same scale -3 to +3.

1.3 Apply hierarchical clustering to scaled data. Identify the number of optimum clusters
using Dendrogram and briefly describe them.
Dendrogram (Ward link):
Observation:
• Both the method is almost similar means, minor variation, which we know it occurs.
• We for cluster grouping based on the dendrogram, 3 or 4 looks good. Did the further analysis,
and based on the dataset had gone for 3 group cluster solution based on the hierarchical
clustering.
• Also in real time, there could have been more variables value captured - tenure,
BALANCE_FREQUENCY, balance, purchase, installment of purchase, others.
• And three group cluster solution gives a pattern based on high/medium/low spending with
max_spent_in_single_shopping (high value item) and probability_of_full_payment(payment
made).
1.4 Apply K-Means clustering on scaled data and determine optimum clusters. Apply elbow
curve and silhouette score.

From SC Score, the number of optimal clusters could be 3 or 4.

K-Means Clustering & Cluster Information:
3-Cluster:

4-Cluster:

1.5 Describe cluster profiles for the clusters defined. Recommend different promotional
strategies for different clusters.
3 group cluster via Kmeans:
3 group cluster via hierarchical clustering:

Cluster Group Profiles:

Group 1: High Spending
Group 3: Medium Spending
Group 2: Low Spending

Promotional strategies for each cluster:

Group 1: High Spending Group
• Giving any reward points might increase their purchases.
• maximum max_spent_in_single_shopping is high for this group, so can be offered discount/offer
on next transactions upon full payment.
• Increase their credit limit and
• Increase spending habits.
• Give loan against the credit card, as they are customers with good repayment record.
• Tie up with luxury brands, which will drive more one_time_maximun spending.

Group 3: Medium Spending Group

• They are potential target customers who are paying bills and doing purchases and maintaining
comparatively good credit score. So, we can increase credit limit or can lower down interest
rate.
• Promote premium cards/loyalty cars to increase transactions.
• Increase spending habits by trying with premium ecommerce sites, travel portal, travel
airlines/hotel, as this will encourage them to spend more.

Group 2: Low Spending Group

• customers should be given remainders for payments. Offers can be provided on early payments
to improve their payment rate.
• Increase their spending habits by tying up with grocery stores, utilities (electricity, phone, gas,
others).
Problem 2: CART-RF-ANN
An Insurance firm providing tour insurance is facing higher claim frequency. The management
decides to collect data from the past few years. You are assigned the task to make a model
which predicts the claim status and provide recommendations to management. Use CART, RF &
ANN and compare the models' performances in train and test sets.

2.1 Read the data, do the necessary initial steps, and exploratory data analysis (Univariate, Bi-variate,
and multivariate analysis).

Initial Observation (EDA):

• 10 variables
• Age, Commision, Duration, Sales are numeric variable
• Rest are categorial variable
• 3000 records and no missing one
• 9 independent variable and one target variable – Claimed
• No missing value
• duration has negative value; it is not possible. Wrong entry
• Commision & Sales- mean and median varies significantly
• Categorial code variable maximum unique count is 5
• Data looks good at first glance
• Though it shows there are 139 records, but it can be of different customers, there is no
customer ID or any unique identifier, so not going to drop them off
Univariate Analysis:

• Minimum Age: 8
• Maximum Age: 84
• Mean value: 38.091
• Median value: 36.0
• Standard deviation: 10.463518245377944
• It seems to be right skewed
• Too many outliers present in lower as well as upper
• Minimum Commision: 0.0
• Maximum Commision: 210.21
• Mean value: 14.529203333333266
• Median value: 4.63
• Standard deviation: 25.48145450662553
• It is right skewed
• Too many outliers are present in upper
• Minimum Duration: -1
• Maximum Duration: 4580
• Mean value: 70.00133333333333
• Median value: 26.5
• Standard deviation: 134.05331313253495
• It is right skewed
• Many outliers are present in upper
• Minimum Sales: 0.0
• Maximum Sales: 539.0
• Mean value: 60.24991333333344
• Median value: 33.0
• Standard deviation: 70.73395353143047
• It is right skewed
• Too many outliers present in upper

There are outliers in all the variables, but the sales and commission can be a genuine business value.
Random Forest and CART can handle the outliers. Hence, Outliers are not treated for now, we will
keep the data as it is.
Bi-Variate Analysis:

Agency_Code
Type
Channel
Product Name
Destination
Multivariate Analysis:
There is not much collinearity between any variable. But among them Sales & Commision has the
highest collinearity of 0.77.
2.2 Data Split: Split the data into test and train, build classification model CART, Random
Forest, Artificial Neural Network.

Decision Tree Classifier

• {'criterion': 'gini', 'max_depth': 10, 'min_samples_leaf': 50, 'min_samples_split': 450}

DecisionTreeClassifier(max_depth=10, min_samples_leaf=50, min_samples_split=450,

random_state=1)

• {'criterion': 'gini', 'max_depth': 5, 'min_samples_leaf': 20, 'min_samples_split': 150}

DecisionTreeClassifier(max_depth=5, min_samples_leaf=20, min_samples_split=150,

random_state=1)

• {'criterion': 'gini', 'max_depth': 3.5, 'min_samples_leaf': 44, 'min_samples_split': 250}

DecisionTreeClassifier(max_depth=3.5, min_samples_leaf=44, min_samples_split=250,

random_state=1)

• {'criterion': 'gini', 'max_depth': 4.85, 'min_samples_leaf': 44, 'min_samples_split': 260}

DecisionTreeClassifier(max_depth=4.85, min_samples_leaf=44, min_samples_split=260,

random_state=1)

Variable Importance-Decision Tree Classifier

Predicted Classes and Probs

Random Forest Classifier

• {'max_depth': 6, 'max_features': 3, 'min_samples_leaf': 8, 'min_samples_split': 46,
'n_estimators': 350}

RandomForestClassifier(max_depth=6, max_features=3, min_samples_leaf=8,

min_samples_split=46, n_estimators=350, random_state=1)

Predicted Classes and Probs

Variable Importance via Random Forest

Neural Network Classifier

MLPClassifier(hidden_layer_sizes=200, max_iter=2500, random_state=1, tol=0.01)

Predicted Classes and Probs

2.3 Performance Metrics: Comment and Check the performance of Predictions on Train and
Test sets using Accuracy, Confusion Matrix, Plot ROC curve and get ROC_AUC score,
classification reports for each model.

Cart Model:
Train Data (ROC curve)

Test Data (ROC curve)

Confusion Matrix (Train data)

array ([[1309, 144],

[ 307, 340]], dtype=int64)

Confusion Matrix (Test data)

array ([[553, 70],

[136, 141]], dtype=int64)

Train Data:
• AUC: 82%
• Accuracy: 79%
• Precision: 70%
• f1-Score: 60%

Test Data:
• AUC: 80%
• Accuracy: 77%
• Precision: 80%
• f1-Score: 84%

Random Forest Model:

Train Data (ROC curve)

Test Data (ROC curve)

Confusion Matrix (Train data)

array ([[1297, 156],

[ 255, 392]], dtype=int64

Confusion Matrix (Test data)

array ([[550, 73],

[121, 156]], dtype=int64)

Train Data
• AUC: 86%
• Accuracy: 80%
• Precision: 72%
• f1-Score: 66%

Test Data
• AUC: 82%
• Accuracy: 78%
• Precision: 68%
• f1-Score: 62
Training and Test set results are almost similar, and with the overall measures high, the model is a
good model.
Change is again the most important variable for predicting diabetes.

NN Model:
Train Data (ROC curve)

Test Data (ROC curve)

Confusion Matrix (Train data)
array ([[1298, 155],

[ 315, 332]], dtype=int64)

Confusion Matrix (Test data)

array ([[553, 70],

[138, 139]], dtype=int64)

Train Data:
• AUC: 82%
• Accuracy: 78%
• Precision: 68%
• f1-Score: 59

Test Data:
• AUC: 80%
• Accuracy: 77%
• Precision: 67%
• f1-Score: 57%

Training and Test set results are almost similar, and with the overall measures high, the model is a
good model.

2.4 Final Model: Compare all the models and write an inference which model is
best/optimized.

Comparison of the performance metrics from the 3 models.

ROC Curve for the 3 models on the Training data.

ROC Curve for the 3 models on the Test data.

CONCLUSION:
Selecting the RF model, as it has better accuracy, precision, recall, f1 score better than other two CART &
NN.
2.5 Inference: Based on the whole Analysis, what are the business insights and
recommendations.

I strongly recommended we collect more real time unstructured data and past data if possible.

This is understood by looking at the insurance data by drawing relations between different variables such as day of
the incident, time, age group, and associating it with other external information such as location, behavior
patterns, weather information, airline/vehicle types, etc.

• Streamlining online experiences benefitted customers, leading to an increase in conversions, which

subsequently raised profits.
• As per the data 90% of insurance is done by online channel.
• Other interesting fact, is almost all the offline business has a claimed associated, need to find why?
• Need to train the JZI agency resources to pick up sales as they are in bottom, need to run promotional
marketing campaign or evaluate if we need to tie up with alternate agency.
• Also based on the model we are getting 80%accuracy, so we need customer books airline tickets or plans,
cross sell the insurance based on the claim data pattern.
• Other interesting fact is more sales happen via Agency than Airlines and the trend shows the claim are
processed more at Airline. So, we may need to deep dive into the process to understand the workflow
and why?

Key performance indicators (KPI) The KPI’s of insurance claims are:

• Reduce claims cycle time
• Increase customer satisfaction
• Combat fraud
• Optimize claims recovery
• Reduce claim handling costs Insights gained from data and AI-powered analytics could expand the
boundaries of insurability, extend existing products, and give rise to new risk transfer solutions in areas
like a non-damage business interruption and reputational damage.

X-Air Hanuman Flight & Maintenance Manual OCT07
No ratings yet
X-Air Hanuman Flight & Maintenance Manual OCT07
55 pages
Wincap Pricelist 26.11.2020
No ratings yet
Wincap Pricelist 26.11.2020
6 pages
orient-pumps
No ratings yet
orient-pumps
32 pages
Bis 13 1294 Uk Cyber Security Standards Research Report
No ratings yet
Bis 13 1294 Uk Cyber Security Standards Research Report
105 pages
Linear Regression: Prepared by Muralidharan N
77% (13)
Linear Regression: Prepared by Muralidharan N
34 pages
MUST FM REPORT
No ratings yet
MUST FM REPORT
9 pages
A Research in Proxy Chain
No ratings yet
A Research in Proxy Chain
58 pages
Enhanced Routing Technique For Military Dog Robot
No ratings yet
Enhanced Routing Technique For Military Dog Robot
12 pages
CM Forrester Total Economic Impact Ansible Analyst Paper f13019 201806 en
No ratings yet
CM Forrester Total Economic Impact Ansible Analyst Paper f13019 201806 en
20 pages
MAGNUM-HW User Manual 2v2
No ratings yet
MAGNUM-HW User Manual 2v2
34 pages
Resolving Inconsistency and Incompleteness Issues in Software Requirements
No ratings yet
Resolving Inconsistency and Incompleteness Issues in Software Requirements
19 pages
Research Proposal 2
No ratings yet
Research Proposal 2
3 pages
Hollywood at It's Best
No ratings yet
Hollywood at It's Best
11 pages
ENTS Assignment Questions
No ratings yet
ENTS Assignment Questions
3 pages
Truck Owner Operator Business Plan Example
No ratings yet
Truck Owner Operator Business Plan Example
52 pages
Pipelineman No.2882
No ratings yet
Pipelineman No.2882
3 pages
Dissertation Front Page Example Uk
100% (2)
Dissertation Front Page Example Uk
5 pages
TOS and TQ
No ratings yet
TOS and TQ
3 pages
Adv Stats Proj
95% (38)
Adv Stats Proj
25 pages
Cisco Telepresence SX20 and Lightware 7 Compressed
No ratings yet
Cisco Telepresence SX20 and Lightware 7 Compressed
8 pages
Agenda_AFE_Summit_21st_Jan_2025
No ratings yet
Agenda_AFE_Summit_21st_Jan_2025
4 pages
Weapons Worksheet
No ratings yet
Weapons Worksheet
1 page
Post Graduate Graduate Graduate Professional 12th Pass 10th Pass 8th Pass Doctorate Others Literate 5th Pass Illiterate
No ratings yet
Post Graduate Graduate Graduate Professional 12th Pass 10th Pass 8th Pass Doctorate Others Literate 5th Pass Illiterate
4 pages
Education - Post 12th Standard - CSV
88% (16)
Education - Post 12th Standard - CSV
11 pages
Education - Post 12th Standard - CSV
88% (16)
Education - Post 12th Standard - CSV
11 pages
Predictive Modelling
67% (3)
Predictive Modelling
64 pages
Data Privacy - NPC Case Digests (2019-2022) by Atty. Paolo Javier
100% (14)
Data Privacy - NPC Case Digests (2019-2022) by Atty. Paolo Javier
39 pages
BESM 4e Cyberpunk Book II Technomagic v1 4, OEF, 2022 10 04
No ratings yet
BESM 4e Cyberpunk Book II Technomagic v1 4, OEF, 2022 10 04
106 pages
Cold Storage Assignment Solution Ankur Jain
75% (8)
Cold Storage Assignment Solution Ankur Jain
6 pages
Artificial Intelligence Class 9 Q-A
No ratings yet
Artificial Intelligence Class 9 Q-A
6 pages
Brochure Balio Ox-360 en
No ratings yet
Brochure Balio Ox-360 en
4 pages
Data Visualization in Tableau - Car Insurance Claim Project
50% (2)
Data Visualization in Tableau - Car Insurance Claim Project
51 pages
How To - Oscam Compile Tutorial
No ratings yet
How To - Oscam Compile Tutorial
5 pages
Shivani Pandey TSF
100% (1)
Shivani Pandey TSF
32 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Advanced Statistics Project
17% (6)
Advanced Statistics Project
2 pages
MRA Milestone-1 Graded Project
100% (2)
MRA Milestone-1 Graded Project
41 pages
Handout: Course Information: CS 229 Machine Learning
No ratings yet
Handout: Course Information: CS 229 Machine Learning
4 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Marketing & Retail Analytics - Report - Part A
100% (2)
Marketing & Retail Analytics - Report - Part A
18 pages
Machine Learning Project: Raghul Harish
100% (2)
Machine Learning Project: Raghul Harish
46 pages
Advanced Statistics: Business Report Ranvijay Sharma
No ratings yet
Advanced Statistics: Business Report Ranvijay Sharma
16 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
100% (1)
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
47 pages
Detail Project Report SMDM
100% (1)
Detail Project Report SMDM
25 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
System Unit
No ratings yet
System Unit
27 pages
ML Project Report
100% (2)
ML Project Report
35 pages
Java Aws Devops Resume Template
0% (1)
Java Aws Devops Resume Template
4 pages
ML Ts Proj
100% (9)
ML Ts Proj
58 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
MRA - Project - Puvya - Ravi
100% (3)
MRA - Project - Puvya - Ravi
46 pages
Project Report
100% (3)
Project Report
36 pages
DVT Group Assignment PDF
100% (1)
DVT Group Assignment PDF
14 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Project Time Series Forecasting
100% (1)
Project Time Series Forecasting
53 pages
Linear - Regression - Assignment: Problem Statement
100% (3)
Linear - Regression - Assignment: Problem Statement
24 pages
Predictive Modeling
100% (1)
Predictive Modeling
22 pages
Tableau Questions
No ratings yet
Tableau Questions
2 pages
Report - Project8 - FRA - Surabhi - Report
100% (2)
Report - Project8 - FRA - Surabhi - Report
15 pages
Sunira - Predictive Modeling
100% (1)
Sunira - Predictive Modeling
65 pages
Advance Statistics Business Report
No ratings yet
Advance Statistics Business Report
15 pages
Machine Learning Business Report - Compress (AutoRecovered)
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
69 pages
Social Media Tourism - Capstone Project
No ratings yet
Social Media Tourism - Capstone Project
13 pages
Practice Questions for Tableau Desktop Specialist Certification Case Based
From Everand
Practice Questions for Tableau Desktop Specialist Certification Case Based
Exam OG
5/5 (1)
Drill Stem Test Part 1
No ratings yet
Drill Stem Test Part 1
31 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
Lifi
100% (1)
Lifi
16 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
AS Project Report
No ratings yet
AS Project Report
22 pages
DVT Alternate Project
50% (2)
DVT Alternate Project
1 page
MRA Project Milestone2 PDF
100% (1)
MRA Project Milestone2 PDF
1 page
FRA Report
100% (1)
FRA Report
30 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
Creo 11 MBD Ebook Interactive V9
No ratings yet
Creo 11 MBD Ebook Interactive V9
15 pages
Data Mining Quiz 1 Clustering
100% (2)
Data Mining Quiz 1 Clustering
4 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
As Quiz 3 PCA Solution PDF
100% (1)
As Quiz 3 PCA Solution PDF
1 page
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
Problem Statement1
No ratings yet
Problem Statement1
1 page
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Caterpillar EMCP4 Installation Guide: Data / Power Cable
No ratings yet
Caterpillar EMCP4 Installation Guide: Data / Power Cable
3 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
No ratings yet
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
56 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
Project Report - Advanced - Stats - Final PDF
No ratings yet
Project Report - Advanced - Stats - Final PDF
25 pages
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
100% (1)
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
11 pages
Arnab Chowdhury As1
No ratings yet
Arnab Chowdhury As1
12 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Harshini Week 8 Doc PDF
No ratings yet
Harshini Week 8 Doc PDF
10 pages
Milestone 1
No ratings yet
Milestone 1
2 pages

Uploaded by

Uploaded by

BUSINESS REPORT

On Data Mining (Clustering, CART-RF-ANN)

Initial Observation (EDA):

• Minimum spending is 10.59

• Minimum probability_of_full_payment is 0.8081

• Minimum credit_limit: 2.63

• Minimum min_payment_amt: 0.7651

From SC Score, the number of optimal clusters could be 3 or 4.

Cluster Group Profiles:

Promotional strategies for each cluster:

Group 3: Medium Spending Group

Group 2: Low Spending Group

Initial Observation (EDA):

Decision Tree Classifier

DecisionTreeClassifier(max_depth=10, min_samples_leaf=50, min_samples_split=450,

• {'criterion': 'gini', 'max_depth': 5, 'min_samples_leaf': 20, 'min_samples_split': 150}

DecisionTreeClassifier(max_depth=5, min_samples_leaf=20, min_samples_split=150,

• {'criterion': 'gini', 'max_depth': 3.5, 'min_samples_leaf': 44, 'min_samples_split': 250}

DecisionTreeClassifier(max_depth=3.5, min_samples_leaf=44, min_samples_split=250,

• {'criterion': 'gini', 'max_depth': 4.85, 'min_samples_leaf': 44, 'min_samples_split': 260}

DecisionTreeClassifier(max_depth=4.85, min_samples_leaf=44, min_samples_split=260,

Variable Importance-Decision Tree Classifier

Random Forest Classifier

RandomForestClassifier(max_depth=6, max_features=3, min_samples_leaf=8,

Predicted Classes and Probs

Neural Network Classifier

Predicted Classes and Probs

Test Data (ROC curve)

array ([[1309, 144],

[ 307, 340]], dtype=int64)

Confusion Matrix (Test data)

array ([[553, 70],

[136, 141]], dtype=int64)

Random Forest Model:

Train Data (ROC curve)

Confusion Matrix (Train data)

array ([[1297, 156],

[ 255, 392]], dtype=int64

Confusion Matrix (Test data)

array ([[550, 73],

[121, 156]], dtype=int64)

Test Data (ROC curve)

[ 315, 332]], dtype=int64)

Confusion Matrix (Test data)

[138, 139]], dtype=int64)

Comparison of the performance metrics from the 3 models.

ROC Curve for the 3 models on the Test data.

• Streamlining online experiences benefitted customers, leading to an increase in conversions, which

Key performance indicators (KPI) The KPI’s of insurance claims are:

You might also like