Author: Olga Seymour
Date: May 2025
GitHub: https://github.com/AI-Data-Space/BankingRetentionOptimizer
Customer segmentation analysis showing distribution across digital transformation readiness and churn risk dimensions
Note: Visualizations and charts are generated when running the notebooks and will appear in the
reports/figures/
directory.
Atlantic Regional Bank (fictional case study) is undergoing a major digital transformation, shifting resources from physical branches to digital channels. The bank must simultaneously reduce branch operating costs while preventing customer attrition during this transition. With a limited retention budget of $1.5 million, the bank needs to identify which customers are both at high risk of leaving and most likely to respond positively to retention efforts. Traditional retention approaches that target all high-risk customers are not financially viable during this transformation period, requiring a more sophisticated, ROI-focused approach.
This project uses the Churn Modelling Dataset originally from Kaggle, which contains customer information from a European bank including demographics, account details, and churn indicators.
Original Source: Churn Modelling Dataset on Kaggle
Note: The dataset is included in this repository (data/Churn_Modelling.csv
) for convenience and reproducibility.
The dataset includes 10,000 customer records with features such as:
- Customer demographics (age, geography, gender)
- Account information (balance, tenure, number of products)
- Engagement metrics (active membership status, credit score)
- Target variable (customer exit/churn indicator)
While the original dataset provides standard banking customer attributes, this project transforms it into a digital transformation context through custom feature engineering and business application development.
While customer churn prediction is a well-established field, this project introduces several novel approaches specifically designed for banking digital transformation scenarios:
Rather than relying on standard demographic and transactional features, we developed banking-specific indicators that directly address digital transformation challenges:
Digital_Readiness
: Quantifies customer adaptation potential to digital channels using age-based technology comfort combined with current engagement patternsRetention_Score
: Weighted loyalty index combining relationship depth, product diversification, and active usage patternsHigh_Risk_Age
: Identifies the 40-60 demographic bracket showing pronounced digital hesitancy during transformation
Innovation: Combines departure risk with digital adoption capability in a matrix approach, moving beyond traditional risk-only segmentation.
Business Impact: Enables targeted retention strategies that address both customer departure prevention and successful digital transformation—solving two business challenges simultaneously.
Applied Innovation: Transforms probability thresholds into a "financial dial" that directly controls retention budget allocation. Strategic Insight: Analysis reveals that blanket retention approaches generate negative ROI (0.87x overall), but selective targeting of profitable segments can achieve 1.25x-1.45x returns, fundamentally changing retention strategy from broad-based to precision-focused.
Business Impact: Provides marketing teams with precise control over spending levels while maintaining optimal precision-recall balance for different budget scenarios.
Business Innovation: Segment-specific intervention costs reflecting the varying complexity of digital adoption support, from automated digital-ready customers to high-touch branch-dependent transitions.
Business Impact: Realistic ROI calculations that account for the operational complexity of helping different customer types successfully navigate digital transformation.
Traditional Churn Prediction | Our Transformation-Focused Approach |
---|---|
Generic risk segmentation | Risk + digital readiness matrix |
Fixed 0.5 probability threshold | Dynamic threshold for budget optimization |
Standard demographic features | Custom digital transformation indicators |
Simple cost-per-customer ROI | Complex intervention economics by segment |
One-size-fits-all retention | Transformation-specific strategies |
This approach directly addresses the strategic challenge facing banks today: how to retain customers while transforming operations. The solution framework is immediately applicable to any financial institution undergoing digital transformation with constrained retention budgets.
- Machine Learning Model: Random Forest with 59.4% precision, 68.1% recall optimization
- Advanced Feature Engineering: Custom ML features for digital transformation context
- Predictive Analytics: 68.1% of at-risk customers identified through ML-driven segmentation
- Strategic Segmentation: $219,702 net benefit with 0.87x overall ROI, identifying profitable segments (1.45x ROI) vs unprofitable segments (-0.29x ROI)
- Implementation Ready: ML pipeline with A/B testing framework
This project delivers a complete customer retention solution that enables banks to:
- Predict which customers are likely to leave with 59.4% precision
- Identify different customer segments requiring tailored retention approaches
- Allocate limited retention budgets strategically (0.87x overall ROI with profitable segments reaching 1.45x ROI)
- Implement a phased transition strategy supporting the shift to digital channels
- Continuously refine retention strategies through A/B testing
-
Banking Executives: Gain a strategic framework for managing digital transformation with reduced customer attrition and optimized spending.
-
Marketing Teams: Receive actionable customer segmentation with targeted retention strategies rather than one-size-fits-all approaches.
-
Digital Transformation Leaders: Obtain a data-driven roadmap for transitioning customers from branches to digital channels with minimal attrition.
-
Customer Experience Teams: Receive insights to develop segment-specific experiences that address the unique needs of different customer groups.
-
Finance Departments: Benefit from quantifiable ROI projections and budget optimization for retention initiatives.
- Predictive model identifying customers at risk of leaving (59.4% precision, 68.1% recall)
- Customer segmentation with tailored retention strategies for digital-ready vs. branch-dependent customers
- ROI analysis revealing 0.87x overall return with profitable segments (1.45x) and unprofitable segments (-0.29x)
- Phased implementation plan for retention programs
The implementation of this solution provides:
- Financial Impact: $219,702 net benefit with strategic insights on profitable vs unprofitable customer segments
- Operational Efficiency: 59.4% targeting precision means reduced wasted resources
- Customer Retention: 68.1% of at-risk customers identified for proactive intervention
- Digital Adoption: Strategies for accelerating digital engagement across all customer segments
- Risk Mitigation: Early identification of high-value customers at risk during transformation
- Strategic Intelligence: Identification of profitable segments (1.45x ROI) versus unprofitable segments (-0.29x ROI)
This project followed a systematic approach:
- Data Analysis & Feature Engineering: Understanding churn patterns and creating banking-specific predictive features
- Model Development & Selection: Testing and optimizing machine learning models with focus on business metrics
- Business Application: Developing targeted retention strategies with ROI assessment
- Implementation Framework: Creating a practical roadmap with A/B testing approach
banking-retention-optimizer/
├── data/
│ └── Churn_Modelling.csv
├── notebooks/
│ ├── 1_data-analysis.ipynb # Data exploration & visualization
│ ├── 2_model-development.ipynb # Model selection and optimization
│ └── 3_business-application.ipynb # Segmentation, strategies & ROI
├── reports/
│ ├── figures/ # Visualizations
│ └── model_comparison.csv # Model performance metrics
├── .gitignore
└── requirements.txt # Dependencies
└── README.md # Project documentation
This notebook explores the customer dataset to understand key factors influencing churn during digital transformation:
- Identified significant age difference between churning (older) and retained customers
- Discovered inactive members are 3x more likely to leave
- Found geography plays a meaningful role in retention
- Revealed that customers with multiple products have lower churn rates
- Analyzed balance patterns showing zero-balance accounts have higher churn risk
This notebook focuses on developing predictive models for customer churn:
- Created custom features for digital transformation context (Digital_Readiness, Retention_Score)
- Implemented feature engineering pipeline with appropriate scaling and encoding
- Evaluated multiple algorithms (Random Forest, AdaBoost, XGBoost)
- Conducted hyperparameter tuning with F1 score optimization
- Performed detailed threshold analysis to balance precision and recall for budget constraints
- Selected Random Forest (Optimized) as best model with F1 score of 0.635
This notebook applies the model to create actionable business strategies:
- Developed customer segmentation combining churn risk and digital readiness
- Created tailored retention strategies for each segment
- Calculated ROI for retention investments by segment
- Designed a three-phase implementation roadmap
- Established an A/B testing framework for ongoing optimization
The project evaluated multiple algorithms, with the following performance on the test set:
Algorithm | F1 Score | Precision | Recall | AUC |
---|---|---|---|---|
Random Forest (Optimized) | 0.635 | 0.594 | 0.681 | 0.861 |
XGBoost (Optimized) | 0.618 | 0.556 | 0.695 | 0.855 |
XGBoost | 0.573 | 0.687 | 0.491 | 0.839 |
AdaBoost | 0.571 | 0.692 | 0.487 | 0.849 |
AdaBoost (Optimized) | 0.568 | 0.764 | 0.452 | 0.848 |
Random Forest | 0.554 | 0.763 | 0.435 | 0.846 |
Random Forest (Optimized) provided the best balance of precision and recall, which is critical for our limited budget scenario. The model achieves 59.4% precision (efficiency of retention spending) while capturing 68.1% of customers who would churn (reach of retention efforts).
The choice of machine learning algorithm directly determines retention budget efficiency and revenue protection:
Why Random Forest (Optimized) Solves Our Business Problem:
- AdaBoost Alternative: 76.4% precision would minimize wasted spending but 45.2% recall means missing 55% of departing customers
- Business Impact: Save budget but lose significant customer value
- High Recall Alternative: Would catch more departing customers but exhaust budget on false positives
- Business Impact: Protect more customers but waste retention resources
- Our Random Forest Solution: 59.4% precision + 68.1% recall = optimal ROI
- Business Impact: Identify 68% of at-risk customers while maintaining budget efficiency
Result: Our systematic algorithm comparison identified the approach that maximizes net business value (revenue protected minus budget spent) rather than optimizing for statistical accuracy alone.
- Cross-validation: 5-fold CV with stratified sampling to ensure robust performance estimates
- Business Metric Optimization: Model selection based on F1 score to balance precision-recall for budget constraints
- Threshold Analysis: Systematic evaluation of decision boundaries for optimal business outcomes
- Age is a critical factor: Customers 40-60 years old show significantly higher churn probability
- Digital readiness varies: Customers with low digital readiness need specialized transition support
- Value-based segmentation is essential: High-value customers require different retention approaches
- Product holding matters: Multiple products significantly reduce churn risk
- Targeted strategies outperform: ROI varies significantly across segments
Segment | Characteristics | Strategy | Actual ROI Performance |
---|---|---|---|
High-Value Digital-Ready At-Risk | Balance >$100k, High churn probability, High digital readiness | Premium Digital VIP Program | 1.36x ROI Profitable |
Digital-Ready Watch | Medium churn probability, High digital readiness | Digital Engagement Program | 1.45x ROI Most Profitable |
High-Value Branch-Dependent At-Risk | Balance >$100k, High churn probability, Low digital readiness | Executive Transition Program | 0.96x ROI Nearly Profitable |
Digital-Ready At-Risk | High churn probability, High digital readiness | Digital-Focused Retention Program | 0.41x ROI Low Return |
Branch-Dependent At-Risk | High churn probability, Low digital readiness | Guided Transition Program | -0.29x ROI Unprofitable |
Branch-Dependent Watch | Medium churn probability, Low digital readiness | Relationship Review Program | -0.29x ROI Unprofitable |
Stable Customers | Low churn probability | Value Growth Program | -0.20x ROI Unprofitable |
The retention program analysis reveals strategic insights:
- Total Program Cost: $251,950
- Value Protected: $471,652
- Net Benefit: $219,702
- Return on Investment: 0.87x overall (1.45x for best-performing segments)
Key Strategic Value: Rather than generating uniform high returns, this analysis identifies which customer segments are profitable to target (Digital-Ready customers: 1.25x-1.45x ROI) versus which segments consistently lose money (Branch-Dependent customers: -0.29x ROI). This selective targeting approach prevents costly blanket retention spending while maximizing returns on profitable segments.
The retention strategy is designed for phased implementation:
-
Phase 1: High-Risk Intervention (Weeks 1-2)
- Immediate focus on high-value at-risk customers
- Personal outreach and premium offers
-
Phase 2: Proactive Engagement (Weeks 3-8)
- Medium-risk customer engagement
- Relationship reviews and cross-selling
-
Phase 3: Value Growth (Month 3+)
- Long-term loyalty development
- Referral programs and satisfaction monitoring
To continuously optimize retention efforts, the project includes an A/B testing framework:
Segment | Test Variants | Primary Metric |
---|---|---|
Digital-Ready At-Risk | Standard email vs. Personalized tutorial vs. Incentive program | Digital engagement rate |
Branch-Dependent At-Risk | Branch notification vs. Guided transition vs. Hybrid model | Digital adoption rate |
- Advanced Machine Learning: Custom Random Forest optimization with hyperparameter tuning across multiple algorithms (Random Forest, XGBoost, AdaBoost)
- ML Feature Engineering: Created domain-specific features (
Digital_Readiness
,Retention_Score
) using advanced feature construction techniques - Ensemble Model Comparison: Systematic evaluation of multiple ML algorithms with cross-validation and business metric optimization
- ML Pipeline Development: End-to-end scikit-learn pipeline with custom transformers for preprocessing and feature engineering
- Predictive Model Deployment: Threshold optimization and model serialization for business application
- Python: Primary programming language
- Pandas/NumPy: Data manipulation and analysis
- Scikit-learn: Machine learning models and evaluation
- XGBoost: Gradient boosting implementation
- Matplotlib/Seaborn: Data visualization
- Jupyter Notebooks: Development environment
- Joblib: Model serialization and deployment
- Python 3.8+
- Jupyter Notebook
git clone https://github.com/yourusername/banking-retention-optimizer
cd banking-retention-optimizer
pip install -r requirements.txt
- Start with
notebooks/1_data-analysis.ipynb
for data exploration - Continue with
notebooks/2_model_development.ipynb
for model training - Finish with
notebooks/3_business_application.ipynb
for business strategy implementation
- Training Data Limitation: Model trained on pre-transformation data; customer behavior may evolve as digital adoption increases
- Static Feature Engineering:
Digital_Readiness
metric may need recalibration as digital services expand - External Factor Exclusion: Economic conditions, competitor actions, and market changes not captured
- Performance Tracking: Monitor precision/recall monthly during transformation
- Feature Drift Detection: Track changes in
Digital_Readiness
patterns as transformation progresses - Dynamic Threshold Adjustment: Adapt probability thresholds based on budget availability and observed effectiveness
- Real-time digital engagement monitoring integration
- Customer lifetime value (CLV) incorporation for enhanced ROI calculations
- Conversational AI for personalized branch-to-digital transition support
- Economic indicator integration for external factor consideration
- Advanced A/B testing framework with multi-armed bandit optimization