AUTO INSURANCE CHURN
Churn modeling on 92,849 policyholders — SMOTE + Gradient Boosting at ROC-AUC 0.70.
BRIEFING
Five classifiers on 92,849 policyholder records to flag retention risk early. The hard part wasn't the model — it was the severe class imbalance (only 11.5% churn). A SMOTE resampling pipeline rebalanced the training data so precision and recall held up instead of collapsing onto the majority class, and a data-leakage catch — dropping a column only populated after a customer churns — kept the score honest. Gradient Boosting came out on top at ROC-AUC 0.70, threshold-tuned to favor recall so more at-risk policyholders get flagged.
ROLE / METHOD / OUTCOME
STACK: Python · scikit-learn · Gradient Boosting · SMOTE