Autobin: A predictive approach towards automatic binning using data splitting

  • Tanja Verster Centre for Business Mathematics and Informatics, North-West University, Potchefstroom, South Africa
Keywords: Binning, Credit scoring, Data splitting, Predictive models

Abstract

The concept of binning is known by many names: discretisation, classing, grouping and quantisation. It entails the mapping of continuous or categorical data into discrete bins. Binning is an important pre-processing step in most predictive models and considered a basic data preparation step in building a credit scorecard. Credit scorecards are mathematical models which attempt to provide a quantitative estimate of the probability that a customer will display a defined behaviour (e.g. default) with respect to their current credit position with a lender. Among the practical advantages of binning are the removal of the effects of outliers and a way to handle missing values. Many binning methods exist but they are often time consuming to actually carry out. We propose a new method, Autobin, that is based on data splitting and maximising a cross-validation form of the predicted log-likelihood. Autobin has the advantage of being nearly automatic and requires very little by way of tuning parameters. In a limited simulation study done, it was found that Autobin outperforms its competitors.

Downloads

Download data is not yet available.
Published
2018-09-30
Section
Research Articles