Autobin: A predictive approach towards automatic binning using data splitting
Abstract
The concept of binning is known by many names: discretisation, classing, grouping and quantisation. It entails the mapping of continuous or categorical data into discrete bins. Binning is an important pre-processing step in most predictive models and considered a basic data preparation step in building a credit scorecard. Credit scorecards are mathematical models which attempt to provide a quantitative estimate of the probability that a customer will display a defined behaviour (e.g. default) with respect to their current credit position with a lender. Among the practical advantages of binning are the removal of the effects of outliers and a way to handle missing values. Many binning methods exist but they are often time consuming to actually carry out. We propose a new method, Autobin, that is based on data splitting and maximising a cross-validation form of the predicted log-likelihood. Autobin has the advantage of being nearly automatic and requires very little by way of tuning parameters. In a limited simulation study done, it was found that Autobin outperforms its competitors.