Variable selection in logistic regression models through the application of exact mathematical programming

  • J. V. Venter Centre for Business Mathematics & Informatics, North-West University, Potchefstroom, South Africa
  • S. E. Terblanche School of Industrial Engineering, North-West University, Potchefstroom, South Africa
Keywords: Best subset selection, Linearisation, Logistic regression, Mixed integer linear programming

Abstract

A linearised approximation of the log-likelihood objective function is presented as a potential alternative to iterative fitting methods employed by logistic regression. The loglikelihood objective function is solved using linear programming and a modified version of the linearised logistic regression model is presented, which facilitates best subset variable selection. The resulting model is a mixed integer linear programming problem which incorporates a cardinality constraint on the number of variables. The suggested approach maintains many attractive properties, such as its ability to quantify the quality of the resulting variable selection solution, its independence of the subjective choice of p-values inherent to typical stepwise variable selection approaches, and its capability to edge closer to optimality within increasingly reduced computing times when the correct settings are applied, even for large input datasets. Computational results are presented to demonstrate the advantages of employing an exact mathematical programming approach towards variable selection in logistic regression applications.

Published
2020-03-31
Section
Research Articles