Prediction error estimation of the survey-weighted least squares model under complex sampling

  • Retha Luus Department of Statistics and Population Studies, University of the Western Cape, Cape Town, South Africa
  • Ariane Neethling Department of Statistics and Actuarial Science, University of the Free State, Bloemfontein, South Africa
  • Tertius de Wet Department of Statistics and Actuarial Science, Stellenbosch University, Stellenbosch, South Africa
Keywords: Bootstrap, Calibration and integrated weighting, Cross-validation, Prediction error, Survey-weighted least squares, Trimming

Abstract

Linear modelling with the objective to predict a future response is ubiquitous in statistical analysis. Methods such as cross-validation and the bootstrap are well known for estimating the predictive performance of a model fitted to i.i.d. data. However, many large-scale surveys make use of a complex sampling design where the data are no longer i.i.d. and sampling weights are assigned to each observation to account for this. This paper shows how the cross-validation and bootstrap methods need to be adapted to evaluate the predictive performance of the survey-weighted least squares model. The investigation of the performance of the different prediction error estimation methods is evaluated through a simulation study. The Income and Expenditure Survey 2005/2006 of Statistics South Africa will form the basis of the analysis. The simulation study will also investigate whether the model’s predictive performance is improved through the truncation of outlier sampling weights. For this purpose, two new thresholds, viz. the 1.5IQR and Hill, are introduced. It was found that the bootstrap estimator of prediction error achieved lower mean squared error while the K-fold cross-validation estimator achieved lower bias. Further improvement was observed using the 1.5IQR and Hill truncated sampling weights.

Downloads

Download data is not yet available.
Published
2020-09-30
Section
Research Articles