Friday February 9
Research Center for Statistics, Geneva School of Economics and Management, University of Geneva
A prediction divergence criterion for model selection and classification in high dimensional settings
A new class of model selection criteria is proposed which is suited for stepwise approaches or can be used as selection criteria in penalized estimation based methods. This new class, called the d-class of error measure, generalizes Efron's q-class. This class not only contains classical criteria such as Mallow's Cp or the AIC, but also enables one to define new criteria that are more general. Within this new class, we propose a model selection criterion based on a prediction divergence between two nested models' predictions that we call the Prediction Divergence Criterion (PDC). The PDC provides a different measure of prediction error than a criterion associated to each potential model within a sequence and for which the selection decision is based on the sign of differences between the criteria. The PDC directly measures the prediction error divergence between two nested models. As examples, we consider the linear regression models and (supervised) classification. We show that a selection procedure based on the PDC, compared to the Cp (in the linear case), has a smaller probability of overfitting hence leading to parsimonious models for the same out-of-sample prediction error. The PDC is particularly well suited in high dimensional and sparse situations and also under (small) model misspecifications. Examples on a malnutrition study and on acute leukemia classification will be presented.