Asymptotics of K-Fold Cross Validation
Main Article Content
Abstract
This paper investigates the asymptotic distribution of the K-fold cross validation error in an i.i.d. setting. As the number of observations n goes to infinity while keeping the number of folds K fixed, the K-fold cross validation error is √ n-consistent for the expected out-of-sample error and has an asymptotically normal distribution. A consistent estimate of the asymptotic variance is derived and used to construct asymptotically valid confidence intervals for the expected out-of-sample error. A hypothesis test is developed for comparing two estimators’ expected out-of-sample errors and a subsampling procedure is used to obtain critical values. Monte Carlo simulations demonstrate the asymptotic validity of our confidence intervals for the expected out-of-sample error and investigate the size and power properties of our test. In our empirical application, we use our estimator selection test to compare the out-of-sample predictive performance of OLS, Neural Networks, and Random Forests for predicting the sale price of a domain name in a GoDaddy expiry auction.