|
|
||||||||
Neural Computation, Vol 9, 1143-1161, Copyright © 1997 by The MIT Press
LETTERS |
Michael Kearns
We give a theoretical and experimental analysis of the generalization error
of cross validation using two natural measures of the problem under
consideration. The approximation rate measures the
accuracy to which the target function can be ideally approximated as a
function of the number of parameters, and thus captures the complexity of
the target function with respect to the hypothesis model. The
estimation rate measures the deviation between the
training and generalization errors as a function of the number of
parameters, and thus captures the extent to which the hypothesis model
suffers from overfitting. Using these two measures, we give a rigorous and
general bound on the error of the simplest form of cross validation. The
bound clearly shows the dangers of making
-- the fraction of
data saved for testing -- too large or too small. By optimizing the bound
with respect to
, we then argue that the following qualitative
properties of cross-validation behavior should be quite robust to
significant changes in the underlying model selection problem:
.
optimally increases, and the optimal value for
decreases, as the target function becomes more complex relative
to the sample size.
that works nearly optimally for a wide range of
target function complexity.
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| J COGNITIVE NEUROSCIENCE | NEURAL COMPUTATION | MIT PRESS JOURNALS |