Neural Comp. Sign up for ETOCS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hochreiter, S.
Right arrow Articles by Schmidhuber, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hochreiter, S.
Right arrow Articles by Schmidhuber, J.

Neural Computation, Vol 9, 1-42, Copyright © 1997 by The MIT Press


ARTICLES

Flat Minima

Sepp Hochreiter and Jurgen Schmidhuber

We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a 'flat' minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to 'simple' networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a 'good' weight prior. Instead we have a prior over input-output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and 'optimal brain surgeon/optimal brain damage.'


This article has been cited by other articles:


Home page
Neural Comput.Home page
Z. Chen and S. Haykin
On Different Facets of Regularization Theory
Neural Comput., December 1, 2002; 14(12): 2791 - 2846.
[Abstract] [Full Text]


Home page
Neural Comput.Home page
J. L. Bernier, J. Ortega, E. Ros, I. Rojas, and A. Prieto
A Quantitative Study of Fault Tolerance, Noise Immunity, and Generalization Ability of MLPs
Neural Comput., December 1, 2000; 12(12): 2941 - 2964.
[Abstract] [Full Text]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 1997 by The MIT Press.