Neural Comp. Sign up for ETOCS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bialek, W.
Right arrow Articles by Tishby, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bialek, W.
Right arrow Articles by Tishby, N.
(Neural Computation. 2001;13:2409-2463.)
© 2001 The MIT Press

Predictability, Complexity, and Learning

William Bialek

NEC Research Institute, Princeton, NJ 08540, U.S.A.

Ilya Nemenman

NEC Research Institute, Princeton, New Jersey 08540, U.S.A., and Department of Physics, Princeton University, Princeton, NJ 08544, U.S.A.

Naftali Tishby

NEC Research Institute, Princeton, NJ 08540, U.S.A., and School of Computer Science and Engineering and Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel

We define predictive information Ipred (T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred (T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then Ipred (T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of Ipred (T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.




This article has been cited by other articles:


Home page
Neural Comput.Home page
F. Creutzig and H. Sprekeler
Predictive Coding and the Slowness Principle: An Information-Theoretic Approach
Neural Comput., April 1, 2007; 20(4): 1026 - 1041.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
M. B. Kennel, J. Shlens, H. D. I. Abarbanel, and E. J. Chichilnisky
Estimating Entropy Rates with Bayesian Confidence Intervals
Neural Comput., July 1, 2005; 17(7): 1531 - 1576.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
E. E. Thomson and W. B. Kristan
Quantifying Stimulus Discriminability: A Comparison of Information Theory and Ideal Observer Analysis
Neural Comput., April 1, 2005; 17(4): 741 - 778.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 2001 by The MIT Press.