Neural Comp. NEW Faster Access
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sakai, Y.
Right arrow Articles by Fukai, T.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Sakai, Y.
Right arrow Articles by Fukai, T.
(Neural Computation. 2007;20:227-251.)
© 2007 The MIT Press


Letter

The Actor-Critic Learning Is Behind the Matching Law: Matching Versus Optimal Behaviors

Yutaka Sakai

sakai{at}eng.tamagawa.ac.jp Department of Intelligent Information Systems, Tamagawa University, Machida, Tokyo 194-8610, Japan

Tomoki Fukai

tfukai{at}brain.riken.jp Laboratory for Neural Circuit Theory, Brain Science Institute, RIKEN, Wako, Saitama 351-0198, Japan

The ability to make a correct choice of behavior from various options is crucial for animals' survival. The neural basis for the choice of behavior has been attracting growing attention in research on biological and artificial neural systems. Alternative choice tasks with variable ratio (VR) and variable interval (VI) schedules of reinforcement have often been employed in studying decision making by animals and humans. In the VR schedule task, alternative choices are reinforced with different probabilities, and subjects learn to select the behavioral response rewarded more frequently. In the VI schedule task, alternative choices are reinforced at different average intervals independent of the choice frequencies, and the choice behavior follows the so-called matching law. The two policies appear robustly in subjects' choice of behavior, but the underlying neural mechanisms remain unknown. Here, we show that these seemingly different policies can appear from a common computational algorithm known as actor-critic learning. We present experimentally testable variations of the VI schedule in which the matching behavior gives only a suboptimal solution to decision making and show that the actor-critic system exhibits the matching behavior in the steady state of the learning even when the matching behavior is suboptimal. However, it is found that the matching behavior can earn approximately the same reward as the optimal one in many practical situations.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 2007 by The MIT Press.