|
|
||||||||
Review |
worgott{at}cn.stir.ac.uk, Department of Psychology, University of Stirling, Stirling FK9 4LA, Scotland
B.Porr{at}elec.gla.ac.uk, Department of Psychology, University of Stirling, Stirling FK9 4LA, Scotland
In this review, we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spike-timing-dependent plasticity (STDP). This review introduces the most influential models and focuses on two questions: To what degree are reward-based (e.g., TD learning) and correlation-based (Hebbian) learning related? and How do the different models correspond to possibly underlying biological mechanisms of synaptic plasticity? We first compare the different models in an open-loop condition, where behavioral feedback does not alter the learning. Here we observe that reward-based and correlation-based learning are indeed very similar. Machine control is then used to introduce the problem of closed-loop control (e.g., actor-critic architectures). Here the problem of evaluative (rewards) versus nonevaluative (correlations) feedback from the environment will be discussed, showing that both learning approaches are fundamentally different in the closed-loop condition. In trying to answer the second question, we compare neuronal versions of the different learning architectures to the anatomy of the involved brain structures (basal-ganglia, thalamus, and cortex) and the molecular biophysics of glutamatergic and dopaminergic synapses. Finally, we discuss the different algorithms used to model STDP and compare them to reward-based learning rules. Certain similarities are found in spite of the strongly different timescales. Here we focus on the biophysics of the different calcium-release mechanisms known to be involved in STDP.
This article has been cited by other articles:
![]() |
A. Gruning Elman Backpropagation as Reinforcement for Simple Recurrent Networks Neural Comput., November 1, 2007; 19(11): 3108 - 3131. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Porr and F. Worgotter Learning with "Relevance": Using a Third Factor to Stabilize Hebbian Learning Neural Comput., October 1, 2007; 19(10): 2694 - 2719. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. X Cohen Individual differences and the neural representations of reward expectation and reward prediction error Soc Cogn Affect Neurosci, March 1, 2007; 2(1): 20 - 30. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Porr and F. Worgotter Strongly improved stability and faster convergence of temporal sequence learning by using input correlations only. Neural Comput., June 1, 2006; 18(6): 1380 - 1412. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| J COGNITIVE NEUROSCIENCE | NEURAL COMPUTATION | MIT PRESS JOURNALS |