Neural Comp. Sign up for ETOCS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow An erratum has been published
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Morimoto, J.
Right arrow Articles by Doya, K.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Morimoto, J.
Right arrow Articles by Doya, K.
(Neural Computation. 2007;19:730-756.)
© 2007 The MIT Press


Letter

Reinforcement Learning State Estimator

Jun Morimoto

xmorimo{at}atr.jp JST, ICORP, Computational Brain Project, 4-1-8 Honcho, Kawaguchi, Saitama, 332-0012, Japan

Kenji Doya

doya{at}irp.oist.jp Initial Research Project, Okinawa Institute of Science and Technology, Uruma, Okinawa 904-2234, Japan

In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning framework to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 2007 by The MIT Press.