Neural Comp. NEW Faster Access
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Fujita, H.
Right arrow Articles by Ishii, S.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Fujita, H.
Right arrow Articles by Ishii, S.
(Neural Computation. 2007;19:3051-3087.)
© 2007 The MIT Press


Letter

Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

Hajime Fujita

hajime-f{at}is.naist.jp Nara Institute of Science and Technology, Graduate School of Information Science, Ikoma, Nara 630-0192, Japan

Shin Ishii

ishii{at}is.naist.jp Nara Institute of Science and Technology, Graduate School of Information Science, Ikoma, Nara 630-0192, Japan

Games constitute a challenging domain of reinforcement learning (RL) for acquiring strategies because many of them include multiple players and many unobservable variables in a large state space. The difficulty of solving such realistic multiagent problems with partial observability arises mainly from the fact that the computational cost for the estimation and prediction in the whole state space, including unobservable variables, is too heavy. To overcome this intractability and enable an agent to learn in an unknown environment, an effective approximation method is required with explicit learning of the environmental model. We present a model-based RL scheme for large-scale multiagent problems with partial observability and apply it to a card game, hearts. This game is a well-defined example of an imperfect information game and can be approximately formulated as a partially observable Markov decision process (POMDP) for a single learning agent. To reduce the computational cost, we use a sampling technique in which the heavy integration required for the estimation and prediction can be approximated by a plausible number of samples. Computer simulation results show that our method is effective in solving such a difficult, partially observable multiagent problem.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 2007 by The MIT Press.