Neural Comp. Sign up for ETOCS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Jedynak, B. M.
Right arrow Articles by Khudanpur, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jedynak, B. M.
Right arrow Articles by Khudanpur, S.
(Neural Computation. 2005;17:1508-1530.)
© 2005 The MIT Press


Letter

Maximum Likelihood Set for Estimating a Probability Mass Function

Bruno M. Jedynak

bruno.jedynak{at}jhu.edu, Département de Mathématiques, Université des Sciences et Technologies de Lille, France, and Department of Applied Mathematics, and Center for Imaging Science, Johns Hopkins University, Baltimore, MD 21218, U.S.A.

Sanjeev Khudanpur

khudanpur{at}jhu.edu, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, U.S.A.

We propose a new method for estimating the probability mass function (pmf) of a discrete and finite random variable from a small sample. We focus on the observed counts—the number of times each value appears in the sample—and define the maximum likelihood set (MLS) as the set of pmfs that put more mass on the observed counts than on any other set of counts possible for the same sample size. We characterize the MLS in detail in this article. We show that the MLS is a diamond-shaped subset of the probability simplex [0,1]k bounded by at most kx(k–1) hyperplanes, where k is the number of possible values of the random variable. The MLS always contains the empirical distribution, as well as a family of Bayesian estimators based on a Dirichlet prior, particularly the well-known Laplace estimator. We propose to select from the MLS the pmf that is closest to a fixed pmf that encodes prior knowledge. When using Kullback-Leibler distance for this selection, the optimization problem comprises finding the minimum of a convex function over a domain defined by linear inequalities, for which standard numerical procedures are available. We apply this estimate to language modeling using Zipf's law to encode prior knowledge and show that this method permits obtaining state-of-the-art results while being conceptually simpler than most competing methods.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 2005 by The MIT Press.