, Volume 37, Issue 4, pp 263–293 | Cite as

Reinforcement Learning with Immediate Rewards and Linear Hypotheses



We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.

Computational learning theory Reinforcement learning Immediate rewards Online learning Online algorithms Decision theory Dialogue systems 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag 2003

Authors and Affiliations

  1. 1.IBM T. J. Watson Research Center, Yorktown Heights, NY 10598USA
  2. 2.Department of Computer Science, Duke University, P.O. Box 90129, Durham, NC 27708USA
  3. 3.Genome Institute of Singapore, 1 Science Park Road, #05-01, Singapore 117528Republic of Singapore

Personalised recommendations