Abstract
This chapter deals with the application of the “absolutely expedient” learning algorithms (developed in chapter 3) for the problem of control of a finite state Markov chain whose transition probabilities as a function of a finite number of control actions are unknown. At any instant of time depending on the state of the Markov chain and the control action chosen a reward is incurred. It is assumed that this reward is a two valued (binary) random variable whose distribution as a function of the state and the control action is unknown, but the sequence of states actually visited by the Markov chain is available. In other words we consider a Markov chain whose dynamics and reward structure are unknown but the state is observable exactly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1981 Springer-Verlag New York Inc.
About this chapter
Cite this chapter
Lakshmivarahan, S. (1981). Control of a Markov Chain with Unknown Dynamics and Cost Structure. In: Learning Algorithms Theory and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-5975-6_8
Download citation
DOI: https://doi.org/10.1007/978-1-4612-5975-6_8
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-90640-9
Online ISBN: 978-1-4612-5975-6
eBook Packages: Springer Book Archive