Mathematical Methods of Operations Research

, Volume 54, Issue 3, pp 387–393 | Cite as

Dynamic productivity improvement in a model with multiple processes

  • Michael Brock
  • Jørgen Tind


We study the situation where there are a number of on-going production processes each yielding a state-dependent standard reward in discrete time. At each time step one may select at most one of these processes for improvement; the selected process will yield a state-dependent non-standard reward (or cost) at that time step and change its state according to a Markov chain. We show that this model can be cast into a bandit formulation with constructed rewards and we characterize the optimal policy. Finally, we present a numerical example.

Key words: Dynamic programming production bandit models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Michael Brock
    • 1
  • Jørgen Tind
    • 2
  1. 1.Maersk Data AS, Lyngbyvej 2, DK-2100 Copenhagen, Denmark (e-mail:
  2. 2.Department of Statistics and Operations Research, University of Copenhagen, Universitetsparken 5, DK-2100 Copenhagen, Denmark (e-mail:

Personalised recommendations