Discrete Event Dynamic Systems

, Volume 14, Issue 3, pp 309–341

Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes

  • Hyeong Soo Chang
  • Robert Givan
  • Edwin K. P. Chong


We propose a novel approach, called parallel rollout, to solving (partially observable) Markov decision processes. Our approach generalizes the rollout algorithm of Bertsekas and Castanon (1999) by rolling out a set of multiple heuristic policies rather than a single policy. In particular, the parallel rollout approach aims at the class of problems where we have multiple heuristic policies available such that each policy performs near-optimal for a different set of system paths. Parallel rollout automatically combines the given multiple policies to create a new policy that adapts to the different system paths and improves the performance of each policy in the set. We formally prove this claim for two criteria: total expected reward and infinite horizon discounted reward. The parallel rollout approach also resolves the key issue of selecting which policy to roll out among multiple heuristic policies whose performances cannot be predicted in advance. We present two example problems to illustrate the effectiveness of the parallel rollout approach: a buffer management problem and a multiclass scheduling problem.

partially observable Markov decision process rollout simulation multiclass scheduling buffer management 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Anderson, A. T., Jensen, A., and Nielsen, B. F. 1995. Modelling and performance study of packet-traffic with self-similar characteristics over several timescales with Markovian arrival processes (MAP). Proc. 12th Nordic Teletraffic Seminar, 269–283.Google Scholar
  2. Anderson, A. T., and Nielsen, B. F. 1997. An application of superpositions of two state Markovian sources to the modelling of self-similar behaviour. Proc. IEEE INFOCOM, 196–204.Google Scholar
  3. Asmussen, S., Nerman, 0., and Olsson, M. 1996. Fitting phase-type distributions via the EM algorithm. Scand. J. Statist. 23:419–414.Google Scholar
  4. Bertsekas, D. P. 1995. Dynamic Programming and Optimal Control. Athena Scientific.Google Scholar
  5. Bertsekas, D. P. 1997. Differential training of rollout policies. Proc. 35th Allerton Conference on Communication, Control, and Computing, Allerton Park, IL.Google Scholar
  6. Bertsekas, D. P., and Castanon, D. A. 1999. Rollout algorithms for stochastic scheduling problems. J. of Heuristics 5:89–108.Google Scholar
  7. Bertsekas, D. P., and Tsitsikiis, J. 1996. Neuro-Dynamic Programming. Nashua, NH: Athena Scientific.Google Scholar
  8. Blondia, C. 1993. A discrete-time batch Markovian arrival process as B-ISDN traffic model. Belgian J. of Operations Research, Statistics and Computer Science 32.Google Scholar
  9. Bonald, T., May, M., and Bolot, M. 2000. Analytic evaluation of RED performance, Proc. IEEE INFOCOM, 1415–1424.Google Scholar
  10. Chang, H. S. 2001. On-line sampling-based control for network queueing problems, Ph.D. thesis. Department of Electrical and Computer Engineering, West Lafayette, IN: Purdue University.Google Scholar
  11. Chang, H. S., Givan, R., and Chong, E. K. P. 2000. On-line scheduling via sampling. Proc. 5th Int. Conf. on Artificial Intelligence Planning and Scheduling, 62–71.Google Scholar
  12. Chen, D. T., and Rieders, M. 1996. Cyclic Markov modulated Poisson processes in traffic characterization. Stochastic Models 12(4): 585–610.Google Scholar
  13. Duffield, N. G., and Whitt, W. 1998. A source traffic model and its transient analysis for network control. Stochastic Models 14:51–78.Google Scholar
  14. Firoiu, V., and Borden, M., 2000. A study of active queue management for congestion control. Proc. IEEE INFOCOM, 1435–1445.Google Scholar
  15. Fischer, W., and Meier-Hellstern, K. 1992. The Markov-modulated Poisson process (MMPP) cookbook. Performance Evaluation 18:149–171.Google Scholar
  16. Floyd, S., and Jacobson, V. 1993. Random early detection gateways for congestion avoidance. IEEEIACM Trans. Net. 1(4): 397–413.Google Scholar
  17. Givan, R., Chong, E. K. P., and Chang, H. S. 2002. Scheduling multiclass packet streams to minimize weighted loss. Queueing Systems 41(3): 241–270.Google Scholar
  18. Hashem, E. 1989. Analysis of random drop for gateway congestion control. Tech. Rep. LCS/TR-465, Massachusetts Institute of Technology.Google Scholar
  19. Ho, Y. C., and Cao, X. R. 1991. Perturbation Analysis of Discrete Event Dynamic Systems. Norwell, Massachusetts: Kluwer Academic Publishers.Google Scholar
  20. Keams, M., Mansour, Y., and Ng, A. Y. 1999. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning 49: 193–208.Google Scholar
  21. Keams, M., Mansour, Y., and Ng, A. Y. 2000. Approximate planning in large POMDPs via reusable trajectories. Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R. Miiller (eds), Cambridge, MA: MIT Press.Google Scholar
  22. Kitaeve, M. Y, and Rykov, V. V. 1995. Controlled Queueing Systems. CRC Press.Google Scholar
  23. Kulkami, V. G., and Tedijanto, T. E. 1998. Optimal admission control of Markov-modulated batch arrivals to a finite-capacity buffer. Stochastic Models 14(1): 95–122.Google Scholar
  24. Mayne, D. Q., and Michalska, H. 1990. Receding horizon control of nonlinear system. IEEE Trans. Auto. Contr. 38(7): 814–824.Google Scholar
  25. Misra, V. M., and Gong, W. B. 1998. A hierarchical model for teletraffic. Proc. IEEE CDC 2: 1674–1679.Google Scholar
  26. Neuts, M. F. 1979. Aversatile Markovian point process. J. Appl. Prob. 16: 764–779.Google Scholar
  27. Peha, J. M., and Tobagi, F. A. 1990. Evaluating scheduling algorithms for traffic with heterogeneous performance objectives. Proc. IEEE GLOBECOM, 21–27.Google Scholar
  28. Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley.Google Scholar
  29. Romanov, A., and Floyd, S. 1995. Dynamics of TCP traffic over ATM network. IEEEJ. of Select. Areas Commun. 13(4): 633–641.Google Scholar
  30. Ross, S. M. 1997. Simulation. San Diego, CA: Academic Press.Google Scholar
  31. Sen, P., Marlaris, B., Rikh, N., and Anastassiou, D. 1989. Models for packet switching of variable-bit-rate video sources. IEEE J. of Select. Areas Commun. 7: 865–869.Google Scholar
  32. Turin, J. W. 1996. Fitting stochastic automata via the EM algorithm. Stochastic Models 12: 405–424.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Hyeong Soo Chang
    • 1
  • Robert Givan
    • 2
  • Edwin K. P. Chong
    • 3
  1. 1.Department of Computer Science and EngineeringSogang UniversitySeoulKorea
  2. 2.School of Electrical and Computer EngineeringPurdue UniversityWest LafayetteUSA
  3. 3.Department of Electrical and Computer EngineeringColorado State UniversityFort CollinsUSA

Personalised recommendations