Machine Learning

, Volume 33, Issue 2–3, pp 235–262 | Cite as

Elevator Group Control Using Multiple Reinforcement Learning Agents

  • Robert H. Crites
  • Andrew G. Barto


Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basis. They can be trained on the basis of real or simulated experiences, focusing their computation on areas of state space that are actually visited during control, making them computationally tractable on very large problems. If each member of a team of agents employs one of these algorithms, a new collective learning algorithm emerges for the team as a whole. In this paper we demonstrate that such collective RL algorithms can be powerful heuristic methods for addressing large-scale control problems.

Elevator group control serves as our testbed. It is a difficult domain posing a combination of challenges not seen in most multi-agent learning research to date. We use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reward signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demonstrate the power of multi-agent RL on a very large scale stochastic dynamic optimization problem of practical utility.

Reinforcement learning multiple agents teams elevator group control discrete event dynamic systems 


  1. Axelrod, R.M. (1984). The Evolution of Cooperation. New York, NY: Basic Books.Google Scholar
  2. Bao, G., Cassandras, C.G., Djaferis, T.E., Gandhi, A.D., & Looze, D.P. (1994). Elevator dispatchers for down peak traffic. ECE Department Technical Report, University of Massachusetts.Google Scholar
  3. Barto, A.G. (1989). From chemotaxis to cooperativity: Abstract exercises in neuronal learning strategies. In R. Durbin, C. Miall, and G. Mitchison, (Eds.), The Computing Neuron. Wokingham, England: Addison-Wesley.Google Scholar
  4. Barto, A.G., Bradtke, S.J., & Singh, S.P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81–138.Google Scholar
  5. Bertsekas, D.P. & Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Belmont, MA: Athena Scientific Press.Google Scholar
  6. Bradtke, S.J. (1993). Distributed adaptive optimal control of flexible structures. Unpublished manuscript.Google Scholar
  7. Bradtke, S.J. & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, (Eds.), Advances in Neural Information Processing Systems 7. Cambridge, MA: MIT Press.Google Scholar
  8. Cassandras, C.G. (1993). Discrete Event Systems: Modeling and Performance Analysis. Homewood, IL: Aksen Associates.Google Scholar
  9. Crites, R.H. (1996). Large-Scale Dynamic Optimization Using Teams of Reinforcement Learning Agents. PhD thesis, University of Massachusetts.Google Scholar
  10. Crites, R.H. & Barto, A.G. (1996). Forming control policies from simulation models using reinforcement learning. Proceedings of the Ninth Yale Workshop on Adaptive and Learning Systems.Google Scholar
  11. Crites, R. H. & Barto, A.G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, (Eds.), Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press.Google Scholar
  12. Dayan, P. & Hinton, G.E. (1993). Feudal reinforcement learning. In S. J. Hanson, J. D. Cowan, and C. L. Giles, (Eds.), Advances in Neural Information Processing Systems 5. San Mateo, CA: Morgan Kaufmann.Google Scholar
  13. Fujino, A., Tobita, T., & Yoneda, K. (1992). An on-line tuning method for multi-objective control of elevator group. Proceedings of the International Conference on Industrial Electronics, Control, Instrumentation, and Automation, (pp. 795–800).Google Scholar
  14. Imasaki, N., Kiji, J., & Endo, T. (1992). A fuzzy neural network and its application to elevator group control. In T. Terano, M. Sugeno, M. Mukaidono, and K. Shigemasu, (Eds.), Fuzzy Engineering Toward Human Friendly Systems. Amsterdam: IOS Press.Google Scholar
  15. Levy, D., Yadin, M., & Alexandrovitz, A. (1977). Optimal control of elevators. International Journal of Systems Science, 8, 301–320.Google Scholar
  16. Lewis, J. (1991). A Dynamic Load Balancing Approach to the Control of Multiserver Polling Systems with Applications to Elevator System Dispatching. PhD thesis, ECE department, University of Massachusetts.Google Scholar
  17. Littman, M. & Boyan, J. (1993). A distributed reinforcement learning scheme for network routing. Technical Report CMU-CS-93-165, Carnegie Mellon University.Google Scholar
  18. Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. Proceedings of the Eleventh International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  19. Littman, M.L. (1996). Algorithms for Sequential Decision Making. PhD thesis, Brown University.Google Scholar
  20. Markey, K.L. (1994). Efficient learning of multiple degree-of-freedom control problems with quasi-independent Q-agents. In M. C. Mozer, P. Smolensky, D. S. Touretzky, J. L. Elman, and A. S. Weigend, (Eds.), Proceedings of the 1993 Connectionist Models Summer School. Hillsdale, NJ: Erlbaum Associates.Google Scholar
  21. Markon, S., Kita, H., & Nishikawa, Y. (1994). Adaptive optimal elevator group control by use of neural networks. Transactions of the Institute of Systems, Control, and Information Engineers, 7, 487–497.Google Scholar
  22. Narendra, K.S. & Thathachar, M.A.L. (1989). Learning Automata: An Introduction. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  23. Ovaska, S.J. (1992). Electronics and information technology in high-range elevator systems. Mechatronics, 2, 89–99.Google Scholar
  24. Pepyne, D.L. & Cassandras, C.G. (1997). Optimal dispatching control for elevator systems during uppeak traffic. IEEE Transactions on Control Systems Technology, 5, 629–643.Google Scholar
  25. Rumelhart, D.E., McClelland, J.L., & the PDP Research Group. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: MIT Press.Google Scholar
  26. Sakai, Y. & Kurosawa, K. (1984). Development of elevator supervisory group control system with artificial intelligence. Hitachi Review, 33, 25–30.Google Scholar
  27. Samuel, A.L. (1963). Some studies in machine learning using the game of checkers. In E. Feigenbaum and J. Feldman, (Eds.), Computers and Thought. New York, NY: McGraw-Hill.Google Scholar
  28. Sandholm, T.W. & Crites, R.H. (1996). Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems, 37, 147–166.Google Scholar
  29. Shoham, Y. & Tennenholtz, M. (1993). Co-learning and the evolution of coordinated multi-agent activity.Google Scholar
  30. Siikonen, M.L. (1993). Elevator traffic simulation. Simulation, 61, 257–267.Google Scholar
  31. Strakosch, G.R. (1983). Vertical Transportation: Elevators and Escalators. New York, NY: Wiley and Sons.Google Scholar
  32. Sutton, R.S. & Barto, A.G. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.Google Scholar
  33. Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. Proceedings of the Tenth International Conference on Machine Learning.Google Scholar
  34. Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.Google Scholar
  35. Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215–219.Google Scholar
  36. Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38, 58–68.Google Scholar
  37. Tobita, T., Fujino, A., Inaba, H., Yoneda, K., & Ueshima, T. (1991). An elevator characterized group supervisory control system. Proceedings of IECON, (pp. 1972–1976).Google Scholar
  38. Tsetlin, M.L. (1973). Automaton Theory and Modeling of Biological Systems. New York, NY: Academic Press.Google Scholar
  39. Ujihara, H. & Amano, M. (1994). The latest elevator group-control system. Mitsubishi Electric Advance, 67, 10–12.Google Scholar
  40. Ujihara, H. & Tsuji, S. (1988). The revolutionary AI-2100 elevator-group control system and the new intelligent option series. Mitsubishi Electric Advance, 45, 5–8.Google Scholar
  41. Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge University.Google Scholar
  42. Weiss, G. & Sen, S. (1996). Adaptation and Learning in Multi-Agent Systems. Lecture Notes in Artificial Intelligence, Volume 1042. Berlin: Springer Verlag.Google Scholar
  43. Widrow, B. & Stearns, S.D. (1985). Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Robert H. Crites
    • 1
  • Andrew G. Barto
    • 2
  1. 1.Unica Technologies, Inc.Lincoln
  2. 2.Department of Computer ScienceUniversity of MassachusettsAmherst

Personalised recommendations