Machine Learning

, Volume 49, Issue 2–3, pp 141–160 | Cite as

Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts

  • Amy McGovern
  • Eliot Moss
  • Andrew G. Barto


The execution order of a block of computer instructions on a pipelined machine can make a difference in running time by a factor of two or more. Compilers use heuristic schedulers appropriate to each specific architecture implementation to achieve the best possible program speed. However, these heuristic schedulers are time-consuming and expensive to build. We present empirical results using both rollouts and reinforcement learning to construct heuristics for scheduling basic blocks. In simulation, the rollout scheduler outperformed a commercial scheduler on all benchmarks tested, and the reinforcement learning scheduler outperformed the commercial scheduler on several benchmarks and performed well on the others. The combined reinforcement learning and rollout approach was also very successful. We present results of running the schedules on Compaq Alpha machines and show that the results from the simulator correspond well to the actual run-time results.

reinforcement learning instruction scheduling rollouts 


  1. Abramson, B. (1990). Expected-outcome: A general model of static evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12:2, 182–193.Google Scholar
  2. Bertsekas, D. P. (1997). Differential training of rollout policies. In Proc. of the 35th Allerton Conference on Communication, Control, and Computing, Allerton Park, IL.Google Scholar
  3. Bertsekas, D. P., Tsitsiklis, J. N., & Wu, C. (1997). Rollout algorithms for combinatorial optimization. Journal of Heuristics, 3, 245–262.Google Scholar
  4. DEC (1992). DEC chip 21064-AA microprocessor hardware reference manual (1st edn.). Maynard, MA: Digital Equipment Corporation.Google Scholar
  5. Galperin, G. (1994). Learning and improving backgammon strategy. In Proceedings of the CBCL Learning Day, Cambridge, MA.Google Scholar
  6. Harmon, M. E., Baird III, L. C., & Klopf, A. H. (1995). Advantage updating applied to a differential game. In G. Tesauro, & D. Touretzky (Eds.), Advances in neural information processing systems, Proceedings of the 1994 Conference (pp. 353–360). San Mateo, CA: Morgan Kaufmann.Google Scholar
  7. McGovern, A., & Moss, E. (1998). Scheduling straight-line code using reinforcement learning and rollouts. In S. Solla (Ed.), Advances in neural information processing Systems 11 (pp. 903–909). Cambridge, MA: MIT Press.Google Scholar
  8. McGovern, A., Moss, E., & Barto, A. G. (1999). Basic-block instruction scheduing using reinforcement learning and rollouts. In T. Dean (Ed.), Proceedings of IJCAI 1999 Workshops. San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
  9. Moss, J. E. B., Utgoff, P. E., Cavazos, J., Precup, D., Stefanovi?, D., Brodley, C. E., & Scheeff, D. T. (1997). Learning to schedule straight-line code. In Proceedings of Advances in Neural Information Processing Systems 10 (Proceedings of NIPS'97) (pp. 929–935). Cambridge, MA: MIT Press.Google Scholar
  10. Proebsting, T. Least-cost instruction selection in DAGs is NP-Complete. Available at Scholar
  11. Reilly, J. (1995). SPEC describes SPEC95 products and benchmarks. SPEC Newsletter.Google Scholar
  12. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart, & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1: Foundations). Cambridge, MA: Bradford Books/MIT Press.Google Scholar
  13. Scheeff, D., Brodley, C., Moss, E., Cavazos, J., & Stefanovi?, D. (1997). Applying reinforcement learning to instruction scheduling within basic blocks. Technical Report, University of Massachusetts, Amherst.Google Scholar
  14. Sites, R. (1992). Alpha architecture reference manual. Maynard, MA: Digital Equipment Corporation.Google Scholar
  15. Srivastava, A., & Eustace, A. (1994). ATOM: A system for building customized program analysis tools. In Proceedings ACM SIGPLAN '94 Conference on Programming Language Design and Implementation (pp. 196-205).Google Scholar
  16. Stefanovi?, D. (1997). The character of the instruction scheduling problem. University of Massachusetts, Amherst.Google Scholar
  17. Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9–44.Google Scholar
  18. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. An introduction. Cambridge, MA: MIT Press.Google Scholar
  19. Tesauro, G., & Galperin, G. R. (1996). On-line policy improvement using Monte-Carlo search. In Advances in neural information processing, Proceedings of the Ninth Conference (pp. 1068–1074). Cambridge, MA: MIT Press.Google Scholar
  20. Utgoff, P. E., Berkman, N., & Clouse, J. (1997). Decision tree induction based on efficient tree restructuring. Machine Learning, 29:1, 5–44.Google Scholar
  21. Utgoff, P. E., & Clouse, J. A. (1991). Two kinds of training information for evaluation function learning. In Proceedings of the Ninth Annual Conference on Artificial Intelligence (pp. 596–600). San Mateo, CA: Morgan Kaufmann.Google Scholar
  22. Utgoff, P. E., & Precup, D. (1997a). Constructive function approximation. Technical Report UM-CS-97-04, University of Massachusetts, Amherst.Google Scholar
  23. Utgoff, P. E., & Precup, D. (1997b). Relative value function approximation. Technical Report UM-CS-97-003, University of Massachusetts, Amherst.Google Scholar
  24. Werbos, P. (1992). Approximate dynamic programming for real-time control and neural modeling. In D. A. White, & D. A. Sofge (Eds.), Handbook of intelligent control: Neural, fuzzy, and adaptive approaches (pp. 493–525). New York: Van Nostrand Reinhold.Google Scholar
  25. Woolsey, K. (1991). Rollouts. Inside Backgammon, 1:5, 4–7.Google Scholar
  26. Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1114-1120).Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Amy McGovern
    • 1
  • Eliot Moss
    • 1
  • Andrew G. Barto
    • 1
  1. 1.Department of Computer ScienceUniversity of Massachusetts, AmherstAmherstUSA

Personalised recommendations