Advertisement

MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

  • Kenzo Van Craeynest
  • Stijn Eyerman
  • Lieven Eeckhout
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5409)

Abstract

Threads experiencing long-latency loads on a simultaneous multith- reading (SMT) processor may clog shared processor resources without making forward progress, thereby starving other threads and reducing overall system throughput. An elegant solution to the long-latency load problem in SMT processors is to employ runahead execution. Runahead threads do not block commit on a long-latency load but instead execute subsequent instructions in a speculative execution mode to expose memory-level parallelism (MLP) through prefetching. The key benefit of runahead SMT threads is twofold: (i) runahead threads do not clog resources on a long-latency load, and (ii) runahead threads exploit far-distance MLP.

This paper proposes MLP-aware runahead threads: runahead execution is only initiated in case there is far-distance MLP to be exploited. By doing so, useless runahead executions are eliminated, thereby reducing the number of speculatively executed instructions (and thus energy consumption) while preserving the performance of the runahead thread and potentially improving the performance of the co-executing thread(s). Our experimental results show that MLP-aware runahead threads reduce the number of speculatively executed instructions by 13.9% and 10.1% for two-program and four-program workloads, respectively, compared to MLP-agnostic runahead threads while achieving comparable system throughput and job turnaround time.

Keywords

System Throughput Dynamic Partitioning Reorder Buffer Execution Resource Predictor Table 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cazorla, F.J., Fernandez, E., Ramirez, A., Valero, M.: Optimizing long-latency-load-aware fetch policies for SMT processors. International Journal of High Performance Computing and Networking (IJHPCN) 2(1), 45–54 (2004)CrossRefGoogle Scholar
  2. 2.
    Cazorla, F.J., Ramirez, A., Valero, M., Fernandez, E.: Dynamically controlled resource allocation in SMT processors. In: MICRO, pp. 171–182 (December 2004)Google Scholar
  3. 3.
    Chou, Y., Fahs, B., Abraham, S.: Microarchitecture optimizations for exploiting memory-level parallelism. In: ISCA, pp. 76–87 (June 2004)Google Scholar
  4. 4.
    Dundas, J., Mudge, T.: Improving data cache performance by pre-executing instructions under a cache miss. In: ICS, pp. 68–75 (July 1997)Google Scholar
  5. 5.
    El-Moursy, A., Albonesi, D.H.: Front-end policies for improved issue efficiency in SMT processors. In: HPCA, pp. 31–40 (February 2003)Google Scholar
  6. 6.
    Eyerman, S., Eeckhout, L.: A memory-level parallelism aware fetch policy for SMT processors. In: HPCA, pp. 240–249 (February 2007)Google Scholar
  7. 7.
    Eyerman, S., Eeckhout, L.: System-level performance metrics for multi-program workloads. IEEE Micro. 28(3), 42–53 (2008)CrossRefGoogle Scholar
  8. 8.
    Glew, A.: MLP yes! ILP no! In: ASPLOS Wild and Crazy Idea Session (October 1998)Google Scholar
  9. 9.
    Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The microarchitecture of the Pentium 4 processor. Intel. Technology Journal Q1 (2001)Google Scholar
  10. 10.
    John, L.K.: Aggregating performance metrics over a benchmark suite. In: John, L.K., Eeckhout, L. (eds.) Performance Evaluation and Benchmarking, pp. 47–58. CRC Press, Boca Raton (2006)Google Scholar
  11. 11.
    Kessler, R.E., McLellan, E.J., Webb, D.A.: The Alpha 21264 microprocessor architecture. In: ICCD, pp. 90–95 (October 1998)Google Scholar
  12. 12.
    Luo, K., Gummaraju, J., Franklin, M.: Balancing throughput and fairness in SMT processors. In: ISPASS, pp. 164–171 (November 2001)Google Scholar
  13. 13.
    Mutlu, O., Kim, H., Patt, Y.N.: Techniques for efficient processing in runahead execution engines. In: ISCA, pp. 370–381 (June 2005)Google Scholar
  14. 14.
    Mutlu, O., Stark, J., Wilkerson, C., Patt, Y.N.: Runahead execution: An alternative to very large instruction windows for out-of-order processors. In: HPCA, pp. 129–140 (February 2003)Google Scholar
  15. 15.
    Perelman, E., Hamerly, G., Calder, B.: Picking statistically valid and early simulation points. In: Malyshkin, V.E. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 244–256. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  16. 16.
    Raasch, S.E., Reinhardt, S.K.: The impact of resource partitioning on SMT processors. In: Malyshkin, V.E. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 15–26. Springer, Heidelberg (2003)Google Scholar
  17. 17.
    Ramirez, T., Pajuelo, A., Santana, O.J., Valero, M.: Runahead threads to improve SMT performance. In: HPCA, pp. 149–158 (February 2008)Google Scholar
  18. 18.
    Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: ASPLOS, pp. 45–57 (October 2002)Google Scholar
  19. 19.
    Snavely, A., Tullsen, D.M.: Symbiotic jobscheduling for simultaneous multithreading processor. In: ASPLOS, pp. 234–244 (November 2000)Google Scholar
  20. 20.
    Tullsen, D.: Simulation and modeling of a simultaneous multithreading processor. In: Proceedings of the 22nd Annual Computer Measurement Group Conference (December 1996)Google Scholar
  21. 21.
    Tullsen, D.M., Brown, J.A.: Handling long-latency loads in a simultaneous multithreading processor. In: MICRO, pp. 318–327 (December 2001)Google Scholar
  22. 22.
    Tullsen, D.M., Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L.: Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In: ISCA, pp. 191–202 (May 1996)Google Scholar
  23. 23.
    Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: Maximizing on-chip parallelism. In: ISCA, pp. 392–403 (June 1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Kenzo Van Craeynest
    • 1
  • Stijn Eyerman
    • 1
  • Lieven Eeckhout
    • 1
  1. 1.Department of Electronics and Information Systems (ELIS)Ghent UniversityBelgium

Personalised recommendations