Advertisement

Loop Selection for Thread-Level Speculation

  • Shengyue Wang
  • Xiaoru Dai
  • Kiran S. Yellajyosula
  • Antonia Zhai
  • Pen-Chung Yew
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4339)

Abstract

Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. However, the high cost associated with unbalanced load, failed speculation, and inter-thread value communication makes it difficult to obtain the desired performance unless the speculative threads are carefully chosen.

In this paper, we focus on extracting parallel threads from loops in general-purpose applications because loops, with their regular structures and significant coverage on execution time, are ideal candidates for extracting parallel threads. General-purpose applications, however, usually contain a large number of nested loops with unpredictable parallel performance and dynamic behavior, thus making it difficult to decide which set of loops should be parallelized to improve overall program performance. Our proposed loop selection algorithm addresses all these difficulties. We have found that (i) with the aid of profiling information, compiler analyses can achieve a reasonably accurate estimation of the performance of parallel execution, and that (ii) different invocations of a loop may behave differently, and exploiting this dynamic behavior can further improve performance. With a judicious choice of loops, we can improve the overall program performance of SPEC2000 integer benchmarks by as much as 20%.

Keywords

Program Performance Parallel Execution Parallel Thread Loop Graph Data Dependence Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Intel Pentium Processor Extreme Edition, http://www.intel.com/products/processor/pentiumXE/prodbrief.pdf
  2. 2.
    Open Research Compiler for Itanium Processor Family, http://ipf-orc.sourceforge.net/
  3. 3.
    Akkary, H., Driscoll, M.: A Dynamic Multithreading Processor. In: Proceedings of Micro-31 (December 1998)Google Scholar
  4. 4.
    Blume, B., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, B., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: Improving the Effectiveness of Parallelizing Compilers. In: Pingali, K.K., Gelernter, D., Padua, D.A., Banerjee, U., Nicolau, A. (eds.) LCPC 1994. LNCS, vol. 892. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  5. 5.
    Chen, M., Olukotun, K.: TEST: A Tracer for Extracting Speculative Threads. In: Proceedings of 2003 International Symposium on CGO (March 2003)Google Scholar
  6. 6.
    Cintra, M.H., Martínez, J.F., Torrellas, J.: Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In: Proceedings of the ISCA (2000)Google Scholar
  7. 7.
    Colohan, C.B., Zhai, A., G.S.J., Mowry, T.C.: The Impact of Thread Size and Selection on the Performance of Thread-Level Speculation (in progress)Google Scholar
  8. 8.
    Du, D.Z., Pardalos, P.M.: Handbook of Combinatorial Optimization. Kluwer Academic Publishers, Dordrecht (1999)zbMATHGoogle Scholar
  9. 9.
    Gopal, S., Vijaykumar, T., Smith, J., Sohi, G.: Speculative Versioning Cache. In: Proceedings of the 4th HPCA (February 1998)Google Scholar
  10. 10.
    Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.-W., Bugnion, E., Lam, M.S.: Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer (12) (1999)Google Scholar
  11. 11.
    Hammond, L., Willey, M., Olukotun, K.: Data Speculation Support for A Chip Multiprocessor. In: Proceedings of ASPLOS-8 (October 1998)Google Scholar
  12. 12.
    Johnson, T.A., Eigenmann, R., Vijaykumar, T.N.: Min-Cut Program Decomposition for Thread-Level Speculation. In: Proceedings of PLDI (2004)Google Scholar
  13. 13.
    Kalla, R., Sinharoy, B., Tendler, J.M.: IBM Power5 Chip: a Dual-Core Multithreaded Processor. IEEE Micro. (2004) (2)Google Scholar
  14. 14.
    Kongetira, P., Aingaran, K., Olukotun, K.N.: A 32-Way Multithreaded Sparc Processor. IEEE Micro. (2005) (2)Google Scholar
  15. 15.
    Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In: Proceedings of the ACM Intl. Conf. on Programming Language Design and Implementation (June 2005)Google Scholar
  16. 16.
    Marcuello, P., Gonzlez, A.: Clustered Speculative Multithreaded Processors. In: Proceedings of MICRO-32 (November 1999)Google Scholar
  17. 17.
    Moshovos, A.I., Breach, S.E., Vijaykumar, T., Sohi, G.S.: Dynamic Speculation and Synchronization of Data Dependences. In: The Proceedings of the 24th ISCA (June 1997)Google Scholar
  18. 18.
    Olukotun, K., Hammond, L., Willey, M.: Improving the Performance of Speculatively Parallel Applications on the Hydra CMP. In: Proceedings of the ACM Int. Conf. on Supercomputing (June 1999)Google Scholar
  19. 19.
    Oplinger, J., Heine, D., Lam, M.S.: In Search of Speculative Thread-Level Parallelism. In: Malyshkin, V.E. (ed.) PaCT 1999. LNCS, vol. 1662. Springer, Heidelberg (1999)Google Scholar
  20. 20.
    Prabhu, M., Olukotun, K.: Exposing Speculative Thread Parallelism in SPEC 2000. In: Proceedings of the 9th ACM Symposium on Principles and Practice of Parallel Programming (2005)Google Scholar
  21. 21.
    Quinones, C.G., Madriles, C., Sanchez, J., Marcuello, P., González, A., Tullsen, D.M.: Mitosis Compiler: An Infrastructure for Speculative Threading Based on Pre-Computation Slices. In: Proceedings of the ACM Intl. Conf. on Programming Language Design and Implementation (June 2005)Google Scholar
  22. 22.
    Rauchwerger, L., Padua, D.A.: The LRPD Test: Speculative RunTime Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Transactions on Parallel Distributed Systems (2), 160–180 (1999)Google Scholar
  23. 23.
    Renau, J., Tuck, J., Liu, W., Ceze, L., Strauss, K., Torrellas, J.: Tasking with Out-of-Order Spawn in TLS Chip Multiprocessors: Microarchitecture and Compilation. In: Proceeding of the 19th ACM International Conference on Supercomputing (2005)Google Scholar
  24. 24.
    Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar Processors. In: Proceedings of the 22nd ISCA (June 1995)Google Scholar
  25. 25.
    Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C.: Improving Value Communication for Thread-Level Speculation. In: Proceedings of the 8th HPCA (February 2002)Google Scholar
  26. 26.
    Tsai, J.-Y., Huang, J., Amlo, C., Lilja, D., Yew, P.-C.: The Superthreaded Processor Architecture. IEEE Transactions on Computers (9) (1999)Google Scholar
  27. 27.
    Vijaykumar, T.N., Sohi, G.S.: Task Selection for a Multiscalar Processor. In: Proceeding of the 31st International Symposium on Microarchitecture (December 1998)Google Scholar
  28. 28.
    Zhai, A., Colohan, C.B., Steffan, J.G., Mowry, T.C.: Compiler Optimization of Memory- Resident Value Communication Between Speculative Threads. In: Proceedings of 2004 International Symposium on CGO (March 2004)Google Scholar
  29. 29.
    Zhai, A., Colohan, C.B., Steffan, J.G., Mowry, T.C.: Compiler Optimization of Scalar Value Communication Between Speculative Threads. In: Proceedings of the 10th ASPLOS (October 2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Shengyue Wang
    • 1
  • Xiaoru Dai
    • 1
  • Kiran S. Yellajyosula
    • 1
  • Antonia Zhai
    • 1
  • Pen-Chung Yew
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of MinnesotaMinneapolisUSA

Personalised recommendations