Speculative Parallelization – Eliminating the Overhead of Failure

  • Mikel Luján
  • Phyllis Gustafson
  • Michael Paleczny
  • Christopher A. Vick
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4782)


Existing runtime parallelization techniques impose severe performance penalties when a speculative parallelization is attempted and fails. Some techniques require a sequential restart of the speculative execution while others only disregard the work after the first point of failure. This paper introduces a new technique that reduces the performance overhead of failure to less than 1% on standard processors through a combination of hoisting the failure path and partitioning work to a Coinspector Thread.


Memory Access Iteration Space Memory Address Loop Body Speculative Execution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zhu, C.Q., Yew, P.C.: A scheme to enforce data dependence on large multiprocessor systems. IEEE Transactions on Software Engineering 13(6), 726–739 (1987)zbMATHCrossRefGoogle Scholar
  2. 2.
    Midkiff, S.P., Padua, D.A.: Compiler algorithms for synchronization. IEEE Transactions on Computers 36(12), 1485–1495 (1987)zbMATHCrossRefGoogle Scholar
  3. 3.
    Rauchwerger, L.: Run-time parallelization: Its time has come. Parallel Computing 24(3-4), 527–556 (1998)zbMATHCrossRefGoogle Scholar
  4. 4.
    Rauchwerger, L., Padua, D.A.: The LRPD Test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Transactions of Parallel and Distributed Systems 10(2), 160–180 (1999)CrossRefGoogle Scholar
  5. 5.
    Garzarán, M.J., Prvulovic, M., Llabería, J.M., Viñals, V., Rauchwerger, L., Torrellas, J.: Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors. ACM Transactions on Architecture and Code Optimization 2(3), 247–279 (2005)CrossRefGoogle Scholar
  6. 6.
    Dang, F.H., Yu, H., Rauchwerger, L.: The R-LRPD Test: Speculative parallelization of partially parallel loops. In: IPDPS 2002, pp. 20–29 (2002)Google Scholar
  7. 7.
    Cintra, M., Llanos, D.R.: Design space exploration of a software speculative parallelization scheme. IEEE Transactions of Parallel and Distributed Systems 16(5), 1–15 (2005)CrossRefGoogle Scholar
  8. 8.
    Rundberg, P., Stenström, P.: An all-software thread-level data dependence speculation system for multiprocessors. Journal of Instruction Level Parallelism 3 (2001)Google Scholar
  9. 9.
    Gupta, M., Nim, R.: Techniques for speculative run-time parallelization of loops. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing – SC 1998 (1998)Google Scholar
  10. 10.
    Bruening, D., Devabhaktuni, S., Amarasinghe, S.: Softspec: Software-based speculative parallelism. In: Third ACM Workshop on Feedback-Directed and Dynamic Optimization – FDDO-3 (2000)Google Scholar
  11. 11.
    Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar processors. In: ISCA 1995, pp. 414–425 (1995)Google Scholar
  12. 12.
    Marcuello, P., González, A.: Clustered speculative multithreaded processors. In: ICS 1999, pp. 365–372 (1999)Google Scholar
  13. 13.
    Tsai, J.Y., Huang, J., Amlo, C., Lilja, D.J., Yew, P.C.: The Superthreaded processor architecture. IEEE Transactions on Computers 48(9), 881–902 (1999)CrossRefGoogle Scholar
  14. 14.
    Oplinger, J.T., Heine, D.L., Lam, M.S.: In search of speculative thread-level parallelism. PACT 1999, 303–313 (1999)Google Scholar
  15. 15.
    Prvulovic, M., Garzarán, M.J., Rauchwerger, L., Torrellas, J.: Removing architectural bottlenecks to the scalability of speculative parallelization. In: ISCA 2001, 204–215 (2001)Google Scholar
  16. 16.
    Chaudhry, S., Tremblay, M.: Space-time dimensional computing for Javatm programs on the MAJC architecture. In: Java Microarchitectures (2002)Google Scholar
  17. 17.
    Sarangi, S.R., Wei Liu, J.T., Zhou, Y.: Reslice: Selective re-execution of long-retired misspeculated instructions using forward slicing. In: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture – MICRO 38, pp. 257–270 (2005)Google Scholar
  18. 18.
    Oliker, L., Canning, A., Carter, J., Shalf, J., Ethier, S.: Scientifc computations on modern parallel vector systems. In: Proceedings of the ACM/IEEE SC2004 Conference on Supercomputing (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Mikel Luján
    • 1
  • Phyllis Gustafson
    • 2
  • Michael Paleczny
    • 2
  • Christopher A. Vick
    • 2
  1. 1.The University of Manchester 
  2. 2.Sun Microsystems Laboratories 

Personalised recommendations