Double Inspection for Run-Time Loop Parallelization

  • Michael Philippsen
  • Nikolai Tillmann
  • Daniel Brinkers
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7146)


The Inspector/Executor is well-known for parallelizing loops with irregular access patterns that cannot be analyzed statically. The downsides of existing inspectors are that it is hard to amortize their high run-time overheads by actually executing the loop in parallel, that they can only be applied to loops with dependencies that do not change during their execution and that they are often specifically designed for array codes and are in general not applicable in object oriented just-in-time compilation.

In this paper we present an inspector that inspects a loop twice to detect if it is fully parallelizable. It works for arbitrary memory access patterns, is conservative as it notices if changing data dependencies would cause errors in a potential parallel execution, and most importantly, as it is designed for current multicore architectures it is fast – despite of its double inspection effort: it pays off at its first use.

On benchmarks we can amortize the inspection overhead and outperform the sequential version from 2 or 3 cores onward.


Wave Front Iteration Number Parallel Execution Memory Address Transactional Memory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bebenita, M., Brandner, F., Fahndrich, M., Logozzo, F., Schulte, W., Tillmann, N., Venter, H.: SPUR: a trace-based JIT compiler for CIL. In: Proc. OOPSLA 2010, ACM Intl. Conf. Object-Oriented Programming, Systems, Languages, and Applications, Reno, NV, pp. 708–725 (October 2010)Google Scholar
  2. 2.
    Chen, D.K., Torellas, J., Yew, P.C.: An efficient algorithm for the run-time parallelization of DOACROSS loops. In: Proc. ACM/IEEE Conf. Supercomp., Washington, DC, pp. 518–527 (November 1994)Google Scholar
  3. 3.
    Eich, B.: JavaScript at ten years. In: ACM SIGPLAN Intl. Conf. Functional Programming, keynote. Tallinn, Estonia (September 2005),
  4. 4.
    Gupta, M., Nim, R.: Techniques for speculative run-time parallelization of loops. In: Proc. ACM/IEEE Conf. Supercomp., Melbourne, Australia, pp. 1–12 (July 1998)Google Scholar
  5. 5.
    Harris, T., Fraser, K.: Language support for lightweight transactions. In: Proc. OOPSLA 2003, ACM Intl. Conf. Object-Oriented Programming, Systems, Languages, and Applications, Anaheim, CA, pp. 388–402 (October 2003)Google Scholar
  6. 6.
    Kao, S.H., Yang, C.T., Tseng, S.S.: Run-time parallelization for loops. In: Proc. HICSS 1996, Hawaii Intl. Conf. System Sciences, Wailea, HI, vol. 1, pp. 233–242 (January 1996)Google Scholar
  7. 7.
    Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic parallelism requires abstractions. Comm. ACM 52(9), 89–97 (2009)CrossRefGoogle Scholar
  8. 8.
    Leung, S.T., Zahorjan, J.: Improving the performance of runtime parallelization. In: Prof. PPoPP 1993, ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, San Diego, CA, pp. 83–91 (May 1993)Google Scholar
  9. 9.
    Midkiff, S.P., Padua, D.A.: Compiler algorithms for synchronization. IEEE Trans. Comput. 36(12), 1485–1495 (1987)zbMATHCrossRefGoogle Scholar
  10. 10.
    Ponnusamy, R., Saltz, J., Choudhary, A.: Runtime compilation techniques for data partitioning and communication schedule reuse. In: Proc. ACM/IEEE Conf. Supercomp., Portland, OR, pp. 361–370 (November 1993)Google Scholar
  11. 11.
    Rauchwerger, L., Amato, N.M., Padua, D.A.: A scalable method for run-time loop parallelization. Intl. J. Parallel Programming 26(6), 537–576 (1995)CrossRefGoogle Scholar
  12. 12.
    Rauchwerger, L., Padua, D.A.: The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel and Distrib. Systems 10(2), 160–180 (1999)CrossRefGoogle Scholar
  13. 13.
    Saltz, J.H., Mirchandaney, R., Crowley, K.: Run-time parallelization and scheduling of loops. IEEE Trans. Comput. 40(5), 603–612 (1991)CrossRefGoogle Scholar
  14. 14.
    Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C.: A scalable approach to thread-level speculation. In: Proc. Intl. Symp. Computer Architecture, Vancouver, Canada, pp. 1–12 (June 2000)Google Scholar
  15. 15.
    Yang, C.T., Tseng, S.S., Kao, S.H., Hsieh, M.H., Jiang, M.F.: Run-time parallelization for partially parallel loops. In: Proc. Intl. Conf. Parallel and Distrib. Systems, Seoul, South Korea, pp. 308–313 (December 1997)Google Scholar
  16. 16.
    Zhu, C.Q., Yew, P.C.: A scheme to enforce data dependence on large multiprocessor systems. IEEE Trans. Softw. Eng. 13(6), 726–739 (1987)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Michael Philippsen
    • 1
  • Nikolai Tillmann
    • 2
  • Daniel Brinkers
    • 1
  1. 1.Computer Science Dept., Programming Systems GroupUniversity of Erlangen-NurembergErlangenGermany
  2. 2.Microsoft ResearchOne Microsoft WayRedmondUSA

Personalised recommendations