Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic Parallelization

  • Daniel Brinkers
  • Ronald Veldema
  • Michael Philippsen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8967)

Abstract

A common approach for dynamic parallelization of loops at runtime is the inspector-executor pattern. The inspector first runs the loop without any (side) effects to analyze whether there are data dependences that would prevent parallel execution. Only if no such dependences are found, does the executor phase actually run the loop iterations in parallel. In previous works, the overhead of the inspection must either be amortized by the parallel execution or is completely wasted if the loop turns out to be non-parallelizable.

In this paper we propose to run the inspection phase simultaneous to an instrumented sequential version of the loop. This way we can reduce and hide the overhead in case of a non-parallelizable loop. We discuss what needs to be done so that the sequentially executed iterations do not invalidate the inspector’s concurrent work (in which case sequential execution is needed for the whole loop).

Our measurements show that if a loop cannot be executed in parallel there is an overhead below 1.6 % compared to the runtime of the original sequential loop. If the loop is parallelizable, we see speedups of up to a factor of 3.6 on a quad core processor.

References

  1. 1.
    Arenaz, M., Touriño, J., Doallo, R.: An inspector-executor algorithm for irregular assignment parallelization. In: Cao, J., Yang, L.T., Guo, M., Lau, F. (eds.) ISPA 2004. LNCS, vol. 3358, pp. 4–15. Springer, Heidelberg (2004) Google Scholar
  2. 2.
    Campanoni, S., Jones, T., Holloway, G., Reddi, V.J., Wei, G.-Y., Brooks, D.: Helix: automatic parallelization of irregular programs for chip multiprocessing. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO 2012), pp. 84–93, San Jose, CA, March 2012Google Scholar
  3. 3.
    Chen, M.K., Olukotun, K.: The Jrpm system for dynamically parallelizing java programs. In: Proceedings of the International Symposium on Computer Architecture (ISCA 2003), pp. 434–446, San Diego, CA, June 2003Google Scholar
  4. 4.
    DeVuyst, M., Tullsen, D.M., Kim, S.W.: Runtime parallelization of legacy code on a transactional memory system. In: Proceedings of the International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC 2011), pp. 127–136, Heraklion, Greece, January 2011Google Scholar
  5. 5.
    Du, Z.-H., Lim, C.-C., Li, X.-F., Yang, C., Zhao, Q., Ngai, T.-F.: A cost-driven compilation framework for speculative parallelization of sequential programs. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI 2004), pp. 71–81, Washington DC, June 2004Google Scholar
  6. 6.
    Garcia, S., Jeon, D., Louie, C.M., Taylor, M.B.: Kremlin: rethinking and rebooting Gprof for the multicore age. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2011), pp. 458–469, San Jose, CA, June 2011Google Scholar
  7. 7.
    García-Yágüez, Á., Llanos, D.R., González-Escribano, A.: Exclusive squashing for thread-level speculation. In: Proceedings of the International Symposium on High Performance Distributed Computing (HPDC 2011), pp. 275–276, San Jose, CA, June 2011Google Scholar
  8. 8.
    Gupta, M., Nim, R.: Techniques for speculative run-time parallelization of loops. In: Proceedings of the International Conference on Supercomputing (SC 1998), pp. 1–12, San Jose, CA, November 1998Google Scholar
  9. 9.
    Larsen, P., Ladelsky, R., Lidman, J., McKee, S.A., Karlsson, S., Zaks, A.: Parallelizing more loops with compiler guided refactoring. In: Proceedings on the International Conferences on Parallel Proceesing (ICPP 2012), pp. 410–419, Pittsburg, PA, September 2012Google Scholar
  10. 10.
    Leung, S.-T., Zahorjan, J.: Improving the performance of runtime parallelization. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP 1993), pp. 83–91, San Diego, CA, May 1993Google Scholar
  11. 11.
    Liao, S.-W., Diwan, A., Bosch, R.P., Jr., Ghuloum, A., Lam, M.S.: Suif explorer: an interactive and interprocedural parallelizer. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP 1999), pp. 37–48, Atlanta, GA, May 1999Google Scholar
  12. 12.
    Mehrara, M., Hao, J., Hsu, P.-C., Mahlke, S.: Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2009), pp. 166–176, Dublin, Ireland, June 2009Google Scholar
  13. 13.
    Philippsen, M., Tillmann, N., Brinkers, D.: Double inspection for run-time loop parallelization. In: Rajopadhye, S., Mills Strout, M. (eds.) LCPC 2011. LNCS, vol. 7146, pp. 46–60. Springer, Heidelberg (2013) Google Scholar
  14. 14.
    Ponnusamy, R., Saltz, J., Choudhary, A.: Runtime compilation techniques for data partitioning and communication schedule reuse. In: Proceedings of the International Conference on Supercomputing (SC 1993), pp. 361–370, Portland, OR, November 1993Google Scholar
  15. 15.
    Qian, Y.: Automatic parallelization tools. In: Proceedings of the World Congress Engineering and Computer Science (WCECS 2012), pp. 97–101, San Francisco, CA, October 2012Google Scholar
  16. 16.
    Rauchwerger, L., Padua, D.: The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 1995), pp. 218–232, La Jolla, CA, June 1995Google Scholar
  17. 17.
    Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.P.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2009), pp. 177–187, Dublin, Ireland, June 2009Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Daniel Brinkers
    • 1
  • Ronald Veldema
    • 1
  • Michael Philippsen
    • 1
  1. 1.Programming Systems GroupFriedrich-Alexander University Erlangen-Nürnberg (FAU)ErlangenGermany

Personalised recommendations