Estimating and Exploiting Potential Parallelism by Source-Level Dependence Profiling

  • Jonathan Mak
  • Karl-Filip Faxén
  • Sverker Janson
  • Alan Mycroft
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6271)


Manual parallelization of programs is known to be difficult and error-prone, and there are currently few ways to measure the amount of potential parallelism in the original sequential code.

We present an extension of Embla, a Valgrind-based dependence profiler that links dynamic dependences back to source code. This new tool estimates potential task-level parallelism in a sequential program and helps programmers exploit it at the source level. Using the popular fork-join model, our tool provides a realistic estimate of potential speed-up for parallelization with frameworks like Cilk, TBB or OpenMP 3.0 . Estimates can be given for several different parallelization models, varying in programmer effort and capabilities required of the underlying implementation. Our tool also outputs source-level dependence information to aid the parallelization of programs with lots of inherent parallelism, as well as critical paths to suggest algorithmic rewrites of programs with little of it.

We validate our claims by running our tool over serial elisions of sample Cilk programs, finding additional inherent parallelism not exploited by the Cilk code, as well as over serial C benchmarks where the profiling results suggest parallelism-enhancing algorithmic rewrites.


Vortex Paral Alan Larus Source Line 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: principles, techniques, and tools. Addison-Wesley Longman Publishing Co. Inc, Boston (1986)MATHGoogle Scholar
  2. 2.
    Blume, B., Eigenmann, R., Faigin, K., Grout, J., Hoe, J., Padua, D., Petersen, P., Pottenger, B., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: The next generation in parallelizing compilers. In: Proc. Workshop on Languages and Compilers for Parallel Computing. Springer, Heidelberg (1994)Google Scholar
  3. 3.
    Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 37(1), 55–69 (1996)CrossRefGoogle Scholar
  4. 4.
    Dagum, L., Menon, R.: OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)CrossRefGoogle Scholar
  5. 5.
    Faxén, K.F., Popov, K., Jansson, S., Albertsson, L.: Embla—data dependence profiling for parallel programming. In: CISIS 2008: Proc. 2nd Int’l Conf. on Complex, Intelligent and Software Intensive Systems. IEEE, Los Alamitos (2008)Google Scholar
  6. 6.
    Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: MiBench: A free, commercially representative embedded benchmark suite. In: WWC 2001: Proc. Int’l Workshop on Workload Characterization. IEEE, Los Alamitos (2001)Google Scholar
  7. 7.
    Henning, J.: SPEC CPU2000: measuring CPU performance in the new millennium. Computer 33(7), 28–35 (2000)CrossRefGoogle Scholar
  8. 8.
    Kennedy, K., Allen, J.R.: Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco (2002)Google Scholar
  9. 9.
    KleinOsowski, A., Lilja, D.J.: MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research. Comp. Arch. Letters 1 (2002)Google Scholar
  10. 10.
    Kreaseck, B., Tullsen, D., Calder, B.: Limits of task-based parallelism in irregular applications. In: Proc. Int’l Symp. on High Performance Computing (2000)Google Scholar
  11. 11.
    Larus, J.R.: Loop-level parallelism in numeric and symbolic programs. IEEE Trans. Parallel Distrib. Syst. 4(7), 812–826 (1993)CrossRefGoogle Scholar
  12. 12.
    Lea, D.: A Java fork/join framework. In: Proc. ACM, Conf. on Java Grande (2000)Google Scholar
  13. 13.
    Leijen, D., Hall, J.: Parallel performance: Optimize managed code for multi-core machines. MSDN Magazine (October 2007)Google Scholar
  14. 14.
    Leiserson, C.E.: The Cilk++ concurrency platform. In: DAC 2009: Proc. 46th Annual Design Automation Conference. ACM, New York (2009)Google Scholar
  15. 15.
    Liu, W., Tuck, J., Ceze, L., Ahn, W., Strauss, K., Renau, J., Torrellas, J.: POSH: a TLS compiler that exploits program structure. In: PPoPP 2006: Proc. 11th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (2006)Google Scholar
  16. 16.
    Mellor-Crummey, J.: On-the-fly detection of data races for programs with nested fork-join parallelism. In: Proc. of Supercomputing 1991, pp. 24–33. ACM Press, New York (1991)CrossRefGoogle Scholar
  17. 17.
    Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not. 42(6), 89–100 (2007)CrossRefGoogle Scholar
  18. 18.
    Nguyen, H., Taura, K., Yonezawa, A.: Parallelizing programs using access traces. In: LCR 2002: Proc. 6th Int’l Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers (2002)Google Scholar
  19. 19.
    Oplinger, J.T., Heine, D.L., Lam, M.S.: In search of speculative thread-level parallelism. In: Proc. 1999 Int’l Conf. on Parallel Architectures and Compilation Techniques. IEEE, Los Alamitos (1999)Google Scholar
  20. 20.
    Perez, J., Badia, R., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE Int’l Conf. on Cluster Computing (2008)Google Scholar
  21. 21.
    Reinders, J.: Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O’Reilly Media, Inc., Sebastopol (2007)Google Scholar
  22. 22.
    Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., Anderson, T.: Eraser: a dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems 15(4), 391–411 (1997)CrossRefGoogle Scholar
  23. 23.
    Steffan, J.G., Colohan, C., Zhai, A., Mowry, T.C.: The STAMPede approach to thread-level speculation. ACM Trans. Comput. Syst. 23(3), 253–300 (2005)CrossRefGoogle Scholar
  24. 24.
    Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In: PLDI 2009: Proc. 2009 ACM SIGPLAN Conf. on Programming Language Design and Implementation (2009)Google Scholar
  25. 25.
    Wall, D.W.: Limits of instruction-level parallelism. In: Proc. 4th Int’l Conf. on Architectural Support for Programming Languages and Operating System. ACM, New York (1991)Google Scholar
  26. 26.
    Warg, F., Stenström, P.: Limits on speculative module-level parallelism in imperative and object-oriented programs on CMP platforms. In: PACT 2001: Proc. 2001 Int’l Conf. on Parallel Architectures and Compilation Techniques. IEEE, Los Alamitos (2001)Google Scholar
  27. 27.
    Wu, P., Kejariwal, A., Caşcaval, C.: Compiler-driven dependence profiling to guide program parallelization. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 232–248. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  28. 28.
    Zhang, X., Navabi, A., Jagannathan, S.: Alchemist: A transparent dependence distance profiling infrastructure. In: CGO 2009: Proc. 2009 Int’l Symp. on Code Generation and Optimization. IEEE, Los Alamitos (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jonathan Mak
    • 1
  • Karl-Filip Faxén
    • 2
  • Sverker Janson
    • 2
  • Alan Mycroft
    • 1
  1. 1.Computer LaboratoryUniversity of CambridgeCambridgeUnited Kingdom
  2. 2.Swedish Institute of Computer ScienceKistaSweden

Personalised recommendations