Accelerating Data-Dependence Profiling with Static Hints

  • Mohammad NorouziEmail author
  • Qamar Ilias
  • Ali Jannesari
  • Felix Wolf
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11725)


Data-dependence profiling is a program-analysis technique to discover potential parallelism in sequential programs. Contrary to purely static dependence analysis, profiling has the advantage that it captures only those dependences that actually occur during execution. Lacking critical runtime information such as the value of pointers and array indices, purely static analysis may overestimate the amount of dependences. On the downside, dependence profiling significantly slows down the program, not seldom prolonging execution by a factor of 100. In this paper, we propose a hybrid approach that substantially reduces this overhead. First, we statically identify persistent data dependences that will appear in any execution. We then exclude the affected source-code locations from instrumentation, allowing the profiler to skip them at runtime and avoiding the associated overhead. At the end, we merge static and dynamic dependences. We evaluated our approach with 38 benchmarks from two benchmark suites and obtained a median reduction of the profiling time by 62% across all the benchmarks.



This work has been funded by the Hessian LOEWE initiative within the Software-Factory 4.0 project. Additional support has been provided by the German Research Foundation (DFG) through the Program Performance Engineering for Scientific Software and the US Department of Energy under Grant No. DE-SC0015524.


  1. 1.
    Bondhugula, U.: Pluto - an automatic parallelizer and locality optimizer for affine loop nests (2015). Accessed 13 June 2019
  2. 2.
    Benabderrahmane, M.W., Pouchet, L.N., Cohen, A., Bastoul, C.: The polyhedral model is more widely applicable than you think. In: Proceedings of the Conference on Compiler Construction. CC 2010, Paphos, Cyprus, pp. 283–303, March 2010Google Scholar
  3. 3.
    Wilhelm, A., Cakaric, F., Gerndt, M., Schuele, T.: Tool-based interactive software parallelization: a case study. In: Proceedings of the International Conference on Software Engineering. ICSE 2018, Gothenburg, Sweden, pp. 115–123, June 2018Google Scholar
  4. 4.
    Ketterlin, A., Clauss, P.: Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In: Proceedings of the International Symposium on Microarchitecture. MICRO 1945, Vancouver, B.C., Canada, pp. 437–448, December 2012Google Scholar
  5. 5.
    Kim, M., Kim, H., Luk, C.K.: SD3: a scalable approach to dynamic data-dependence profiling. In: Proceedings of the International Symposium on Microarchitecture. MICRO 1943, Atlanta, GA, USA, pp. 535–546, December 2010Google Scholar
  6. 6.
    Norouzi, M., Wolf, F., Jannesari, A.: Automatic construct selection and variable classification in OpenMP. In: Proceedings of the International Conference on Supercomputing. ICS 2019, Phoenix, AZ, USA, pp. 330–342, June 2019Google Scholar
  7. 7.
    Li, Z., Atre, R., Huda, Z.U., Jannesari, A., Wolf, F.: Unveiling parallelization opportunities in sequential programs. J. Syst. Softw. 117(1), 282–295 (2016)CrossRefGoogle Scholar
  8. 8.
    Jimborean, A., Clauss, P., Martinez, J.M., Sukumaran-Rajam, A.: Online dynamic dependence analysis for speculative polyhedral parallelization. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 191–202. Springer, Heidelberg (2013). Scholar
  9. 9.
    Li, Z., Jannesari, A., Wolf, F.: An efficient data-dependence profiler for sequential and parallel programs. In: Proceedings of the International Parallel and Distributed Processing Symposium. IPDPS 2015, Hyderabad, India, pp. 484–493, May 2015Google Scholar
  10. 10.
    Li, Z., Beaumont, M., Jannesari, A., Wolf, F.: Fast data-dependence profiling by skipping repeatedly executed memory operations. In: Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing. ICA3PP 2015, Zhangjiajie, China, pp. 583–596, November 2015CrossRefGoogle Scholar
  11. 11.
    Garcia, S., Jeon, D., Louie, C.M., Taylor, M.B.: Kremlin: rethinking and rebooting gprof for the multicore age. In: Proceedings of the Conference on Programming Language Design and Implementation. PLDI 2011, pp. 458–469, June 2011Google Scholar
  12. 12.
    Rus, S., Rauchwerger, L., Hoeflinger, J.: Hybrid analysis: static & dynamic memory reference analysis. Int. J. Parallel Prog. 31(4), 251–283 (2003)CrossRefGoogle Scholar
  13. 13.
    Intel: Pin - a dynamic binary instrumentation tool (2010). Accessed 13 June 2019
  14. 14.
    Bailey, D.H., et al.: The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991)CrossRefGoogle Scholar
  15. 15.
    Pouchet, L.N.: Polyhedral suite (2011). pouchet/software/polybench/. Accessed 13 June 2019

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Mohammad Norouzi
    • 1
    Email author
  • Qamar Ilias
    • 1
  • Ali Jannesari
    • 2
  • Felix Wolf
    • 1
  1. 1.Technische Universitaet DarmstadtDarmstadtGermany
  2. 2.Iowa State UniversityAmesUSA

Personalised recommendations