Polyhedral Optimizations for a Data-Flow Graph Language

  • Alina SbîrleaEmail author
  • Jun Shirako
  • Louis-Noël Pouchet
  • Vivek Sarkar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9519)


This paper proposes a novel optimization framework for the Data-Flow Graph Language (DFGL), a dependence-based notation for macro-dataflow model which can be used as an embedded domain-specific language. Our optimization framework follows a “dependence-first” approach in capturing the semantics of DFGL programs in polyhedral representations, as opposed to the standard polyhedral approach of deriving dependences from access functions and schedules. As a first step, our proposed framework performs two important legality checks on an input DFGL program — checking for potential violations of the single-assignment rule, and checking for potential deadlocks. After these legality checks are performed, the DFGL dependence information is used in lieu of standard polyhedral dependences to enable polyhedral transformations and code generation, which include automatic loop transformations, tiling, and code generation of parallel loops with coarse-grain (fork-join) and fine-grain (doacross) synchronizations. Our performance experiments with nine benchmarks on Intel Xeon and IBM Power7 multicore processors show that the DFGL versions optimized by our proposed framework can deliver up to 6.9\(\times \) performance improvement relative to standard OpenMP versions of these benchmarks. To the best of our knowledge, this is the first system to encode explicit macro-dataflow parallelism in polyhedral representations so as to provide programmers with an easy-to-use DSL notation with legality checks, while taking full advantage of the optimization functionality in state-of-the-art polyhedral frameworks.


Iteration Domain Tile Size Dataflow Graph Polyhedral Model Dataflow Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported in part by the National Science Foundation through awards 0926127 and 1321147.


  1. 1.
    Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical report LLNL-TR-490254Google Scholar
  2. 2.
    The PACE compiler project.
  3. 3.
    The Swarm Framework.
  4. 4.
    Building an open community runtime (OCR) framework for exascale systems, supercomputing 2012 Birds-of-a-feather session, November 2012Google Scholar
  5. 5.
    Ackerman, W., Dennis, J.: VAL - A Value Oriented Algorithmic Language. Technical report TR-218, MIT Laboratory for Computer Science, June 1979Google Scholar
  6. 6.
    Agrawal, K., et al.: Executing task graphs using work-stealing. In: IPDPS (2010)Google Scholar
  7. 7.
    Arvind., Dertouzos, M., Nikhil, R., Papadopoulos, G.: Project Dataflow: A parallel computing system based on the Monsoon architecture and the Id programming language. Technical report, MIT Lab for Computer Science, computation Structures Group Memo 285, March 1988Google Scholar
  8. 8.
    Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT, pp. 7–16 (2004)Google Scholar
  9. 9.
    Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC (2012)Google Scholar
  10. 10.
    Bhaskaracharya, S.G., Bondhugula, U.: PolyGLoT: a polyhedral loop transformation framework for a graphical dataflow language. In: Jhala, R., De Bosschere, K. (eds.) Compiler Construction. LNCS, vol. 7791, pp. 123–143. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: PLDI (2008)Google Scholar
  12. 12.
    Budimlić, Z., Burke, M., Cavé, V., Knobe, K., Lowney, G., Newton, R., Palsberg, J., Peixotto, D., Sarkar, V., Schlimbach, F., Taşirlar, S.: Concurrent collections. Sci. Program. 18, 203–217 (2010)Google Scholar
  13. 13.
    Chandramowlishwaran, A., Knobe, K., Vuduc, R.: Performance evaluation of concurrent collections on high-performance multicore computing systems. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12, April 2010Google Scholar
  14. 14.
    Chatarasi, P., Shirako, J., Sarkar, V.: Polyhedral optimizations of explicitly parallel programs. In: Proceedings of PACT 2015 (2015)Google Scholar
  15. 15.
    Chatterjee, S., Tasrlar, S., Budimlic, Z., Cave, V., Chabbi, M., Grossman, M., Sarkar, V., Yan, Y.: Integrating asynchronous task parallelism with MPI. In: IPDPS (2013)Google Scholar
  16. 16.
    Collard, J.-F., Griebl, M.: Array dataflow analysis for explicitly parallel programs. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds.) Euro-Par 1996. LNCS, vol. 1123, pp. 406–416. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  17. 17.
    Cytron, R.: Doacross: beyond vectorization for multiprocessors. In: ICPP 1986, pp. 836–844 (1986)Google Scholar
  18. 18.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Int. J. Parallel Program. 21(6), 389–420 (1992)CrossRefMathSciNetzbMATHGoogle Scholar
  19. 19.
    Feautrier, P., Lengauer, C.: The polyhedron model. In: Encyclopedia of Parallel Programming (2011)Google Scholar
  20. 20.
    Hong, S., Salihoglu, S., Widom, J., Olukotun, K.: Simplifying scalable graph processing with a domain-specific language. In: CGO (2014)Google Scholar
  21. 21.
    IntelCorporation: Intel (R) Concurrent Collections for C/C++.
  22. 22.
    Karlin, I., et al.: Lulesh programming model and performance ports overview. Techical report. LLNL-TR-608824, December 2012Google Scholar
  23. 23.
    Kong, M., Pop, A., Pouchet, L.N., Govindarajan, R., Cohen, A., Sadayappan, P.: Compiler/runtime framework for dynamic dataflow parallelization of tiled programs. ACM Trans. Archit. Code Optim. (TACO) 11(4), 61 (2015)Google Scholar
  24. 24.
    Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978). CrossRefzbMATHGoogle Scholar
  25. 25.
    Pouchet, L.-N.: The Polyhedral Benchmark Suite.
  26. 26.
    Lu, Q., Bondhugula, U., Henretty, T., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P., Chen, Y., Lin, H., Fook Ngai, T.: Data layout transformation for enhancing data locality on NUCA chip multiprocessors. In: PACT (2009)Google Scholar
  27. 27.
    McGraw, J.: SISAL - Streams and Iteration in a Single-Assignment Language - Version 1.0. Lawrence Livermore National Laboratory, July 1983Google Scholar
  28. 28.
    OpenMP Technical Report 3 on OpenMP 4.0 enhancements.
  29. 29.
    Sarkar, V., Harrod, W., Snavely, A.E.: Software Challenges in Extreme Scale Systems, special Issue on Advanced Computing: The Roadmap to Exascale, January 2010Google Scholar
  30. 30.
    Sarkar, V., Hennessy, J.: Partitioning parallel programs for macro-dataflow. In: ACM Conference on LISP and Functional Programming, pp. 202–211, August 1986Google Scholar
  31. 31.
    Sbirlea, A., Pouchet, L.N., Sarkar, V.: DFGR: an intermediate graph representation for macro-dataflow programs. In: Fourth International Workshop on Data-Flow Modelsfor Extreme Scale Computing (DFM 2014), August 2014Google Scholar
  32. 32.
    Sbîrlea, A., Zou, Y., Budimlić, Z., Cong, J., Sarkar, V.: Mapping a data-flow programming model onto heterogeneous platforms. In: LCTES (2012)Google Scholar
  33. 33.
    Shirako, J., Pouchet, L.N., Sarkar, V.: Oil and water can mix: an integration of polyhedral and AST-based transformations. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2014 (2014)Google Scholar
  34. 34.
    Shirako, J., Unnikrishnan, P., Chatterjee, S., Li, K., Sarkar, V.: Expressing DOACROSS loop dependencies in OpenMP. In: 9th International Workshop on OpenMP (IWOMP) (2011)Google Scholar
  35. 35.
    Stavrou, K., Nikolaides, M., Pavlou, D., Arandi, S., Evripidou, P., Trancoso, P.: TFlux: a portable platform for data-driven multithreading on commodity multicore systems. In: ICPP (2008)Google Scholar
  36. 36.
    The STE—AR Group: HPX, a C++ runtime system for parallel and distributed applications of any scale.
  37. 37.
    UCLA, Rice, OSU, UCSB: Center for Domain-Specific Computing (CDSC).
  38. 38.
    Unnikrishnan, P., Shirako, J., Barton, K., Chatterjee, S., Silvera, R., Sarkar, V.: A practical approach to DOACROSS parallelization. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 219–231. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  39. 39.
    Vrvilo, N.: Asynchronous Checkpoint/Restart for the Concurrent Collections Model. MS thesis, Rice University (2014).
  40. 40.
    Wonnacott, D.G.: Constraint-based Array Dependence Analysis. Ph.D. thesis, College Park, MD, USA, uMI Order No. GAX96-22167 (1995)Google Scholar
  41. 41.
    Yuki, T., Feautrier, P., Rajopadhye, S., Saraswat, V.: Array dataflow analysis for polyhedral X10 programs. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2007 (2013)Google Scholar
  42. 42.
    Yuki, T., Gupta, G., Kim, D.G., Pathan, T., Rajopadhye, S.: AlphaZ: a system for design space exploration in the polyhedral model. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 17–31. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Alina Sbîrlea
    • 1
    Email author
  • Jun Shirako
    • 1
  • Louis-Noël Pouchet
    • 2
  • Vivek Sarkar
    • 1
  1. 1.Rice UniversityHoustonUSA
  2. 2.Ohio State UniversityColumbusUSA

Personalised recommendations