On the Scalability of an Automatically Parallelized Irregular Application

  • Martin Burtscher
  • Milind Kulkarni
  • Dimitrios Prountzos
  • Keshav Pingali
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5335)


Irregular applications, i.e., programs that manipulate pointer-based data structures such as graphs and trees, constitute a challenging target for parallelization because the amount of parallelism is input dependent and changes dynamically. Traditional dependence analysis techniques are too conservative to expose this parallelism. Even manual parallelization is difficult, time consuming, and error prone. The Galois system parallelizes such applications using an optimistic approach that exploits higher-level semantics of abstract data types.

In this paper, we study the performance and scalability of a Galoised, that is, automatically parallelized, version of Delaunay mesh refinement (DR) on a shared-memory system with 128 CPUs. DR is an important irregular application that is used, e.g., in graphics and finite-element codes. The parallelized program scales to 64 threads, where it reaches a speedup of 25.8. For large numbers of threads, the performance is hampered by the load imbalance and the nonuniform memory latency, both of which grow as the number of threads increases. While these two issues will have to be addressed in future work, we believe our results already show the Galois approach to be very promising.


parallel programming multicore processors sparse graph algorithm amorphous data-parallelism optimistic execution mesh refinement 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chernikov, A., Chrisochoides, N.: Parallel 2D Constrained Delaunay Mesh Generation. ACM Transactions on Mathematical Software 34(1) (2008)Google Scholar
  2. 2.
    Chernikov, A., Chrisochoides, N.: Three-dimensional Delaunay Refinement for Multi-core Processors. In: 22nd International Conference on Supercomputing, pp. 214–224 (2008)Google Scholar
  3. 3.
    Chew, L.P.: Guaranteed-quality Mesh Generation for Curved Surfaces. In: Ninth Annual Symposium on Computational Geometry (1993)Google Scholar
  4. 4.
    Ghiya, R., Hendren, L.J.: Putting pointer analysis to work. In: 25th Symposium on Principles of Programming Languages, pp. 121–133 (1998)Google Scholar
  5. 5.
    Hendren, L.J., Nicolau, A.: Parallelizing Programs with Recursive Data Structures. IEEE Transactions on Parallel and Distributed Systems 1(1), 35–47 (1990)CrossRefGoogle Scholar
  6. 6.
    Allen, R.J., Kennedy, K.: Optimizing Compilers for Modern Architectures: a Dependence-based Approach. Morgan Kaufmann Publishers Inc., San Francisco (2002)Google Scholar
  7. 7.
    Krishnan, V., Torrellas, J.: A Chip-multiprocessor Architecture with Speculative Multithreading. IEEE Transactions on Computers 48(9) (1999)Google Scholar
  8. 8.
    Kulkarni, M., Carribault, P., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Scheduling Strategies for Optimistic Parallel Execution of Irregular Programs. In: Symposium on Parallelism in Algorithms and Architectures, pp. 217–228 (2008)Google Scholar
  9. 9.
    Kulkarni, M., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Optimistic Parallelism Benefits from Data Partitioning. In: International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 36(1), pp. 233–243 (2008)Google Scholar
  10. 10.
    Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic Parallelism Requires Abstractions. In: Conference on Programming Language Design and Implementation, vol. 42(6), pp. 211–222 (2007)Google Scholar
  11. 11.
    Larus, J.R., Hilfinger, P.N.: Detecting Conflicts between Structure Accesses. In: Conference on Programming Language Design and Implementation (1988)Google Scholar
  12. 12.
    Larus, J., Rajwar, R.: Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, San Francisco (2007)Google Scholar
  13. 13.
    Ni, Y., Menon, V.S., Adl-Tabatabai, A.R., Hosking, A.L., Hudson, R.L., Moss, J.E.B., Saha, B., Shpeisman, T.: Open Nesting in Software Transactional Memory. In: 12th Symposium on Principles and Practice of Parallel Programming, pp. 68–78 (2007)Google Scholar
  14. 14.
    Ponnusamy, R., Saltz, J., Choudhary, A.: Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In: Conference on Supercomputing, pp. 361–370 (1993)Google Scholar
  15. 15.
    Rauchwerger, L., Padua, D.: The LRPD Test: Speculative Runtime Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Transactions on Parallel Distributed Systems 10(2), 160–180 (1999)CrossRefGoogle Scholar
  16. 16.
    Sagiv, M., Reps, T., Wilhelm, R.: Parametric Shape Analysis via 3-valued Logic. In: 26th Symposium on Principles of Programming Languages, pp. 105–118 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Martin Burtscher
    • 1
  • Milind Kulkarni
    • 1
  • Dimitrios Prountzos
    • 1
  • Keshav Pingali
    • 1
  1. 1.Center for Grid and Distributed Computing Institute for Computational Engineering and SciencesThe University of Texas at AustinAustin

Personalised recommendations