Evaluation of Hierarchical Mesh Reorderings

  • Michelle Mills Strout
  • Nissa Osheim
  • Dave Rostron
  • Paul D. Hovland
  • Alex Pothen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5544)

Abstract

Irregular and sparse scientific computing programs frequently experience performance losses due to inefficient use of the memory system in most machines. Previous work has shown that, for a graph model, performing a partitioning and then reordering within each partition improves performance. More recent work has shown that reordering heuristics based on a hypergraph model result in better reorderings than those based on a graph model. This paper studies the effects of hierarchical reordering strategies within the hypergraph model. In our experiments, the reorderings are applied to the nodes and elements of tetrahedral meshes, which are inputs to a mesh optimization application. We show that cache performance degrades over time with consecutive packing, but not with breadth-first ordering, and that hierarchical reorderings involving hypergraph partitioning followed by consecutive packing or breadth-first orderings in each partition improve overall execution time.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gropp, W.D., Kaushik, D.K., Keyes, D.E., Smith, B.F.: Performance modeling and tuning of an unstructured mesh CFD application. In: Proceedings of the ACM/IEEE Conference on Supercomputing (2000)Google Scholar
  2. 2.
    Han, H., Tseng, C.: A comparison of locality transformations for irregular codes. In: Dwarkadas, S. (ed.) LCR 2000. LNCS, vol. 1915, pp. 70–84. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Oliker, L., Li, X., Husbands, P., Biswas, R.: Effects of ordering strategies and programming paradigms on sparse matrix computations. SIAM Review 44(3), 373–393 (2002)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Martin, M.J., Singh, D.E., Tourino, J.: Exploiting locality in the run-time parallelization of irregular loops. In: International Conference on Parallel Processing (ICPP), August 18-21 (2002)Google Scholar
  5. 5.
    Al-Furaih, I., Ranka, S.: Memory hierarchy management for iterative graph structures. In: Proceedings of the 1st Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing, March 30–April 3, 1998, pp. 298–302 (1998)Google Scholar
  6. 6.
    Mitchell, N., Carter, L., Ferrante, J.: Localizing non-affine array references. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 192–202 (October 1999)Google Scholar
  7. 7.
    Ding, C., Kennedy, K.: Improving cache performance in dynamic applications through data and computation reorganization at run time. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 229–241 (May 1999)Google Scholar
  8. 8.
    Mellor-Crummey, J., Whalley, D., Kennedy, K.: Improving memory hierarchy performance for irregular applications using data and computation reorderings. International Journal of Parallel Programming 29(3), 217–247 (2001)MATHCrossRefGoogle Scholar
  9. 9.
    Vuduc, R., Demmel, J.W., Yelick, K.A., Kamil, S., Nishtala, R., Lee, B.: Performance optimizations and bounds for sparse matrix-vector multiply. In: Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 1–35 (2002)Google Scholar
  10. 10.
    Strout, M.M., Hovland, P.D.: Metrics and models for reordering transformations. In: Proceedings of the The Second ACM SIGPLAN Workshop on Memory System Performance (MSP), pp. 23–34 (June 2004)Google Scholar
  11. 11.
    Munson, T.S., Hovland, P.D.: The FeasNewt benchmark. In: The IEEE International Symposium on Workload Characterization (IISWC 2005) (October 2005)Google Scholar
  12. 12.
    Catalyurek, U., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems 10(7), 673–693 (1999)CrossRefGoogle Scholar
  13. 13.
    Pingali, V.K., McKee, S.A., Hseih, W.C., Carter, J.B.: Computation regrouping: restructuring programs for temporal data cache locality. In: Proceedings of the 16th International Conference on Supercomputing, pp. 252–261 (2002)Google Scholar
  14. 14.
    Si, H.: TetGen, a quality tetrahedral mesh generator and three-dimensional delaunay triangulator, http://tetgen.berlios.de/
  15. 15.
    INRIA Gamma team research database, http://wwwc.inria.fr/gamma/gamma.php
  16. 16.
    CUBIT, Geometry and Mesh Generation Toolkit, http://cubit.sandia.gov/
  17. 17.
    O’Hallaron, D.R., Shewchuk, J.R.: CMU Unstructured Mesh Suite, http://www.cs.cmu.edu/~quake/meshsuite.html
  18. 18.
    BioMesh Project, an all-hex meshing strategy for bifurcation geometries, http://www.unix.mcs.anl.gov/~csverma/BioMesh/biomesh.html
  19. 19.
    London, K., Dongarra, J., Moore, S., Mucci, P., Seymour, K., Spencer, T.: End-user tools for application performance analysis using hardware counters. In: International Conference on Parallel and Distributed Computing Systems (August 2001)Google Scholar
  20. 20.
    Ou, C., Gunwani, M., Ranka, S.: Architecture-independent locality-improving transformations of computational graphs embedded in k-dimensions. In: Proceedings of the International Conference on Supercomputing (1995)Google Scholar
  21. 21.
    Taylor, V.E.: Sparse matrix computations: implications for cache designs. In: Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 598–607 (1992)Google Scholar
  22. 22.
    Toledo, S.: Improving the memory-system performance of sparse-matrix vector multiplication. IBM Journal of Research and Development 41(6), 711–725 (1997)CrossRefGoogle Scholar
  23. 23.
    Han, H., Rivera, G., Tseng, C.W.: Software support for improving locality in scientific codes. In: 8th Workshop on Compilers for Parallel Computers (CPC 2000), Aussois, France (January 2000)Google Scholar
  24. 24.
    Badawy, A.H.A., Aggarwal, A., Yeung, D., Tseng, C.W.: Evaluating the impact of memory system performance on software prefetching and locality optimizations. In: International Conference on Supercomputing, pp. 486–500 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Michelle Mills Strout
    • 1
  • Nissa Osheim
    • 1
  • Dave Rostron
    • 1
  • Paul D. Hovland
    • 2
  • Alex Pothen
    • 3
  1. 1.Colorado State UniversityFort CollinsUSA
  2. 2.Argonne National LaboratoryArgonneUSA
  3. 3.Purdue UniversityWest LafayetteUSA

Personalised recommendations