A Comparison of Locality Transformations for Irregular Codes

  • Hwansoo Han
  • Chau-Wen Tseng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1915)

Abstract

Researchers have proposed several data and computation transformations to improve locality in irregular scientific codes. We ex- perimentally compare their performance and present gpart, a new tech- nique based on hierarchical clustering. Quality partitions are constructed quickly by clustering multiple neighboring nodes with priority on nodes with high degree, and repeating a few passes. Overhead is kept low by clustering multiple nodes in each pass and considering only edges between partitions. Experimental results show gpart matches the performance of more sophisticated partitioning algorithms to with 6%-8%, with a small fraction of the overhead. It is thus useful for optimizing programs whose running times are not known. This research was supported in part by NSF CAREER Development Award

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    I. Al-Furaih and S. Ranka. Memory hierarchy management for iterative graph structures. In Proceedings of the 12th International Parallel Processing Symposium, Orlando, FL, April 1998.Google Scholar
  2. 2.
    M. Berger and S. Bokhari. A partitioning strategy for pdes across multiprocessors. In Proceedings of the 1985 International Conference on Parallel Processing, August 1985.Google Scholar
  3. 3.
    M. Berger and S. Bokhari. A partitioning strategy for non-uniform problems on multiprocessors. IEEE Transactions on Computers, 37(12):570–580, 1987.CrossRefGoogle Scholar
  4. 4.
    S. Chandra and J.R. Larus. Optimizing communication in HPF programs for fine-grain distributed shared memory. In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Las Vegas, NV, June 1997.Google Scholar
  5. 5.
    E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 24th National Conference of the ACM, ACM Publication P-69, Association for Computing Machinery, NY, 1969.Google Scholar
  6. 6.
    R. Das, D. Mavriplis, J. Saltz, S. Gupta, and R. Ponnusamy. The design and implementation of a parallel unstructured Euler solver using software primitives. In Proceedings of the 30th Aerospace Sciences Meeting and Exhibit, Reno, NV, January 1992.Google Scholar
  7. 7.
    R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462–479, September 1994.CrossRefGoogle Scholar
  8. 8.
    C. Ding and K. Kennedy. Improving cache performance of dynamic applications with computation and data layout transformations. In Proceedings of the SIG-PLAN ’99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.Google Scholar
  9. 9.
    C. Ding and K. Kennedy. Inter-array data regrouping. In Proceedings of the Twelfth Workshop on Languages and Compilers for Parallel Computing, San Diego, August 1999.Google Scholar
  10. 10.
    H. Han and C.-W. Tseng. Improving compiler and run-time support for adap-tive irregular codes. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Paris, France, October 1998.Google Scholar
  11. 11.
    H. Han and C.-W. Tseng. Improving locality for adaptive irregular scientific codes. Technical Report CS-TR-4039, Dept. of Computer Science, University of Maryland at College Park, September 1999.Google Scholar
  12. 12.
    H. Han and C.-W. Tseng. Improving locality for adaptive irregular codes. In Proceedings of the Thirteenth Workshop on Languages and Compilers for Parallel Computing, White Plains, NY, August 2000.Google Scholar
  13. 13.
    R. v. Hanxleden. Handling irregular problems with Fortran D—A preliminary report. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, December 1993.Google Scholar
  14. 14.
    R. v. Hanxleden and K. Kennedy. Give-N-Take—A balanced code placement framework. In Proceedings of the SIGPLAN ’94 Conference on Programming Language Design and Implementation, Orlando, FL, June 1994.Google Scholar
  15. 15.
    Y. Hu, S. L. Johnsson, and S.-H. Teng. High Performance Fortran for highly irregular problems. In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Las Vegas, NV, June 1997.Google Scholar
  16. 16.
    Y.-S. Hwang, B. Moon, S. Sharma, R. Ponnusamy, R. Das, and J. Saltz. Runtime and language support for compiling adaptive irregular programs on distributed memory machines. Software Practice and Experience, 25(6):597–621, June 1995.CrossRefGoogle Scholar
  17. 17.
    E. Im and K. Yelick. Model-based memory hierarchy optimizations for sparse matrices. In Proceedings of the 1998 Workshop on Profile and Feedback-Directed Compilation, Paris, France, October 1998.Google Scholar
  18. 18.
    M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. Improving locality using loop and data transformations in an integrated framework. In Proceedings of the 31th IEEE/ACM International Symposium on Microarchitecture, Dallas, TX, November 1998.Google Scholar
  19. 19.
    G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. In Proceedings of the 24th International Conference on Parallel Processing, Oconomowoc, WI, August 1995.Google Scholar
  20. 20.
    G. Karypis and V. Kumar. Multi-level k-way hypergraph partitioning. In Proceedings of SC’98, Orlando, FL, November 1998.Google Scholar
  21. 21.
    A. Lain and P. Banerjee. Exploiting spatial regularity in irregular iterative applications. In Proceedings of the 9th International Parallel Processing Symposium, Santa Barbara, CA, April 1995.Google Scholar
  22. 22.
    Y. Lin and D. Padua. On the automatic parallelization of sparse and irregular Fortran programs. In Proceedings of the 4th Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, Pittsburgh, PA, May 1998.Google Scholar
  23. 23.
    Y. Lin and D. Padua. Compiler analysis of irregular memory accesses. In Proceedings of the SIGPLAN ’00 Conference on Programming Language Design and Implementation, Vancouver, Canada, June 2000.Google Scholar
  24. 24.
    W. Liu and A. Sherman. Comparative analysis of the cuthill-mckee and the reverse cuthill-mckee ordering algorithms for sparse matrices. SIAM Journal on Numerical Analysis, 13(2):198–213, April 1976.MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    B. Lu and J. Mellor-Crummey. Compiler optimization of implicit reductions for distributed memory multiprocessors. In Proceedings of the 12th International Par-allel Processing Symposium, Orlando, FL, April 1998.Google Scholar
  26. 26.
    K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, July 1996.CrossRefGoogle Scholar
  27. 27.
    J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.Google Scholar
  28. 28.
    N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Newport Beach, LA, October 1999.Google Scholar
  29. 29.
    M. Rinard and P. Diniz. Commutativity analysis: A new analysis technique for par-allelizing compilers. ACM Transactions on Programming Languages and Systems, 19(6):942–992, November 1997.CrossRefGoogle Scholar
  30. 30.
    G. Rivera and C.-W. Tseng. Data transformations for eliminating conflict misses. In tiProceedings of the SIGPLAN ’98 Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998.Google Scholar
  31. 31.
    H. Simon. Partitioning of unstructured mesh problems for parallel processing. In Proceedings of the Conference on Parallel Methods on Large Scale Structural Analysis and Physics Applications. Permagon Press, 1991.Google Scholar
  32. 32.
    J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. Hennessy. Load balancing and data locality in adaptive hierarchical n-body methods: Barnes-hut, fast multipole, and radiosity. Journal of Parallel and Distributed Computing, June 1995.Google Scholar
  33. 33.
    M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN ’91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.Google Scholar
  34. 34.
    H. Yu and L. Rauchwerger. Adaptive reduction parallelization techniques. In Proceedings of the 2000 ACM International Conference on Supercomputing, Santa Fe, NM, May 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Hwansoo Han
    • 1
  • Chau-Wen Tseng
    • 1
  1. 1.Department of Computer ScienceUniversity of MarylandCollege Park

Personalised recommendations