Empirical Optimization for a Sparse Linear Solver: A Case Study

  • Yoon-Ju Lee
  • Pedro C. Diniz
  • Mary W. Hall
  • Robert Lucas
Article

Abstract

This paper describes initial experiences with semi-automated performance tuning of a sparse linear solver in LS-DYNA, a large, widely used engineering application. Through a collection of tools supporting empirical optimization, we alleviate the burden of performance tuning for mapping today’s sophisticated engineering software to increasingly complex hardware platforms. We describe a tool that automatically isolates code segments to create benchmark subsets for the purposes of performance tuning. We present a collection of automatically generated empirical results that demonstrate the sensitivity of the application’s performance to optimization parameters. Through this case study, we demonstrate the importance of developing automatic performance tuning support for performance-sensitive applications.

Keywords

Memory hierarchy optimization performance tuning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    LS-DYNA User’s Manual V. 960, Livermore Software Technology Corporation, http://www.lstc.com (March 2001)Google Scholar
  2. 2.
    Ashcraft C., Lucas R.F A Stackless Multifrontal Method, in Proc. 10th SIAM Conference on Parallel Processing for Scientific Computing (March 2001)Google Scholar
  3. 3.
    Baradaran N., Chame J., Chen C., Diniz P., Hall M., Lee Y., Liu B., Lucas R., ECO: An Empirical-based Compilation and Optimization System, in Proc. of the Workshop on Next Generation Software, held in conjunction with IPDPS’03 (April 2003)Google Scholar
  4. 4.
    Chen C., Chame J., Hall M., Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy, in Int. Symposium on Code Generation and Optimization (CGO’05) (March, 2005)Google Scholar
  5. 5.
    Diniz P., Liu B. Selector: An Effective Technique for Adaptive Computing, in Proc. of the 15th Workshop on Languages and Compilers for Parallel Computing (LCPC’02) (July, 2002)Google Scholar
  6. 6.
    Lee Y., Hall M. A Code Isolator: Isolating Code Fragments from Large programs, in Proc. of the 17th Workshop on Languages and Compilers for Parallel Computing (LCPC’04) (September, 2004)Google Scholar
  7. 7.
    Vetter J.S., Worley P. Asserting Performance Expectations, in Proc. of Supercomputing’02 (November, 2002)Google Scholar
  8. 8.
    Diniz P., Lee Y., Hall M., and Lucas R., A Case Study Using Empirical Optimization for a Large, Engineering Application, in Proc. of the Workshop on Next Generation Software, held in Conjunction with IPDPS’04 (April, 2003)Google Scholar
  9. 9.
    Hall M., Amarasinghe S., Murphy B., Liao S., and Lam M., and M Lam, Interprocedural Parallelization Analysis in SUIF, in ACM Trans. on Programming Languages and Systems (2005)Google Scholar
  10. 10.
    MIPSpro C and C++ Pragmas, Document Number 007-3587-003, 1998, 1999 Silicon Graphics, IncGoogle Scholar
  11. 11.
    Carr, S., Kennedy, K. July 1994Improving the Ratio of Memory Operations to Floating-Point Operations in Loopsin ACM Trans. on Programming Languages and Systems (TOPLAS)15400462Google Scholar
  12. 12.
    London K., Dongarra J., Moore S., P Mucci, Seymour K., and T Spencer, End-user Tools for Application Performance Analysis, Using Hardware Counters, Intl. Conference on Parallel and Distributed Computing Systems (August, 2001)Google Scholar
  13. 13.
    C. Whaley and Dongarra J., Automatically tuned linear algebra software, in. Proc. of Super-computing ’98 (1998)Google Scholar
  14. 14.
    Bilmes J., Asanovic K., C.-W. Chen, and Demmel J., Optimizing Matrix Multiply using PHiPAC: Portable High-Performance ANSI-C Coding Methodology, in Proc. of the ACM International Conference on Supercomputing ’97 (1997)Google Scholar
  15. 15.
    D. Mirkovic and Johnsson SL., Automatic Performance Tuning in the UHFFT Library, in Proc. of the International conference on Computational Science (ICCS’01) (May, 2001)Google Scholar
  16. 16.
    Frigo M., A Fast Fourier Transform Compiler, in Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’99) (June, 1999)Google Scholar
  17. 17.
    Xiong J., Johnson J., Johnson R., and Padua D., SPL: A Language and Compiler for DSP Algorithms, in Proc. of the ACM Conference on Programming Language Design and Implementation (PLDI’01) (June, 2001)Google Scholar
  18. 18.
    M. Wolf and Lam M., A Data Locality Optimization Algorithm, in Proc. of the 1991 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’91) (June, 1991)Google Scholar
  19. 19.
    Wolfe M., More iteration space tiling, in Proc. of Supercomputing ’89 (November, 1989)Google Scholar
  20. 20.
    J. Chame and Moon S., A Title Selection Algorithm for Data Locality and Cache Interference, in Proc. of the 1999 ACM International Conference on Supercomputing’ 99 (June, 1999)Google Scholar
  21. 21.
    S. Coleman and McKinley K., Tile Size Selection Using Cache Organization and Data Layout, in Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’95) (June, 1995)Google Scholar
  22. 22.
    G. Rivera and C.-Tseng W., Data Transformations for Eliminating Conflict Misses, in Proc. of the ACM Conference on Programming Language Design and Implementation (PLDI’98) (June, 1998)Google Scholar
  23. 23.
    Lam M., Rothberg E., and Wolf M., The Cache Performance and Optimization of Blocked Algorithms, in Proc. of the 4th International conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’91) (April, 1991)Google Scholar
  24. 24.
    Chatterjee S., Parker E., Hanlon PJ., and Lebeck AR., Exact Analysis of the Cache Behavior of Nested Loops, in Proc. of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’01) (June, 2001)Google Scholar
  25. 25.
    Ghosh S., Martonosi M., and Malik S., Precise Miss Analysis for Program Transformations with Caches of Arbitrary Associativity, in Proc. of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’98) (October, 1998)Google Scholar
  26. 26.
    Temam O., Granston E., and Jalby W., To Copy or not to Copy: A Compile-time Technique for Assessing When Data Copying Should be Used to Eliminate Cache Conflicts, in Proc. of Supercomputing ’93 (November, 1993)Google Scholar
  27. 27.
    M. Voss and Eigenmann R., High-Level Adaptive Program Optimization with ADAPT, in Proc. of the ACM SIGPLAN Conference on Principles and Practice of Parallel Processing (PPoPP’01) (June, 2001)Google Scholar
  28. 28.
    Adve V., Lam V., and Ensink B., Language and Compiler Support for Adaptive Distributed Applications, in Proc. of the ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems (OM’01) (June, 2001)Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  • Yoon-Ju Lee
    • 1
  • Pedro C. Diniz
    • 1
  • Mary W. Hall
    • 1
  • Robert Lucas
    • 1
  1. 1.Information Sciences InstituteUniversity of Southern CaliforniaMarina del ReyUSA

Personalised recommendations