Performance Optimization of RK Methods Using Block-Based Pipelining

Korch, Matthias; Rauber, Thomas; Rünger, Gudula

doi:10.1007/978-1-4615-0361-3_3

Matthias Korch⁶,
Thomas Rauber⁶ &
Gudula Rünger⁷

551 Accesses

Abstract

The efficiency of modern microprocessors is extremely sensitive towards the structure and memory access pattern of programs to be executed. This is caused by memory hierarchies which were introduced to reduce average memory access times. In this paper, we consider embedded Runge-Kutta (RK) methods for the solution of ordinary differential equations arising from space discretization problems for partial differential equations and study their efficient implementation on modern microprocessors. Different program variants with different execution orders and storage schemes are investigated. In particular, we explore how the potential parallelism in the stage vector computation can be exploited in a pipelining approach in order to improve the locality behavior of the RK implementations. Experiments show that this results in efficiency improvements on several recent processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multicore Platforms: Processors, Communication and Memories

Energy Efficient Stencil Computations on the Low-Power Manycore MPPA-256 Processor

The Impact of Parallel Programming Interfaces on Energy

References

E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, J. Dongarra, J.Du Croz, A. Greenbaum, S. Hammarlin, A. McKenney, and D. Sorensen. LAPACK Users’ Guide, Third Edition. SIAM, 1999.
Book Google Scholar
R. Berrendorf and B. Mohr. PCL — The Performance Counter Library, Version 2.0. Research Centre Jülich, September 2000.
Google Scholar
J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In 11th ACM Int. Conf. on Supercomputing, 1997.
Google Scholar
R. W. Brankin, I. Gladwell, and L. F. Shampine. RKSUITE release 1.0, 1991.
Google Scholar
J. Choi, J. J. Dongarra, L. S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley. Design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines. Scientific Programming, 5:173–184, 1996.
Google Scholar
W.H. Enright, D.J. Higham, B. Owren, and Ph.W. Sharp. A survey of the explicit Runge-Kutta method. Technical Report 94–291, University of Toronto, Department of Computer Science, 1995.
Google Scholar
E. Fehlberg. Classical fifth-, sixth-, seventh- and eighth order Runge-Kutta formulas with step size control. Computing, 4:93–106, 1969.
Article MathSciNet MATH Google Scholar
K. S. Gatlin and L. Carter. Architecture-cognizant divide and conquer algorithms. In Proc. of Supercomputing’99 Conference, 1999.
Google Scholar
S. Goedecker and A. Hoisie. Performance Optimization of Numerically Intensive Codes. SIAM, 2001.
Book MATH Google Scholar
E. Hairer, S. P. Nørsett, and G. Wanner. Solving Ordinary Differential Equations I: Nonstiff Problems. Springer-Verlag, Berlin, 1993.
MATH Google Scholar
P. J. Prince and J. R. Dormand. High order embedded Runge-Kutta formulae. J. Comp. Appl. Math., 7(l):67–75, 1981.
Article MathSciNet MATH Google Scholar
T. Rauber and G. Rünger. Parallel Execution of Embedded and Iterated Runge-Kutta Methods. Concurrency: Practice and Experience, 11 (7):367–385, 1999.
Article Google Scholar
T. Rauber and G. Rünger. Optimizing Locality for ODE Solvers. In Proceedings of the 15th ACM International Conference on Supercomputing, pages 123–132. ACM Press, 2001.
Google Scholar
L. Stals and U. Rüde. Data local iterative methods for the efficient solution of partial differential equations. In Proc. of Computational Techniques and Applications, 1997.
Google Scholar
C. Weiß, W. Karl, M. Kowarschik, and U. Rüde. Memory characteristics of iterative methods. In Proceedings of the ACM/IEEE SC99 Conference, Portland, Oregon, November 1999.
Google Scholar
R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. Technical report, University of Tennessee, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Physics, University of Bayreuth, Bayreuth, Germany
Matthias Korch & Thomas Rauber
Department of Computer Science, Technical University of Chemnitz, Chemnitz, Germany
Gudula Rünger

Authors

Matthias Korch
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Rauber
View author publications
You can also search for this author in PubMed Google Scholar
Gudula Rünger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Westminster, UK
Vladimir Getov
Technical University Munich, Germany
Michael Gerndt
Los Alamos National Laboratory, USA
Adolfy Hoisie
University of Oregon-Eugene, USA
Allen Malony
University of Wisconsin-Madison, USA
Barton Miller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Korch, M., Rauber, T., Rünger, G. (2004). Performance Optimization of RK Methods Using Block-Based Pipelining. In: Getov, V., Gerndt, M., Hoisie, A., Malony, A., Miller, B. (eds) Performance Analysis and Grid Computing. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0361-3_3

Download citation

DOI: https://doi.org/10.1007/978-1-4615-0361-3_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5038-5
Online ISBN: 978-1-4615-0361-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Performance Optimization of RK Methods Using Block-Based Pipelining

Abstract

Access this chapter

Preview

Similar content being viewed by others

Multicore Platforms: Processors, Communication and Memories

Energy Efficient Stencil Computations on the Low-Power Manycore MPPA-256 Processor

The Impact of Parallel Programming Interfaces on Energy

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Performance Optimization of RK Methods Using Block-Based Pipelining

Abstract

Access this chapter

Preview

Similar content being viewed by others

Multicore Platforms: Processors, Communication and Memories

Energy Efficient Stencil Computations on the Low-Power Manycore MPPA-256 Processor

The Impact of Parallel Programming Interfaces on Energy

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation