Performance of High-Accuracy PDE Solvers on a Self-Optimizing NUMA Architecture

  • Sverker Holmgren
  • Dan Wallin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2150)


High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communication. Performance analysis of a non-trivial kernel representing a PDE solution algorithm has been carried out on a Sun WildFire computer. Here, different architecture, system and programming models can be studied. The WildFire system uses self-optimization techniques such as data migration and replication to change the placement of data at runtime. If the data placement is not optimal, the initial performance is degraded. However, after a few iterations the page migration daemon is able to modify the placement of data. The performance is improved, and equals what is achieved if the data is optimally placed at the start of the execution using hand tuning. The speedup for the PDE solution kernel is surprisingly good.


Remote Access Data Placement Remote Memory Speedup Curve Cache Coherence Protocol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bircsak J. et al., Extending OpenMP for NUMA Machines, Proceedings of Supercomputing 2000.Google Scholar
  2. 2.
    Falsafi M., Wood D. A., Reactive NUMA: A Design for Unifying S-COMA with CC-NUMA, Proceedings of ACM/IEEE International Symposium on Computer Architecture 1997.Google Scholar
  3. 3.
    Frigo M., Johnson S. G., FFTW: An Adaptive Software Architecture for the FFT, 1998 ICASSP proceedings (vol. 3, p. 1381).Google Scholar
  4. 4.
    Hagersten E., Saulsbury A., Landin A, Simple COMA Node Implementations, Proceedings of Hawaii International Conference on System Science, 1994.Google Scholar
  5. 5.
    Hagersten E., Koster M., WildFire: A Scalable Path for SMPs, Proceedings of 5th International Symposium on High-Performance Architecture, 1999.Google Scholar
  6. 6.
    Lenoski D. E., Weber W. D., Scalable shared-memory multiprocessing, Morgan Kaufmann publishers, 1995.Google Scholar
  7. 7.
    van Loan C., Computational Frameworks for the Fast Fourier Transform, Society for Industrial and Applied Mathematics, Philadelphia, 1992.zbMATHGoogle Scholar
  8. 8.
    Nikolopoulo D. S. et al., Is Data Distribution Necessary in OpenMP?, Proceedings of Supercomputing 2000.Google Scholar
  9. 9.
    Noordergraaf L., van der Pas R., Performance Experiences on Sun’s WildFire Prototype, Proceedings of Supercomputing 99, 1999.Google Scholar
  10. 10.
    Fornberg F., A Practical Guide to Pseudospectral Methods, Cambridge University Press, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Sverker Holmgren
    • 1
  • Dan Wallin
    • 1
  1. 1.Information Technology, Department of Scientific ComputingUppsala UniversityUppsalaSweden

Personalised recommendations