Autotuning of Adaptive Mesh Refinement PDE Solvers on Shared Memory Architectures

  • Svetlana Nogina
  • Kristof Unterweger
  • Tobias Weinzierl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7203)

Abstract

Many multithreaded, grid-based, dynamically adaptive solvers for partial differential equations permanently have to traverse subgrids (patches) of different and changing sizes. The parallel efficiency of this traversal depends on the interplay of the patch size, the architecture used, the operations triggered throughout the traversal, and the grain size, i.e. the size of the subtasks the patch is broken into. We propose an oracle mechanism delivering grain sizes on-the-fly. It takes historical runtime measurements for different patch and grain sizes as well as the traverse’s operations into account, and it yields reasonable speedups. Neither magic configuration settings nor an expensive pre-tuning phase are necessary. It is an autotuning approach.

Keywords

Patch Size Parallel Section Chunk Size Simulation Time Step Parallel Loop 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bilmes, J., Asanovic, K., Chin, C.-W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th International Conference on Supercomputing, pp. 340–347 (1997)Google Scholar
  2. 2.
    Cariño, R.L., Banicescu, I.: Dynamic Scheduling Parallel Loops With Variable Iterate Execution Times. In: 16th International Parallel and Distributed Processing Symposium (IPDPS 2002). IEEE (2002); electonical proceedingsGoogle Scholar
  3. 3.
    Chapman, B., Jost, G., van der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming. MIT Press (2007)Google Scholar
  4. 4.
    Cuenca, J., García, L.-P., Giménez, D.: A proposal for autotuning linear algebra routines on multicore platforms. Procedia CS 1(1), 515–523 (2010)CrossRefGoogle Scholar
  5. 5.
    Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 4:1–4:12. IEEE Press (2008)Google Scholar
  6. 6.
    Eckhardt, W., Weinzierl, T.: A Blocking Strategy on Multicore Architectures for Dynamically Adaptive PDE Solvers. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 567–575. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Gmeiner, B., Gradl, T., Köstler, H., Rüde, U.: Highly parallel geometric multigrid for hierarchical hybrid grids on Blue-Gene/P. Numer. Linear Algebr. (submitted)Google Scholar
  8. 8.
    Hummel, S.F., Schmidt, J., Uma, R.N., Wein, J.: Load-Sharing in Heterogeneous Systems via Weighted Factoring. In: Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 318–328. ACM (1997)Google Scholar
  9. 9.
    Kamil, S., Chan, C., Oliker, L., Shalf, J., Williams, S.: An auto-tuning framework for parallel multicore stencil computations. In: 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, pp. 1–12. IEEE (2010)Google Scholar
  10. 10.
    Parker, S.G.: A component-based architecture for parallel multi-physics pde simulation. FGCS 22(1-2), 204–216 (2006)CrossRefGoogle Scholar
  11. 11.
    Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media (2007)Google Scholar
  12. 12.
    Sutter, H.: The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. Dr. Dobb’s Journal 3(30), 202–210 (2005)Google Scholar
  13. 13.
    Tang, P., Yew, P.: Processor self-scheduling for multiple-nested parallel loops. In: Proceedings of the 1986 International Conference on Parallel Processing, pp. 528–535. IEEE (1986)Google Scholar
  14. 14.
    Vuduc, R.: Automatic assembly of highly tuned code fragments (1998), www.cs.berkeley.edu/~richie/stat242/project
  15. 15.
    Weinzierl, T.: A Framework for Parallel PDE Solvers on Multiscale Adaptive Cartesian Grids. Verlag Dr. Hut (2009)Google Scholar
  16. 16.
    Weinzierl, T., Köppl, T.: A geometric space-time multigrid algorithm for the heat equation. In: NMTMA (accepted)Google Scholar
  17. 17.
    Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 27, 3–35 (2001)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Svetlana Nogina
    • 1
  • Kristof Unterweger
    • 1
  • Tobias Weinzierl
    • 1
  1. 1.Technische Universität MünchenGarchingGermany

Personalised recommendations