Computer Science - Research and Development

, Volume 31, Issue 4, pp 225–234 | Cite as

Energy efficiency of the simulation of three-dimensional coastal ocean circulation on modern commodity and mobile processors

A case study based on the Haswell and Cortex-A15 microarchitectures
  • Markus Geveler
  • Balthasar Reuter
  • Vadym Aizinger
  • Dominik Göddeke
  • Stefan Turek
Special Issue Paper


We analyze energy efficiency of a 3D coastal ocean simulator on Haswell and Cortex-A15 architectures and propose a simple yet effective way to model energy-to-solution on different hardware platforms. The work also demonstrates that using processors from the field of embedded/mobile computing can increase the energy efficiency by 50 %.


Energy efficiency in HPC Coastal ocean simulation Embedded processors ARM cluster Performance modeling 



This work has been supported in part by the German Research Foundation (DFG) through the Priority Program 1648 ‘Software for Exascale Computing’ (Grants TU 102/50-1, GO 1758/2-1), and through the individual Grant AI 117/1. ICARUS hardware is financed by MIWF NRW under the lead of MERCUR.


  1. 1.
    Aizinger V (2011) A geometry independent slope limiter for the discontinuous Galerkin method. In: Computational science and high performance computing, vol IV. Note N Fl Mech Mul D (Springer) 115:207–217. doi: 10.1007/978-3-642-17770-5
  2. 2.
    Aizinger V, Proft J, Dawson C, Pothina D, Negusse S (2013) A three-dimensional discontinuous Galerkin model applied to the baroclinic simulation of Corpus Christi Bay. Ocean Dyn 63(1):89–113. doi: 10.1007/s10236-012-0579-8 CrossRefGoogle Scholar
  3. 3.
    Anzt H, Quintana-Ortí ES (2014) Improving the energy efficiency of sparse linear system solvers on multicore and manycore systems. Philos Trans R Soc A 372(2018). doi: 10.1098/rsta.2013.0279
  4. 4.
    Barker K, Kerbyson D (2005) A performance model and scalability analysis of the HYCOM ocean simulation application. In: Proceedings of the IASTED international conference on parallel and distributed computingGoogle Scholar
  5. 5.
    Benner P, Ezzatti P, Quintana-Ortí E, Remón A (2013) On the impact of optimization on the time-power-energy balance of dense linear algebra factorizations. In: Rea A (ed) Algorithms and architectures for parallel processing. Lecture notes in comput science, vol 8286, pp 3–10. Springer, New York. doi: 10.1007/978-3-319-03889-6_1
  6. 6.
    Castelló A, Duato J, Mayo R, Peña A, Quintana-Ortí E, Roca VVS (2014) On the use of remote GPUs and low-power processors for the acceleration of scientific applications. Energy. In: The 4th international conference on smart grids, green communication and IT energy-aware. Technical report, pp 57–62Google Scholar
  7. 7.
    Cockburn B, Shu CW (1989) TVB Runge–Kutta local projection discontinuous Galerkin finite element method for conservation laws II. General framework. Math Comput 52(186):411–435. doi: 10.1090/S0025-5718-1989-0983311-4 MathSciNetzbMATHGoogle Scholar
  8. 8.
    Cowles GW (2008) Parallelization of the FVCOM coastal ocean model. Int J High Perform Comput 22(2):177–193. doi: 10.1177/1094342007083804 CrossRefGoogle Scholar
  9. 9.
    Dawson C, Aizinger V (2005) A discontinuous Galerkin method for three-dimensional shallow water equations. J Sci Comput 22(1–3):245–267. doi: 10.1007/s10915-004-4139-3 MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Dietrich J, Tanaka S, Westerink J, Dawson C, Luettich JRA, Zijlema M, Holthuijsen L, Smith J, Westerink L, Westerink H (2012) Performance of the unstructured-mesh, SWAN\(+\)ADCIRC model in computing hurricane waves and surge. J Sci Comput 52(2):468–497. doi: 10.1007/s10915-011-9555-6 CrossRefzbMATHGoogle Scholar
  11. 11.
    Feng W, Cameron K, Scogland T, Subraumaniam B (2015) Green500 list.
  12. 12.
    Geveler M, Turek S (2016) Icarus project homepage.
  13. 13.
    Göddeke D, Komatitsch D, Geveler M, Ribbrock D, Rajovic N, Puzovic N, Ramirez A (2013) Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power arm-based cluster. J Comput Phys 237:132–150. doi: 10.1016/ CrossRefGoogle Scholar
  14. 14.
    Hager G, Treibig J, Habich J, Wellein G (2016) Exploring performance and power properties of modern multi-core chips via simple machine models. Concurr Comput Pract Exp 28(2). doi: 10.1002/cpe.3180
  15. 15.
    Intel Corp (2015) Desktop 4th generation Intel Core Processor family. Desktop Intel Pentium Processor family, and Desktop Intel Celeron\(^{\textregistered }\) processor family datasheet volume 1 of 2.
  16. 16.
    Kerbyson DJ, Jones PW (2005) A performance model of the parallel ocean program. Int J High Perform Comput 19(3):261–276. doi: 10.1177/1094342005056114 CrossRefGoogle Scholar
  17. 17.
    Kuzmin D (2010) A vertex-based hierarchical slope limiter for p-adaptive discontinuous Galerkin methods. J Comput Appl Math 233(12):3077–3085. doi: 10.1016/ MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Laros JH, Pedretti K, Kelly SM, Shu W, Ferreira K, Dyke JV, Vaughan C (2012) Energy-efficient high performance computing: measurement and tuning. Springer. doi: 10.1007/978-1-4471-4492-2
  19. 19.
    Lawrence Berkeley National Laboratory (2006) High-performance buildings for high-tech industries: data centers.
  20. 20.
    Malas TM, Hager G, Ltaief H, Keyes DE (2014) Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking. CoRR abs/1410.5561.arXiv:1410.5561Google Scholar
  21. 21.
    Meuer H, Strohmeier E, Dongarra J, Simon H, Meuer M (2015) Top500 list.
  22. 22.
    Nair R, Choi HW, Tufo H (2009) Computational aspects of a scalable high-order discontinuous Galerkin atmospheric dynamical core. Comput Fluids 38(2):309–319. doi: 10.1016/j.compfluid.2008.04.006 MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    NVIDIA Corp (2014) NVIDIA Jetson TK1 development kit—bringing GPU-accelerated computing to embedded systems.
  24. 24.
    Rajovic N, Rico A, Vipond J, Gelado I, Puzovic N, Ramirez A (2013) Experiences with mobile processors for energy efficient HPC. In: Design, automation test in Europe conference exhibition (DATE), pp 464–468. doi: 10.7873/DATE.2013.103
  25. 25.
    Reuter B, Aizinger V, Köstler H (2015) A multi-platform scaling study for an OpenMP parallelization of a discontinuous Galerkin ocean model. Comput Fluids 117:325–335. doi: 10.1016/j.compfluid.2015.05.020 MathSciNetCrossRefGoogle Scholar
  26. 26.
    Ringler T, Petersen M, Higdon RL, Jacobsen D, Jones PW, Maltrud M (2013) A multi-resolution approach to global ocean modeling. Ocean Model 69:211–232. doi: 10.1016/j.ocemod.2013.04.010 CrossRefGoogle Scholar
  27. 27.
    Sannino G, Artale V, Lanucara P (2001) An hybrid OpenMP-MPI parallelization of the Princeton ocean model. In: Proceedings of the international conference ParCo, pp 222–229. doi: 10.1142/9781860949630_0028
  28. 28.
    Sarkar V, Harrod W, Snavely AE (2009) Software challenges in extreme scale systems. J Phys Conf Ser 180(1):012045.
  29. 29.
    Schäppi B, Przywara B, Bellosa F, Bogner T, Weeren S, Harrison R, Anglade A (2009) Energy efficient servers in Europe—energy consumption, saving potentials and measures to support market development for energy efficient solutions. In: Technical report, Intelligent Energy Europe ProjectGoogle Scholar
  30. 30.
    Scogland TR, Steffen CP, Wilde T, Parent F, Coghlan S, Bates N, Feng Wc, Strohmaier E (2014) A power-measurement methodology for large-scale, high-performance computing. In: Proceedings of the 5th ACM/SPEC international conference on performance engineering, ICPE ’14. ACM, New York, pp 149–159. doi: 10.1145/2568088.2576795
  31. 31.
    Tanaka S, Bunya S, Westerink JJ, Dawson C, Luettich RA (2011) Scalability of an unstructured grid continuous Galerkin based hurricane storm surge model. J Sci Comput 46(3):329–358. doi: 10.1007/s10915-010-9402-1 MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Treibig J, Dolz MF, Guillen C, Navarrete C, Knobloch M, Rountree B (2014) Tools and methods for measuring and tuning the energy efficiency of HPC systems. J Sci Program 22:273–283. doi: 10.3233/SPR-140393 Google Scholar
  33. 33.
    Umlauf L, Burchard H (2003) A generic length-scale equation for geophysical turbulence models. J Mar Res 61(2):235–265. doi: 10.1357/002224003322005087 CrossRefGoogle Scholar
  34. 34.
    Wallcraft A, Hurlburt H, Townsend T, Chassignet E (2005) 1/25 degree Atlantic Ocean simulation using HYCOM. In: Users group conference, pp 222–225. doi: 10.1109/DODUGC.2005.1
  35. 35.
    Wang G, Qiao F, Xia C (2010) Parallelization of a coupled wave-circulation model and its application. Ocean Dyn 60(2):331–339. doi: 10.1007/s10236-010-0274-6 CrossRefGoogle Scholar
  36. 36.
    Wittmann M, Hager G, Zeiser T, Wellein G (2013) An analysis of energy-optimized lattice-Boltzmann CFD simulations from the chip to the highly parallel level. CoRR abs/1304.7664. arXiv:1304.7664
  37. 37.
    Worley P, Levesque J (2004) The performance evolution of the parallel ocean program on the Cray X1. In: Proceedings of the 46th Cray User Group conference, pp 17–21Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Markus Geveler
    • 2
  • Balthasar Reuter
    • 1
  • Vadym Aizinger
    • 1
  • Dominik Göddeke
    • 3
  • Stefan Turek
    • 2
  1. 1.Applied Mathematics IFriedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany
  2. 2.Chair for Applied MathematicsTechnische Universität DortmundDortmundGermany
  3. 3.Institute for Applied Analysis and Numerical Simulation (IANS)University of StuttgartStuttgartGermany

Personalised recommendations