Characterizing the Performance-Energy Tradeoff of Small ARM Cores in HPC Computation

  • Michael A. Laurenzano
  • Ananta Tiwari
  • Adam Jundt
  • Joshua Peraza
  • William A. WardJr.
  • Roy Campbell
  • Laura Carrington
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8632)

Abstract

Deploying large numbers of small, low-power cores has been gaining traction recently as a system design strategy in high performance computing (HPC). The ARM platform that dominates the embedded and mobile computing segments is now being considered as an alternative to high-end x86 processors that largely dominate HPC because peak performance per watt may be substantially improved using off-the-shelf commodity processors.

In this work we methodically characterize the performance and energy of HPC computations drawn from a number of problem domains on current ARM and x86 processors. Unsurprisingly, we find that the performance, energy and energy-delay product of applications running on these platforms varies significantly across problem types and inputs. Using static program analysis we further show that this variation can be explained largely in terms of the capabilities of two processor subsystems: single instruction multiple data (SIMD)/floating point and the cache/memory hierarchy; and that static analysis of this kind is sufficient to predict which platform is best for a particular application/input pair. In the context of these findings, we evaluate how some of the key architectural changes being made for upcoming 64-bit ARM platforms may impact HPC application performance.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    CORAL Benchmark Codes (2013), https://asc.llnl.gov/CORAL-benchmarks/
  2. 2.
    The Top 500 list (November 2013), http://www.top500.org
  3. 3.
  4. 4.
    Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: A view from berkeley. Technical report, Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)Google Scholar
  5. 5.
    Attig, N., Gibbon, P., Lippert, T.: Trends in supercomputing: The european path to exascale. Computer Physics Communications 182(9), 2041–2046 (2011)CrossRefGoogle Scholar
  6. 6.
    Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The nas parallel benchmarks summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, 1991, pp. 158–165. IEEE (1991)Google Scholar
  7. 7.
    Blem, E.R., Menon, J., Sankaralingam, K.: Power struggles: Revisiting the risc vs. cisc debate on contemporary arm and x86 architectures. In: HPCA, pp. 1–12 (2013)Google Scholar
  8. 8.
    Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. International Journal of High Performance Computing Applications 21(4), 457–466 (2007)CrossRefGoogle Scholar
  9. 9.
    Carrington, L., Laurenzano, M., Snavely, A., Campbell, R.L., Davis, L.P.: How well can simple metrics represent the performance of hpc applications? In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC 2005, p. 48. IEEE Computer Society, Washington, DC (2005)Google Scholar
  10. 10.
    Carrington, L., Snavely, A., Gao, X., Wolter, N.: A performance prediction framework for scientific applications. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J. J., Zomaya, A.Y. (eds.) ICCS 2003, Part III. LNCS, vol. 2659, pp. 926–935. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Cordery, M., Austin, B., Wassermann, H., Daley, C., Wright, N., Hammond, S., Doerfler, D.: Analysis of cray xc30 performance using trinity-nersc-8 benchmarks and comparison with cray xe6 and ibm bg/q (2013)Google Scholar
  12. 12.
    Analytics, E.P.: EPAX Toolkit: Binary Analysis for ARM (2014), http://epaxtoolkit.com/
  13. 13.
    Fürlinger, K., Klausecker, C., Kranzlmüller, D.: Towards energy efficient parallel computing on consumer electronic devices. In: Kranzlmüller, D., Toja, A.M. (eds.) ICT-GLOW 2011. LNCS, vol. 6868, pp. 1–9. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Goodacre, J.: Technology preview: The armv8 architecture. White Paper (November 2011)Google Scholar
  15. 15.
    Goodacre, J., Cambridge, A.: The evolution of the arm architecture towards big data and the data-centre. In: Proceedings of the 8th Workshop on Virtualization in High-Performance Cloud Computing, p. 4. ACM (2013)Google Scholar
  16. 16.
    Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving performance via mini-applications. Sandia National Laboratories, Tech. Rep. (2009)Google Scholar
  17. 17.
    Hölzle, U.: Brawny cores still beat wimpy cores, most of the time. IEEE Micro 30(4) (2010)Google Scholar
  18. 18.
    Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (CDROM), Supercomputing 2001, pp. 37–37. ACM, New York (2001)CrossRefGoogle Scholar
  19. 19.
    Kerbyson, D.J., Jones, P.W.: A performance model of the parallel ocean program. Int. J. High Perform. Comput. Appl. 19(3), 261–276 (2005)CrossRefGoogle Scholar
  20. 20.
    Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., et al.: Exascale computing study: Technology challenges in achieving exascale systems (2008)Google Scholar
  21. 21.
    Laurenzano, M.A., Meswani, M., Carrington, L., Snavely, A., Tikir, M.M., Poole, S.: Reducing energy usage with memory and computation-aware dynamic frequency scaling. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 79–90. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  22. 22.
    Laurenzano, M.A., Tikir, M.M., Carrington, L., Snavely, A.: Pebil: Efficient static binary instrumentation for linux. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS 2010, pp. 175–183. IEEE (2010)Google Scholar
  23. 23.
    Ou, Z., Pang, B., Deng, Y., Nurminen, J.K., Yla-Jaaski, A., Hui, P.: Energy- and cost-efficiency analysis of arm-based clusters. In: Symposium on Cluster, Cloud and Grid Computing, CCGRID (2012)Google Scholar
  24. 24.
    Padoin, E.L., de Oliveira, D.A., Velho, P., Navaux, P.O., Videau, B., Degomme, A., Mehaut, J.-F.: Scalability and energy efficiency of hpc cluster with arm mpsocGoogle Scholar
  25. 25.
    Pouchet, L.-N.: PolyBench: The Polyhedral Benchmark suite (2012), http://www.cse.ohio-state.edu/~pouchet/software/polybench/
  26. 26.
    Rajovic, N., Rico, A., Vipond, J., Gelado, I., Puzovik, N., Ramirez, A.: Experiences with mobile processors for energy efficient hpc. In: Design, Automation and Test in Europe Conference and Exhibition, DATE (2013)Google Scholar
  27. 27.
    Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 1–25. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  28. 28.
    Sharkawi, S., DeSota, D., Panda, R., Stevens, S., Taylor, V., Wu, X.: Swapp: A framework for performance projections of hpc applications using benchmarks. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW 2012, pp. 1722–1731. IEEE Computer Society, Washington, DC (2012)Google Scholar
  29. 29.
    Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., Purkayastha, A.: A framework for performance modeling and prediction. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, Supercomputing 2002, pp. 1–17. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  30. 30.
    Snir, M., Gropp, W., Kogge, P.: Exascale research: Preparing for the post–moore era (2011)Google Scholar
  31. 31.
    Vogt, W.P., Johnson, R.B.: Dictionary of statistics & methodology: A nontechnical guide for the social sciences. Sage (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Michael A. Laurenzano
    • 1
    • 2
  • Ananta Tiwari
    • 1
    • 3
  • Adam Jundt
    • 1
  • Joshua Peraza
    • 1
  • William A. WardJr.
    • 4
  • Roy Campbell
    • 4
  • Laura Carrington
    • 1
    • 3
  1. 1.EP AnalyticsUSA
  2. 2.Dept. of Computer Science and EngineeringUniversity of MichiganUSA
  3. 3.Performance Modeling and Characterization Lab.San Diego Supercomputer CenterUSA
  4. 4.U.S. Dept. of DefenseHigh Performance Computing Modernization ProgramUSA

Personalised recommendations