Advertisement

SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance

  • Guido JuckelandEmail author
  • William Brantley
  • Sunita Chandrasekaran
  • Barbara Chapman
  • Shuai Che
  • Mathew Colgrove
  • Huiyu Feng
  • Alexander Grund
  • Robert Henschel
  • Wen-Mei W. Hwu
  • Huian Li
  • Matthias S. Müller
  • Wolfgang E. Nagel
  • Maxim Perminov
  • Pavel Shelepugin
  • Kevin Skadron
  • John Stratton
  • Alexey Titov
  • Ke Wang
  • Matthijs van Waveren
  • Brian Whitney
  • Sandra Wienke
  • Rengan Xu
  • Kalyan Kumaran
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8966)

Abstract

Hybrid nodes with hardware accelerators are becoming very common in systems today. Users often find it difficult to characterize and understand the performance advantage of such accelerators for their applications. The SPEC High Performance Group (HPG) has developed a set of performance metrics to evaluate the performance and power consumption of accelerators for various science applications. The new benchmark comprises two suites of applications written in OpenCL and OpenACC and measures the performance of accelerators with respect to a reference platform. The first set of published results demonstrate the viability and relevance of the new metrics in comparing accelerator performance. This paper discusses the benchmark suites and selected published results in great detail.

Keywords

SPEC SPEC ACCEL OpenCL OpenACC Energy measurements 

Notes

Acknowledgments

The authors thank Cloyce Spradling for his work on the SPEC harness as well as the SPEC POWER group for their work on enabling the integration of power measurements into other SPEC suites.

SPEC\(^{\textregistered }\), SPEC ACCEL™, SPEC CPU™, SPEC MPI\(^{\textregistered }\), and SPEC OMP\(^{\textregistered }\) are registered trademarks of the Standard Performance Evaluation Corporation (SPEC). AMD is a trademarks of Advanced Micro Devices, Inc. OpenCL is a trademark of Apple, Inc. used by permission by Khronos. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners.

Contributions by the University of Houston were supported in part by NVIDIA and Department of Energy under Award Agreement No. DE-FC02-12ER26099.

References

  1. 1.
    The OpenACC Application Programming Interface, November 2011. http://www.openacc.org/sites/default/files/OpenACC.1.0_0.pdf, version 1.0
  2. 2.
    Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Technical report RNR-94-2007, NASA (1994). http://www.nas.nasa.gov/assets/pdf/techreports/1994/rnr-94-007.pdf
  3. 3.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68(10), 1370–1380 (2008). http://dx.doi.org/10.1016/j.jpdc.2008.05.014 CrossRefGoogle Scholar
  4. 4.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, W.J., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54, October 2009Google Scholar
  5. 5.
    Che, S., Sheaffer, W.J., Boyer, M., Szafaryn, L.G., Wang, L., Skadron, K.: A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), December 2010Google Scholar
  6. 6.
    Corrigan, A., Camelli, F., Lohner, R., Wallin, J.: Running unstructured grid CFD solvers on modern graphics hardware. In: Proceedings of the 19th AIAA Computational Fluid Dynamics Conference, June 2009Google Scholar
  7. 7.
    Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU 2010, pp. 63–74. ACM, New York (2010). http://doi.acm.org/10.1145/1735688.1735702
  8. 8.
    Komatitsch, D., Martin, R.: University of Pau: SEISMIC\_CPML. http://geodynamics.org/cig/software/seismic_cpml/
  9. 9.
    Fix, J., Wilkes, A., Skadron, K.: Accelerating braided B+ tree searches on a GPU with CUDA. In: Proceedings of the 2nd Workshop on Applications for Multi and Many Core Processors: Analysis, Implementation, and Performance (A4MMC), in Conjunction with ISCA, June 2011Google Scholar
  10. 10.
    Hardy, D.J., Stone, J.E., Vandivort, K.L., Gohara, D., Rodrigues, C., Schulten, K.: Fast molecular electrostatics algorithms on GPUs. In: GPU Computing Gems (2010)Google Scholar
  11. 11.
    Herdman, J., Gaudin, W., McIntosh-Smith, S., Boulton, M., Beckingsale, D., Mallinson, A., Jarvis, S.: Accelerating hydrocodes with OpenACC, OpeCL and CUDA. In: 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), pp. 465–471, November 2012Google Scholar
  12. 12.
    Horowitz, C.J., Berry, D.K., Brown, E.F.: Phase separation in the crust of accreting neutron stars. Phys. Rev. E 75, 066101 (2007). http://link.aps.org/doi/10.1103/PhysRevE.75.066101 CrossRefGoogle Scholar
  13. 13.
    Huang, W., Ghosh, S., Velusamy, S., Sankaranarayanan, K., Skadron, K., Stan, M.: HotSpot: a compact thermal modeling methodology for early-stage VLSI design. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 14(5), 501–513 (2006)CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Khronos Group: OpenCL 1.1 API and C Language Specification, June 2011. https://www.khronos.org/registry/cl/specs/opencl-1.1.pdf, revision 44
  16. 16.
    Lange, K.D.: Identifying shades of green: the SPECpower benchmarks. Computer 42, 95–97 (2009)CrossRefGoogle Scholar
  17. 17.
    Lee, S., Eigenmann, R.: OpenMPC: extended OpenMP programming and tuning for GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Computer Society (2010)Google Scholar
  18. 18.
    Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. ACM Sigplan Not. 44(4), 101–110 (2009)CrossRefGoogle Scholar
  19. 19.
    Lee, S., Vetter, J.S.: Early evaluation of directive-based gpu programming models for productive exascale computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 23. IEEE Computer Society Press (2012)Google Scholar
  20. 20.
    Luo, L., Wong, M., Hwu, W.W.: An effective GPU implementation of breadth-first search. In: Proceedings of the 47th Design Automation Conference, pp. 52–55, June 2010Google Scholar
  21. 21.
    Müller, M.S., et al.: SPEC OMP2012 — an application benchmark suite for parallel systems using OpenMP. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 223–236. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-30961-8_17 CrossRefGoogle Scholar
  22. 22.
    Müller, M.S., van Waveren, M., Lieberman, R., Whitney, B., Saito, H., Kumaran, K., Baron, J., Brantley, W.C., Parrott, C., Elken, T., Feng, H., Ponder, C.: SPEC MPI2007 - an application benchmark suite for parallel systems using MPI. Concurr. Comput. Pract. Exper. 22(2), 191–205 (2010). http://dx.doi.org/10.1002/cpe.v22:2 Google Scholar
  23. 23.
    Qian, Y.H., D’Humieres, D., Lallemand, P.: Lattice BGK models for navier-stokes equation. Europhys. Lett. 17, 479–484 (1992)CrossRefzbMATHGoogle Scholar
  24. 24.
    Barrett, R.F., Vaughan, C.T., Heroux, M.A.: MiniGhost: A miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing, Version 1.0. Techical report (2012)Google Scholar
  25. 25.
    Raasch, S.: Leibniz University of Hannover: PALM. http://palm.muk.uni-hannover.de/
  26. 26.
    Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.W., Liang, Z., Sutton, B.P.: Accelerating advanced MRI reconstructions on GPUs. In: International Conference on Computing Frontiers, pp. 261–272 (2008)Google Scholar
  27. 27.
    Stratton, J.A., Rodrigues, C., Sung, I.J., Obeid, N., Chang, L., Liu, G., Hwu, W.W.: Parboil: a revised benchmark suite for scientific and commercial throughput computing. Technical report IMPACT-12-01. University of Illinois at Urbana-Champaign, Urbana, March 2012Google Scholar
  28. 28.
    Szafaryn, L.G., Skadron, K., Saucerman, J.J.: Experiences accelerating MATLAB systems biology applications. In: Proceedings of the Workshop on Biomedicine in Computing: Systems, Architectures, and Circuits (BiC) 2009, in Conjunction with the 36th IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2009Google Scholar
  29. 29.
    Szafaryn, L.G., Gamblin, T., de Supinski, B.R., Skadron, K.: Trellis: portability across architectures with a high-level framework. J. Parallel Distrib. Comput. 73(10), 1400–1413 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Guido Juckeland
    • 1
    • 2
    Email author
  • William Brantley
    • 1
    • 3
  • Sunita Chandrasekaran
    • 1
    • 4
  • Barbara Chapman
    • 1
    • 4
  • Shuai Che
    • 1
    • 3
  • Mathew Colgrove
    • 1
    • 5
  • Huiyu Feng
    • 1
    • 6
  • Alexander Grund
    • 1
    • 2
  • Robert Henschel
    • 1
    • 7
  • Wen-Mei W. Hwu
    • 1
    • 8
  • Huian Li
    • 1
    • 7
  • Matthias S. Müller
    • 1
    • 9
  • Wolfgang E. Nagel
    • 1
    • 2
  • Maxim Perminov
    • 1
    • 10
  • Pavel Shelepugin
    • 1
    • 10
  • Kevin Skadron
    • 1
    • 11
  • John Stratton
    • 1
    • 8
    • 12
  • Alexey Titov
    • 1
    • 3
  • Ke Wang
    • 1
    • 11
  • Matthijs van Waveren
    • 1
    • 13
  • Brian Whitney
    • 1
    • 14
  • Sandra Wienke
    • 1
    • 9
  • Rengan Xu
    • 1
    • 4
  • Kalyan Kumaran
    • 1
    • 15
  1. 1.SPEC High Performance GroupGainesvilleUSA
  2. 2.Center for Information Services and High Performance Computing (ZIH)Technische Universität DresdenDresdenGermany
  3. 3.Advanced Micro Devices, Inc.SunnyvaleUSA
  4. 4.University of HoustonHoustonUSA
  5. 5.NVIDIASanta ClaraUSA
  6. 6.Silicon Graphics International Corp.MilpitasUSA
  7. 7.Indiana UniversityBloomingtonUSA
  8. 8.University of Illinois (UIUC)ChampaignUSA
  9. 9.RWTH Aachen UniversityAachenGermany
  10. 10.IntelNizhny NovgorodRussia
  11. 11.University of VirginiaCharlottesvilleUSA
  12. 12.Colgate UniversityHamiltonUSA
  13. 13.CompilaflowsToulouseFrance
  14. 14.OracleRedwood ShoresUSA
  15. 15.Argonne National LaboratoryLemontUSA

Personalised recommendations