Computing

, Volume 97, Issue 11, pp 1077–1100 | Cite as

ParVec: vectorizing the PARSEC benchmark suite

Article

Abstract

Energy efficiency has recently replaced performance as the main design goal for microprocessors across all market segments. Vectorization, parallelization, specialization and heterogeneity are the key approaches that both academia and industry embrace to make energy efficiency a reality. New architectural proposals are validated against real applications in order to ensure correctness and perform performance and energy evaluations. However, keeping up with architectural changes while maintaining similar workloads and algorithms (for comparative purposes) becomes a real challenge. If benchmarks are optimized for certain features and not for others, architects may end up overestimating the impact of certain techniques and underestimating others. The main contribution of this work is a detailed description and evaluation of ParVec, a vectorized version of the PARSEC benchmark suite (as a case study of a commonly used application set). ParVec can target SSE, AVX and NEON™ SIMD architectures by means of custom vectorization and math libraries. The performance and energy efficiency improvements from vectorization depend greatly on the fraction of code that can be vectorized. Vectorization-friendly benchmarks obtain up to 10\(\times \) energy improvements per thread. The ParVec benchmark suite is available for the research community to serve as a new baseline for evaluation of future computer systems.

Keywords

Benchmarking Vectorization SIMD 

Mathematics Subject Classification

68-02 Computer science Research exposition 

References

  1. 1.
    Bienia, C.: Benchmarking modern multiprocessors. Ph.D. Thesis, Princeton (2011)Google Scholar
  2. 2.
    Borkar S, Chien AA (2011) The Future of Microprocessors. ACM, NYGoogle Scholar
  3. 3.
    Cebrian, J.M., Natvig, L.: Temperature effects on on-chip energy measurements. In: Proceedings of the International Green Computing Conference (IGCC), 2013, pp. 78–87. IEEE Computer Society, Los Alamitos (2012)Google Scholar
  4. 4.
    Cebrian, J.M., Natvig, L., Jahre, M.: Parvec: vectorized PARSEC benchmarks (2014). http://www.ntnu.edu/ime/eecs/parvec
  5. 5.
    Cebrian, J.M., Natvig, L., Meyer, J.C.: Improving energy efficiency through parallelization and vectorization on Intel Core i5 and i7 Processors. In: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (2012)Google Scholar
  6. 6.
    Cebrian, J.M., Jahre, M., Natvig, L.: Optimized hardware for suboptimal software: the case for SIMD-aware benchmarks. In: Proceedings of 2014 IEEE International Symposium on Performance Analysis of Systems and Software ISPASS (2014)Google Scholar
  7. 7.
    Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IEEE (2009)Google Scholar
  8. 8.
    Dennard, R., et al.: Design of Ion-Implanted Mosfet’s With Very Small Physical Dimensions (1974)Google Scholar
  9. 9.
    Donald, J., Martonosi, M.: Techniques for multicore thermal management: classification and new exploration. In: Proc. of the 33rd Int. Symp. on Comp. Arch. (2006)Google Scholar
  10. 10.
    Esmaeilzadeh, H., et al.: Dark silicon and the end of multicore scaling. In: Proc. of the 38th Annual International Symposium on Computer Architecture, ISCA, ACM (2011)Google Scholar
  11. 11.
    Feng, W.C., Lin, H., Scogland, T., Zhang, J.: Opencl and the 13 dwarfs: a work in progress. In: Proc. of the 3rd ACM/SPEC Int. Conf. on Performance Engineering, ICPE ’12, ACM (2012)Google Scholar
  12. 12.
    Ferdman, M., et al.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: 17th Int. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2012)Google Scholar
  13. 13.
    Firasta, N., et al.: White paper: Intel AVX: new frontiers in performance improvements and energy efficiency (2008)Google Scholar
  14. 14.
    Gerber, R.: The Software Optimization Cookbook. Intel Press (2002)Google Scholar
  15. 15.
    Ghose, S., Srinath, S., Tse, J.: Accelerating a PARSEC benchmark using portable subword SIMD. In: CS 5220: Final Project Report. Cornell Eng. (2011)Google Scholar
  16. 16.
    Hennessy JL, Patterson DA (2006) Computer Architecture: A Quantitative Approach, 4th edn. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  17. 17.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 5th edn. Morgan Kaufmann Publishers Inc. (2012)Google Scholar
  18. 18.
    ITRS: Int. Technology Roadmap for Semiconductors report (2012). http://www.itrs.net/Links/2012ITRS/Home2012.htm
  19. 19.
    Kaxiras, S., Martonosi, M.: Computer Architecture Techniques for Power-Efficiency, 1st edn. Morgan and Claypool Publishers (2008)Google Scholar
  20. 20.
    Kim, C., et al.: Technical report: closing the ninja performance gap through traditional programming and compiler technology (2012)Google Scholar
  21. 21.
    Li, J., Martínez, J.F.: Power-Performance Considerations of Parallel Computing on Chip Multiprocessors, pp. 397–422. ACM, New York (2005)Google Scholar
  22. 22.
    Li, M., Sasanka, R., Adve, S.V., kuang Chen, Y., Debes, E.: The alpbench benchmark suite. In. In Proc. of the IEEE Int. Symp. on Workload Characterization (2005)Google Scholar
  23. 23.
    Li, S.: Case study: computing black-scholes with Intel advanced vector extensions (2012). http://software.intel.com/en-us/articles/case-study-computing-black-scholes-with-intel-advanced-vector-extensions
  24. 24.
    Lien, H., Natvig, L., Hasib, A.A., Meyer, J.C.: Case studies of multi-core energy efficiency in task based programs. In: ICT-GLOW, pp. 44–54 (2012)Google Scholar
  25. 25.
    Lotze, J., Sutton, P.D., Lahlou, H.: Many-core accelerated libor swaption portfolio pricing. In: Proc. of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC ’12, IEEE Computer Society (2012)Google Scholar
  26. 26.
    Molka D et al (2011) Flexible Workload Generation for HPC Cluster Efficiency Benchmarking. Springer, BerlinGoogle Scholar
  27. 27.
    Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: a portable interface to hardware performance counters. In: Proc. of the Dep. of Defense Users Group Conf. (1999)Google Scholar
  28. 28.
    Pommier, J.: Simple SSE and SSE2 sin, cos, log and exp (2007). http://gruntthepeon.free.fr/ssemath/
  29. 29.
    Totoni, E., Behzad, B., Ghike, S., Torrellas, J.: Comparing the power and performance of Intel’s SCC to state-of-the-art CPUs and GPUs, pp. 78–87. IEEE Computer Society, Los Alamitos (2012)Google Scholar

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  1. 1.Department of Computer and Information Science (IDI)NTNUTrondheimNorway

Personalised recommendations