Advertisement

Performance Characterization and Optimization for Intel Xeon Phi Coprocessor

  • Cheng ZhangEmail author
  • Li Liu
  • Ruizhe Li
  • Guangwen YangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9528)

Abstract

The Intel Xeon Phi is a many-core accelerator which focuses on the high performance applications. To characterize the performance of the Intel Xeon Phi, a system of dual 8-core Intel Xeon E5-2670 processors is employed as a control platform, and a subset of the PARSEC benchmark suite is selected as the benchmark applications. The first evaluation in this paper shows that the applications on the Intel Xeon Phi is averagely 2.06x slower than on the dual Intel Xeon E5-2670. The further detailed performance characterization quantifies the performance impact of various architecture parameters on the Intel Xeon Phi. To set an example for how to improve the architecture of the Intel Xeon Phi for better performance, the hardware optimization with an additional set of vector processing units is discussed and a simple emulator is developed accordingly. The evaluation results show that this optimization can provide an average speedup of 1.10.

Keywords

Intel Xeon Phi Parallel architecture Performance characterization Architecture parameters Hardware optimization 

Notes

Acknowledgments

This work is supported in part by the Natural Science Foundation of China (no. 41275098), the National Grand Fundamental Research 973 Program of China (no. 2013CB956603), and the Tsinghua University Initiative Scientific Research Program (no. 20131089356).

References

  1. 1.
    Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Muller, M.S.: Assessing the performance of Openmp programs on the Intel Xeon Phi. In: Euro-Par 2013 Parallel Processing, pp. 547–558 (2013)Google Scholar
  2. 2.
    Semelyanskiy, M., Sewall, J., Kalamkar, D.D., Satish, N., Dubey, P., Astafiev, N., Burylov, I., Nikolaev, A., Maidanov, S., Li, S., Kulkarni, S., Finan, C.H.: Analysis and optimization of financial analytics benchmark on modern multi- and many-core ia-based architectures. In: SC Companion: High Performance Computing, Networking, Storage and Analysis (2012)Google Scholar
  3. 3.
    Williams, S., Kalamkar, D.D., Singh, A., Deshpande, A.M., Van Straalen, B., Smelyanskiy, M., Almgren, A., Dubey, P., Shalf, J., Oliker, L.: Optimization of geometric multigrid for emerging multi-and manycore processors. In: Conference on High Performance Computing, Networking, Storage and Analysis (2012)Google Scholar
  4. 4.
    Park, J., Tang, P.T.P., Smelyanskiy, M., Kim, D., Benson, T.: Efficient backprojection-based synthetic aperture radar computation with many-core processors. In: Conference on High Performance Computing, Networking, Storage and Analysis (2012)Google Scholar
  5. 5.
    Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: Openmp programming on Intel Xeon Phi coprocessors: an early performance comparison. In: Proceedings of the Many-core Applications Research Community (MARC) Symposium at RWTH Aachen University, pp. 38–44 (2012)Google Scholar
  6. 6.
    Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on X86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS), Eugene, Oregon, USA (2013)Google Scholar
  7. 7.
    Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring Simd for molecular dynamics, using Intel Xeon processors and Intel Xeon Phi coprocessors. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS). Boston, MA, USA (2013)Google Scholar
  8. 8.
    Saule, E., Kaya, K., Catalyurek, U.V.: Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. In: Parallel Processing and Applied Mathematics (2013)Google Scholar
  9. 9.
    Gao, T., Lu, Y., Zhang, B., Suo, G.: Using the intel many integrated core to accelerate graph traversal. Int. J. High Perform. Comput. Appl. 28(3), 255–266 (2014)CrossRefGoogle Scholar
  10. 10.
    Ravi, N., Yang, Y., Bao, T., Chakradhar, S.: Semi-automatic restructuring of offloadable tasks for many-core accelerators. In: Conference on High Performance Computing, Networking, Storage and Analysis (SC). Denver, USA (2013)Google Scholar
  11. 11.
    Reinders, J.: An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors. Intel (2012)Google Scholar
  12. 12.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81 (2008)Google Scholar
  13. 13.
    Bienia, C., Li, K.: Parsec 2.0: a new benchmark suite for chip-multiprocessors. In: Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation (2009)Google Scholar
  14. 14.
    Molka, D., Hackenberg, D., Schone, R., Mller, M.S.: Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 261–270 (2009)Google Scholar
  15. 15.
    Iyer, R., Bhuyan, L.N.: Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors. In: Proceedings of 5th International Symposium on High-Performance Computer Architecture (1999)Google Scholar
  16. 16.
    Koesterke, L., Boisseau, J., Cazes, J., Milfeld, K., Stanzione, D.: Early experiences with the intel many integrated cores accelerated computing technology. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery. Salt Lake City, Utah, USA (2011)Google Scholar
  17. 17.
    Schulz, K.W., Ulerich, R., Malaya, N., Bauman, P.T., Stogner, R.H., Simmons, C.: Early experiences porting scientic applications to the many integrated core (Mic) platform. In: TACC-Intel Highly Parallel Computing Symposium. Austin, TX (2012)Google Scholar
  18. 18.
    Saini, S., Jin, H., Jespersen, D., Feng, H., Djomehri, J., Arasin, W., Hood, R., Mehrotra, P., Biswas, R.: An early performance evaluation of many integrated core architecture based SGI rackable computing system. In: Conference on High Performance Computing, Networking, Storage and Analysis (2013)Google Scholar
  19. 19.
    Rahman, R.: Intel Xeon Phi Coprocessor Architecture and Tools: the Guide for Application Developers (Experts Voice in Microprocessors). Springer, Berlin (2013)Google Scholar
  20. 20.
    Thiagarajan, S.U., Congdon, C., Naik, S., Nguyen, L.Q.: Intel Xeon Phi Coprocessor Developer’s Quick Start Guide". https://software.intel.com/enus/articles/intel-xeon-phi-coprocessor-developers-quick-start-guide
  21. 21.
    Pentium Processor Family Developers Manual Volume 3: Architecture and Programming Manual. vol. 3, no. 241430 (1995)Google Scholar
  22. 22.
    Fang, J., Varbanescu, A.L., Sips, H., Zhang, L., Che, Y., Xu, C.: An Empirical Study of Intel Xeon Phi. arXiv preprint (2013)Google Scholar
  23. 23.
    Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Symposium on Programming Language Design & Implementation (PLDI). Chicago, Illinois, USA (2005)Google Scholar
  24. 24.
    Hazelwood, K., Lueck, G., Cohn, R.: Scalable support for multithreaded applications on dynamic binary instrumentation systems. In: Proceedings of the 2009 International Symposium on Memory Management (ISMM), Dublin, Ireland (2009)Google Scholar
  25. 25.
    Shao, Y.S., Brooks, D.: Energy characterization and instructionlevel energy model of Intels Xeon Phi processor. In: 2013 IEEE International Symposium on Low Power Electronics and Design (2013)Google Scholar
  26. 26.
    Czechowski, K., Lee, V.M., Grochowski, E., Ronen, R., Singhal, R., Vuduc, R., Dubey, P.: Improving the energy efficiency of big cores. In: Proceedings of the 41st Annual International Symposium on Computer Architecture (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.Center for Earth System ScienceTsinghua UniversityBeijingChina

Personalised recommendations