Abstract
The Intel® Xeon PhiTM coprocessor platform enables offload of computation from a host processor to a coprocessor that is a fully-functional Intel® Architecture CPU. This paper presents the C/C++ and Fortran compiler offload runtime for that coprocessor. The paper addresses why offload to a coprocessor is useful, how it is specified, and what the conditions for the profitability of offload are. It also serves as a guide to potential third-party developers of offload runtimes, such as a gcc-based offload compiler, ports of existing commercial offloading compilers to Intel® Xeon PhiTM coprocessor such as CAPS®, and third-party offload library vendors that Intel is working with, such as NAG® and MAGMA®. It describes the software architecture and design of the offload compiler runtime. It enumerates the key performance features for this heterogeneous computing stack, related to initialization, data movement and invocation. Finally, it evaluates the performance impact of those features for a set of directed micro-benchmarks and larger workloads.
Keywords
For more complete information about compiler optimizations, see Intel’s Optimization Notice at http://software.intel.com/en-us/articles/optimization-notice
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects SciDAC 2009: Scientific Discovery through Advanced Computing, San Diego, California. Journal of Physics: Conference Series, vol. 180, p. 012037. IOP Publishing (2009)
Budruk, R., Anderson, D., Shanley, T.: PCI Express System Architecture, 1st edn., 1120 pages (2003) ISBN 978-0-321-15630-3
Denning, P.J., Schwartz, S.C.: Properties of the Working-Set model. Communications of the ACM 15, 191–198 (1972)
Donaldson, A.F., Dolinsky, U., Richards, A., Russell, G.: Automatic offloading of C++ for the Cell BE Processor: A case study using offload. In: Proceedings of the 2010 Interna-tional Conference on Complex, Intelligent and Software Intensive Systems, pp. 901–906 (2010)
Green 500: The Green500 List (November 2012), http://www.green500.org
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message Passing Interface, 2nd edn. MIT Press, Cambridge (1999)
Gropp, W., Lusk, E., Thakur, R.: Using MPI-2: Advanced Features of the Message-Passing Interface. MIT Press, Cambridge (1999)
Intel® C/C++ compiler, http://www.intel.com/Software/Products
Intel® Many Integrated Core, http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html
Intel® Many Integrated Core SW development pages, http://software.intel.com/mic-developer
Intel® Math Kernel Library, http://www.intel.com/Software/Products
Intel® Message Passing Interface, http://software.intel.com/en-us/intel-mpi-library/
Intel® OpenCL for Intel® Xeon PhiTM Coprocessor, http://software.intel.com/en-us/vcsource/tools/opencl-sdk-xe
Jeffers, J., Reinders, J.: Intel® Xeon PhiTM Coprocessor High Performance Programming. Morgan Kaufmann (2013)
Khronos, http://www.khronos.org/opencl/
MAGMA, http://icl.cs.utk.edu/magma/
Newburn, C., Deodhar, R., Dmitriev, S., Murty, R., Narayanaswamy, R., Wiegert, J., Chin-chilla, F., McGuire, R.: Offlad Runtime for the Intel® Xeon PhiTM Coprocessor, http://software.intel.com/en-us/articles/offload-runtime-for-the-intelr-xeon-phitm-coprocessor
Numerical Algorithms Group, Ltd., http://www.nag.com/
NVIDIA CUDA reference manual, version 5.0 (October 2012), http://docs.nvidia.com/cuda/pdf/CUDA_Toolkit_Reference_Manual.pdf
OpenACC, http://www.openacc-standard.org/
OpenMP (March 2013), http://www.openmp.org/mp-documents/OpenMP_4.0_RC2.pdf
OpenMP (November 2012), http://www.openmp.org/mp-documents/TR1_167.pdf
Patterson, D., Hennessey, J.: Computer Organization and Design: the Hard-ware/Software Interface, 2nd edn., p. 751. Morgan Kaufmann Publishers, Inc., San Fran (1998)
Rabenseifner, R., Hager, G., Jost, G., Keller, R.: Hybrid MPI and openMP parallel programming. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, p. 11. Springer, Heidelberg (2006)
Ravi, N., Yang, Y., Bao, T., Chakradhar, S.: Apricot: An optimizing compiler and productivity tool for x86-compatible many-core coprocessors. In: Proc. of the 26th ACM International Conference on Supercomputing, pp. 47–58. ACM, New York (2012)
Reinders, J., http://parallelbook.com/blogs/james
Saha, B., Zhou, X., Chen, H., Gao, Y., Yan, S., Rajagopalan, M., Fang, J., Zhang, P., Ronen, R., Mendelson, A.: Programming model for a heterogeneous x86 platform. SIGPLAN Not. 44(6), 431–440 (2009)
SHOC 1.1.1 manual, http://ft.ornl.gov/doku/_media/shoc/shoc-manual-1.1.1.pdf
Threading Building Blocks, http://threadingbuildingblocks.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Newburn, C.J. et al. (2013). Offload Compiler Runtime for the Intel® Xeon PhiTM Coprocessor. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2013. Lecture Notes in Computer Science, vol 7905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38750-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-38750-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38749-4
Online ISBN: 978-3-642-38750-0
eBook Packages: Computer ScienceComputer Science (R0)