Abstract
Intel Xeon Phi accelerators are one of the newest devices used in the field of parallel computing. However, there are comparatively few studies concerning their performance when using most of the existing parallelization techniques. One of them is thread-level speculation, a technique that optimistically tries to extract parallelism of loops without the need of a compile-time analysis that guarantees that the loop can be executed in parallel. In this article we evaluate the performance delivered by an Intel Xeon Phi coprocessor when using a software, state-of-the-art thread-level speculative parallelization library in the execution of well-known benchmarks. We describe both the internal characteristics of the Xeon Phi platform and the particularities of the thread-level speculation library being used as benchmark. Our results show that, although the Xeon Phi delivers a relatively good speedup in comparison with a shared-memory architecture in terms of scalability, the relatively low computing power of its computational units when specific vectorization and SIMD instructions are not fully exploited makes this first generation of Xeon Phi architectures not competitive (in terms of absolute performance) with respect to conventional multicore systems for the execution of speculatively parallelized code.
Similar content being viewed by others
Notes
This issue can be addressed by the programmer, or by the use of specific compilers such as [4].
A thread is the simplest unit of execution, intended to process a specific code. A block is defined as a group of threads, where threads can be executed concurrently or sequentially with no order. At this level, a block allow the coordination of its threads with the use of barriers. A grid is a group of blocks without any possible synchronization among them.
References
AMD \(\text{ Opteron }^{{\rm TM}}\) 6300 Series processor - quick reference guide. https://www.amd.com/Documents/Opteron_6300_QRG.pdf. Accessed June 2015
Intel \(\textregistered \) Xeon \(\text{ Phi }^{{\rm TM}}\) product family: Product brief. https://www-ssl.intel.com/content/dam/www/public/us/en/documents/product-briefs/high-performance-xeon-phi-coprocessor-brief.pdf. Accessed June 2015
Intel \(\textregistered \) Xeon \(\text{ Phi }^{{\rm TM}}\) coprocessor instruction set architecture reference manual. https://software.intel.com/sites/default/files/forum/278102/327364001en.pdf. Accessed June 2015
Aldea, S., Estebanez, A., Llanos, D., Gonzalez-Escribano, A.: An OpenMP extension that supports thread-level speculation. IEEE Trans. Parallel Distrib. Syst. PP(99), 1–1 (2015). doi:10.1109/TPDS.2015.2393870
Barnes, J.E.: TREE. Institute for Astronomy. University of Hawaii (1997). ftp://hubble.ifa.hawaii.edu/pub/barnes/treecode/
Cadambi, S., Coviello, G., Li, C.H., Phull, R., Rao, K., Sankaradass, M., Chakradhar, S.: Cosmic: middleware for high performance and reliable multiprocessing on Xeon Phi coprocessors. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’13, pp. 215–226. ACM, New York (2013). doi:10.1145/2462902.2462921
Cai, P., Cai, Y., Chandrasekaran, I., Zheng, J.: A GPU-enabled parallel genetic algorithm for path planning of robotic operators. In: Cai, Y., See, S. (eds.) GPU Comput. Appl., pp. 1–13. Springer, Singapore (2015). doi:10.1007/978-981-287-134-3_1
Cintra, M., Llanos, D.R.: Toward efficient and robust software speculative parallelization on multiprocessors. In: Proceedings of the SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) (2003)
Cintra, M., Llanos, D.R.: Design space exploration of a software speculative parallelization scheme. IEEE Trans. Parallel Distrib. Syst. 16(6), 562–576 (2005)
Clarkson, K.L., Mehlhorn, K., Seidel, R.: Four results on randomized incremental constructions. Comput. Geom. Theory Appl. 3(4), 185–212 (1993)
Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: OpenMP programming on Intel Xeon Phi coprocessors: An early performance comparison. In: Proceedings of the Many-core Applications Research Community (MARC) Symposium (2012)
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998). doi:10.1109/99.660313
Devroye, L., Mücke, E.P., Zhu, B.: A note on point location in Delaunay triangulations of random points. Algorithmica 22, 477–482 (1998)
Dou, J., Cintra, M.: Compiler estimation of load imbalance overhead in speculative parallelization. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT ’04. IEEE Computer Society, Washington, DC (2004)
Estebanez, A., Llanos, D., Gonzalez-Escribano, A.: New data structures to handle speculative parallelization at runtime. Int. J. Parallel Program. 1–20 (2015). doi:10.1007/s10766-014-0347-0
Fang, J., Sips, H., Zhang, L., Xu, C., Che, Y., Varbanescu, A.L.: Test-driving Intel Xeon Phi. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, ICPE ’14, pp. 137–148. ACM, New York (2014). doi:10.1145/2568088.2576799
Franklin, M., Sohi, G.S.: ARB: a hardware mechanism for dynamic reordering of memory references. IEEE Trans. Comput. 45(5), 552–571 (1996). doi:10.1109/12.509907
Gao, L., Li, L., Xue, J., Yew, P.C.: SEED: a statically-greedy and dynamically-adaptive approach for speculative loop execution. IEEE Trans. Comput. 62(5), 1004–1016 (2013)
Gopal, S., Vijaykumar, T.N., Smith, J., Sohi, G.: Speculative versioning cache. In: High-Performance Computer Architecture, 1998. Proceedings, 1998 Fourth International Symposium on, pp. 195–205 (1998). doi:10.1109/HPCA.1998.650559
Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High-Performance Programming. Newnes, Boston (2013)
Jimborean, A., Clauss, P., Dollinger, J.F., Loechner, V., Martinez Caamao, J.: Dynamic and speculative polyhedral parallelization using compiler-generated skeletons. Int. J. Parallel Program. 42(4), 529–545 (2014)
Kelsey, K., Bai, T., Ding, C., Zhang, C.: Fast track: a software system for speculative program optimization. In: Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’09, pp. 157–168. IEEE Computer Society, Washington, DC (2009). doi:10.1109/CGO.2009.18
Khronos: Open Computing Language (OpenCL) (2010). http://www.khronos.org/opencl/, Accessed 2 Dec 2013
Krishnan, V., Torrellas, J.: A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput. 48(9), 866–880 (1999)
Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic parallelism requires abstractions. In: PLDI 2007 Proceedings. ACM (2007)
Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic parallelism requires abstractions. Commun. ACM 52(9), 89–97 (2009)
Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS ’13, pp. 273–282. ACM, New York (2013). doi:10.1145/2464996.2465013
Marcuello, P., Gonzalez, A., Tubella, J.: Speculative multithreaded processors. In: Proceedings of the 12th International Conference on Supercomputing, ICS ’98. ACM, New York (1998)
Mücke, E.P., Saias, I., Zhu, B.: Fast randomized point location without preprocessing in two- and three-dimensional Delaunay triangulations. In: SoCG ’96 Proceedings, pp. 274–283 (1996)
NVIDIA: NVIDIA CUDA Architecture Introduction and Overview Version 1.1 (2009)
Oancea, C.E., Mycroft, A., Harris, T.: A lightweight in-place implementation for software thread-level speculation. In: Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures, SPAA ’09. ACM, New York (2009)
Olsen, S., Romoser, B., Zong, Z.: SQLPhi: A SQL-based database engine for Intel Xeon Phi coprocessors. In: Proceedings of the 2014 International Conference on Big Data Science and Computing, BigDataScience ’14, pp. 17:1–17:6. ACM, New York (2014). doi:10.1145/2640087.2644172
Park, J., Bikshandi, G., Vaidyanathan, K., Tang, P.T.P., Dubey, P., Kim, D.: Tera-scale 1D FFT with low-communication algorithm and Intel Xeon Phi coprocessors. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 34:1–34:12. ACM, New York (2013). doi:10.1145/2503210.2503242
Raman, E., Vahharajani, N., Rangan, R., August, D.I.: Spice: speculative parallel iteration chunk execution. In: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’08. ACM, New York (2008)
Rauchwerger, L., Padua, D.: The lrpd test: speculative run-time parallelization of loops with privatization and reduction parallelization (1995). doi:10.1145/207110.207148
Rezaei, A., Coviello, G., Li, C.H., Chakradhar, S., Mueller, F.: Snapify: capturing snapshots of offload applications on Xeon Phi manycore processors. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’14, pp. 1–12. ACM, New York (2014). doi:10.1145/2600212.2600215
Rotenberg, E., Bennett, S., Smith, J.E.: Trace cache: a low latency approach to high bandwidth instruction fetching. In: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture. MICRO 29, pp. 24–35. IEEE Computer Society, Washington, DC (1996)
Satish, N., Kim, C., Chhugani, J., Saito, H., Krishnaiyer, R., Smelyanskiy, M., Girkar, M., Dubey, P.: Can traditional programming bridge the ninja performance gap for parallel computing applications? In: Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA ’12, pp. 440–451. IEEE Computer Society, Washington, DC (2012). http://dl.acm.org/citation.cfm?id=2337159.2337210
Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Mller, M.: Assessing the performance of OpenMP programs on the Intel Xeon Phi. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013 Parallel Processing, Lecture Notes in Computer Science, vol. 8097, pp. 547–558. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40047-6_56
Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar processors. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA ’95, pp. 414–425. ACM, New York (1995). doi:10.1145/223982.224451
Tian, C., Feng, M., Gupta, R.: Supporting speculative parallelization in the presence of dynamic data structures. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’10. ACM, New York (2010)
Tian, C., Feng, M., Nagarajan, V., Gupta, R.: Copy or discard execution model for speculative parallelization on multicores. In: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’41. Washington, DC (2008)
Walker, D.W.: The design of a standard message passing interface for distributed memory concurrent computers. Parallel Comput. 20(4), 657–673 (1994). http://portal.acm.org/citation.cfm?id=180103
Wallace, S., Calder, B., Tullsen, D.M.: Threaded multiple path execution. In: Proceedings of the 25th Annual International Symposium on Computer Architecture, ISCA ’98, pp. 238–249. IEEE Computer Society, Washington, DC (1998). doi:10.1145/279358.279392
Yiapanis, P., Rosas-Ham, D., Brown, G., Luján, M.: Optimizing software runtime systems for speculative parallelization. ACM Trans. Archit. Code Optim. 9(4), 39:1–39:27 (2013)
Zhao, Z., Wu, B., Shen, X.: Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applications. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT ’12. New York (2012)
Acknowledgments
This research is partly supported by the Castilla-Leon Regional Government (VA172A12-2); MICINN (Spain) and the European Union FEDER (MOGECOPP project TIN2011-25639, HomProg-HetSys project TIN2014-58876-P, CAPAP-H5 network TIN2014-53522-REDT).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Estebanez, A., Llanos, D.R. & Gonzalez-Escribano, A. Using the Xeon Phi Platform to Run Speculatively-Parallelized Codes. Int J Parallel Prog 45, 225–241 (2017). https://doi.org/10.1007/s10766-016-0421-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0421-x