Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 431))

Abstract

The paper proposes an approach for efficient and flexible parallelization of divide-and-conquer computations using the modern Intel Xeon Phi accelerators. Many real-life problems follow the divide-and-conquer paradigm and consequently generate either balanced or imbalanced trees. The paper proposes an OpenMP multi-threaded implementation of a general framework that requires coding basic divide-and-conquer constructs such as data partitioning, computations and result integration. Mapping computations onto threads is handled by the underlying runtime layer. The paper presents performance results for a parallel adaptive quadrature integration resulting in an irregular and imbalanced tree. It is shown that speed-ups obtained reach around 90 for parallelization of an irregular adaptive integration code compared to maximum speed-ups of 98 for code without thread management at various levels of the divide-and-conquer tree. Results for various thread affinities are shown. The framework, for which the source code is presented, can be easily reused for any other divide-and-conquer application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://top500.org.

References

  1. Czarnul, P., Rosciszewski, P.: Optimization of execution time under power consumption constraints in a heterogeneous parallel system with gpus and cpus. In: Distributed Computing and Networking, pp. 66–80. Springer Berlin (2014), Volume 8314 of LNCS

    Google Scholar 

  2. Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming. Newnes, New South Wales (2013)

    Google Scholar 

  3. Rugina, R., Rinard, M.: Automatic parallelization of divide and conquer algorithms. In: Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 72–83. PPoPP ’99, New York, NY, USA, ACM (1999)

    Google Scholar 

  4. Freisleben, B., Kielmann, T.: Automatic parallelization of divide-and-conquer algorithms. In: Parallel Processing: CONPAR 92—VAPP V, pp. 849–850. Springer, Berlin (1992), Volume 634 of Lecture Notes in Computer Science

    Google Scholar 

  5. Czarnul, P.: Programming, tuning and automatic parallelization of irregular divide-and-conquer applications in Dampvm/DAC. Int. J. High Perform. Comput. Appl. 17, 77–93 (2003)

    Google Scholar 

  6. Eriksson, M.V., Keßler, C.W., Chalabine, M.: Load balancing of irregular parallel divideand-conquer algorithms in group-spmd programming environments. In: ARCS Workshops, pp. 313–322. GI (2006), Volume 81 of LNI

    Google Scholar 

  7. Intel: Intel cilk plus language specification (2010) ver. 0.9. http://www.cilkplus.org/sites/default/files/open_specifications/cilk_plus_language_specification_0_9.pdf

  8. Michaela, M., Byckling, M., Ilieva, N., Saarinen, S., Schliephake, M., Weinberg, V.: Best practice guide intel xeon phi v1.1, PRACE, 7 Capacities (2014)

    Google Scholar 

  9. Saule, E., Kaya, K., Çatalyürek, Ü.V.: Performance evaluation of sparse matrix multiplication kernels on intel xeon phi. CoRR abs/1302.1078 (2013)

    Google Scholar 

  10. Ramachandran, A., Vienne, J., der Wijngaart, R.A., Koesterke, L., Sharapov, I.: Performance evaluation of nas parallel benchmarks on intel xeon phi. In: IEEE ICPP, pp. 736–743 (2013)

    Google Scholar 

  11. Lima, J.V., Broquedis, F., Gautier, T., Raffin, B.: Preliminary experiments with xkaapi on Intel xeon phi coprocessor. In: Symposium on Computer Architecture and High Performance Computing

    Google Scholar 

  12. Eisenlor, J., Hudak, D., Tomko, K., Prince, T.: Dense linear algebra factorization in OpenMP and Cilk Plus on Intel MIC: development experiences and performance analysis. In: TACCIntel Highly Parallel Computing Symposium (2012)

    Google Scholar 

  13. Wu, Q., Yang, C., Tang, T., Xiao, L.: Mic acceleration of short-range molecular dynamics simulations. In: Proceedings of the First International Workshop on Code OptimiSation for MultI and Many Cores. COSMIC ’13, New York, NY, USA, ACM (2013) 2:1–2:8

    Google Scholar 

  14. Cramer, T., Schmidl, D., Klemm, M., Mey, D.: Openmp programming on intel xeon phi coprocessors: An early performance comparison. In: Proceedings of the Many-core Applications Research Community Symposium at RWTH Aachen University, pp. 38–44 (2012)

    Google Scholar 

  15. Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.: Assessing the performance of openmp programs on the intel xeon phi. In: Euro-Par 2013 Parallel Processing, pp. 547–559. Springer, Berlin (2013), Volume 8097 of LNCS

    Google Scholar 

  16. Reinders, J.: An overview of programming for intel xeon processors and intel xeon phi coprocessors, Intel. https://software.intel.com/sites/default/files/article/330164/an-overview-of-programming-for-intel-xeon-processors-and-intel-xeon-phi-coprocessors_1.pdf (2012)

  17. Green, R.W.: Openmp* thread affinity control. compiler methodology for intelR mic architecture. https://software.intel.com/en-us/articles/openmp-thread-affinity-control (2012)

Download references

Acknowledgments

The work was supported by Intel within grant “Intel Phi Parallel Processing Lab”. The author wishes to thank Intel, especially Marek Zmuda, Marek Piosik and Robert Bard from Intel for provision of Xeon Phi equipment, literature and needed support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paweł Czarnul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Czarnul, P. (2016). Parallelization of Divide-and-Conquer Applications on Intel Xeon Phi with an OpenMP Based Framework. In: Świątek, J., Borzemski, L., Grzech, A., Wilimowska, Z. (eds) Information Systems Architecture and Technology: Proceedings of 36th International Conference on Information Systems Architecture and Technology – ISAT 2015 – Part III. Advances in Intelligent Systems and Computing, vol 431. Springer, Cham. https://doi.org/10.1007/978-3-319-28564-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28564-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28562-7

  • Online ISBN: 978-3-319-28564-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics