Skip to main content

Generic Algorithmic Scheme for 2D Stencil Applications on Hybrid Machines

  • Conference paper
Architecture of Computing Systems – ARCS 2016 (ARCS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9637))

Included in the following conference series:

Abstract

Hardware accelerators are classic scientific coprocessors in HPC machines. However, the number of CPU cores on the mother board is increasing and constitutes a non negligible part of the total computing power of the machine. So, running an application both on an accelerator (like a GPU or a Xeon-Phi device) and on the CPU cores can provide the highest performance. Moreover, it is now possible to include different accelerators in a machine, in order to support and to speedup a larger set of applications. Then, running an application part on the most suitable device allows to reach high performance, but using all unused devices in the machine should permit to improve even more the performance of that part. However, the overlapping of computations with inter-device data transfers is mandatory to limit the overhead of this approach, leading to complex asynchronous algorithms and multi-paradigm optimized codes. This article introduces our research and experiments on cooperation between several CPU and both a GPU and a Xeon-Phi accelerators, all included in a same machine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shuttle Radar Topography Mission (2000). https://lta.cr.usgs.gov/SRTM1Arc

  2. Calandra, H., Dolbeau, R., Fortin, P., Lamotte, J.L., Said, I.: Evaluation of successive CPUs/APUs/GPUs based on an OpenCL finite difference stencil. In: 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), February 2013

    Google Scholar 

  3. Contassot-Vivier, S., Vialle, S.: Algorithmic scheme for hybrid computing with CPU, Xeon-Phi/MIC and GPU devices on a single machine. In: ParCo 2015, Edinburgh, UK, September 2015

    Google Scholar 

  4. Courtès, L.: C language extensions for hybrid CPU/GPU programming with StarPU. Technical Report 8278, INRIA (2013)

    Google Scholar 

  5. Dijkstra, E.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959). doi:10.1007/BF01386390

    Article  MathSciNet  MATH  Google Scholar 

  6. Fang, J., Varbanescu, A.L., Imbernon, B., Cecilia, J.M., Perez-Sanchez, H.: Parallel computation of non-bonded interactions in drug discovery: Nvidia GPUs vs. Intel Xeon Phi. In: 2nd International Work-Conference on Bioinformatics and Biomedical Engineering, Granada, Spain (2014)

    Google Scholar 

  7. Gaster, B., Howes, L., Kaeli, D., Mistry, P., Schaa, D.: Heterogeneous Computing with OpenCL, 2nd edn. Morgan Kaufmann, Burlington (2012). ISBN 9780124058941

    Google Scholar 

  8. Rao, J.S.: Optimization. In: Rao, J.S. (ed.) History of Rotating Machinery Dynamics. HMMS, vol. 20, pp. 341–351. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Jin, G., Lin, J., Endo, T.: Efficient utilization of memory hierarchy to enable the computation on bigger domains for stencil computation in CPU-GPU based systems. In: 2014 International Conference on High Performance Computing and Applications (ICHPCA), December 2014

    Google Scholar 

  10. Su, H., Wu, N., Wen, M., Zhang, C., Cai, X.: On the GPU-CPU performance portability of OpenCL for 3D stencil computations. In: Proceedings of the 2013 International Conference on Parallel and Distributed Systems, ICPADS 2013, Washington, DC, USA (2013)

    Google Scholar 

  11. Szustak, L., Rojek, K., Olas, T., Kuczynski, L., Halbiniak, K., Gepner, P.: Adaptation of MPDATA heterogeneous stencil computation to Intel Xeon Phi coprocessor. Sci. Prog. 2015, Article ID 642705, 14 (2015). Doi:10.1155/2015/642705

    Google Scholar 

  12. Wende, F., Steinke, T.: Swendsen-Wang multi-cluster algorithm for the 2D/3D Ising model on Xeon Phi and GPU. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, ACM, New York, NY, USA (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephane Vialle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Vialle, S., Contassot-Vivier, S., Mercier, P. (2016). Generic Algorithmic Scheme for 2D Stencil Applications on Hybrid Machines. In: Hannig, F., Cardoso, J.M.P., Pionteck, T., Fey, D., Schröder-Preikschat, W., Teich, J. (eds) Architecture of Computing Systems – ARCS 2016. ARCS 2016. Lecture Notes in Computer Science(), vol 9637. Springer, Cham. https://doi.org/10.1007/978-3-319-30695-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30695-7_9

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30694-0

  • Online ISBN: 978-3-319-30695-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics