Skip to main content

GPU-Accelerated Language and Communication Support by FPGA

  • Chapter
  • First Online:
Advanced Software Technologies for Post-Peta Scale Computing

Abstract

Although the GPU is one of the most successfully used accelerating devices for HPC, there are several issues when it is used for large-scale parallel systems. To describe real applications on GPU-ready parallel systems, we need to combine different paradigms of programming such as CUDA/OpenCL, MPI, and OpenMP for advanced platforms. In the hardware configuration, inter-GPU communication through PCIe channel and support by CPU are required which causes large overhead to be a bottleneck of total parallel processing performance. In our project to be described in this chapter, we developed an FPGA-based platform to reduce the latency of inter-GPU communication and also a PGAS language for distributed-memory programming with accelerating devices such as GPU. Through this work, a new approach to compensate the hardware and software weakness of parallel GPU computing is provided. Moreover, FPGA technology for computation and communication acceleration is described upon astrophysical problem where GPU or CPU computation is not sufficient on performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Before we started this research, we had made another PCIe base communication. Then it is named as the second version.

References

  1. Aarseth, S.J.: Dynamical evolution of clusters of galaxies, I. Mon. Not. R. Astron. Soc. 126, 223 (1963). https://doi.org/10.1093/mnras/126.3.223

    Article  Google Scholar 

  2. Barnes, J., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324, 446–449 (1986). https://doi.org/10.1038/324446a0

    Article  Google Scholar 

  3. Cunningham, D., et al.: GPU programming in a high level language: compiling X10 to CUDA. In: Proceedings of the 2011 ACM SIGPLAN X10 workshop (X10 ’11), New York (2011)

    Google Scholar 

  4. Edwards, H.C., Trott, C.R.: Kokkos: enabling performance portability across manycore architectures. In: Proceedings of the 2013 extreme scaling workshop (XSW 2013), pp. 18–24, Aug 2013

    Google Scholar 

  5. Garland, M., Kudlur, M., Zheng, Y.: Designing a unified programming model for heterogeneous machines. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 67:1–67:11, Los Alamitos (2012)

    Google Scholar 

  6. Hanawa, T., Kodama, Y., Boku, T., Sato, M.: Interconnect for tightly coupled accelerators architecture. In: IEEE 21st Annual Sympsium on High-Performance Interconnects (HOT Interconnects 21) (2013)

    Google Scholar 

  7. Hornung, R.D., Keasler, J.A.: The RAJA portability layer: overview and status. Technical Report LLNLTR-661403, LLNL (2014)

    Google Scholar 

  8. McMillan, S.L.W.: The vectorization of small-N integrators. In: Hut, P., McMillan, S.L.W. (eds.) The Use of Supercomputers in Stellar Dynamics. Lecture Notes in Physics, vol. 267, p. 156. Springer, Berlin (1986). https://doi.org/10.1007/BFb0116406

    Google Scholar 

  9. Mellanox Fabric Collective Accelerator. http://www.mellanox.com/

  10. Miki, Y., Umemura, M.: GOTHIC: gravitational oct-tree code accelerated by hierarchical time step controlling. New Astron. 52, 65–81 (2017). https://doi.org/10.1016/j.newast.2016.10.007

    Article  Google Scholar 

  11. Miki, Y., Umemura, M.: MAGI: many-component galaxy initializer. Mon. Not. R. Astron. Soc. 475, 2269–2281 (2018). https://doi.org/10.1093/mnras/stx3327

    Article  Google Scholar 

  12. NVIDIA Corporation: NVIDIA GPUDirect (2014). https://developer.nvidia.com/gpudirect

    Google Scholar 

  13. Odajima, T., et al.: Hybrid communication with TCA and infiniband on a parallel programming language XcalableACC for GPU clusters. In: Proceedings of the 2015 IEEE International Conference on Cluster Computing, pp. 627–634, Sept 2015

    Google Scholar 

  14. Omni Compiler Project: Omni compiler project (2018). http://omni-compiler.org/

  15. OpenACC-Standard.org: The OpenACC application programming interface version 2.0 (2013). http://www.openacc.org/sites/default/files/OpenACC.2.0a_1.pdf

  16. Potluri, S., Hamidouche, K., Venkatesh, A., Bureddy, D., Panda, D.K.: Efficient inter-node MPI communication using GPUDirect RDMA for infiniband clusters with NVIDIA GPUs. In: Proceedings of the International Conference on Parallel Processing, pp. 80–89 (2013)

    Google Scholar 

  17. RIKEN AICS and University of Tsukuba: XcalableACC language specification version 1.0 (2017). http://xcalablemp.org/download/XACC/xacc-spec-1.0.pdf

  18. Sidelnik, A., et al.: Performance portability with the Chapel language. In: Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium, pp. 582–594 (2012)

    Google Scholar 

  19. Stone, A.I., et al.: Evaluating coarray fortran with the cgpop miniapp. In: Proceedings of the Fifth Conference on Partitioned Global Address Space Programming Models (PGAS), Oct 2011.

    Google Scholar 

  20. Tsuruta, C., Miki, Y., Kuhara, T., Amano, H., Umemura, M.: Off-loading LET generation to PEACH2: a switching hub for high performance GPU clusters. In: ACM SIGARCH Computer Architecture News – HEART15, vol. 43, pp. 3–8. ACM, New York (2016). http://doi.acm.org/10.1145/2927964.2927966

    Article  Google Scholar 

  21. Tsuruta, C., Kaneda, K., Nishikawa, N., Amano, H.: Accelerator-in-switch: a framework for tightly coupled switching hub and an accelerator with FPGA. In: 27th International Conference on Field Programmable Logic & Application (FPL2017) (2017)

    Google Scholar 

  22. Warren, M.S., Salmon, J.K.: Astrophysical N-body simulations using hierarchical tree data structures. In: Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, pp. 570–576. IEEE Computer Society Press (1992)

    Google Scholar 

  23. Wilson, K.G.: Confinement of quarks. Phys. Rev. D 10, 2445–2459 (1974)

    Article  Google Scholar 

  24. XcalableMP Specification Working Group: XcalableMP specification version 1.2 (2013). http://www.xcalablemp.org/download/spec/xmp-spec-1.2.pdf

  25. Zenker, E., Worpitz, B., Widera, R., Huebl, A., Juckeland, G., Knpfer, A., Nagel, W.E., Bussmann, M.: Alpaka – an abstraction library for parallel Kernel acceleration. In: Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 631–640, May 2016

    Google Scholar 

  26. Zilberman, N., Audzevich, Y., Kalogeridou, G., Bojan, N.M., Zhang, J., Moore, A.W.: NetFPGA – rapid prototyping of high bandwidth devices in open source. In: 25th International Conference on Field Programmable Logic and Applications (FPL) (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taisuke Boku .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Boku, T. et al. (2019). GPU-Accelerated Language and Communication Support by FPGA. In: Sato, M. (eds) Advanced Software Technologies for Post-Peta Scale Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-1924-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1924-2_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1923-5

  • Online ISBN: 978-981-13-1924-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics