Advertisement

High-Throughput Sockets over RDMA for the Intel Xeon Phi Coprocessor

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10524)

Abstract

In this paper we describe the design, implementation and performance of Trans4SCIF, a user-level socket-like transport library for the Intel Xeon Phi coprocessor. Trans4SCIF library is primarily intended for high-throughput applications. It uses RDMA transfers over the native SCIF support, in a way that is transparent for the application, which has the illusion of using conventional stream sockets. We also discuss the integration of Trans4SCIF with the ZeroMQ messaging library, used extensively by several applications running at CERN. We show that this can lead to a substantial, up to 3x, increase of application throughput compared to the default TCP/IP transport option.

Keywords

RDMA Fast data transfer Stream sockets Manycore processors Intel Xeon Phi ZeroMQ High performance computing 

Notes

Acknowledgments

Many thanks for the great support we received from Kristina Gunne, Omar Awile and Luca Atzori from CERN openlab and the CERN IT department.

References

  1. 1.
    ALICE Collaboration: Upgrade of the Online - Offline computing system (CERN-LHCC-2015-004; ALICE-TDR-019)Google Scholar
  2. 2.
    Antcheva, I., et al.: ROOT - A C++ framework for petabyte data storage, statistical analysis and visualization. Comput. Phys. Commun. 180(12), 2499–2512 (2009)CrossRefGoogle Scholar
  3. 3.
    Balaji, P., et al.: Sockets Direct Protocol over InfiniBand in clusters: is it beneficial? In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 28–35, IEEE (2004)Google Scholar
  4. 4.
    Farrell, S., Dotti, A., Asai, M., Calafiura, P., Monnard, R.: Multi-threaded Geant4 on the Xeon-Phi with complex high-energy physics geometry. In: IEEE Nuclear Science Symposium and Medical Imaging Conference, pp. 1–4 (2015)Google Scholar
  5. 5.
    George, C.: Intel Xeon Phi Coprocessor, the architecture. Intel Whitepaper (2014)Google Scholar
  6. 6.
    Hefty, S.: Rsocket, https://goo.gl/2uOsmZ
  7. 7.
    Hintjens, P.: ZeroMQ: Messaging for Many Applications. O’Reilly, Sebastopol (2013)Google Scholar
  8. 8.
    Intel Corporation: Symmetric Communications Interface (SCIF) For Intel Xeon Phi Product Family Users Guide , revision: 3.5 (2015)Google Scholar
  9. 9.
  10. 10.
    MacArthur, P., Russell, R.D.: An efficient method for stream semantics over RDMA. In: IEEE International Parallel and Distributed Processing Symposium, pp. 841–851 (2014)Google Scholar
  11. 11.
    Monnard, R.: Concurrent I/O from Xeon Phi accelerator cards. Masters thesis, Haute Ecole Specialisee de Suisse Occidentale de Fribourg, Switzerland (2015)Google Scholar
  12. 12.
    Nowak, A., et al.: Does the Intel Xeon Phi processor fit HEP workloads?. J. Phys. Conf. Seri. 513(5) (2014). article no. 052024Google Scholar
  13. 13.
    Pfister, G.F.: An introduction to the infiniband architecture. High Perfor. Mass Storage and Parallel I/O 42, 617–632 (2001)Google Scholar
  14. 14.
    Potluri, S., Hamidouche, K., Bureddy, D., Panda, D.K.: MVAPICH2-MIC: A high performance MPI library for Xeon Phi clusters with Infiniband. In: Extreme Scaling, Workshop, pp. 25–32 (2013)Google Scholar
  15. 15.
    Potluri, S., Venkatesh, A., Bureddy, D., Kandalla, K., Panda, D.K.: Efficient intra-node communication on Intel-MIC clusters. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 128–135 (2013)Google Scholar
  16. 16.
    Radford, N.A., et al.: Valkyrie: NASA’s first bipedal humanoid robot. J. Field Robot. 32(3), 397–419 (2015)CrossRefGoogle Scholar
  17. 17.
    Santogidis, A., Hirstius, A., Lalis, S.: Evaluating the transport layer of the ALFA framework for the Intel Xeon Phi Coprocessor. J. Phys. Conf. Ser. 664(9) (2015). article no. 092021Google Scholar
  18. 18.
    Sustrik, M.: NanoMSG. http://nanomsg.org/
  19. 19.
    Toshniwal, A., et al.: Storm@ twitter. In: ACM SIGMOD International Conference on Management of Data, pp. 147–156 (2014)Google Scholar
  20. 20.
    Wang, H., et al.: MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. In: Comput. Sci. Res. Dev. 26(3–4), p. 257 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Maynooth UniversityMaynoothIreland
  2. 2.CERNGenevaSwitzerland
  3. 3.University of ThessalyVolosGreece

Personalised recommendations