An RDMA Middleware for Asynchronous Multi-stage Shuffling in Analytical Processing

  • Rui C. Gonçalves
  • José Pereira
  • Ricardo Jiménez-Peris
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9687)


A key component in large scale distributed analytical processing is shuffling, the distribution of data to multiple nodes such that the computation can be done in parallel. In this paper we describe the design and implementation of a communication middleware to support data shuffling for executing multi-stage analytical processing operations in parallel. The middleware relies on RDMA (Remote Direct Memory Access) to provide basic operations to asynchronously exchange data among multiple machines. Experimental results show that the RDMA-based middleware developed can provide a 75 % reduction of the costs of communication operations on parallel analytical processing tasks, when compared with a sockets middleware.


Distributed databases OLAP Middleware RDMA 



This research has been partially funded by the European Commission under projects CoherentPaaS and LeanBigData (grants FP7-611068, FP7-619606), the Madrid Regional Council, FSE and FEDER, project Cloud4BigData (grant S2013TIC-2894), the Spanish Research Agency MICIN project BigDataPaaS (grant TIN2013-46883), and the ERDF – European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation – COMPETE 2020 Programme and by National Funds through the FCT – Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project POCI-01-0145-FEDER-006961.


  1. 1.
    Darema, F.: The SPMD model: past, present and future. In: Cotronis, Y., Dongarra, J. (eds.) PVM/MPI 2001. LNCS, vol. 2131, p. 1. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  2. 2.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  3. 3.
    Dragojević, A., Narayanan, D., Castro, M., Hodson, O.: FaRM: fast remote memory. In: USENIX Symposium on Networked Systems Design and Implementation, pp. 401–414 (2014)Google Scholar
  4. 4.
    Forum, M.P.I.: MPI: A message-passing interface standard. University of Tennessee, Technical report (1994)Google Scholar
  5. 5.
    Gonçalves, R.C., Pereira, J., Jimenez-Peris, R.: Design of an RDMA communication middleware for asynchronous shuffling in analytical processing. In: CLOSER - CoherentPaaS/LeanBigData Projects Workshop (to appear)Google Scholar
  6. 6.
    Apache Impala project.
  7. 7.
    Jimenez-Peris, R., Patino-Martinez, M., Kemme, B., Brondino, I., Pereira, J., Vilaça, R., Cruz, F., Oliveira, R., Ahmad, Y.: CumuloNimbo: a cloud scalable multi-tier SQL database. Data Eng. 38(1), 73–83 (2015)Google Scholar
  8. 8.
    Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)CrossRefGoogle Scholar
  9. 9.
    Liu, J., Wu, J., Panda, D.K.: High performance RDMA-based MPI implementation over InfiniBand. Int. J. Parallel Program. 32(3), 167–198 (2004)CrossRefzbMATHGoogle Scholar
  10. 10.
    Lu, X., Islam, N.S., Wasi-Ur-Rahman, M., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-performance design of Hadoop RPC with RDMA over InfiniBand. In: International Conference on Parallel Processing, pp. 641–650 (2013)Google Scholar
  11. 11.
    MacArthur, P., Russell, R.D.: A performance study to guide RDMA programming decisions. In: ACM International Conference on High Performance Computing and Communication & IEEE International Conference on Embedded Software and Systems, pp. 778–785 (2012)Google Scholar
  12. 12.
    Mellanox Technologies: RDMA Aware Networks Programming User Manual (2015)Google Scholar
  13. 13.
    Stuedi, P., Metzler, B., Trivedi, A.: jVerbs: ultra-low latency for data center applications. In: 4th Annual Symposium on Cloud Computing, pp. 10:1–10:14 (2013)Google Scholar
  14. 14.
    Sur, S., Jin, H.W., Chai, L., Panda, D.K.: RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 32–39 (2006)Google Scholar
  15. 15.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)CrossRefGoogle Scholar
  16. 16.
    Transaction Processing Performance Council: TPC Benchmark C Standard Specification, Revision 5.11 (2010)Google Scholar
  17. 17.
    Trivedi, A., Metzler, B., Stuedi, P.: A case for RDMA in clouds: turning supercomputer networking into commodity. In: Asia-Pacific Workshop on Systems (2011)Google Scholar
  18. 18.
    Wang, Y., Que, X., Yu, W., Goldenberg, D., Sehgal, D.: Hadoop acceleration through network levitated merge. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 57:1–57:10 (2011)Google Scholar
  19. 19.
    Wang, Y., Xu, C., Li, X., Yu, W.: JVM-bypass for efficient Hadoop shuffling. In: International Symposium on Parallel and Distributed Processing, pp. 569–578 (2013)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  • Rui C. Gonçalves
    • 1
  • José Pereira
    • 1
  • Ricardo Jiménez-Peris
    • 2
  1. 1.HASLab, INESC TEC & U. MinhoBragaPortugal
  2. 2.Univ. Politécnica de Madrid & LeanXcaleMadridSpain

Personalised recommendations