Advertisement

MapReduce over Lustre: Can RDMA-Based Approach Benefit?

  • Md. Wasi-ur Rahman
  • Xiaoyi Lu
  • Nusrat Sharmin Islam
  • Raghunath Rajachandrasekar
  • Dhabaleswar K. (DK) Panda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8632)

Abstract

Recently, MapReduce is getting deployed over many High Performance Computing (HPC) clusters. Different studies reveal that by leveraging the benefits of high-performance interconnects like InfiniBand in these clusters, faster MapReduce job execution can be obtained by using additional performance enhancing features. Although RDMA-enhanced MapReduce has been proven to provide faster solutions over Hadoop distributed file system, efficiencies over parallel file systems used in HPC clusters are yet to be discovered. In this paper, we present a complete methodology for evaluating MapReduce over Lustre file system to provide insights about the interactions of different system components in HPC clusters. Our performance evaluation shows that RDMA-enhanced MapReduce can achieve significant benefits in terms of execution time (49% in a 128-node HPC cluster) and resource utilization, compared to the default architecture. To the best of our knowledge, this is the first attempt to evaluate RDMA-enhanced MapReduce over Lustre file system on HPC clusters.

Keywords

MapReduce RDMA Lustre HPC Clusters 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.N.: Tarazu: Optimizing MapReduce on Heterogeneous Clusters. In: ASPLOS (2012)Google Scholar
  2. 2.
    Castain, R.H., Kulkarni, O.: MapReduce and Lustre: Running Hadoop in a High Performance Computing Environment, https://intel.activeevents.com/sf13/connect/sessionDetail.ww?SESSION_ID=1141
  3. 3.
    Engelmann, C., Ong, H., Scott, S.L.: Middleware in modern high performance computing system architectures. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007, Part II. LNCS, vol. 4488, pp. 784–791. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Gordon at San Diego Supercomputer Center, http://www.sdsc.edu/us/resources/gordon/
  5. 5.
    Huang, J., Ouyang, X., Jose, J., Rahman, M.W., Wang, H., Luo, M., Subramoni, H., Murthy, C., Panda, D.K.: High-Performance Design of HBase with RDMA over InfiniBand. In: IPDPS, Shanghai, China (2012)Google Scholar
  6. 6.
    International Data Corporation (IDC): New IDC Worldwide HPC End-User Study Identifies Latest Trends in High Performance Computing Usage and Spending, http://www.idc.com/getdoc.jsp?containerId=prUS24409313
  7. 7.
    IOzone: IOzone Filesystem Benchmark, http://www.iozone.org/
  8. 8.
    Islam, N.S., Rahman, M.W., Jose, J., Rajachandrasekar, R., Wang, H., Subramoni, H., Murthy, C., Panda, D.K.: High Performance RDMA-based Design of HDFS over InfiniBand. In: SC (2012)Google Scholar
  9. 9.
  10. 10.
    Lu, X., Islam, N.S., Rahman, M.W., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-Performance Design of Hadoop RPC with RDMA over InfiniBand. In: ICPP, France (2013)Google Scholar
  11. 11.
    OSU NBC Lab: RDMA for Apache Hadoop: High-Performance Design of Apache Hadoop over RDMA-enabled Interconnects, http://hadoop-rdma.cse.ohio-state.edu
  12. 12.
    Purdue MapReduce Benchmarks Suite (PUMA), http://web.ics.purdue.edu/
  13. 13.
    Rahman, M.W., Islam, N.S., Lu, X., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand. In: HPDIC, in Conjunction with IPDPS, Boston, MA (2013)Google Scholar
  14. 14.
    Rahman, M.W., Lu, X., Islam, N.S., Panda, D.K.: HOMR: A Hybrid Approach to Exploit Maximum Overlapping in MapReduce over High Performance Interconnects. In: ICS, Munich, Germany (2014)Google Scholar
  15. 15.
  16. 16.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: MSST, Incline Village, Nevada (2010)Google Scholar
  17. 17.
  18. 18.
    Statistical Workload Injector for MapReduce, https://github.com/SWIMProjectUCB
  19. 19.
    Sterling, T., Lusk, E., Gropp, W.: Beowulf Cluster Computing with Linux. MIT Press, Cambridge (2003)Google Scholar
  20. 20.
    Sterling, T.L., Salmon, J., Becker, D.J., Savarese, D.F.: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. MIT Press, MA (1999)Google Scholar
  21. 21.
    The Apache Software Foundation: The Apache Hadoop Project, http://hadoop.apache.org/
  22. 22.
    Top500 Supercomputing System,http://www.top500.org
  23. 23.
    Wang, Y., Que, X., Yu, W., Goldenberg, D., Sehgal, D.: Hadoop Acceleration through Network Levitated Merge. In: SC, Seattle, WA (2011)Google Scholar
  24. 24.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Md. Wasi-ur Rahman
    • 1
  • Xiaoyi Lu
    • 1
  • Nusrat Sharmin Islam
    • 1
  • Raghunath Rajachandrasekar
    • 1
  • Dhabaleswar K. (DK) Panda
    • 1
  1. 1.Department of Computer Science and EngineeringThe Ohio State UniversityUSA

Personalised recommendations