MRBS: Towards Dependability Benchmarking for Hadoop MapReduce

  • Amit Sangroya
  • Damián Serrano
  • Sara Bouchenak
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7640)

Abstract

MapReduce is a popular programming model for distributed data processing. Extensive research has been conducted on the reliability of MapReduce, ranging from adaptive and on-demand fault-tolerance to new fault-tolerance models. However, realistic benchmarks are still missing to analyze and compare the effectiveness of these proposals. To date, most MapReduce fault-tolerance solutions have been evaluated using microbenchmarks in an ad-hoc and overly simplified setting, which may not be representative of real-world applications. This paper presents MRBS, a comprehensive benchmark suite for evaluating the dependability of MapReduce systems. MRBS includes five benchmarks covering several application domains and a wide range of execution scenarios such as data-intensive vs. compute-intensive applications, or batch applications vs. online interactive applications. MRBS allows to inject various types of faults at different rates and produces extensive reliability, availability and performance statistics. The paper illustrates the use of MRBS with Hadoop clusters.

Keywords

Benchmark Dependability MapReduce Hadoop 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI (2004)Google Scholar
  2. 2.
    Apache Hadoop, http://hadoop.apache.org
  3. 3.
    Fadika, Z., Govindaraju, M.: LEMO-MR: Low Overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications. In: IEEE CloudCom (2010)Google Scholar
  4. 4.
    Ananthanarayanan, G., Agarwal, S., Kandula, S., Greenberg, A., Stoica, I., Harlan, D., Harris, E.: Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters. In: European Conf. on Computer Systems (EuroSys) (2011)Google Scholar
  5. 5.
    Eltabakh, M., Tian, Y., Ozcan, F., Gemulla, R., Krettek, A., McPherson, J.: CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop. In: VLDB (2011)Google Scholar
  6. 6.
    Jin, H., Yang, X., Sun, X.H., Raicu, I.: ADAPT: Availability-Aware MapReduce Data Placement in Non-Dedicated Distributed Computing Environment. In: ICDCS (2012)Google Scholar
  7. 7.
    Lin, H., Ma, X., Archuleta, J., Feng, W.C., Gardner, M., Zhang, Z.: MOON: MapReduce On Opportunistic eNvironments. In: HPDC (2010)Google Scholar
  8. 8.
    Bessani, A.N., Cogo, V.V., Correia, M., Costa, P., Pasin, M., Silva, F., Arantes, L., Marin, O., Sens, P., Sopena, J.: Making Hadoop MapReduce Byzantine Fault-Tolerant. In: DSN, Fast abstract (2010)Google Scholar
  9. 9.
    Ko, S.Y., Hoque, I., Cho, B., Gupta, I.: Making Cloud Intermediate Data Fault-Tolerant. In: ACM Symp. on Cloud Computing (SoCC) (2010)Google Scholar
  10. 10.
    Liu, H., Orban, D.: Cloud MapReduce: A MapReduce Implementation on Top of a Cloud Operating System. In: CCGRID (2011)Google Scholar
  11. 11.
    Sangroya, A., Serrano, D., Bouchenak, S.: MRBS: A Comprehensive MapReduce Benchmark Suite. Research Report RR-LIG-024, LIG, Grenoble, France (February 2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Amit Sangroya
    • 1
  • Damián Serrano
    • 1
  • Sara Bouchenak
    • 1
  1. 1.University of Grenoble - LIG - INRIAGrenobleFrance

Personalised recommendations