MapReduce Applications in the Cloud: A Cost Evaluation of Computation and Storage

  • Diana Moise
  • Alexandra Carpen-Amarie
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7450)

Abstract

MapReduce is a powerful paradigm that enables rapid implementation of a wide range of distributed data-intensive applications. The Hadoop project, its main open source implementation, has recently been widely adopted by the Cloud computing community. This paper aims to evaluate the cost of moving MapReduce applications to the Cloud, in order to find a proper trade-off between cost and performance for this class of applications. We provide a cost evaluation of running MapReduce applications in the Cloud, by looking into two aspects: the overhead implied by the execution of MapReduce jobs in the Cloud, compared to an execution on a Grid, and the actual costs of renting the corresponding Cloud resources. For our evaluation, we compared the runtime of 3 MapReduce applications executed with the Hadoop framework, in two environments: 1)on clusters belonging to the Grid’5000 experimental grid testbed and 2)in a Nimbus Cloud deployed on top of Grid’5000 nodes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    The Nimbus project, http://www.nimbusproject.org/
  2. 2.
    The Windows Azure Platform, http://www.microsoft.com/windowsazure/
  3. 3.
    Berriman, G.B., Juve, G., Deelman, E., et al.: The application of cloud computing to astronomy: A study of cost and performance. In: 2010 IEEE International Conference on e-Science Workshops, pp. 1–7 (2010)Google Scholar
  4. 4.
    Carlyle, A.G., Harrell, S.L., Smith, P.M.: Cost-effective HPC: The community or the cloud? In: The 2010 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 169–176 (2010)Google Scholar
  5. 5.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  6. 6.
    Deelman, E., Singh, G., Livny, M., et al.: The cost of doing science on the cloud: the Montage example. In: Supercomputing 2008, Piscataway, NJ, USA, pp. 1–50. IEEE Press (2008)Google Scholar
  7. 7.
    Evangelinos, C., Hill, C.N.: Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2. Cloud Computing and Its Applications (October 2008)Google Scholar
  8. 8.
    Gunarathne, T., Wu, T.-L., Qiu, J., Fox, G.: Mapreduce in the clouds for science. In: Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 565–572 (2010)Google Scholar
  9. 9.
    The Apache Hadoop Project, http://www.hadoop.org
  10. 10.
    HDFS. The Hadoop Distributed File System, http://hadoop.apache.org/common/docs/r0.20.1/hdfs_design.html
  11. 11.
    Hill, Z., Humphrey, M.: A quantitative analysis of high performance computing with Amazon’s EC2 infrastructure: The death of the local cluster? In: 2009 10th IEEE/ACM International Conference on Grid Computing, pp. 26–33 (October 2009)Google Scholar
  12. 12.
    Jégou, Y., Lantéri, S., Leduc, M., et al.: Grid’5000: a large scale and highly reconfigurable experimental Grid testbed. International Journal of High Performance Computing Applications 20(4), 481–494 (2006)CrossRefGoogle Scholar
  13. 13.
    Juve, G., Deelman, E., Vahi, K., et al.: Data Sharing Options for Scientific Workflows on Amazon EC2. In: Supercomputing 2010, pp. 1–9. IEEE Computer Society, Washington, DC (2010)Google Scholar
  14. 14.
    Juve, G., Deelman, E., Vahi, K., et al.: Scientific Workflow Applications on Amazon EC2. In: 2009 5th IEEE International Conference on EScience Workshops, pp. 59–66 (2010)Google Scholar
  15. 15.
    Keahey, K., Figueiredo, R., Fortes, J., et al.: Science Clouds: Early experiences in cloud computing for scientific applications. In: Cloud Computing and Its Application 2008 (CCA 2008), Chicago (October 2008)Google Scholar
  16. 16.
    Kondo, D., Javadi, B., Malecot, P., et al.: Cost-benefit analysis of cloud computing versus desktop grids. In: IEEE International Symposium on Parallel Distributed Processing, pp. 1–12 (May 2009)Google Scholar
  17. 17.
    Palankar, M.R., Iamnitchi, A., Ripeanu, M., et al.: Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, pp. 55–64 (2008)Google Scholar
  18. 18.
    Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11), 1363–1369 (2009)CrossRefGoogle Scholar
  19. 19.
    Walker, E.: Benchmarking Amazon EC2 for high-performance scientific computing. LOGIN 33(5), 18–23 (2008)Google Scholar
  20. 20.
    Amazon Elastic Compute Cloud (EC2), http://aws.amazon.com/ec2/
  21. 21.
    Amazon Elastic MapReduce, http://aws.amazon.com/elasticmapreduce/
  22. 22.
    Amazon Simple Storage Service (S3), http://aws.amazon.com/s3/

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Diana Moise
    • 1
  • Alexandra Carpen-Amarie
    • 1
  1. 1.INRIA Rennes - Bretagne Atlantique / IRISAFrance

Personalised recommendations