Skip to main content

MapReduce Applications in the Cloud: A Cost Evaluation of Computation and Storage

  • Conference paper
Book cover Data Management in Cloud, Grid and P2P Systems (Globe 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7450))

Abstract

MapReduce is a powerful paradigm that enables rapid implementation of a wide range of distributed data-intensive applications. The Hadoop project, its main open source implementation, has recently been widely adopted by the Cloud computing community. This paper aims to evaluate the cost of moving MapReduce applications to the Cloud, in order to find a proper trade-off between cost and performance for this class of applications. We provide a cost evaluation of running MapReduce applications in the Cloud, by looking into two aspects: the overhead implied by the execution of MapReduce jobs in the Cloud, compared to an execution on a Grid, and the actual costs of renting the corresponding Cloud resources. For our evaluation, we compared the runtime of 3 MapReduce applications executed with the Hadoop framework, in two environments: 1)on clusters belonging to the Grid’5000 experimental grid testbed and 2)in a Nimbus Cloud deployed on top of Grid’5000 nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The Nimbus project, http://www.nimbusproject.org/

  2. The Windows Azure Platform, http://www.microsoft.com/windowsazure/

  3. Berriman, G.B., Juve, G., Deelman, E., et al.: The application of cloud computing to astronomy: A study of cost and performance. In: 2010 IEEE International Conference on e-Science Workshops, pp. 1–7 (2010)

    Google Scholar 

  4. Carlyle, A.G., Harrell, S.L., Smith, P.M.: Cost-effective HPC: The community or the cloud? In: The 2010 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 169–176 (2010)

    Google Scholar 

  5. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  6. Deelman, E., Singh, G., Livny, M., et al.: The cost of doing science on the cloud: the Montage example. In: Supercomputing 2008, Piscataway, NJ, USA, pp. 1–50. IEEE Press (2008)

    Google Scholar 

  7. Evangelinos, C., Hill, C.N.: Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2. Cloud Computing and Its Applications (October 2008)

    Google Scholar 

  8. Gunarathne, T., Wu, T.-L., Qiu, J., Fox, G.: Mapreduce in the clouds for science. In: Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 565–572 (2010)

    Google Scholar 

  9. The Apache Hadoop Project, http://www.hadoop.org

  10. HDFS. The Hadoop Distributed File System, http://hadoop.apache.org/common/docs/r0.20.1/hdfs_design.html

  11. Hill, Z., Humphrey, M.: A quantitative analysis of high performance computing with Amazon’s EC2 infrastructure: The death of the local cluster? In: 2009 10th IEEE/ACM International Conference on Grid Computing, pp. 26–33 (October 2009)

    Google Scholar 

  12. Jégou, Y., Lantéri, S., Leduc, M., et al.: Grid’5000: a large scale and highly reconfigurable experimental Grid testbed. International Journal of High Performance Computing Applications 20(4), 481–494 (2006)

    Article  Google Scholar 

  13. Juve, G., Deelman, E., Vahi, K., et al.: Data Sharing Options for Scientific Workflows on Amazon EC2. In: Supercomputing 2010, pp. 1–9. IEEE Computer Society, Washington, DC (2010)

    Google Scholar 

  14. Juve, G., Deelman, E., Vahi, K., et al.: Scientific Workflow Applications on Amazon EC2. In: 2009 5th IEEE International Conference on EScience Workshops, pp. 59–66 (2010)

    Google Scholar 

  15. Keahey, K., Figueiredo, R., Fortes, J., et al.: Science Clouds: Early experiences in cloud computing for scientific applications. In: Cloud Computing and Its Application 2008 (CCA 2008), Chicago (October 2008)

    Google Scholar 

  16. Kondo, D., Javadi, B., Malecot, P., et al.: Cost-benefit analysis of cloud computing versus desktop grids. In: IEEE International Symposium on Parallel Distributed Processing, pp. 1–12 (May 2009)

    Google Scholar 

  17. Palankar, M.R., Iamnitchi, A., Ripeanu, M., et al.: Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, pp. 55–64 (2008)

    Google Scholar 

  18. Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11), 1363–1369 (2009)

    Article  Google Scholar 

  19. Walker, E.: Benchmarking Amazon EC2 for high-performance scientific computing. LOGIN 33(5), 18–23 (2008)

    Google Scholar 

  20. Amazon Elastic Compute Cloud (EC2), http://aws.amazon.com/ec2/

  21. Amazon Elastic MapReduce, http://aws.amazon.com/elasticmapreduce/

  22. Amazon Simple Storage Service (S3), http://aws.amazon.com/s3/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moise, D., Carpen-Amarie, A. (2012). MapReduce Applications in the Cloud: A Cost Evaluation of Computation and Storage. In: Hameurlain, A., Hussain, F.K., Morvan, F., Tjoa, A.M. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2012. Lecture Notes in Computer Science, vol 7450. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32344-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32344-7_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32343-0

  • Online ISBN: 978-3-642-32344-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics