Abstract
MapReduce is a powerful paradigm that enables rapid implementation of a wide range of distributed data-intensive applications. The Hadoop project, its main open source implementation, has recently been widely adopted by the Cloud computing community. This paper aims to evaluate the cost of moving MapReduce applications to the Cloud, in order to find a proper trade-off between cost and performance for this class of applications. We provide a cost evaluation of running MapReduce applications in the Cloud, by looking into two aspects: the overhead implied by the execution of MapReduce jobs in the Cloud, compared to an execution on a Grid, and the actual costs of renting the corresponding Cloud resources. For our evaluation, we compared the runtime of 3 MapReduce applications executed with the Hadoop framework, in two environments: 1)on clusters belonging to the Grid’5000 experimental grid testbed and 2)in a Nimbus Cloud deployed on top of Grid’5000 nodes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
The Nimbus project, http://www.nimbusproject.org/
The Windows Azure Platform, http://www.microsoft.com/windowsazure/
Berriman, G.B., Juve, G., Deelman, E., et al.: The application of cloud computing to astronomy: A study of cost and performance. In: 2010 IEEE International Conference on e-Science Workshops, pp. 1–7 (2010)
Carlyle, A.G., Harrell, S.L., Smith, P.M.: Cost-effective HPC: The community or the cloud? In: The 2010 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 169–176 (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Deelman, E., Singh, G., Livny, M., et al.: The cost of doing science on the cloud: the Montage example. In: Supercomputing 2008, Piscataway, NJ, USA, pp. 1–50. IEEE Press (2008)
Evangelinos, C., Hill, C.N.: Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2. Cloud Computing and Its Applications (October 2008)
Gunarathne, T., Wu, T.-L., Qiu, J., Fox, G.: Mapreduce in the clouds for science. In: Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 565–572 (2010)
The Apache Hadoop Project, http://www.hadoop.org
HDFS. The Hadoop Distributed File System, http://hadoop.apache.org/common/docs/r0.20.1/hdfs_design.html
Hill, Z., Humphrey, M.: A quantitative analysis of high performance computing with Amazon’s EC2 infrastructure: The death of the local cluster? In: 2009 10th IEEE/ACM International Conference on Grid Computing, pp. 26–33 (October 2009)
Jégou, Y., Lantéri, S., Leduc, M., et al.: Grid’5000: a large scale and highly reconfigurable experimental Grid testbed. International Journal of High Performance Computing Applications 20(4), 481–494 (2006)
Juve, G., Deelman, E., Vahi, K., et al.: Data Sharing Options for Scientific Workflows on Amazon EC2. In: Supercomputing 2010, pp. 1–9. IEEE Computer Society, Washington, DC (2010)
Juve, G., Deelman, E., Vahi, K., et al.: Scientific Workflow Applications on Amazon EC2. In: 2009 5th IEEE International Conference on EScience Workshops, pp. 59–66 (2010)
Keahey, K., Figueiredo, R., Fortes, J., et al.: Science Clouds: Early experiences in cloud computing for scientific applications. In: Cloud Computing and Its Application 2008 (CCA 2008), Chicago (October 2008)
Kondo, D., Javadi, B., Malecot, P., et al.: Cost-benefit analysis of cloud computing versus desktop grids. In: IEEE International Symposium on Parallel Distributed Processing, pp. 1–12 (May 2009)
Palankar, M.R., Iamnitchi, A., Ripeanu, M., et al.: Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, pp. 55–64 (2008)
Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11), 1363–1369 (2009)
Walker, E.: Benchmarking Amazon EC2 for high-performance scientific computing. LOGIN 33(5), 18–23 (2008)
Amazon Elastic Compute Cloud (EC2), http://aws.amazon.com/ec2/
Amazon Elastic MapReduce, http://aws.amazon.com/elasticmapreduce/
Amazon Simple Storage Service (S3), http://aws.amazon.com/s3/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moise, D., Carpen-Amarie, A. (2012). MapReduce Applications in the Cloud: A Cost Evaluation of Computation and Storage. In: Hameurlain, A., Hussain, F.K., Morvan, F., Tjoa, A.M. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2012. Lecture Notes in Computer Science, vol 7450. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32344-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-32344-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32343-0
Online ISBN: 978-3-642-32344-7
eBook Packages: Computer ScienceComputer Science (R0)