MapReduce Performance in Federated Cloud Computing Environments

Kondikoppa, Praveenkumar; Chiu, Chui-Hui; Park, Seung-Jong

doi:10.1007/978-1-4614-3296-8_12

Praveenkumar Kondikoppa⁴,
Chui-Hui Chiu⁴ &
Seung-Jong Park⁴

1662 Accesses

Abstract

Large scale scientific and engineering applications, and cloud auditing generate huge amounts of data. MapReduce framework coupled with cloud computing is emerging as the viable solution for distributed big data processing. Specifically, if data is generated from distributed sources and computation is also distributed then multiple clouds need to be set up to minimize data transfer, which introduces us to federated distributed or multi-domain clouds. In addition to security concerns of general clouds, distributed clouds expose new challenges to the performance of cloud based applications including cloud auditing and analysis. This book chapter focuses on a method to deploy distributed clouds and evaluates the performance of various cloud based applications over distributed clouds. It also proposes a method to optimize the performance of cloud based applications over high speed networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.N.: Tarazu: Optimizing MapReduce on heterogeneous clusters. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS’12, London, pp. 61–74. ACM, New York (2012). doi:10.1145/2150976. 2150984
Google Scholar
apache.org, Apache Hadoop. http://goo.gl/tnkf
Babu, S.: Towards automatic optimization of mapreduce programs. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC’10, Indianapolis, pp. 137–142. ACM, New York (2010). doi:10.1145/1807128. 1807150
Google Scholar
Bajda-Pawlikowski, K., Abadi, D.J., Silberschatz, A., Paulson, E.: Efficient processing of data warehousing queries in a split execution environment. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD’11, Athens, pp. 1165–1176. ACM, New York (2011). doi:10.1145/1989323.1989447
Google Scholar
Cardosa, M., Wang, C., Nangia, A., Chandra, A., Weissman, J.: Exploring mapreduce efficiency with highly-distributed data. In: Proceedings of the 2nd International Workshop on MapReduce and its Applications, MapReduce’11, San Jose, pp. 27–34. ACM, New York (2011). doi:10.1145/1996092.1996100
Google Scholar
cloudera.com, Hadoop resources. http://goo.gl/uW8nr
cron.loni.org, CRON project: Cyberinfrastructure for reconfigurable optical networking environment. http://goo.gl/yNRrn (2011)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). doi:10.1145/1327452. 1327492
Article Google Scholar
Dou, A., Kalogeraki, V., Gunopulos, D., Mielikainen, T., Tuulos, V.H.: Misco: a MapReduce framework for mobile systems. In: Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments, PETRA’10, Samos, pp. 32:1–32:8. ACM, New York (2010). doi:10.1145/1839294.1839332
Google Scholar
Eastman, C.M., Jansen, B.J.: Coverage, relevance, and ranking: the impact of query operators on Web search engine results. ACM Trans. Inf. Syst. 21(4), 383–411 (2003). doi:10.1145/944012.944015
Article Google Scholar
eucalyptus.com, Open source software for building AWS-compatible private and hybrid clouds. http://goo.gl/FpdG5
geni-orca.renci.org, NEuca patch for Eucalyptus cloud computing software. http://goo.gl/UbQDG
GonzáLez-VéLez, H., Kontagora, M.: Performance evaluation of MapReduce using full virtualisation on a departmental cloud. Int. J. Appl. Math. Comput. Sci. 21(2), 275–284 (2011). doi:10.2478/ v10006-011-0020-3
Article MATH Google Scholar
Guo, Z., Fox, G., Zhou, M.: Investigation of data locality and fairness in mapreduce. In: Proceedings of 3rd International Workshop on MapReduce and its Applications, MapReduce’12, Ottawa, pp. 25–32. ACM, New York (2012). doi:10.1145/2287016.2287022
Google Scholar
Guo, Z., Fox, G., Zhou, M.: Investigation of data locality in MapReduce. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID’12, Delft, pp. 419–426. IEEE Computer Society, Washington, DC (2012). doi:10.1109/CCGrid. 2012.42
Google Scholar
Horey, J.: A programming framework for integrating web-based spatiotemporal sensor data with mapreduce capabilities. In: Proceedings of the ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS’10, San Jose, pp. 51–58. ACM, New York (2010). doi:10.1145/ 1878500.1878511
Google Scholar
ibm.com, IBM SmartCloud services. http://goo.gl/BMhy2
iet.unipi.it, The dummynet project. http://goo.gl/smxg5
Jahani, E., Cafarella, M.J., Ré, C.: Automatic optimization for MapReduce programs. Proc. VLDB Endow. 4(6), 385–396 (2011)
Google Scholar
Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The performance of MapReduce: an in-depth study. Proc. VLDB Endow. 3(1–2), 472–483 (2010)
Google Scholar
Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’10, Austin, pp. 938–948. Society for Industrial and Applied Mathematics, Philadelphia (2010)
Google Scholar
Kondikoppa, P., Chiu, C.H., Cui, C., Xue, L., Park, S.J.: Network-aware scheduling of MapReduce framework ondistributed clusters over high speed networks. In: Proceedings of the 2012 Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit, FederatedClouds’12, San Jose, pp. 39–44. ACM, New York (2012). doi:10.1145/2378975.2378985
Google Scholar
Lämmel, R.: Google’s MapReduce programming model – Revisited. Sci. Comput. Program. 70(1), 1–30 (2008). doi:10.1016/j.scico.2007.07.001
Article MATH Google Scholar
Li, M., Subhraveti, D., Butt, A.R., Khasymski, A., Sarkar, P.: CAM: a topology aware minimum cost flow based resource manager for MapReduce applications in the cloud. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC’12, Delft, pp. 211–222. ACM, New York (2012). doi:10.1145/2287076.2287110
Google Scholar
Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ICUIMC’12, Kuala Lumpur, pp. 76:1–76:8. ACM, New York (2012). doi:10.1145/2184751.2184842
Google Scholar
Liu, J., Ravi, N., Chakradhar, S., Kandemir, M.: Panacea: towards holistic optimization of MapReduce applications. In: Proceedings of the 10th International Symposium on Code Generation and Optimization, CGO’12, San Jose, pp. 33–43. ACM, New York (2012). doi:10.1145/ 2259016.2259022
Google Scholar
Luo, Y., Guo, Z., Sun, Y., Plale, B., Qiu, J., Li, W.W.: A hierarchical framework for cross-domain MapReduce execution. In: Proceedings of the 2nd International Workshop on Emerging Computational Methods for the Life Sciences, ECMLS’11, San Jose, pp. 15–22. ACM, New York (2011). doi:10.1145/1996023.1996026
Google Scholar
Luo, Y., Plale, B.: Hierarchical mapreduce programming model and scheduling algorithms. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID’12, Ottawa, pp. 769–774. IEEE Computer Society, Washington, DC (2012). doi:10.1109/CCGrid.2012.132
Google Scholar
Mandal, A., Xin, Y., Baldine, I., Ruth, P., Heerman, C., Chase, J., Orlikowski, V., Yumerefendi, A.: Provisioning and evaluating multi-domain networked clouds for Hadoop-based applications. In: Proceedings of the 3rd IEEE International Conference on Cloud Computing Technology and Science, CLOUDCOM’11, Athens, pp. 690–697. IEEE Computer Society, Washington, DC (2011). doi:10.1109/ CloudCom.2011.107
Google Scholar
Mantha, P.K., Luckow, A., Jha, S.: Pilot-mapreduce: an extensible and flexible mapreduce implementation for distributed data. In: Proceedings of 3rd International Workshop on MapReduce and its Applications Date, MapReduce’12, Delft, pp. 17–24. ACM, New York (2012). doi:10.1145/ 2287016.2287020
Google Scholar
Molina-Estolano, E., Gokhale, M., Maltzahn, C., May, J., Bent, J., Brandt, S.: Mixing Hadoop and HPC workloads on parallel filesystems. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, PDSW’09, Portland, pp. 1–5. ACM, New York (2009). doi:10.1145/ 1713072.1713074
Google Scholar
Noll, M.G.: michael-noll.com, Benchmarking and stress testing an Hadoop cluster with TeraSort, TestDFSIO. http://goo.gl/zhxD5 (2011)
OpenStack: Open source software for building private and public clouds. http://goo.gl/sWpx
Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for mapreduce in a cloud. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC’11, Seattle, pp. 58:1–58:11. ACM, New York (2011). doi:10.1145/2063384.2063462
Google Scholar
rackspace.com, The Rackspace cloud. http://goo.gl/5b2iU
Sandholm, T., Lai, K.: MapReduce optimization using regulated dynamic prioritization. In: Proceedings of the 11th International Joint Conference on Measurement and Modeling of Computer Systems, SIGMETRICS’09, Seattle, pp. 299–310. ACM, New York (2009). doi:10. 1145/1555349.1555384
Google Scholar
Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11), 1363–1369 (2009). doi:10.1093/ bioinformatics/btp236
Article Google Scholar
scripps.edu, Auto dock vina from scripps lab. http://goo.gl/JM2Pw
Tan, J., Meng, X., Zhang, L.: Delay tails in MapReduce scheduling. SIGMETRICS Perform. Evaluation Rev. 40(1), 5–16 (2012). doi:10. 1145/2318857.2254761
Google Scholar
Vaquero, L.M., Rodero-Merino, L., Buyya, R.: Dynamically scaling applications in the cloud. SIGCOMM Comput. Commun. Rev. 41(1), 45–52 (2011). doi:10.1145/1925861.1925869
Article Google Scholar
Verma, A., Cherkasova, L., Campbell, R.H.: Resource provisioning framework for MapReduce jobs with performance goals. In: Proceedings of the 12th International Middleware Conference, Middleware’11, Lisbon, pp. 160–179. International Federation for Information Processing, Laxenburg (2011)
Google Scholar
Wang, G., Butt, A.R., Pandey, P., Gupta, K.: Using realistic simulation for performance analysis of mapreduce setups. In: Proceedings of the 1st ACM Workshop on Large-Scale System and Application Performance, LSAP’09, Garching, pp. 19–26. ACM, New York (2009). doi:10.1145/ 1552272.1552278
Google Scholar
Xu, H., Li, Z., Guo, S., Chen, K.: CloudVista: interactive and economical visual cluster analysis for big data in the cloud. Proc. VLDB Endow. (PVLDB) 5(12), 1886–1889 (2012)
Google Scholar
Zhang, J., Li, T., Pan, Y.: Parallel rough set based knowledge acquisition using mapreduce from big data. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine ’12, Beijing, pp. 20–27. ACM, New York (2012). doi:10.1145/2351316. 2351320
Google Scholar
Zhou, C.: Fast parallelization of differential evolution algorithm using MapReduce. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO’10, Portland, pp. 1113–1114. ACM, New York (2010). doi:10.1145/1830483.1830689
Google Scholar

Download references

Acknowledgements

This material is based upon work partially supported by the National Science Foundation (NSF) GENI-grant and grant MRI-0821741 (CRON project), the Department of Defense Experimental Program to Stimulate Competitive Research (DEPSCoR) N0014-08-1-0856, and the Air Force Research Laboratory (AFRL) Visiting Faculty Research Program (VFRP) extension grant LRIR 11RI01COR. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agency.

Author information

Authors and Affiliations

Louisiana State University, Baton Rouge, LA, USA
Praveenkumar Kondikoppa, Chui-Hui Chiu & Seung-Jong Park

Authors

Praveenkumar Kondikoppa
View author publications
You can also search for this author in PubMed Google Scholar
Chui-Hui Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Seung-Jong Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Praveenkumar Kondikoppa .

Editor information

Editors and Affiliations

Air Force Research Laboratory, Rome, New York, USA
Keesook J. Han
School of Computing and Engineering, University of Missouri - Kansas City, Kansas City, Missouri, USA
Baek-Young Choi
Department of Engineering Technology The Dwight Look College of Engineering, Texas A&M University, College Station, Texas, USA
Sejun Song

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kondikoppa, P., Chiu, CH., Park, SJ. (2014). MapReduce Performance in Federated Cloud Computing Environments. In: Han, K., Choi, BY., Song, S. (eds) High Performance Cloud Auditing and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3296-8_12

Download citation

DOI: https://doi.org/10.1007/978-1-4614-3296-8_12
Published: 01 August 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3295-1
Online ISBN: 978-1-4614-3296-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics