Abstract
Large scale scientific and engineering applications, and cloud auditing generate huge amounts of data. MapReduce framework coupled with cloud computing is emerging as the viable solution for distributed big data processing. Specifically, if data is generated from distributed sources and computation is also distributed then multiple clouds need to be set up to minimize data transfer, which introduces us to federated distributed or multi-domain clouds. In addition to security concerns of general clouds, distributed clouds expose new challenges to the performance of cloud based applications including cloud auditing and analysis. This book chapter focuses on a method to deploy distributed clouds and evaluates the performance of various cloud based applications over distributed clouds. It also proposes a method to optimize the performance of cloud based applications over high speed networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.N.: Tarazu: Optimizing MapReduce on heterogeneous clusters. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS’12, London, pp. 61–74. ACM, New York (2012). doi:10.1145/2150976. 2150984
apache.org, Apache Hadoop. http://goo.gl/tnkf
Babu, S.: Towards automatic optimization of mapreduce programs. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC’10, Indianapolis, pp. 137–142. ACM, New York (2010). doi:10.1145/1807128. 1807150
Bajda-Pawlikowski, K., Abadi, D.J., Silberschatz, A., Paulson, E.: Efficient processing of data warehousing queries in a split execution environment. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD’11, Athens, pp. 1165–1176. ACM, New York (2011). doi:10.1145/1989323.1989447
Cardosa, M., Wang, C., Nangia, A., Chandra, A., Weissman, J.: Exploring mapreduce efficiency with highly-distributed data. In: Proceedings of the 2nd International Workshop on MapReduce and its Applications, MapReduce’11, San Jose, pp. 27–34. ACM, New York (2011). doi:10.1145/1996092.1996100
cloudera.com, Hadoop resources. http://goo.gl/uW8nr
cron.loni.org, CRON project: Cyberinfrastructure for reconfigurable optical networking environment. http://goo.gl/yNRrn (2011)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). doi:10.1145/1327452. 1327492
Dou, A., Kalogeraki, V., Gunopulos, D., Mielikainen, T., Tuulos, V.H.: Misco: a MapReduce framework for mobile systems. In: Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments, PETRA’10, Samos, pp. 32:1–32:8. ACM, New York (2010). doi:10.1145/1839294.1839332
Eastman, C.M., Jansen, B.J.: Coverage, relevance, and ranking: the impact of query operators on Web search engine results. ACM Trans. Inf. Syst. 21(4), 383–411 (2003). doi:10.1145/944012.944015
eucalyptus.com, Open source software for building AWS-compatible private and hybrid clouds. http://goo.gl/FpdG5
geni-orca.renci.org, NEuca patch for Eucalyptus cloud computing software. http://goo.gl/UbQDG
GonzáLez-VéLez, H., Kontagora, M.: Performance evaluation of MapReduce using full virtualisation on a departmental cloud. Int. J. Appl. Math. Comput. Sci. 21(2), 275–284 (2011). doi:10.2478/ v10006-011-0020-3
Guo, Z., Fox, G., Zhou, M.: Investigation of data locality and fairness in mapreduce. In: Proceedings of 3rd International Workshop on MapReduce and its Applications, MapReduce’12, Ottawa, pp. 25–32. ACM, New York (2012). doi:10.1145/2287016.2287022
Guo, Z., Fox, G., Zhou, M.: Investigation of data locality in MapReduce. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID’12, Delft, pp. 419–426. IEEE Computer Society, Washington, DC (2012). doi:10.1109/CCGrid. 2012.42
Horey, J.: A programming framework for integrating web-based spatiotemporal sensor data with mapreduce capabilities. In: Proceedings of the ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS’10, San Jose, pp. 51–58. ACM, New York (2010). doi:10.1145/ 1878500.1878511
ibm.com, IBM SmartCloud services. http://goo.gl/BMhy2
iet.unipi.it, The dummynet project. http://goo.gl/smxg5
Jahani, E., Cafarella, M.J., Ré, C.: Automatic optimization for MapReduce programs. Proc. VLDB Endow. 4(6), 385–396 (2011)
Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The performance of MapReduce: an in-depth study. Proc. VLDB Endow. 3(1–2), 472–483 (2010)
Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’10, Austin, pp. 938–948. Society for Industrial and Applied Mathematics, Philadelphia (2010)
Kondikoppa, P., Chiu, C.H., Cui, C., Xue, L., Park, S.J.: Network-aware scheduling of MapReduce framework ondistributed clusters over high speed networks. In: Proceedings of the 2012 Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit, FederatedClouds’12, San Jose, pp. 39–44. ACM, New York (2012). doi:10.1145/2378975.2378985
Lämmel, R.: Google’s MapReduce programming model – Revisited. Sci. Comput. Program. 70(1), 1–30 (2008). doi:10.1016/j.scico.2007.07.001
Li, M., Subhraveti, D., Butt, A.R., Khasymski, A., Sarkar, P.: CAM: a topology aware minimum cost flow based resource manager for MapReduce applications in the cloud. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC’12, Delft, pp. 211–222. ACM, New York (2012). doi:10.1145/2287076.2287110
Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ICUIMC’12, Kuala Lumpur, pp. 76:1–76:8. ACM, New York (2012). doi:10.1145/2184751.2184842
Liu, J., Ravi, N., Chakradhar, S., Kandemir, M.: Panacea: towards holistic optimization of MapReduce applications. In: Proceedings of the 10th International Symposium on Code Generation and Optimization, CGO’12, San Jose, pp. 33–43. ACM, New York (2012). doi:10.1145/ 2259016.2259022
Luo, Y., Guo, Z., Sun, Y., Plale, B., Qiu, J., Li, W.W.: A hierarchical framework for cross-domain MapReduce execution. In: Proceedings of the 2nd International Workshop on Emerging Computational Methods for the Life Sciences, ECMLS’11, San Jose, pp. 15–22. ACM, New York (2011). doi:10.1145/1996023.1996026
Luo, Y., Plale, B.: Hierarchical mapreduce programming model and scheduling algorithms. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID’12, Ottawa, pp. 769–774. IEEE Computer Society, Washington, DC (2012). doi:10.1109/CCGrid.2012.132
Mandal, A., Xin, Y., Baldine, I., Ruth, P., Heerman, C., Chase, J., Orlikowski, V., Yumerefendi, A.: Provisioning and evaluating multi-domain networked clouds for Hadoop-based applications. In: Proceedings of the 3rd IEEE International Conference on Cloud Computing Technology and Science, CLOUDCOM’11, Athens, pp. 690–697. IEEE Computer Society, Washington, DC (2011). doi:10.1109/ CloudCom.2011.107
Mantha, P.K., Luckow, A., Jha, S.: Pilot-mapreduce: an extensible and flexible mapreduce implementation for distributed data. In: Proceedings of 3rd International Workshop on MapReduce and its Applications Date, MapReduce’12, Delft, pp. 17–24. ACM, New York (2012). doi:10.1145/ 2287016.2287020
Molina-Estolano, E., Gokhale, M., Maltzahn, C., May, J., Bent, J., Brandt, S.: Mixing Hadoop and HPC workloads on parallel filesystems. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, PDSW’09, Portland, pp. 1–5. ACM, New York (2009). doi:10.1145/ 1713072.1713074
Noll, M.G.: michael-noll.com, Benchmarking and stress testing an Hadoop cluster with TeraSort, TestDFSIO. http://goo.gl/zhxD5 (2011)
OpenStack: Open source software for building private and public clouds. http://goo.gl/sWpx
Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for mapreduce in a cloud. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC’11, Seattle, pp. 58:1–58:11. ACM, New York (2011). doi:10.1145/2063384.2063462
rackspace.com, The Rackspace cloud. http://goo.gl/5b2iU
Sandholm, T., Lai, K.: MapReduce optimization using regulated dynamic prioritization. In: Proceedings of the 11th International Joint Conference on Measurement and Modeling of Computer Systems, SIGMETRICS’09, Seattle, pp. 299–310. ACM, New York (2009). doi:10. 1145/1555349.1555384
Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11), 1363–1369 (2009). doi:10.1093/ bioinformatics/btp236
scripps.edu, Auto dock vina from scripps lab. http://goo.gl/JM2Pw
Tan, J., Meng, X., Zhang, L.: Delay tails in MapReduce scheduling. SIGMETRICS Perform. Evaluation Rev. 40(1), 5–16 (2012). doi:10. 1145/2318857.2254761
Vaquero, L.M., Rodero-Merino, L., Buyya, R.: Dynamically scaling applications in the cloud. SIGCOMM Comput. Commun. Rev. 41(1), 45–52 (2011). doi:10.1145/1925861.1925869
Verma, A., Cherkasova, L., Campbell, R.H.: Resource provisioning framework for MapReduce jobs with performance goals. In: Proceedings of the 12th International Middleware Conference, Middleware’11, Lisbon, pp. 160–179. International Federation for Information Processing, Laxenburg (2011)
Wang, G., Butt, A.R., Pandey, P., Gupta, K.: Using realistic simulation for performance analysis of mapreduce setups. In: Proceedings of the 1st ACM Workshop on Large-Scale System and Application Performance, LSAP’09, Garching, pp. 19–26. ACM, New York (2009). doi:10.1145/ 1552272.1552278
Xu, H., Li, Z., Guo, S., Chen, K.: CloudVista: interactive and economical visual cluster analysis for big data in the cloud. Proc. VLDB Endow. (PVLDB) 5(12), 1886–1889 (2012)
Zhang, J., Li, T., Pan, Y.: Parallel rough set based knowledge acquisition using mapreduce from big data. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine ’12, Beijing, pp. 20–27. ACM, New York (2012). doi:10.1145/2351316. 2351320
Zhou, C.: Fast parallelization of differential evolution algorithm using MapReduce. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO’10, Portland, pp. 1113–1114. ACM, New York (2010). doi:10.1145/1830483.1830689
Acknowledgements
This material is based upon work partially supported by the National Science Foundation (NSF) GENI-grant and grant MRI-0821741 (CRON project), the Department of Defense Experimental Program to Stimulate Competitive Research (DEPSCoR) N0014-08-1-0856, and the Air Force Research Laboratory (AFRL) Visiting Faculty Research Program (VFRP) extension grant LRIR 11RI01COR. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agency.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kondikoppa, P., Chiu, CH., Park, SJ. (2014). MapReduce Performance in Federated Cloud Computing Environments. In: Han, K., Choi, BY., Song, S. (eds) High Performance Cloud Auditing and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3296-8_12
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3296-8_12
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3295-1
Online ISBN: 978-1-4614-3296-8
eBook Packages: EngineeringEngineering (R0)