Abstract
Faster data analytics is the ability to generate the desired report in near real time. Any application that looks at an aggregated view of a stream of data can be considered as an analytic application. The demand to process vast amounts of data to produce various market trends, user behavior, fraud behavior etc. becomes not just useful, but critical to the success of the business. In the past few years, fast data, i.e., high-speed data streams, has also exploded in volume and availability. Prime examples include sensor data streams, real-time stock market data, and social-media feeds such as Twitter, Facebook etc. New models for distributed stream processing have been evolved over a time. This research investigates the suitability of Google’s MapReduce (MR) parallel programming frame work for faster data processing. Originally MapReduce systems are geared towards batch processing. This paper proposes some optimizations to original MR framework for faster distributed data processing applications using distributed shared memory to store intermediate data and use of Remote Direct Access (RDMA) technology for faster data transfer across network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
White, T.: Hadoop The Definitive Guide. Yahoo Press (January 2012)
Li, B., Mazur, E., Diao, Y., McGregor, A., Shenoy, P.: A Platform for Scalable One-Pass Analytics using MapReduce. In: International Conference on Management of Data, pp. 985–996 (2011)
Khare, R., Sitaker, D.C.K., Rifkin, A.: Nutch: A Flexible and Scalable Open-Source Web Search Engine, Oregon State University, Commerce Net Labs Technical Report, pp. 1–10 (2004)
Logothetis, D., Olston, C., Reed, B.: Stateful Bulk Processing for Incremental Analytics. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 51–62 (2010)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online Aggregation. In: SIGMOD Conference, pp. 171–182 (1997)
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: Map reduce online. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, pp. 1–14 (2010)
Lam, W., Liu, L., Prasad, S.T.S., Rajaraman, A., Vacheri, Z., Doan, A.H.: Muppet: MapReduceStyle Processing of Fast Data. Proceedings of the VLDB Endowment 5(12), 1814–1825 (2012)
Elteir, M., Lin, H., Feng, W.-C.: Enhancing mapreduce via asynchronous data processing. In: IEEE 16th International Conference on Parallel and Distributed Systems, pp. 397–405 (2010)
Bu, Y., Howe, B., Balazinska, M., Ernst, M.: Haloop: efficient iterative data processing on large clusters. In: 34th International Conference on Very Large Data Bases, pp. 285–296 (2010)
Verma, A., Zea, N., Cho, B., Gupta, I., Campbell, R.H.: Breaking the MapReduce Stage Barrier. Journal on Cluster Computing, 1–16 (2011)
Yan, C., Yang, X., Yu, Z., Li, M., Li, X.: IncMR: Incremental Data Processing based on MapReduce. In: 5th IEEE International Conference on Cloud Computing, pp. 534–541 (2012)
Logothetis, D., Yocum, K.: AdHoc Data Processing in the Cloud. Proceedings of the VLDB Endowment 1(2), 1472–1475 (2008)
http://java.sun.com/developer/technicalArticles/tools/JavaSpaces
Liang, H., Noronha, R., Panda, D.K.: Swapping to Remote Memory over InfiniBand: An S Approach using a High Performance Network Block Device. In: IEEE International Conference on Cluster Computing, pp. 1–10 (2005)
Recio, R., Metzler, B., Culley, P., Hilland, J., Garcia, D.: A Remote Direct Memory Access Protocol Specification. RFC 5040
http://members.infinibandta.org/kwspub/Intro_to_IB_for_End_Users.pdf
Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., Pasquini, R.: Incoop: MapReduce for Incremental Computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, Article No. 7, pp. 1–14 (2011)
Krevat, E., Shiran, T., Anderson, E., Tucek, J., Wylie, J.J., Ganger, G.R.: Applying performance models to understand data-intensive computing efficiency. Technical Report Carnegie Mellon University, HP Labs, pp. 1–18 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mavani, M., Ragha, L. (2013). MapReduce Frame Work: Investigating Suitability for Faster Data Analytics. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds) Advances in Computing, Communication, and Control. ICAC3 2013. Communications in Computer and Information Science, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36321-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-36321-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36320-7
Online ISBN: 978-3-642-36321-4
eBook Packages: Computer ScienceComputer Science (R0)