Skip to main content

MapReduce Frame Work: Investigating Suitability for Faster Data Analytics

  • Conference paper
Advances in Computing, Communication, and Control (ICAC3 2013)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 361))

Abstract

Faster data analytics is the ability to generate the desired report in near real time. Any application that looks at an aggregated view of a stream of data can be considered as an analytic application. The demand to process vast amounts of data to produce various market trends, user behavior, fraud behavior etc. becomes not just useful, but critical to the success of the business. In the past few years, fast data, i.e., high-speed data streams, has also exploded in volume and availability. Prime examples include sensor data streams, real-time stock market data, and social-media feeds such as Twitter, Facebook etc. New models for distributed stream processing have been evolved over a time. This research investigates the suitability of Google’s MapReduce (MR) parallel programming frame work for faster data processing. Originally MapReduce systems are geared towards batch processing. This paper proposes some optimizations to original MR framework for faster distributed data processing applications using distributed shared memory to store intermediate data and use of Remote Direct Access (RDMA) technology for faster data transfer across network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. White, T.: Hadoop The Definitive Guide. Yahoo Press (January 2012)

    Google Scholar 

  2. Li, B., Mazur, E., Diao, Y., McGregor, A., Shenoy, P.: A Platform for Scalable One-Pass Analytics using MapReduce. In: International Conference on Management of Data, pp. 985–996 (2011)

    Google Scholar 

  3. hadoop.apache.org

  4. Khare, R., Sitaker, D.C.K., Rifkin, A.: Nutch: A Flexible and Scalable Open-Source Web Search Engine, Oregon State University, Commerce Net Labs Technical Report, pp. 1–10 (2004)

    Google Scholar 

  5. Logothetis, D., Olston, C., Reed, B.: Stateful Bulk Processing for Incremental Analytics. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 51–62 (2010)

    Google Scholar 

  6. Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online Aggregation. In: SIGMOD Conference, pp. 171–182 (1997)

    Google Scholar 

  7. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: Map reduce online. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, pp. 1–14 (2010)

    Google Scholar 

  8. Lam, W., Liu, L., Prasad, S.T.S., Rajaraman, A., Vacheri, Z., Doan, A.H.: Muppet: MapReduceStyle Processing of Fast Data. Proceedings of the VLDB Endowment 5(12), 1814–1825 (2012)

    Google Scholar 

  9. Elteir, M., Lin, H., Feng, W.-C.: Enhancing mapreduce via asynchronous data processing. In: IEEE 16th International Conference on Parallel and Distributed Systems, pp. 397–405 (2010)

    Google Scholar 

  10. Bu, Y., Howe, B., Balazinska, M., Ernst, M.: Haloop: efficient iterative data processing on large clusters. In: 34th International Conference on Very Large Data Bases, pp. 285–296 (2010)

    Google Scholar 

  11. Verma, A., Zea, N., Cho, B., Gupta, I., Campbell, R.H.: Breaking the MapReduce Stage Barrier. Journal on Cluster Computing, 1–16 (2011)

    Google Scholar 

  12. Yan, C., Yang, X., Yu, Z., Li, M., Li, X.: IncMR: Incremental Data Processing based on MapReduce. In: 5th IEEE International Conference on Cloud Computing, pp. 534–541 (2012)

    Google Scholar 

  13. Logothetis, D., Yocum, K.: AdHoc Data Processing in the Cloud. Proceedings of the VLDB Endowment 1(2), 1472–1475 (2008)

    Google Scholar 

  14. http://en.wikipedia.org/wiki/Tuple_space

  15. http://en.wikipedia.org/wiki/Linda_coordination_language

  16. http://java.sun.com/developer/technicalArticles/tools/JavaSpaces

  17. http://wiki.gigaspaces.com/wiki

  18. Liang, H., Noronha, R., Panda, D.K.: Swapping to Remote Memory over InfiniBand: An S Approach using a High Performance Network Block Device. In: IEEE International Conference on Cluster Computing, pp. 1–10 (2005)

    Google Scholar 

  19. Recio, R., Metzler, B., Culley, P., Hilland, J., Garcia, D.: A Remote Direct Memory Access Protocol Specification. RFC 5040

    Google Scholar 

  20. http://members.infinibandta.org/kwspub/Intro_to_IB_for_End_Users.pdf

  21. http://docs.oracle.com/javase/tutorial/sdp

  22. Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., Pasquini, R.: Incoop: MapReduce for Incremental Computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, Article No. 7, pp. 1–14 (2011)

    Google Scholar 

  23. Krevat, E., Shiran, T., Anderson, E., Tucek, J., Wylie, J.J., Ganger, G.R.: Applying performance models to understand data-intensive computing efficiency. Technical Report Carnegie Mellon University, HP Labs, pp. 1–18 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mavani, M., Ragha, L. (2013). MapReduce Frame Work: Investigating Suitability for Faster Data Analytics. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds) Advances in Computing, Communication, and Control. ICAC3 2013. Communications in Computer and Information Science, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36321-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36321-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36320-7

  • Online ISBN: 978-3-642-36321-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics