Skip to main content

Big Data Processing Algorithms

  • Chapter
  • First Online:
Big Data

Part of the book series: Studies in Big Data ((SBD,volume 11))

Abstract

Information has been growing large enough to realize the need to extend traditional algorithms to scale. Since the data cannot fit in memory and is distributed across machines, the algorithms should also comply with the distributed storage. This chapter introduces some of the algorithms to work on such distributed storage and to scale with massive data. The algorithms, called Big Data Processing Algorithms, comprise random walks, distributed hash tables, streaming, bulk synchronous processing (BSP), and MapReduce paradigms. Each of these algorithms is unique in its approach and fits certain problems. The goal of the algorithms is to reduce network communications in the distributed network, minimize the data movements, bring down synchronous delays, and optimize computational resources. Data to be processed where it resides, peer-to-peer-based network communications, computational and aggregation components for synchronization are some of the techniques being used in these algorithms to achieve the goals. MapReduce has been adopted in Big Data problems widely. This chapter demonstrates how MapReduce enables analytics to process massive data with ease. This chapter also provides example applications and codebase for readers to start hands-on with the algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tole, A.A.: Big data challenges. Database Syst. J. 4(3), 31–40 (2013)

    MathSciNet  Google Scholar 

  2. Von Neumann, J.: First draft of a report on the EDVAC. IEEE Ann. Hist. Comput. 15(4), 27–75 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  3. Riesen, R., Brightwell, R., Maccabe, A.B.: Differences between distributed and parallel systems. In: SAND98-2221, Unlimited Release, Printed October 1998. Available via http://www.cs.sandia.gov/rbbrigh/papers/distpar.pdf. (1998)

  4. Israeli, A., Jalfon, M.: Token management schemes and random walks yield self-stabilizing mutual exclusion. In: Proceedings of the Ninth Annual ACM Symposium on Principles of Distributed Computing (PODC ‘90), pp. 119–131. ACM, New York (1990)

    Google Scholar 

  5. Gribble, S.D., et al.: Scalable, distributed data structures for internet service construction. In: Proceedings of the 4th Conference on Symposium on Operating System Design and Implementation, vol. 4. USENIX Association (2000)

    Google Scholar 

  6. Gerbessiotis, Alexandros V., Valiant, Leslie G.: Direct bulk-synchronous parallel algorithms. J. Parallel Distrib. Comput. 22(2), 251–267 (1994)

    Article  Google Scholar 

  7. Leslie, G.V.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  9. Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website (2007). Available via https://svn.eu.apache.org/repos/asf/hadoop/common/tags/release-0.16.3/docs/hdfs_design.pdf

  10. Hadoop’ MapReduce Tutorial, Last updated on 08/04/2013. Available at http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

  11. Vavilapalli, V.K., et al.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing. ACM (2013)

    Google Scholar 

  12. Zhang, Y., et al.: iMAPreduce: a distributed computing framework for iterative computation. J. Grid Comput. 10(1), 47–68 (2012)

    Article  Google Scholar 

  13. Chu, C., et al.: Map-reduce for machine learning on multicore. Adv. Neural Inf. Process. Syst. 19, 281 (2007)

    Google Scholar 

  14. Zhao, W., Huifang, M., Qing, H.: Parallel k-means clustering based on mapreduce. Cloud Computing, pp. 674–679. Springer, Berlin (2009)

    Google Scholar 

  15. Martha, V.S.: GraphStore: a distributed graph storage system for big data networks. Dissertaion, University of Arkansas at Little Rock (2013). Available at http://gradworks.umi.com/35/87/3587625.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to VenkataSwamy Martha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer India

About this chapter

Cite this chapter

Martha, V. (2015). Big Data Processing Algorithms. In: Mohanty, H., Bhuyan, P., Chenthati, D. (eds) Big Data. Studies in Big Data, vol 11. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2494-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2494-5_3

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2493-8

  • Online ISBN: 978-81-322-2494-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics