Big Data Computing with Distributed Computing Frameworks

  • Gurjit Singh BhathalEmail author
  • Amardeep Singh
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 65)


Big Data volume, velocity, and veracity characteristics are both advantageous and disadvantageous during handling large amount of data. It is really difficult to process, store, and analyze data using traditional approaches as such. To process data in very small span of time, we require a modified or new technology which can extract those values from the data which are obsolete with time. The distributed computing frameworks come into the picture when it is not possible to analyze huge volume of data in short timeframe by a single system. Distributed Computing is the technology which can handle such type of situations because this technology is foundational technology for cluster computing and cloud computing. Distributed Computing compute large datasets dividing into the small pieces across nodes. Numbers of nodes are connected through communication network and work as a single computing environment and compute parallel, to solve a specific problem. Hadoop is an open-source framework that takes advantage of Distributed Computing. It is implemented by MapReduce programming model for distributed processing and Hadoop Distributed File System (HDFS) for distributed storage. With time, there has been an evolution of other fast processing programming models such as Spark, Strom, and Flink for stream and real-time processing also used Distributed Computing concepts.


Distributed computing Big data Hadoop Stream processing Batch processing 


  1. 1.
    Apache Software foundation. [Online] (2017, Dec)
  2. 2.
    David T. [Online] (2017, Dec)
  3. 3.
    Ghemawat S, Dean J (2004) MapReduce: simplified data processing. In: 6th symposium on operating system design and implementation (OSDI 2004), San Francisco, California, USA, pp 137–150Google Scholar
  4. 4.
    Hortronworks. [Online] (2018, Jan)
  5. 5.
    Grid Computing. [Online] (2017, Dec)
  6. 6.
    Wiki Pedia. [Online] (2017, Dec)
  7. 7.
    Cluster Computing. [Online] (2018, Jan)
  8. 8.
    Cloud Computing. [Online] (2018, Jan)
  9. 9.
    Botta A, de Donato W, Persico V, Pescapé A (2016) Integration of Cloud computing and Internet of Things: A survey. Future Gener Comput Sys 56:684–700CrossRefGoogle Scholar
  10. 10.
    Purcell BM (2013) Big data using cloud computingGoogle Scholar
  11. 11.
    Tanenbaum AS, van Steen M (2007) Distributed Systems: principles and paradigms. Upper Saddle River, NJ, USA: Pearson Higher EducationGoogle Scholar
  12. 12.
    de Assunção MD, Buyya R, Nadiminti K (2006) Distributed systems and recent innovations: challenges and benefits. InfoNet Mag 16(3)Google Scholar
  13. 13.
  14. 14.
    Corporation D (2012) IDC releases first worldwide hadoop-mapreduce ecosystem software forecast, strong growth will continue to accelerate as talent and tools developGoogle Scholar
  15. 15.
    Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive. Proceedings of the VLDB Endowment 2(2):1626–1629CrossRefGoogle Scholar
  16. 16.
    Apache Strom (2018). [Online] (2018)
  17. 17.
  18. 18.
  19. 19.
    Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH, M. G. Institute J. Manyika (2011) Big data: the next frontier for innovation, competition, and productivity, San FranciscoGoogle Scholar
  20. 20.
    Ed Lazowska (2008) Viewpoint Envisioning the future of computing research. Communications of the ACM 51(8):28CrossRefGoogle Scholar
  21. 21.
    Dollimore J, Kindberg T, Coulouris G (2015) Distributed systems concepts and design, 4th ed. England, Addison-Wesley, LondonGoogle Scholar
  22. 22.

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringPunjabi UniversityPatialaIndia

Personalised recommendations