Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Parallel Processing with Big Data

  • Behrooz ParhamiEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_165

Synonyms

Definitions

Discrepancy between the explosive growth rate in data volumes and the improvement trends in processing and memory access speeds necessitates that parallel processing be applied to the handling of extremely large data sets.

Overview

Both data volumes and processing speeds have been on exponentially rising trajectories since the onset of the digital age (Denning and Lewis 2016), but the former has risen at a much higher rate than the latter. It follows that parallel processing is needed to bridge the gap. In addition to providing a higher processing capability to deal with the requirements of large data sets, parallel processing has the potential of easing the “von Neumann bottleneck” (Markgraf 2007), sometimes referred to as “the memory wall” because of its tendency to hinder the smooth progress of a computation, when operands cannot be supplied to the processor at the required rate (McKee 2004; Wulf and...

This is a preview of subscription content, log in to check access.

References

  1. Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik S (2003) Aurora: a new model and architecture for data stream management. Int J Very Large Data Bases 12(2):120–139CrossRefGoogle Scholar
  2. Agrawal D, Das S, El Abbadi A (2011) Big data and cloud computing: current state and future opportunities. In: Proceedings of 14th international conference on extending database technology, Uppsala, pp 530–533Google Scholar
  3. Benini L, De Micheli G (2002) Networks on chips: a new SoC paradigm. IEEE Comput 35(1):70–78CrossRefGoogle Scholar
  4. Brock DC, Moore GE (eds) (2006) Understanding Moore’s law: four decades of innovation. Chemical Heritage Foundation, PhiladelphiaGoogle Scholar
  5. Bu Y, Howe B, Balazinska M, Ernst MD (2010) HaLoop: efficient iterative data processing on large clusters. Proc VLDB Endowment 3(1–2):285–296CrossRefGoogle Scholar
  6. Caulfield AM et al (2016) A cloud-scale acceleration architecture. In: Proceedings of 49th IEEE/ACM international symposium microarchitecture, Orlando, pp 1–13Google Scholar
  7. Ceze L, Hill MD, Wenisch TE (2016) Arch2030: a vision of computer architecture research over the next 15 years, Computing Community Consortium. On-line document. http://cra.org/ccc/wp-content/uploads/sites/2/2016/ 12/15447-CCC-ARCH-2030-report-v3-1-1.pdf
  8. Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. Proc USENIX Symp Networked Syst Des Implement 10(4):20Google Scholar
  9. Dally WJ, Towles BP (2004) Principles and practices of interconnection networks. Elsevier, AmsterdamGoogle Scholar
  10. Danowitz A, Kelley K, Mao J, Stevenson JP, Horowitz M (2012) CPU DB: recording microprocessor history. Commun ACM 55(4):55–63CrossRefGoogle Scholar
  11. Darema F (2001) The SPMD model: past, present and future. In: Proceedings of European parallel virtual machine/message passing interface users’ group meeting, SpringerGoogle Scholar
  12. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  13. Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77CrossRefGoogle Scholar
  14. Denning PJ, Lewis TG (2016) Exponential laws of computing growth. Commun ACM 60(1):54–65CrossRefGoogle Scholar
  15. Duato J, Yalamanchili S, Ni LM (2003) Interconnection networks: an engineering approach. Morgan Kaufmann, San FranciscoGoogle Scholar
  16. Eggers SJ, Emer JS, Levy HM, Lo JL, Stamm RL, Tullsen DM (1997) Simultaneous multithreading: a platform for next-generation processors. IEEE Micro 17(5):12–19CrossRefGoogle Scholar
  17. Eugster PT, Felber PA, Guerraoui R, Kermarrec A-M (2003) The many faces of publish/subscribe. ACM Comput Surv 35(2):114–131CrossRefGoogle Scholar
  18. Flynn MJ, Rudd KW (1996) Parallel architectures. ACM Comput Surv 28(1):67–70CrossRefGoogle Scholar
  19. Gautschi M (2017) Design of energy-efficient processing elements for near-threshold parallel computing. Doctoral thesis, ETH ZurichGoogle Scholar
  20. Gepner P, Kowalik MF (2006) Multi-core processors: new way to achieve high system performance. In: Proceedings of IEEE international symposium on parallel computing in electrical engineering, Bialystak, pp 9–13Google Scholar
  21. Hord RM (2013) The Illiac IV: the first supercomputer. Springer, BerlinzbMATHGoogle Scholar
  22. Koomey JG, Berard S, Sanchez M, Wong H (2011) Implications of historical trends in the electrical efficiency of computing. IEEE Ann Hist Comput 33(3):46–54MathSciNetCrossRefGoogle Scholar
  23. Kuon I, Tessier R, Rose J (2008) FPGA architecture: survey and challenges. Found Trends Electron Des Autom 2(2):135–253CrossRefGoogle Scholar
  24. Lee RB (1997) Multimedia extensions for general-purpose processors. In: Proceedings of IEEE workshop signal processing systems, design and implementation, Leicester, pp 9–23Google Scholar
  25. Mack CA (2011) Fifty years of Moore’s law. IEEE Trans Semicond Manuf 24(2):202–207CrossRefGoogle Scholar
  26. Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of ACM SIGMOD international conference on management of data, Indianapolis, pp 135–146Google Scholar
  27. Markgraf JD (2007) The von Neumann bottleneck. On-line source that is no longer accessible (will find a replacement for this reference during revisions)Google Scholar
  28. McKee SA (2004) Reflections on the memory wall. In: Proceedings of the conference on computing frontiers, Ischia, pp 162–167Google Scholar
  29. Mueller R, Teubner J, Alonso G (2012) Sorting networks on FPGAs. Int J Very Large Data Bases 21(1):1–23CrossRefGoogle Scholar
  30. Nanda S, Chiueh TC (2005) A survey on virtualization technologies, technical report TR179, Department of Computer Science, SUNY at Stony BrookGoogle Scholar
  31. NRC (2011) The future of computing performance: game over or next level? Report of the US National Research Council, National Academies PressGoogle Scholar
  32. Nvidia (2016) Nvidia Tesla P100: infinite compute power for the modern data center – technical overview. http://images.nvidia.com/content/ tesla/pdf/nvidia-teslap100-techoverview.pdf. Accessed 14 Dec 2017
  33. Owens JD et al (2008) GPU computing. Proc IEEE 96(5):879–899CrossRefGoogle Scholar
  34. Parhami B (1999) Chapter 7: Sorting networks. In: Introduction to parallel processing: algorithms and architectures. Plenum Press, New York, pp 129–147Google Scholar
  35. Rau BR, Fisher JA (1993) Instruction-level parallel processing: history, overview, and perspective. J Supercomput 7(1–2):9–50CrossRefGoogle Scholar
  36. Rixner S (2001) Stream processor architecture. Kluwer, BostonzbMATHGoogle Scholar
  37. Rosenblum M, Garfinkel T (2005) Virtual machine monitors: current technology and future trends. IEEE Comput 38(5):39–47CrossRefGoogle Scholar
  38. Sakai S, Hiraki K, Kodama Y, Yuba T (1989) An architecture of a dataflow single chip processor. ACM SIGARCH Comput Archit News 17(3):46–53CrossRefGoogle Scholar
  39. Schaller RR (1997) Moore’s law: past, present and future. IEEE Spectr 34(6):52–59CrossRefGoogle Scholar
  40. Shafer J, Rixner S, Cox AL (2010) The Hadoop distributed filesystem: balancing portability and performance. In: Proceedings of IEEE international symposium on performance analysis of systems & software, White Plains, pp 122–133Google Scholar
  41. Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of 26th symposium on mass storage systems and technologies, Incline Village, pp 1–10Google Scholar
  42. Singer G (2013) The history of the modern graphics processor, TechSpot on-line article. http://www.techspot. com/article/650-history-of-the-gpu/. Accessed 14 Dec 2017
  43. Sinnen O (2007) Task scheduling for parallel systems. Wiley, HobokenCrossRefGoogle Scholar
  44. Sklyarov V et al (2015) Hardware accelerators for information retrieval and data mining. In: Proceedings of IEEE conference on information and communication technology research, Bali, pp 202–205Google Scholar
  45. Stanford University (2012) 21st century computer architecture: a community white paper. http://csl.stanford.edu/~christos/publications/2012.21 stcenturyarchitecture.whitepaper.pdf
  46. Top-500 Organization (2017) November 2017 list of the world’s top 500 supercomputers. http://www.top500. org/lists/2017/11/
  47. Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111CrossRefGoogle Scholar
  48. Vavilapalli VK et al (2013) Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of fourth symposium on cloud computing, Santa Clara, p 5Google Scholar
  49. Wilkes MV (1972) Time-sharing computer systems. Elsevier, New YorkzbMATHGoogle Scholar
  50. Wulf W, McKee S (1995) Hitting the wall: implications of the obvious. ACM Comput Archit News 23(1):20–24CrossRefGoogle Scholar
  51. Zaharia M et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringUniversity of CaliforniaSanta BarbaraUSA

Section editors and affiliations

  • Bingsheng He
  • Behrooz Parhami
    • 1
  1. 1.Dept. of Electrical and Computer EngineeringUniversity of California, Santa BarbaraSanta BarbaraUSA