Definition
Discrepancy between the explosive growth rate in data volumes and the improvement trends in processing and memory access speeds necessitates that parallel processing be applied to the handling of extremely large data sets.
Overview
Both data volumes and processing speeds have been on exponentially rising trajectories since the onset of the digital age (Denning and Lewis 2016), but the former has risen at a much higher rate than the latter. It follows that parallel processing is needed to bridge the gap. In addition to providing a higher processing capability to deal with the requirements of large data sets, parallel processing has the potential of easing the “von Neumann bottleneck” (Markgraf 2007), sometimes referred to as “the memory wall” because of its tendency to hinder the smooth progress of a computation, when operands cannot be supplied to the processor at the required rate (McKee 2004; Wulf and McKee 1995...
References
Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik S (2003) Aurora: a new model and architecture for data stream management. Int J Very Large Data Bases 12(2):120–139
Agrawal D, Das S, El Abbadi A (2011) Big data and cloud computing: current state and future opportunities. In: Proceedings of 14th international conference on extending database technology, Uppsala, pp 530–533
Benini L, De Micheli G (2002) Networks on chips: a new SoC paradigm. IEEE Comput 35(1):70–78
Brock DC, Moore GE (eds) (2006) Understanding Moore’s law: four decades of innovation. Chemical Heritage Foundation, Philadelphia
Bu Y, Howe B, Balazinska M, Ernst MD (2010) HaLoop: efficient iterative data processing on large clusters. Proc VLDB Endowment 3(1–2):285–296
Caulfield AM et al (2016) A cloud-scale acceleration architecture. In: Proceedings of 49th IEEE/ACM international symposium microarchitecture, Orlando, pp 1–13
Ceze L, Hill MD, Wenisch TE (2016) Arch2030: a vision of computer architecture research over the next 15 years, Computing Community Consortium. On-line document. http://cra.org/ccc/wp-content/uploads/sites/2/2016/ 12/15447-CCC-ARCH-2030-report-v3-1-1.pdf
Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. Proc USENIX Symp Networked Syst Des Implement 10(4):20
Dally WJ, Towles BP (2004) Principles and practices of interconnection networks. Elsevier, Amsterdam
Darema F (2001) The SPMD model: past, present and future. In: Proceedings of European parallel virtual machine/message passing interface users’ group meeting, Springer
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77
Denning PJ, Lewis TG (2016) Exponential laws of computing growth. Commun ACM 60(1):54–65
Duato J, Yalamanchili S, Ni LM (2003) Interconnection networks: an engineering approach. Morgan Kaufmann, San Francisco
Eggers SJ, Emer JS, Levy HM, Lo JL, Stamm RL, Tullsen DM (1997) Simultaneous multithreading: a platform for next-generation processors. IEEE Micro 17(5):12–19
Eugster PT, Felber PA, Guerraoui R, Kermarrec A-M (2003) The many faces of publish/subscribe. ACM Comput Surv 35(2):114–131
Flynn MJ, Rudd KW (1996) Parallel architectures. ACM Comput Surv 28(1):67–70
Gautschi M (2017) Design of energy-efficient processing elements for near-threshold parallel computing. Doctoral thesis, ETH Zurich
Gepner P, Kowalik MF (2006) Multi-core processors: new way to achieve high system performance. In: Proceedings of IEEE international symposium on parallel computing in electrical engineering, Bialystak, pp 9–13
Hord RM (2013) The Illiac IV: the first supercomputer. Springer, Berlin
Koomey JG, Berard S, Sanchez M, Wong H (2011) Implications of historical trends in the electrical efficiency of computing. IEEE Ann Hist Comput 33(3):46–54
Kuon I, Tessier R, Rose J (2008) FPGA architecture: survey and challenges. Found Trends Electron Des Autom 2(2):135–253
Lee RB (1997) Multimedia extensions for general-purpose processors. In: Proceedings of IEEE workshop signal processing systems, design and implementation, Leicester, pp 9–23
Mack CA (2011) Fifty years of Moore’s law. IEEE Trans Semicond Manuf 24(2):202–207
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of ACM SIGMOD international conference on management of data, Indianapolis, pp 135–146
Markgraf JD (2007) The von Neumann bottleneck. On-line source that is no longer accessible (will find a replacement for this reference during revisions)
McKee SA (2004) Reflections on the memory wall. In: Proceedings of the conference on computing frontiers, Ischia, pp 162–167
Mueller R, Teubner J, Alonso G (2012) Sorting networks on FPGAs. Int J Very Large Data Bases 21(1):1–23
Nanda S, Chiueh TC (2005) A survey on virtualization technologies, technical report TR179, Department of Computer Science, SUNY at Stony Brook
NRC (2011) The future of computing performance: game over or next level? Report of the US National Research Council, National Academies Press
Nvidia (2016) Nvidia Tesla P100: infinite compute power for the modern data center – technical overview. http://images.nvidia.com/content/ tesla/pdf/nvidia-teslap100-techoverview.pdf. Accessed 14 Dec 2017
Owens JD et al (2008) GPU computing. Proc IEEE 96(5):879–899
Parhami B (1999) Chapter 7: Sorting networks. In: Introduction to parallel processing: algorithms and architectures. Plenum Press, New York, pp 129–147
Rau BR, Fisher JA (1993) Instruction-level parallel processing: history, overview, and perspective. J Supercomput 7(1–2):9–50
Rixner S (2001) Stream processor architecture. Kluwer, Boston
Rosenblum M, Garfinkel T (2005) Virtual machine monitors: current technology and future trends. IEEE Comput 38(5):39–47
Sakai S, Hiraki K, Kodama Y, Yuba T (1989) An architecture of a dataflow single chip processor. ACM SIGARCH Comput Archit News 17(3):46–53
Schaller RR (1997) Moore’s law: past, present and future. IEEE Spectr 34(6):52–59
Shafer J, Rixner S, Cox AL (2010) The Hadoop distributed filesystem: balancing portability and performance. In: Proceedings of IEEE international symposium on performance analysis of systems & software, White Plains, pp 122–133
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of 26th symposium on mass storage systems and technologies, Incline Village, pp 1–10
Singer G (2013) The history of the modern graphics processor, TechSpot on-line article. http://www.techspot. com/article/650-history-of-the-gpu/. Accessed 14 Dec 2017
Sinnen O (2007) Task scheduling for parallel systems. Wiley, Hoboken
Sklyarov V et al (2015) Hardware accelerators for information retrieval and data mining. In: Proceedings of IEEE conference on information and communication technology research, Bali, pp 202–205
Stanford University (2012) 21st century computer architecture: a community white paper. http://csl.stanford.edu/~christos/publications/2012.21 stcenturyarchitecture.whitepaper.pdf
Top-500 Organization (2017) November 2017 list of the world’s top 500 supercomputers. http://www.top500. org/lists/2017/11/
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111
Vavilapalli VK et al (2013) Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of fourth symposium on cloud computing, Santa Clara, p 5
Wilkes MV (1972) Time-sharing computer systems. Elsevier, New York
Wulf W, McKee S (1995) Hitting the wall: implications of the obvious. ACM Comput Archit News 23(1):20–24
Zaharia M et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Parhami, B. (2018). Parallel Processing with Big Data. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_165-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_165-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering