Definition
Big data refers to data sets which have large sizes and complex structures. The data size can range from dozens of terabytes to a few petabytes and is still growing. Big data is usually characterized with three Vs, namely the large Volume of data size, high Variety of data formats and fast Velocity of data generation.
The importance of big data comes from the value contained in the data. However, due to the three Vs, it is challenging to obtain the value from big data fastly and efficiently. Thus, designing big data systems which analyze big data according to the features of the data is important.
Overview
In the past few decades, clusters, grids, and cloud computing systems have been used to address the challenges on big data processing. Those systems use large numbers of commodity machines interconnected using commodity Ethernet networks to improve system scalabilities horizontally. Various programming models have also been proposed to efficiently distribute big data...
References
Apache (2011) The Apache Hadoop project. http://www.hadoop.org. Accessed 1 Dec 2017
Apache (2017) Apache Beam: an advanced unified programming model. https://beam.apache.org/. Accessed 1 Dec 2017
Cappello F, Al G, Gropp W, Kale S, Kramer B, Snir M (2014) Toward exascale resilience: 2014 update. Supercomput Front Innov: Int J 1(1):5–28
Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation, OSDI’04, vol 6. USENIX Association, Berkeley, pp 137–149
EU (2017) Bigstorage European project. http://bigstorage-project.eu/. Accessed 1 Dec 2017
Friedman E, Tzoumas K (2016) Introduction to Apache flink: stream processing for real time and beyond, 1st edn. O’Reilly Media Inc, Sebastopol
Fu H, He C, Chen B, Yin Z, Zhang Z, Zhang W, Zhang T, Xue W, Liu W, Yin W, Yang G, Chen X (2017a) 18.9Pflopss nonlinear earthquake simulation on sunway taihulight: enabling depiction of 18-hz and 8-meter scenarios. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’17. ACM, New York, pp 2:1–2:12
Fu H, Liao J, Ding N, Duan X, Gan L, Liang Y, Wang X, Yang J, Zheng Y, Liu W, Wang L, Yang G (2017b) Redesigning cam-se for peta-scale climate modeling performance and ultra-high resolution on sunway taihulight. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’17. ACM, New York, pp 1:1–1:12
Gao S, He B, Xu J (2015) Real-time in-memory checkpointing for future hybrid memory systems. In: Proceedings of the 29th ACM on international conference on supercomputing, ICS ’15. ACM, New York, pp 263–272. http://doi.acm.org/10.1145/2751205.2751212
Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68. http://doi.acm.org/10.1145/2699414
Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 147–156
Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E (2013) Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, SOCC ’13. ACM, New York, pp 5:1–5:16
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud’10). USENIX Association, Berkeley, pp 1–7
Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, SOSP ’13. ACM, New York, pp 423–438
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this entry
Cite this entry
Zhou, A.C., He, B. (2018). Big Data and Exascale Computing. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_167-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_167-1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering