Big Data and Exascale Computing

Zhou, Amelie Chi; He, Bingsheng

doi:10.1007/978-3-319-63962-8_167-1

Amelie Chi Zhou³ &
Bingsheng He⁴

544 Accesses

Definition

Big data refers to data sets which have large sizes and complex structures. The data size can range from dozens of terabytes to a few petabytes and is still growing. Big data is usually characterized with three Vs, namely the large Volume of data size, high Variety of data formats and fast Velocity of data generation.

The importance of big data comes from the value contained in the data. However, due to the three Vs, it is challenging to obtain the value from big data fastly and efficiently. Thus, designing big data systems which analyze big data according to the features of the data is important.

Overview

In the past few decades, clusters, grids, and cloud computing systems have been used to address the challenges on big data processing. Those systems use large numbers of commodity machines interconnected using commodity Ethernet networks to improve system scalabilities horizontally. Various programming models have also been proposed to efficiently distribute big data...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Apache (2011) The Apache Hadoop project. http://www.hadoop.org. Accessed 1 Dec 2017
Apache (2017) Apache Beam: an advanced unified programming model. https://beam.apache.org/. Accessed 1 Dec 2017
Cappello F, Al G, Gropp W, Kale S, Kramer B, Snir M (2014) Toward exascale resilience: 2014 update. Supercomput Front Innov: Int J 1(1):5–28
Google Scholar
Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation, OSDI’04, vol 6. USENIX Association, Berkeley, pp 137–149
Google Scholar
EU (2017) Bigstorage European project. http://bigstorage-project.eu/. Accessed 1 Dec 2017
Friedman E, Tzoumas K (2016) Introduction to Apache flink: stream processing for real time and beyond, 1st edn. O’Reilly Media Inc, Sebastopol
Google Scholar
Fu H, He C, Chen B, Yin Z, Zhang Z, Zhang W, Zhang T, Xue W, Liu W, Yin W, Yang G, Chen X (2017a) 18.9Pflopss nonlinear earthquake simulation on sunway taihulight: enabling depiction of 18-hz and 8-meter scenarios. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’17. ACM, New York, pp 2:1–2:12
Google Scholar
Fu H, Liao J, Ding N, Duan X, Gan L, Liang Y, Wang X, Yang J, Zheng Y, Liu W, Wang L, Yang G (2017b) Redesigning cam-se for peta-scale climate modeling performance and ultra-high resolution on sunway taihulight. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’17. ACM, New York, pp 1:1–1:12
Google Scholar
Gao S, He B, Xu J (2015) Real-time in-memory checkpointing for future hybrid memory systems. In: Proceedings of the 29th ACM on international conference on supercomputing, ICS ’15. ACM, New York, pp 263–272. http://doi.acm.org/10.1145/2751205.2751212
Chapter Google Scholar
Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68. http://doi.acm.org/10.1145/2699414
Article Google Scholar
Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 147–156
Google Scholar
Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E (2013) Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, SOCC ’13. ACM, New York, pp 5:1–5:16
Google Scholar
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud’10). USENIX Association, Berkeley, pp 1–7
Google Scholar
Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, SOSP ’13. ACM, New York, pp 423–438
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen University, Shenzhen, China
Amelie Chi Zhou
National University of Singapore, Singapore, Singapore
Bingsheng He

Authors

Amelie Chi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bingsheng He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amelie Chi Zhou .

Editor information

Editors and Affiliations

School of Comp. Sci. and Engineering, University of New South Wales School of Comp. Sci. and Engineering, Eveleigh, New South Wales, Australia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

No affiliation provided
Bingsheng He
No affiliation provided
Behrooz Parhami

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Zhou, A.C., He, B. (2018). Big Data and Exascale Computing. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_167-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_167-1
Published: 27 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics