Skip to main content

Big Data and Exascale Computing

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data Technologies
  • 544 Accesses

Definition

Big data refers to data sets which have large sizes and complex structures. The data size can range from dozens of terabytes to a few petabytes and is still growing. Big data is usually characterized with three Vs, namely the large Volume of data size, high Variety of data formats and fast Velocity of data generation.

The importance of big data comes from the value contained in the data. However, due to the three Vs, it is challenging to obtain the value from big data fastly and efficiently. Thus, designing big data systems which analyze big data according to the features of the data is important.

Overview

In the past few decades, clusters, grids, and cloud computing systems have been used to address the challenges on big data processing. Those systems use large numbers of commodity machines interconnected using commodity Ethernet networks to improve system scalabilities horizontally. Various programming models have also been proposed to efficiently distribute big data...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Apache (2011) The Apache Hadoop project. http://www.hadoop.org. Accessed 1 Dec 2017

  • Apache (2017) Apache Beam: an advanced unified programming model. https://beam.apache.org/. Accessed 1 Dec 2017

  • Cappello F, Al G, Gropp W, Kale S, Kramer B, Snir M (2014) Toward exascale resilience: 2014 update. Supercomput Front Innov: Int J 1(1):5–28

    Google Scholar 

  • Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation, OSDI’04, vol 6. USENIX Association, Berkeley, pp 137–149

    Google Scholar 

  • EU (2017) Bigstorage European project. http://bigstorage-project.eu/. Accessed 1 Dec 2017

  • Friedman E, Tzoumas K (2016) Introduction to Apache flink: stream processing for real time and beyond, 1st edn. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  • Fu H, He C, Chen B, Yin Z, Zhang Z, Zhang W, Zhang T, Xue W, Liu W, Yin W, Yang G, Chen X (2017a) 18.9Pflopss nonlinear earthquake simulation on sunway taihulight: enabling depiction of 18-hz and 8-meter scenarios. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’17. ACM, New York, pp 2:1–2:12

    Google Scholar 

  • Fu H, Liao J, Ding N, Duan X, Gan L, Liang Y, Wang X, Yang J, Zheng Y, Liu W, Wang L, Yang G (2017b) Redesigning cam-se for peta-scale climate modeling performance and ultra-high resolution on sunway taihulight. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’17. ACM, New York, pp 1:1–1:12

    Google Scholar 

  • Gao S, He B, Xu J (2015) Real-time in-memory checkpointing for future hybrid memory systems. In: Proceedings of the 29th ACM on international conference on supercomputing, ICS ’15. ACM, New York, pp 263–272. http://doi.acm.org/10.1145/2751205.2751212

    Chapter  Google Scholar 

  • Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68. http://doi.acm.org/10.1145/2699414

    Article  Google Scholar 

  • Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 147–156

    Google Scholar 

  • Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E (2013) Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, SOCC ’13. ACM, New York, pp 5:1–5:16

    Google Scholar 

  • Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud’10). USENIX Association, Berkeley, pp 1–7

    Google Scholar 

  • Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, SOSP ’13. ACM, New York, pp 423–438

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amelie Chi Zhou .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Zhou, A.C., He, B. (2018). Big Data and Exascale Computing. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_167-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_167-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics