Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Big Data and Exascale Computing

Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_167-1


Big data refers to data sets which have large sizes and complex structures. The data size can range from dozens of terabytes to a few petabytes and is still growing. Big data is usually characterized with three Vs, namely the large Volume of data size, high Variety of data formats and fast Velocity of data generation.

The importance of big data comes from the value contained in the data. However, due to the three Vs, it is challenging to obtain the value from big data fastly and efficiently. Thus, designing big data systems which analyze big data according to the features of the data is important.


In the past few decades, clusters, grids, and cloud computing systems have been used to address the challenges on big data processing. Those systems use large numbers of commodity machines interconnected using commodity Ethernet networks to improve system scalabilities horizontally. Various programming models have also been proposed to efficiently distribute big data...

This is a preview of subscription content, log in to check access


  1. Apache (2011) The Apache Hadoop project. http://www.hadoop.org. Accessed 1 Dec 2017
  2. Apache (2017) Apache Beam: an advanced unified programming model. https://beam.apache.org/. Accessed 1 Dec 2017
  3. Cappello F, Al G, Gropp W, Kale S, Kramer B, Snir M (2014) Toward exascale resilience: 2014 update. Supercomput Front Innov: Int J 1(1):5–28Google Scholar
  4. Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation, OSDI’04, vol 6. USENIX Association, Berkeley, pp 137–149Google Scholar
  5. EU (2017) Bigstorage European project. http://bigstorage-project.eu/. Accessed 1 Dec 2017
  6. Friedman E, Tzoumas K (2016) Introduction to Apache flink: stream processing for real time and beyond, 1st edn. O’Reilly Media Inc, SebastopolGoogle Scholar
  7. Fu H, He C, Chen B, Yin Z, Zhang Z, Zhang W, Zhang T, Xue W, Liu W, Yin W, Yang G, Chen X (2017a) 18.9Pflopss nonlinear earthquake simulation on sunway taihulight: enabling depiction of 18-hz and 8-meter scenarios. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’17. ACM, New York, pp 2:1–2:12Google Scholar
  8. Fu H, Liao J, Ding N, Duan X, Gan L, Liang Y, Wang X, Yang J, Zheng Y, Liu W, Wang L, Yang G (2017b) Redesigning cam-se for peta-scale climate modeling performance and ultra-high resolution on sunway taihulight. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’17. ACM, New York, pp 1:1–1:12Google Scholar
  9. Gao S, He B, Xu J (2015) Real-time in-memory checkpointing for future hybrid memory systems. In: Proceedings of the 29th ACM on international conference on supercomputing, ICS ’15. ACM, New York, pp 263–272.  http://doi.acm.org/10.1145/2751205.2751212 CrossRefGoogle Scholar
  10. Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68.  http://doi.acm.org/10.1145/2699414 CrossRefGoogle Scholar
  11. Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 147–156Google Scholar
  12. Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E (2013) Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, SOCC ’13. ACM, New York, pp 5:1–5:16Google Scholar
  13. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud’10). USENIX Association, Berkeley, pp 1–7Google Scholar
  14. Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, SOSP ’13. ACM, New York, pp 423–438CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Shenzhen UniversityShenzhenChina
  2. 2.National University of SingaporeSingaporeSingapore

Section editors and affiliations

  • Bingsheng He
  • Behrooz Parhami
    • 1
  1. 1.Dept. of Electrical and Computer EngineeringUniversity of California, Santa BarbaraSanta BarbaraUSA