Big Data and Exascale Computing
Big data refers to data sets which have large sizes and complex structures. The data size can range from dozens of terabytes to a few petabytes and is still growing. Big data is usually characterized with three Vs, namely the large Volume of data size, high Variety of data formats and fast Velocity of data generation.
The importance of big data comes from the value contained in the data. However, due to the three Vs, it is challenging to obtain the value from big data fastly and efficiently. Thus, designing big data systems which analyze big data according to the features of the data is important.
In the past few decades, clusters, grids, and cloud computing systems have been used to address the challenges on big data processing. Those systems use large numbers of commodity machines interconnected using commodity Ethernet networks to improve system scalabilities horizontally. Various programming models have also been proposed to efficiently distribute big data...
- Apache (2011) The Apache Hadoop project. http://www.hadoop.org. Accessed 1 Dec 2017
- Apache (2017) Apache Beam: an advanced unified programming model. https://beam.apache.org/. Accessed 1 Dec 2017
- Cappello F, Al G, Gropp W, Kale S, Kramer B, Snir M (2014) Toward exascale resilience: 2014 update. Supercomput Front Innov: Int J 1(1):5–28Google Scholar
- Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation, OSDI’04, vol 6. USENIX Association, Berkeley, pp 137–149Google Scholar
- EU (2017) Bigstorage European project. http://bigstorage-project.eu/. Accessed 1 Dec 2017
- Friedman E, Tzoumas K (2016) Introduction to Apache flink: stream processing for real time and beyond, 1st edn. O’Reilly Media Inc, SebastopolGoogle Scholar
- Fu H, He C, Chen B, Yin Z, Zhang Z, Zhang W, Zhang T, Xue W, Liu W, Yin W, Yang G, Chen X (2017a) 18.9Pflopss nonlinear earthquake simulation on sunway taihulight: enabling depiction of 18-hz and 8-meter scenarios. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’17. ACM, New York, pp 2:1–2:12Google Scholar
- Fu H, Liao J, Ding N, Duan X, Gan L, Liang Y, Wang X, Yang J, Zheng Y, Liu W, Wang L, Yang G (2017b) Redesigning cam-se for peta-scale climate modeling performance and ultra-high resolution on sunway taihulight. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’17. ACM, New York, pp 1:1–1:12Google Scholar
- Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 147–156Google Scholar
- Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E (2013) Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, SOCC ’13. ACM, New York, pp 5:1–5:16Google Scholar
- Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud’10). USENIX Association, Berkeley, pp 1–7Google Scholar