Big Data, Simulations and HPC Convergence

  • Geoffrey FoxEmail author
  • Judy Qiu
  • Shantenu Jha
  • Saliya Ekanayake
  • Supun Kamburugamuve
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10044)


Two major trends in computing systems are the growth in high performance computing (HPC) with in particular an international exascale initiative, and big data with an accompanying cloud infrastructure of dramatic and increasing size and sophistication. In this paper, we study an approach to convergence for software and applications/algorithms and show what hardware architectures it suggests. We start by dividing applications into data plus model components and classifying each component (whether from Big Data or Big Compute) in the same way. This leads to 64 properties divided into 4 views, which are Problem Architecture (Macro pattern); Execution Features (Micro patterns); Data Source and Style; and finally the Processing (runtime) View. We discuss convergence software built around HPC-ABDS (High Performance Computing enhanced Apache Big Data Stack) and show how one can merge Big Data and HPC (Big Simulation) concepts into a single stack and discuss appropriate hardware.


Big Data HPC Simulations 



This work was partially supported by NSF CIF21 DIBBS 1443054, NSF OCI 1149432 CAREER. and AFOSR FA9550-13-1-0225 awards. We thank Dennis Gannon for comments on an early draft.


  1. 1.
    Big Data and Extreme-scale Computing (BDEC). Accessed 29 Jan 2016
  2. 2.
    Data Science Curriculum: Indiana University Online Class: Big Data Open Source Software and Projects (2014). Accessed 11 Dec 2014
  3. 3.
    DDDAS Dynamic Data-Driven Applications System Showcase. Accessed 22 July 2015
  4. 4.
    HPC-ABDS Kaleidoscope of over 350 Apache Big Data Stack and HPC Technologies.
  5. 5.
    NSCI: Executive Order - creating a National Strategic Computing Initiative, 29 July 2015.
  6. 6.
    NIST Big Data Use Case & Requirements. V1.0 Final Version 2015, January 2016.
  7. 7.
    Apache Software Foundation: Apache Flink open source platform for distributed stream and batch data processing. Accessed 16 Jan 2016
  8. 8.
    Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: a view from Berkeley. Tech. rep., UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006).
  9. 9.
    Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The NAS parallel benchmarks. Int. J. High Perform. Comput. Appl. 5(3), 63–73 (1991)CrossRefGoogle Scholar
  10. 10.
    Baru, C., Rabl, T.: Tutorial 4 “Big Data Benchmarking” at 2014 IEEE International Conference on Big Data (2014). Accessed 2 Jan 2015
  11. 11.
    Baru, C.: BigData Top 100 List. Accessed Jan 2016
  12. 12.
    Bryant, R.E.: Data-Intensive Supercomputing: The case for DISC, 10 May 2007.
  13. 13.
    Bryant, R.E.: Supercomputing & Big Data: A Convergence. Supercomputing (SC) 15 Panel- Supercomputing and Big Data: From Collision to Convergence Nov 18 2015 - Austin, Texas.
  14. 14.
    Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B., Andrew, N.: Deep learning with COTS HPC systems. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1337–1345 (2013)Google Scholar
  15. 15.
    Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.H., Qiu, J., Fox, G.: Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818. ACM (2010)Google Scholar
  16. 16.
    Ekanayake, J., Pallickara, S., Fox, G.: Mapreduce for data intensive scientific analyses. In: IEEE Fourth International Conference on eScience (eScience 2008), pp. 277–284. IEEE (2008)Google Scholar
  17. 17.
    Ekanayake, S., Kamburugamuve, S., Fox, G.: SPIDAL: high performance data analytics with Java and MPI on large multicore HPC clusters, Technical report, January 2016.
  18. 18.
    Fox, G., Jha, S., Qiu, J., Ekanazake, S., Luckow, A.: Towards a comprehensive set of big data benchmarks. In: Big Data and High Performance Computing, vol. 26, p. 47, February 2015.
  19. 19.
    Fox, G., Chang, W.: Big data use cases and requirements. In: 1st Big Data Interoperability Framework Workshop: Building Robust Big Data Ecosystem ISO/IEC JTC 1 Study Group on Big Data, pp. 18–21 (2014)Google Scholar
  20. 20.
    Fox, G., Qiu, J., Jha, S.: High performance high functionality big data software stack. In: Big Data and Extreme-scale Computing (BDEC) (2014).
  21. 21.
    Fox, G.C., Jha, S., Qiu, J., Luckow, A.: Towards an understanding of facets and exemplars of big data applications. In: 20 Years of Beowulf: Workshop to Honor Thomas Sterling’s 65th Birthday October, Annapolis 14 October 2014.
  22. 22.
    Fox, G.C., Jha, S., Qiu, J., Luckow, A.: Ogres: a systematic approach to big data benchmarks. In: Big Data and Extreme-scale, Computing (BDEC), pp. 29–30 (2015)Google Scholar
  23. 23.
    Fox, G.C., Qiu, J., Kamburugamuve, S., Jha, S., Luckow, A.: HPC-ABDS high performance computing enhanced apache big data stack. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 1057–1066. IEEE (2015)Google Scholar
  24. 24.
    Iandola, F.N., Ashraf, K., Moskewicz, M.W., Keutzer, K.: FireCaffe: near-linear acceleration of deep neural network training on compute clusters. arXiv preprint arxiv:1511.00175 (2015)
  25. 25.
    Jha, S., Qiu, J., Luckow, A., Mantha, P., Fox, G.C.: A tale of two data-intensive paradigms: applications, abstractions, and architectures. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 645–652. IEEE (2014)Google Scholar
  26. 26.
    Kamburugamuve, S., Ekanayake, S., Pathirage, M., Fox, G.: Towards high performance processing of streaming data in large data centers, Technical report (2016).
  27. 27.
    National Research Council: Frontiers in Massive Data Analysis. The National Academies Press, Washington (2013)Google Scholar
  28. 28.
    Qiu, J., Jha, S., Luckow, A., Fox, G.C.: Towards HPC-ABDS: an initial high-performance big data stack. In: Building Robust Big Data Ecosystem ISO/IEC JTC 1 Study Group on Big Data, pp. 18–21 (2014).
  29. 29.
    Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58(7), 56–68 (2015)CrossRefGoogle Scholar
  30. 30.
    Trader, T.: Toward a converged exascale-big data software stack, 28 January 2016.
  31. 31.
    Van der Wijngaart, R.F., Sridharan, S., Lee, V.W.: Extending the BT NAS parallel benchmark to exascale computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 94. IEEE Computer Society Press (2012)Google Scholar
  32. 32.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, vol. 10, p. 10 (2010)Google Scholar
  33. 33.
    Zhang, B., Peng, B., Qiu, J.: Parallel LDA through synchronized communication optimizations. Technical report (2015).
  34. 34.
    Zhang, B., Ruan, Y., Qiu, J.: Harp: collective communication on hadoop. In: IEEE International Conference on Cloud Engineering (IC2E) Conference (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Geoffrey Fox
    • 1
    Email author
  • Judy Qiu
    • 1
  • Shantenu Jha
    • 2
  • Saliya Ekanayake
    • 1
  • Supun Kamburugamuve
    • 1
  1. 1.School of Informatics and ComputingIndiana UniversityBloomingtonUSA
  2. 2.ECERutgers UniversityPiscatawayUSA

Personalised recommendations