Advertisement

Towards High Performance Data Analytics for Climate Change

  • Sandro FioreEmail author
  • Donatello Elia
  • Cosimo Palazzo
  • Fabrizio Antonio
  • Alessandro D’Anca
  • Ian Foster
  • Giovanni Aloisio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11887)

Abstract

The continuous increase in the data produced by simulations, experiments and edge components in the last few years has forced a shift in the scientific research process, leading to the definition of a fourth paradigm in Science, concerning data-intensive computing. This data deluge, in fact, introduces various challenges related to big data volumes, formats heterogeneity and the speed in the data production and gathering that must be handled to effectively support scientific discovery. To this end, High Performance Computing (HPC) and data analytics are both considered as fundamental and complementary aspects of the scientific process and together contribute to a new paradigm encompassing the efforts from the two fields called High Performance Data Analytics (HPDA). In this context, the Ophidia project provides a HPDA framework which joins the HPC paradigm with scientific data analytics. This contribution presents some aspects regarding the Ophidia HPDA framework, such as the multidimensional storage model, its distributed and hierarchical implementation along with a benchmark of a parallel in-memory time series reduction operator.

Keywords

HPDA Climate change Scientific data analysis Storage model Multidimensional data 

Notes

Acknowledgments

This work was supported in part by the EU H2020 Excellence in SImulation of Weather and Climate in Europe (ESiWACE) project (Grant Agreement 675191). Moreover, the authors would like to acknowledge Antonio Aloisio for his editing and proofreading work on this paper.

References

  1. 1.
    Aloisio, G., Fiore, S.: Towards exascale distributed data management. Int. J. High Perform. Comput. Appl. 23(4), 398–400 (2009).  https://doi.org/10.1177/1094342009347702CrossRefGoogle Scholar
  2. 2.
    Aloisio, G., Fiore, S., Foster, I., Williams, D.: Scientific big data analytics challenges at large scale. Proceedings of Big Data and Extreme-scale Computing (BDEC) (2013)Google Scholar
  3. 3.
    Asch, M., et al.: Big data and extreme-scale computing: pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int. J. High Perform. Comput. Appl. 32(4), 435–479 (2018).  https://doi.org/10.1177/1094342018778123CrossRefGoogle Scholar
  4. 4.
    Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. SIGMOD Rec. 27(2), 575–577 (1998).  https://doi.org/10.1145/276305.276386CrossRefGoogle Scholar
  5. 5.
    Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: Spatio-temporal retrieval with RasDaMan. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999 pp. 746–749. Morgan Kaufmann Publishers Inc., San Francisco (1999). http://dl.acm.org/citation.cfm?id=645925.671513
  6. 6.
    Baumann, P., Furtado, P., Ritsch, R., Widmann, N.: The RasDaMan approach to multidimensional database management. In: Proceedings of the 1997 ACM Symposium on Applied Computing, SAC 1997, pp. 166–173. ACM, New York (1997).  https://doi.org/10.1145/331697.331732
  7. 7.
    Bell, G., Hey, T., Szalay, A.: Beyond the data deluge. Science 323(5919), 1297–1298 (2009).  https://doi.org/10.1126/science.1170411CrossRefGoogle Scholar
  8. 8.
    Brown, P.G.: Overview of sciDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 963–968. ACM, New York (2010).  https://doi.org/10.1145/1807167.1807271
  9. 9.
    D’Anca, A., et al.: On the use of in-memory analytics workflows to computer science indicators from large climate datasets. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 1035–1043, May 2017.  https://doi.org/10.1109/CCGRID.2017.132
  10. 10.
    Dongarra, J., et al.: The international exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011).  https://doi.org/10.1177/1094342010391989CrossRefGoogle Scholar
  11. 11.
    Elia, D., et al.: An in-memory based framework for scientific data analytics. In: Proceedings of the ACM International Conference on Computing Frontiers, CF 2016, pp. 424–429. ACM, New York (2016).  https://doi.org/10.1145/2903150.2911719
  12. 12.
    Fiore, S., et al.: Ophidia: a full software stack for scientific data analytics. In: 2014 International Conference on High Performance Computing Simulation (HPCS), pp. 343–350, July 2014.  https://doi.org/10.1109/HPCSim.2014.6903706
  13. 13.
    Fiore, S., et al.: Distributed and cloud-based multi-model analytics experiments on large volumes of climate change data in the earth system grid federation eco-system. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2911–2918, December 2016.  https://doi.org/10.1109/BigData.2016.7840941
  14. 14.
    Fiore, S., D’Anca, A., Palazzo, C., Foster, I.T., Williams, D.N., Aloisio, G.: Ophidia: toward big data analytics for escience. In: Proceedings of the International Conference on Computational Science, ICCS 2013, Barcelona, Spain, 5–7 June 2013, pp. 2376–2385 (2013).  https://doi.org/10.1016/j.procs.2013.05.409CrossRefGoogle Scholar
  15. 15.
    Fiore, S., et al.: Big data analytics on large-scale scientific datasets in the INDIGO-DataCloud project. In: Proceedings of the Computing Frontiers Conference, CF 2017, pp. 343–348. ACM, New York (2017).  https://doi.org/10.1145/3075564.3078884
  16. 16.
    Folk, M., Heber, G., Koziol, Q., Pourmal, E., Robinson, D.: An overview of the HDF5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases. AD 2011, pp. 36–47. ACM, New York (2011).  https://doi.org/10.1145/1966895.1966900
  17. 17.
    Golfarelli, M., Rizzi, S.: Data Warehouse Design: Modern Principles and Methodologies, 1st edn. McGraw-Hill Inc., New York (2009)Google Scholar
  18. 18.
    Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. SIGMOD Rec. 34(4), 34–41 (2005).  https://doi.org/10.1145/1107499.1107503CrossRefGoogle Scholar
  19. 19.
    Hu, F., et al.: ClimateSpark: an in-memory distributed computing framework for big climate data analytics. Comput. Geosci. 115, 154–166 (2018).  https://doi.org/10.1016/j.cageo.2018.03.011CrossRefGoogle Scholar
  20. 20.
    Palamuttam, R., et al.: SciSpark: applying in-memory distributed computing to weather event detection and tracking. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2020–2026, October 2015.  https://doi.org/10.1109/BigData.2015.7363983
  21. 21.
    Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58(7), 56–68 (2015).  https://doi.org/10.1145/2699414CrossRefGoogle Scholar
  22. 22.
    Schulzweida, U.: CDO user guide - version 1.9.6 (2019). https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf
  23. 23.
    Stonebraker, M., Brown, P., Becla, J., Zhang, D.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013).  https://doi.org/10.1109/MCSE.2013.19CrossRefGoogle Scholar
  24. 24.
    Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The Architecture of SciDB. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 1–16. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-22351-8_1CrossRefGoogle Scholar
  25. 25.
    Wilson, B., et al.: SciSpark: highlyinteractive in-memory science data analytics. In: 2016 IEEE InternationalConference on Big Data (Big Data), pp. 2964–2973, December 2016.  https://doi.org/10.1109/BigData.2016.7840948
  26. 26.
    Zender, C.S.: Analysis of self-describing gridded geoscience data with netCDF Operators (NCO). Environ. Model. Softw. 23(10), 1338–1342 (2008).  https://doi.org/10.1016/j.envsoft.2008.03.004CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Euro-Mediterranean Center on Climate Change FoundationLecceItaly
  2. 2.University of SalentoLecceItaly
  3. 3.University of Chicago & Argonne National LaboratoryChicagoUSA

Personalised recommendations