Skip to main content

Towards High Performance Data Analytics for Climate Change

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Abstract

The continuous increase in the data produced by simulations, experiments and edge components in the last few years has forced a shift in the scientific research process, leading to the definition of a fourth paradigm in Science, concerning data-intensive computing. This data deluge, in fact, introduces various challenges related to big data volumes, formats heterogeneity and the speed in the data production and gathering that must be handled to effectively support scientific discovery. To this end, High Performance Computing (HPC) and data analytics are both considered as fundamental and complementary aspects of the scientific process and together contribute to a new paradigm encompassing the efforts from the two fields called High Performance Data Analytics (HPDA). In this context, the Ophidia project provides a HPDA framework which joins the HPC paradigm with scientific data analytics. This contribution presents some aspects regarding the Ophidia HPDA framework, such as the multidimensional storage model, its distributed and hierarchical implementation along with a benchmark of a parallel in-memory time series reduction operator.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    OPH_REDUCE2 documentation http://ophidia.cmcc.it/documentation/users/operators/OPH_REDUCE2.html.

  2. 2.

    GlusterFS documentation https://docs.gluster.org/en/latest/.

  3. 3.

    ICCLIM (Indice Calculation CLIMate) https://icclim.readthedocs.io/en/latest/ intro.html.

  4. 4.

    NCAR command language https://www.ncl.ucar.edu/.

  5. 5.

    PyOphidia - Conda Forge https://anaconda.org/conda-forge/pyophidia.

  6. 6.

    Dask, library for dynamic task scheduling https://dask.org.

  7. 7.

    Pangeo. A community platform for big data geoscience. https://pangeo.io/.

  8. 8.

    The ESiWACE Center of Excellence on Weather and Climate Simulations in Europe project https://www.esiwace.eu/.

  9. 9.

    ESiWACE Earth System Data Middleware https://github.com/ESiWACE/esdm.

References

  1. Aloisio, G., Fiore, S.: Towards exascale distributed data management. Int. J. High Perform. Comput. Appl. 23(4), 398–400 (2009). https://doi.org/10.1177/1094342009347702

    Article  Google Scholar 

  2. Aloisio, G., Fiore, S., Foster, I., Williams, D.: Scientific big data analytics challenges at large scale. Proceedings of Big Data and Extreme-scale Computing (BDEC) (2013)

    Google Scholar 

  3. Asch, M., et al.: Big data and extreme-scale computing: pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int. J. High Perform. Comput. Appl. 32(4), 435–479 (2018). https://doi.org/10.1177/1094342018778123

    Article  Google Scholar 

  4. Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. SIGMOD Rec. 27(2), 575–577 (1998). https://doi.org/10.1145/276305.276386

    Article  Google Scholar 

  5. Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: Spatio-temporal retrieval with RasDaMan. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999 pp. 746–749. Morgan Kaufmann Publishers Inc., San Francisco (1999). http://dl.acm.org/citation.cfm?id=645925.671513

  6. Baumann, P., Furtado, P., Ritsch, R., Widmann, N.: The RasDaMan approach to multidimensional database management. In: Proceedings of the 1997 ACM Symposium on Applied Computing, SAC 1997, pp. 166–173. ACM, New York (1997). https://doi.org/10.1145/331697.331732

  7. Bell, G., Hey, T., Szalay, A.: Beyond the data deluge. Science 323(5919), 1297–1298 (2009). https://doi.org/10.1126/science.1170411

    Article  Google Scholar 

  8. Brown, P.G.: Overview of sciDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 963–968. ACM, New York (2010). https://doi.org/10.1145/1807167.1807271

  9. D’Anca, A., et al.: On the use of in-memory analytics workflows to computer science indicators from large climate datasets. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 1035–1043, May 2017. https://doi.org/10.1109/CCGRID.2017.132

  10. Dongarra, J., et al.: The international exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011). https://doi.org/10.1177/1094342010391989

    Article  Google Scholar 

  11. Elia, D., et al.: An in-memory based framework for scientific data analytics. In: Proceedings of the ACM International Conference on Computing Frontiers, CF 2016, pp. 424–429. ACM, New York (2016). https://doi.org/10.1145/2903150.2911719

  12. Fiore, S., et al.: Ophidia: a full software stack for scientific data analytics. In: 2014 International Conference on High Performance Computing Simulation (HPCS), pp. 343–350, July 2014. https://doi.org/10.1109/HPCSim.2014.6903706

  13. Fiore, S., et al.: Distributed and cloud-based multi-model analytics experiments on large volumes of climate change data in the earth system grid federation eco-system. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2911–2918, December 2016. https://doi.org/10.1109/BigData.2016.7840941

  14. Fiore, S., D’Anca, A., Palazzo, C., Foster, I.T., Williams, D.N., Aloisio, G.: Ophidia: toward big data analytics for escience. In: Proceedings of the International Conference on Computational Science, ICCS 2013, Barcelona, Spain, 5–7 June 2013, pp. 2376–2385 (2013). https://doi.org/10.1016/j.procs.2013.05.409

  15. Fiore, S., et al.: Big data analytics on large-scale scientific datasets in the INDIGO-DataCloud project. In: Proceedings of the Computing Frontiers Conference, CF 2017, pp. 343–348. ACM, New York (2017). https://doi.org/10.1145/3075564.3078884

  16. Folk, M., Heber, G., Koziol, Q., Pourmal, E., Robinson, D.: An overview of the HDF5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases. AD 2011, pp. 36–47. ACM, New York (2011). https://doi.org/10.1145/1966895.1966900

  17. Golfarelli, M., Rizzi, S.: Data Warehouse Design: Modern Principles and Methodologies, 1st edn. McGraw-Hill Inc., New York (2009)

    Google Scholar 

  18. Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. SIGMOD Rec. 34(4), 34–41 (2005). https://doi.org/10.1145/1107499.1107503

    Article  Google Scholar 

  19. Hu, F., et al.: ClimateSpark: an in-memory distributed computing framework for big climate data analytics. Comput. Geosci. 115, 154–166 (2018). https://doi.org/10.1016/j.cageo.2018.03.011

    Article  Google Scholar 

  20. Palamuttam, R., et al.: SciSpark: applying in-memory distributed computing to weather event detection and tracking. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2020–2026, October 2015. https://doi.org/10.1109/BigData.2015.7363983

  21. Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58(7), 56–68 (2015). https://doi.org/10.1145/2699414

    Article  Google Scholar 

  22. Schulzweida, U.: CDO user guide - version 1.9.6 (2019). https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf

  23. Stonebraker, M., Brown, P., Becla, J., Zhang, D.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013). https://doi.org/10.1109/MCSE.2013.19

    Article  Google Scholar 

  24. Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The Architecture of SciDB. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 1–16. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22351-8_1

    Chapter  Google Scholar 

  25. Wilson, B., et al.: SciSpark: highlyinteractive in-memory science data analytics. In: 2016 IEEE InternationalConference on Big Data (Big Data), pp. 2964–2973, December 2016. https://doi.org/10.1109/BigData.2016.7840948

  26. Zender, C.S.: Analysis of self-describing gridded geoscience data with netCDF Operators (NCO). Environ. Model. Softw. 23(10), 1338–1342 (2008). https://doi.org/10.1016/j.envsoft.2008.03.004

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the EU H2020 Excellence in SImulation of Weather and Climate in Europe (ESiWACE) project (Grant Agreement 675191). Moreover, the authors would like to acknowledge Antonio Aloisio for his editing and proofreading work on this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandro Fiore .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fiore, S. et al. (2019). Towards High Performance Data Analytics for Climate Change. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34356-9_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34355-2

  • Online ISBN: 978-3-030-34356-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics