Skip to main content

Advertisement

Log in

Reducing the HPC-datastorage footprint with MAFISC—Multidimensional Adaptive Filtering Improved Scientific data Compression

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

Large HPC installations today also include large data storage installations. Data compression can significantly reduce the amount of data, and it was one of our goals to find out, how much compression can do for climate data. The price of compression is, of course, the need for additional computational resources, so our second goal was to relate the savings of compression to the costs it necessitates.

In this paper we present the results of our analysis of typical climate data. A lossless algorithm based on these insights is developed and its compression ratio is compared to that of standard compression tools. As it turns out, this algorithm is general enough to be useful for a large class of scientific data, which is the reason we speak of MAFISC as a method for scientific data compression. A numeric problem for lossless compression of scientific data is identified and a possible solution is given. Finally, we discuss the economics of data compression in HPC environments using the example of the German Climate Computing Center.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Alfsen K, Skodvin T (2009) The intergovernmental panel on climate change (ipcc) and scientific consensus

  2. Bosi M (2012) MPEG audio compression basics. In: Chiariglione L (ed) The MPEG Representation of Digital Media. Springer, New York, pp 97–123. doi:10.1007/978-1-4419-6184-6_6

    Chapter  Google Scholar 

  3. Christopoulos C, Skodras A, Ebrahimi T (2000) The JPEG2000 still image coding system: an overview. IEEE Trans Consum Electron 46(4):1103–1127

    Article  Google Scholar 

  4. Dutta S, Bhattacherjee S, Narang A (2011) Towards “intelligent compression” in streams: a biased reservoir sampling based bloom filter approach. arXiv e-prints

  5. ECMA: Streaming lossless data compression algorithm—(sldc) (2001). ECMA Standart 321

  6. Fenwick P (1996) The Burrows–Wheeler transform for block sorting text compression: principles and improvements. Comput J 39(9):731

    Article  Google Scholar 

  7. Furht B (1995) A survey of multimedia compression techniques and standards. Part I: JPEG standard. Real-Time Imaging 1(1):49–67

    Article  Google Scholar 

  8. Jain C, Chaudhary V, Jain K, Karsoliya S (2011) Performance analysis of integer wavelet transform for image compression. In: 3rd International Conference on Electronics Computer Technology (ICECT), vol 3, pp 244–246. doi:10.1109/ICECTECH.2011.5941746

    Google Scholar 

  9. Koranne S, Koranne S (2011) Hierarchical data format 5: HDF5. In: Handbook of Open Source Tools. Springer, New York, pp 191–200. doi:10.1007/978-1-4419-7719-9_10

    Chapter  Google Scholar 

  10. Lakshminarasimhan S, Shah N, Ethier S, Klasky S, Latham R, Ross R, Samatova N (2011) Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data. In: Euro-Par 2011 Parallel Processing pp. 366–379

    Chapter  Google Scholar 

  11. Latham R (2010) The parallel-netCDF I/O library

  12. Nagaraj N, Vaidya P, Bhat K (2009) Arithmetic coding as a non-linear dynamical system. Commun Nonlinear Sci Numer Simul 14(4):1013–1020

    Article  MathSciNet  MATH  Google Scholar 

  13. Rockel B, Will A, Hense A (2008) The regional climate model COSMO-CLM (CCLM). Meteorol Z 17(4):347–348

    Article  Google Scholar 

  14. Szalay A (2011) Extreme data-intensive scientific computing. Comput Sci Eng 13(6):34–41. doi:10.1109/MCSE.2011.74

    Article  MathSciNet  Google Scholar 

  15. Taylor KE, Stouffer RJ, Meehl GA (2007) A summary of the CMIP5 experiment design. World 4 (January 2011), 1–33. http://cmip-pcmdi.llnl.gov/cmip5/docs/Taylor_CMIP5_design.pdf

  16. Woodring J, Mniszewski S, Brislawn C, DeMarle D, Ahrens J (2011) Revisiting wavelet compression for large-scale climate data using JPEG2000 and ensuring data precision. In: IEEE symposium on large data analysis and visualization (LDAV), pp 31–38. doi:10.1109/LDAV.2011.6092314

    Chapter  Google Scholar 

Download references

Acknowledgements

We want to thank Wolfgang Stahl for his measurements of the tape drive compression ratio at the DKRZ and the fruitful discussion about compression impact on HPC installations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathanael Hübbe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hübbe, N., Kunkel, J. Reducing the HPC-datastorage footprint with MAFISC—Multidimensional Adaptive Filtering Improved Scientific data Compression. Comput Sci Res Dev 28, 231–239 (2013). https://doi.org/10.1007/s00450-012-0222-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-012-0222-4

Keywords

Navigation