Toward Decoupling the Selection of Compression Algorithms from Quality Constraints

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10524)


Data intense scientific domains use data compression to reduce the storage space needed. Lossless data compression preserves the original information accurately but on the domain of climate data usually yields a compression factor of only 2:1. Lossy data compression can achieve much higher compression rates depending on the tolerable error/precision needed. Therefore, the field of lossy compression is still subject to active research. From the perspective of a scientist, the compression algorithm does not matter but the qualitative information about the implied loss of precision of data is a concern.

With the Scientific Compression Library (SCIL), we are developing a meta-compressor that allows users to set various quantities that define the acceptable error and the expected performance behavior. The ongoing work a preliminary stage for the design of an automatic compression algorithm selector. The task of this missing key component is the construction of appropriate chains of algorithms to yield the users requirements. This approach is a crucial step towards a scientifically safe use of much-needed lossy data compression, because it disentangles the tasks of determining scientific ground characteristics of tolerable noise, from the task of determining an optimal compression strategy given target noise levels and constraints. Future algorithms are used without change in the application code, once they are integrated into SCIL.

In this paper, we describe the user interfaces and quantities, two compression algorithms and evaluate SCIL’s ability for compressing climate data. This will show that the novel algorithms are competitive with state-of-the-art compressors ZFP and SZ and illustrate that the best algorithm depends on user settings and data properties.


Compression Algorithm Lossy Compression Optimal Compression Scheme Target Noise Level AbsTol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported in part by the German Research Foundation (DFG) through the Priority Programme 1648 “Software for Exascale Computing” (SPPEXA) (GZ: LU 1353/11-1).


  1. 1.
    Hubbe, N., Kunkel, J.: Reducing the HPC-Datastorage Footprint with MAFISC - Multidimensional Adaptive Filtering Improved Scientific data Compression. Computer Science - Research and Development, pp. 231–239 (2013)Google Scholar
  2. 2.
    Kunkel, J.: Analyzing Data Properties using Statistical Sampling Techniques - Illustrated on Scientific File Formats and Compression Features. In Taufer, M., Mohr, B., Kunkel, J., eds.: High Performance Computing: ISC High Performance 2016 International Workshops, ExaComm, E-MuCoCoS, HPC-IODC, IXPUG, IWOPH, P3MA, VHPC, WOPSSS, 130–141. Number 9945 2016 in Lecture Notes in Computer Science. Springer, Heidelberg (2016)Google Scholar
  3. 3.
  4. 4.
    DEFLATE algorithm. Accessed 04 Oct 2016
  5. 5.
    Huffman coding. A Method for the Construction of Minimum-Redundancy Codes. Accessed 04 Oct 2016Google Scholar
  6. 6.
    GZIP algorithm. Accessed 04 Oct 2016
  7. 7.
    Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Trans. Visual Comput. Graphics 12(5), 1245–1250 (2006)CrossRefGoogle Scholar
  8. 8.
    Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ (2015)Google Scholar
  9. 9.
    Hübbe, N., Wegener, A., Kunkel, J.M., Ling, Y., Ludwig, T.: Evaluating lossy compression on climate data. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 343–356. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-38750-0_26 CrossRefGoogle Scholar
  10. 10.
    Bicer, T., Agrawal, G.: A compression framework for multidimensional scientific datasets. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), pp. 2250–2253 (2013)Google Scholar
  11. 11.
    Laney, D., Langer, S., Weber, C., Lindstrom, P., Wegener, A.: Assessing the effects of data compression in simulations using physically motivated metrics. Super Computing (2013)Google Scholar
  12. 12.
    Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: Compressing the incompressible with isabela: in-situ reduction of spatio-temporal data. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6852, pp. 366–379. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-23400-2_34 CrossRefGoogle Scholar
  13. 13.
    Iverson, J., Kamath, C., Karypis, G.: Fast and effective lossy compression algorithms for scientific datasets. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 843–856. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-32820-6_83 CrossRefGoogle Scholar
  14. 14.
    Gomez, L.A.B., Cappello, F.: Improving floating point compression through binary masks. In: 2013 IEEE International Conference on Big Data (2013)Google Scholar
  15. 15.
    Lindstrom, P.: Fixed-Rate Compressed Floating-Point Arrays. IEEE Trans. Visualization Comput Graphics 2012 (2014)Google Scholar
  16. 16.
    Baker, A.H., et al.: Evaluating lossy data compression on climate simulation data within a large ensemble. Geosci. Model Dev. 9, 4381–4403 (2016)Google Scholar
  17. 17.
    OpenSimplex Noise in Java. Accessed 05 Feb 2017
  18. 18.
    Roeckner, E., Bäuml, G., Bonaventura, L., Brokopf, R., Esch, M., Giorgetta, M., Hagemann, S., Kirchner, I., Kornblueh, L., Manzini, E., et al.: The Atmospheric General Circulation Model ECHAM 5. Model description, PART I (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Deutsches KlimarechenzentrumHamburgGermany
  2. 2.Universität HamburgHamburgGermany

Personalised recommendations