Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10524)


Because of the vast volume of data being produced by today’s scientific simulations, lossy compression allowing user-controlled information loss can significantly reduce the data size and the I/O burden. However, for large-scale cosmology simulation, such as the Hardware/Hybrid Accelerated Cosmology Code (HACC), where memory overhead constraints restrict compression to only one snapshot at a time, the lossy compression ratio is extremely limited because of the fairly low spatial coherence and high irregularity of the data. In this work, we propose a pattern-matching (similarity searching) technique to optimize the prediction accuracy and compression ratio of SZ lossy compressor on the HACC data sets. We evaluate our proposed method with different configurations and compare it with state-of-the-art lossy compressors. Experiments show that our proposed optimization approach can improve the prediction accuracy and reduce the compressed size of quantization codes compared with SZ. We present several lessons useful for future research involving pattern-matching techniques for lossy compression.


Lossy Compression Methods Cosmology Simulations Quantization Codes Search Buffer Look-ahead Buffer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Baker, A.H., Xu, H., Dennis, J.M., Levy, M.N., Nychka, D., Mickelson, S.A., Edwards, J., Vertenstein, M., Wegener, A.: A methodology for evaluating the impact of data compression on climate simulation data. In: HPDC 2014, pp. 203–214 (2014)Google Scholar
  3. 3.
    Bernholdt, D., Bharathi, S., Brown, D., Chanchio, K., Chen, M., Chervenak, A., Cinquini, L., Drach, B., Foster, I., Fox, P., et al.: The earth system grid: supporting the next generation of climate modeling research. Proc. IEEE 93(3), 485–495 (2005)CrossRefGoogle Scholar
  4. 4.
    Chan, K.P., Fu, A.W.C.: Efficient time series matching by wavelets. In: Proceedings of the 15th International Conference on Data Engineering, pp. 126–133. IEEE (1999)Google Scholar
  5. 5.
    Chanussot, J., Lambert, P.: Total ordering based on space filling curves for multivalued morphology. Comput. Imaging Vis. 12, 51–58 (1998)zbMATHGoogle Scholar
  6. 6.
    Chen, Z., Son, S.W., Hendrix, W., Agrawal, A., Liao, W., Choudhary, A.N.: NUMARCK: machine learning algorithm for resiliency and checkpointing. In: SC 2014, pp. 733–744 (2014)Google Scholar
  7. 7.
    Committee, I.S., et al.: 754–2008 IEEE standard for floating-point arithmetic. IEEE Comput. Soc. Std 2008 (2008)Google Scholar
  8. 8.
    Daubechies, I.: The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inf. Theory 36(5), 961–1005 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Deutsch, L.P.: GZIP file format specification version 4.3 (1996)Google Scholar
  10. 10.
    Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ. In: 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, 23–27 May 2016, pp. 730–739 (2016)Google Scholar
  11. 11.
    Gleckler, P.J., Durack, P.J., Stouffer, R.J., Johnson, G.C., Forest, C.E.: Industrial-era global ocean heat uptake doubles in recent decades. Nat. Clim. Chang. (2016)Google Scholar
  12. 12.
    Habib, S., Pope, A., Finkel, H., Frontiere, N., Heitmann, K., Daniel, D., Fasel, P., Morozov, V., Zagaris, G., Peterka, T., et al.: Hacc: simulating sky surveys on state-of-the-art supercomputing architectures. New Astron. 42, 49–65 (2016)CrossRefGoogle Scholar
  13. 13.
    Huffman, D.A., et al.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)CrossRefzbMATHGoogle Scholar
  14. 14.
    Kumar, A., Zhu, X., Tu, Y.-C., Pandit, S.: Compression in molecular simulation datasets. In: Sun, C., Fang, F., Zhou, Z.-H., Yang, W., Liu, Z.-Y. (eds.) IScIDE 2013. LNCS, vol. 8261, pp. 22–29. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-42057-3_4 CrossRefGoogle Scholar
  15. 15.
    Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S., Chang, C., Klasky, S., Latham, R., Ross, R.B., Samatova, N.F.: ISABELA for effective in situ compression of scientific data. Concurr. Comput. Pract. Exp. 25(4), 524–540 (2013)CrossRefGoogle Scholar
  16. 16.
    Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014)CrossRefGoogle Scholar
  17. 17.
    Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. TVCG 12(5), 1245–1250 (2006)Google Scholar
  18. 18.
    Meyer, T., Ferrer-Costa, C., Pérez, A., Rueda, M., Bidon-Chanal, A., Luque, F.J., Laughton, C., Orozco, M.: Essential dynamics: a tool for efficient trajectory compression and management. J. Chem. Theory Comput. 2(2), 251–258 (2006)CrossRefGoogle Scholar
  19. 19.
    Omeltchenko, A., Campbell, T.J., Kalia, R.K., Liu, X., Nakano, A., Vashishta, P.: Scalable i/o of large-scale molecular dynamics simulations: A data-compression algorithm. Comput. Phys. Commun. 131(1), 78–85 (2000)CrossRefzbMATHGoogle Scholar
  20. 20.
    Ratanaworabhan, P., Ke, J., Burtscher, M.: Fast lossless compression of scientific floating-point data. In: Proceedings of the Data Compression Conference, DCC 2006, pp. 133–142. IEEE (2006)Google Scholar
  21. 21.
    Sasaki, N., Sato, K., Endo, T., Matsuoka, S.: Exploration of lossy compression for application-level checkpoint/restart. In: 2015 IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp. 914–922. IEEE (2015)Google Scholar
  22. 22.
    Tao, D., Di, S., Chen, Z., Cappello, F.: Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In: 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, Florida, USA, 29 May–2 June, 2017, pp. 1129–1139 (2017)Google Scholar
  23. 23.
    Yang, D.Y., Grama, A., Sarin, V.: Bounded-error compression of particle data from hierarchical approximate methods. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, SC 1999. ACM, New York, NY, USA (1999)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of CaliforniaRiversideUSA
  2. 2.Argonne National LaboratoryLemontUSA
  3. 3.University of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations