Skip to main content
Log in

Compressing unstructured mesh data from simulations using machine learning

  • Applications
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

The amount of data output from a computer simulation has grown to terabytes and petabytes as increasingly complex simulations are being run on massively parallel systems. As we approach exaflop computing in the next decade, it is expected that the I/O subsystem will not be able to write out these large volumes of data. In this paper, we explore the use of machine learning to compress the data before it is written out. Despite the computational constraints that limit us to using very simple learning algorithms, our results show that machine learning is a viable option for compressing unstructured data. We demonstrate that by simply using a better sampling algorithm to generate the training set, we can obtain more accurate results compared to random sampling, but at no extra cost. Further, by carefully selecting and incorporating points with high prediction error, we can improve reconstruction accuracy without sacrificing the compression rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Atkeson, C., Schaal, S.A., Moore, A.W.: Locally weighted learning. AI Rev. 11, 75–133 (1997)

    Google Scholar 

  2. Bridson, R.: Fast Poisson disk sampling in arbitrary dimensions. In: ACM SIGGRAPH 2007 Sketches, SIGGRAPH ’07. ACM, New York (2007). https://doi.org/10.1145/1278780.1278807

  3. Chen, Z., Son, S.W., Hendrix, W., Agrawal, A., Liao, W.k., Choudhary, A.: NUMARCK: machine learning algorithm for resiliency and checkpointing. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’14, pp. 733–744. IEEE Press, Piscataway (2014). https://doi.org/10.1109/SC.2014.65

  4. Cheng, L., Vishwanathan, S.V.N.: Learning to compress images and videos. In: Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pp. 161–168. ACM, New York (2007). https://doi.org/10.1145/1273496.1273517

  5. Childs, H., et al.: In situ processing. In: Bethel, E.W., Childs, H., Hansen, C. (eds.) High Performance Visualization-Enabling Extreme-Scale Scientific Insight, pp. 171–198. CRC Press/Francis-Taylor Group, Boca Raton (2012)

    Google Scholar 

  6. Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ. In: Proceedings of the International Parallel and Distributed Processing Symposium, pp. 730–739. IEEE (2016)

  7. Fan, Y.J., Kamath, C.: A comparison of compressed sensing and sparse recovery algorithms applied to simulation data. Stat. Optim. Inf. Comput. 4(3), 194–213 (2016)

    Article  MathSciNet  Google Scholar 

  8. Iverson, J., Kamath, C., Karypis, G.: Fast and effective lossy compression algorithms for scientific datasets. In: Proceedings of the 18th International Conference on Parallel Processing, Euro-Par’12, Berlin, pp. 843–856 (2012)

  9. Kamath, C.: Learning to compress unstructured mesh data from simulations. In: 2017 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2017, Tokyo, Japan, October 19–21, 2017, pp. 621–630 (2017)

  10. Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S.H., Chang, C.S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: Isabela for effective in situ compression of scientific data. Concurr. Comput. Pract. Exp. 25(4), 524–540 (2013). https://doi.org/10.1002/cpe.2887

    Article  Google Scholar 

  11. Lin, Z., Hahm, T.S., Lee, W.W., Tang, W.M., White, R.B.: Turbulent transport reduction by zonal flows: massively parallel simulations. Science 281, 1835 (1998)

    Article  Google Scholar 

  12. Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014). https://doi.org/10.1109/TVCG.2014.2346458

    Article  Google Scholar 

  13. Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Trans. Vis. Comput. Graph. 12(5), 1245–1250 (2006)

    Article  Google Scholar 

  14. Mitchell, D.P.: Spectrally optimal sampling for distribution ray tracing. Comput. Graph. 25(4), 157–164 (1991)

    Article  Google Scholar 

  15. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  16. Sakai, R., Sasaki, D., Obayashi, S., Nakahashi, K.: Wavelet-based data compression for flow simulation on block-structured cartesian mesh. Int. J. Numer. Methods Fluids 73(5), 462–476 (2013). https://doi.org/10.1002/fld.3808

    Article  Google Scholar 

  17. Salloum, M., Fabian, N., Hensinger, D.M., Templeton, J.A.: Compressed sensing and reconstruction of unstructured mesh datasets. arXiv:1508.06314 (2015)

  18. Shiflet, A.B., Shiflet, G.W.: Introduction to Computational Science: Modeling and Simulation for the Sciences. Princeton University Press, Princeton (2006)

    MATH  Google Scholar 

Download references

Acknowledgements

I thank the reviewers of both the original DSAA’2017 paper, and this extended version, for their careful review and thoughtful suggestions for improvements. I also thank Prof. Zhihong Lin, from UC Irvine, for providing access to the data generated as part of the GSEP SciDAC project. This work was funded by the ASCR Program (Dr. Lucille Nowell, Program Manager) at the Office of Science, US Department of Energy. LLNL-JRNL-750460 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chandrika Kamath.

Ethics declarations

Conflict of interest

The author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This version is an extended version of the DSAA’2017 Application Track paper titled “Learning to Compress Unstructured Mesh Data From Simulations” [9].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kamath, C. Compressing unstructured mesh data from simulations using machine learning. Int J Data Sci Anal 9, 113–130 (2020). https://doi.org/10.1007/s41060-019-00180-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-019-00180-6

Keywords

Navigation