Abstract
The amount of data output from a computer simulation has grown to terabytes and petabytes as increasingly complex simulations are being run on massively parallel systems. As we approach exaflop computing in the next decade, it is expected that the I/O subsystem will not be able to write out these large volumes of data. In this paper, we explore the use of machine learning to compress the data before it is written out. Despite the computational constraints that limit us to using very simple learning algorithms, our results show that machine learning is a viable option for compressing unstructured data. We demonstrate that by simply using a better sampling algorithm to generate the training set, we can obtain more accurate results compared to random sampling, but at no extra cost. Further, by carefully selecting and incorporating points with high prediction error, we can improve reconstruction accuracy without sacrificing the compression rate.
Similar content being viewed by others
References
Atkeson, C., Schaal, S.A., Moore, A.W.: Locally weighted learning. AI Rev. 11, 75–133 (1997)
Bridson, R.: Fast Poisson disk sampling in arbitrary dimensions. In: ACM SIGGRAPH 2007 Sketches, SIGGRAPH ’07. ACM, New York (2007). https://doi.org/10.1145/1278780.1278807
Chen, Z., Son, S.W., Hendrix, W., Agrawal, A., Liao, W.k., Choudhary, A.: NUMARCK: machine learning algorithm for resiliency and checkpointing. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’14, pp. 733–744. IEEE Press, Piscataway (2014). https://doi.org/10.1109/SC.2014.65
Cheng, L., Vishwanathan, S.V.N.: Learning to compress images and videos. In: Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pp. 161–168. ACM, New York (2007). https://doi.org/10.1145/1273496.1273517
Childs, H., et al.: In situ processing. In: Bethel, E.W., Childs, H., Hansen, C. (eds.) High Performance Visualization-Enabling Extreme-Scale Scientific Insight, pp. 171–198. CRC Press/Francis-Taylor Group, Boca Raton (2012)
Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ. In: Proceedings of the International Parallel and Distributed Processing Symposium, pp. 730–739. IEEE (2016)
Fan, Y.J., Kamath, C.: A comparison of compressed sensing and sparse recovery algorithms applied to simulation data. Stat. Optim. Inf. Comput. 4(3), 194–213 (2016)
Iverson, J., Kamath, C., Karypis, G.: Fast and effective lossy compression algorithms for scientific datasets. In: Proceedings of the 18th International Conference on Parallel Processing, Euro-Par’12, Berlin, pp. 843–856 (2012)
Kamath, C.: Learning to compress unstructured mesh data from simulations. In: 2017 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2017, Tokyo, Japan, October 19–21, 2017, pp. 621–630 (2017)
Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S.H., Chang, C.S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: Isabela for effective in situ compression of scientific data. Concurr. Comput. Pract. Exp. 25(4), 524–540 (2013). https://doi.org/10.1002/cpe.2887
Lin, Z., Hahm, T.S., Lee, W.W., Tang, W.M., White, R.B.: Turbulent transport reduction by zonal flows: massively parallel simulations. Science 281, 1835 (1998)
Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014). https://doi.org/10.1109/TVCG.2014.2346458
Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Trans. Vis. Comput. Graph. 12(5), 1245–1250 (2006)
Mitchell, D.P.: Spectrally optimal sampling for distribution ray tracing. Comput. Graph. 25(4), 157–164 (1991)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Sakai, R., Sasaki, D., Obayashi, S., Nakahashi, K.: Wavelet-based data compression for flow simulation on block-structured cartesian mesh. Int. J. Numer. Methods Fluids 73(5), 462–476 (2013). https://doi.org/10.1002/fld.3808
Salloum, M., Fabian, N., Hensinger, D.M., Templeton, J.A.: Compressed sensing and reconstruction of unstructured mesh datasets. arXiv:1508.06314 (2015)
Shiflet, A.B., Shiflet, G.W.: Introduction to Computational Science: Modeling and Simulation for the Sciences. Princeton University Press, Princeton (2006)
Acknowledgements
I thank the reviewers of both the original DSAA’2017 paper, and this extended version, for their careful review and thoughtful suggestions for improvements. I also thank Prof. Zhihong Lin, from UC Irvine, for providing access to the data generated as part of the GSEP SciDAC project. This work was funded by the ASCR Program (Dr. Lucille Nowell, Program Manager) at the Office of Science, US Department of Energy. LLNL-JRNL-750460 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This version is an extended version of the DSAA’2017 Application Track paper titled “Learning to Compress Unstructured Mesh Data From Simulations” [9].
Rights and permissions
About this article
Cite this article
Kamath, C. Compressing unstructured mesh data from simulations using machine learning. Int J Data Sci Anal 9, 113–130 (2020). https://doi.org/10.1007/s41060-019-00180-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-019-00180-6