Abstract
In this paper, we propose a correlation-aware probabilistic data summarization technique to efficiently analyze and visualize large-scale multi-block volume data generated by massively parallel scientific simulations. The core of our technique is correlation modeling of distribution representations of adjacent data blocks using copula functions and accurate data value estimation by combining numerical information, spatial location, and correlation distribution using Bayes’ rule. This effectively preserves statistical properties without merging data blocks in different parallel computing nodes and repartitioning them, thus significantly reducing the computational cost. Furthermore, this enables reconstruction of the original data more accurately than existing methods. We demonstrate the effectiveness of our technique using six datasets, with the largest having one billion grid points. The experimental results show that our approach reduces the data storage cost by approximately one order of magnitude compared to state-of-the-art methods while providing a higher reconstruction accuracy at a lower computational cost.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Ahrens, J.; Hendrickson, B.; Long, G.; Miller, S.; Ross, R.; Williams, D. Data intensive science in the Department of Energy. Technical Report, LA-UR-10-07088. Los Alamos National Laboratory, 2010.
Nowell, L. Science at extreme scale: Architectural challenges and opportunities. 2014. Available at https://www.mcs.anl.gov/∼hereld/doecgf2014/slides/ScienceAtExtremeScale_DOECGF_Nowell_140424v2.pdf.
Luo, A.; Kao, D.; Pang, A. Visualizing spatial distribution data sets. In: Proceedings of the Symposium on Data Visualisation, 29–38, 2003.
Kniss, J. M.; Van Uitert, R.; Stephens, A.; Li, G.; Tasdizen, T.; Hansen, C. Statistically quantitative volume visualization. In: Proceedings of the IEEE Visualization, 287–294, 2005.
Potter, K.; Krüger, J.; Johnson, C. Towards the visualization of multi-dimensional stochastic distribution data. In: Proceedings of the International Conference on Computer Graphics and Visualization, 2008. Available at http://www.sci.utah.edu/publications/Pot2008a/CGV08-Potter-Kruger-Johnson.pdf.
Johnson, C. R.; Huang, J. Distribution-driven visualization of volume data. IEEE Transactions on Visualization and Computer Graphics Vol. 15, No. 5, 734–746, 2009.
Gosink, L. J.; Garth, C.; Anderson, J. C.; Bethel, E. W.; Joy, K. I. An application of multivariate statistical analysis for query-driven visualization. IEEE Transactions on Visualization and Computer Graphics Vol. 17, No. 3, 264–275, 2011.
Potter, K.; Kniss, J.; Riesenfeld, R.; Johnson, C. R. Visualizing summary statistics and uncertainty. Computer Graphics Forum Vol. 29, No. 3, 823–832, 2010.
Thompson, D.; Levine, J. A.; Bennett, J. C.; Bremer, P. T.; Gyulassy, A.; Pascucci, V.; Pébay, P. P. Analysis of large-scale scalar data using hixels. In: Proceedings of the IEEE Symposium on Large Data Analysis and Visualization, 23–30, 2011.
Liu, S. S.; Levine, J. A.; Bremer, P. T.; Pascucci, V. Gaussian mixture model based volume visualization. In: Proceedings of the IEEE Symposium on Large Data Analysis and Visualization, 73–77, 2012.
Dutta, S.; Shen, H. W. Distribution driven extraction and tracking of features for time-varying data analysis. IEEE Transactions on Visualization and Computer-Graphics Vol. 22, No. 1, 837–846, 2016.
Pöthkow, K.; Hege, H. Nonparametric models for uncertainty visualization. Computer Graphics Forum Vol. 32, No. 3pt2, 131–140, 2013.
Chaudhuri, A.; Wei, T. H.; Lee, T. Y.; Shen, H. W.; Peterka, T. Efficient range distribution query for visualizing scientific data. In: Proceedings of the IEEE Pacific Visualization Symposium, 201–208, 2014.
Nouanesengsy, B.; Woodring, J.; Patchett, J.; Myers, K.; Ahrens, J. ADR visualization: A generalized framework for ranking large-scale scientific data using Analysis-Driven Refinement. In: Proceedings of the IEEE 4th Symposium on Large Data Analysis and Visualization, 43–50, 2014.
Athawale, T.; Sakhaee, E.; Entezari, A. Isosurface visualization of data with nonparametric models for uncertainty. IEEE Transactions on Visualization and Computer Graphics Vol. 22, No. 1, 777–786, 2016.
Wei, T. H.; Chen, C. M.; Biswas, A. Efficient local histogram searching via bitmap indexing. Computer Graphics Forum Vol. 34, No. 3, 81–90, 2015.
Dutta, S.; Chen, C. M.; Heinlein, G.; Shen, H. W.; Chen, J. P. In situ distribution guided analysis and visualization of transonic jet engine simulations. IEEE Transactions on Visualization and Computer Graphics Vol. 23, No. 1, 811–820, 2017.
Dutta, S.; Woodring, J.; Shen, H. W.; Chen, J. P.; Ahrens, J. Homogeneity guided probabilistic data summaries for analysis and visualization of large-scale data sets. In: Proceedings of the IEEE Pacific Visualization Symposium, 111–120, 2017.
Reynolds, D. R.; Gardner, D. J.; Balos, C. J.; Woodward, C. S. SUNDIALS Multiphysics+MPIMany-Vector performance testing. arXiv preprint arXiv: 1909.12966, 2019.
Wang, K. C.; Lu, K. W.; Wei, T. H.; Shareef, N.; Shen, H. W. Statistical visualization and analysis of large data using a value-based spatial distribution. In: Proceedings of the IEEE Pacific Visualization Symposium, 161–170, 2017.
Sklar, A. Fonctions de Répartition à n Dimensions et Leurs Marges. Publications de l’Institut Statistique de l’Université de Paris Vol. 8, 229–231, 1959.
Hazarika, S.; Biswas, A.; Shen, H. W. Uncertainty visualization using copula-based analysis in mixed distribution models. IEEE Transactions on Visualization and Computer Graphics Vol. 24, No. 1, 934–943, 2018.
Hazarika, S.; Dutta, S.; Shen, H. W.; Chen, J. P. CoDDA: A flexible copula-based distribution driven analysis framework for large-scale multivariate data. IEEE Transactions on Visualization and Computer Graphics Vol. 25, No. 1, 1214–1224, 2019.
Ihm, I.; Park, S. Wavelet-based 3D compression scheme for very large volume data. In: Proceedings of the Graphics Interface, 107–116, 1998.
Kim, T.; Shin, Y. An efficient wavelet-based compression method for volume rendering. In: Proceedings of the 7th Pacific Conference on Computer Graphics and Applications, 147–156, 1999.
Sasaki, N.; Sato, K.; Endo, T.; Matsuoka, S. Exploration of lossy compression for application-level checkpoint/restart. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 914–922, 2015.
Deering, M. Geometry compression. In: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, 13–20, 1995.
Peng, J. L.; Kuo, C.-C. J. Geometry-guided progressive lossless 3D mesh coding with octree (OT) decomposition. In: Proceedings of the ACM SIGGRAPH 2005 Papers, 609–616, 2005.
Khodakovsky, A.; Schröder, P.; Sweldens, W. Progressive geometry compression. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 271–278, 2000.
Gu, X. F.; Gortler, S. J.; Hoppe, H. Geometry images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, 355–361, 2002.
Tzeng, F. Y.; Lum, E. B.; Ma, K. L. A novel interface for higher-dimensional classification of volume data. In: Proceedings of the IEEE Visualization, 505–512, 2003.
Kindlmann, G.; Whitaker, R.; Tasdizen, T.; Moller, T. Curvature-based transfer functions for direct volume rendering: Methods and applications. In: Proceedings of the IEEE Visualization, 513–520, 2003.
Tenginakai, S.; Lee, J.; Machiraju, R. Salient isosurface detection with model-independent statistical signatures. In: Proceedings of the Visualization, 231–238, 2001.
Hladåvka, J.; König, A.; Gröller, E. Salient representation of volume data. In: Data Visualization 2001. Eurographics. Ebert, D. S.; Favre, J. M.; Peikert, R. Eds. Springer Vienna, 203–211, 2001.
Kniss, J.; Kindlmann, G.; Hansen, C. Multidimensional transfer functions for interactive volume rendering. IEEE Transactions on Visualization and Computer Graphics Vol. 8, No. 3, 270–285, 2002.
Wang, K. C.; Xu, J. Y.; Woodring, J.; Shen, H. W. Statistical super resolution for data analysis and visualization of large scale cosmological simulations. In: Proceedings of the IEEE Pacific Visualization Symposium, 303–312, 2019.
Schmidt, T. Coping with copulas. In: Copulas — From Theory to Application in Finance. Bloomberg Press, 3–34, 2006.
Bilmes, J. A. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute, 1998. Available at http://www.leap.ee.iisc.ac.in/sriram/teaching/MLSP_18/refs/GMMBilmes.pdf.
Nocedal, J.; Wright, S. Numerical Optimization. New York: Springer, 2006.
Wang, C. L.; Shen, H. W. Information theory in scientific visualization. Entropy Vol. 13, No. 1, 254–273, 2011.
Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing Vol. 13, No. 4, 600–612, 2004.
Acknowledgements
This work was supported by the Chinese Postdoctoral Science Foundation (2021M700016).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Yang Yang is a postdoctoral researcher at the Institute of Applied Physics and Computational Mathematics. She received her Ph.D. degree from the University of Science and Technology of China in 2020. Her research interests include large-scale scientific data visualization and geometric processing.
Kecheng Lu is working towards a Ph.D. degree in the School of Computer Science and Technology at Shandong University. He received his bachelor degree from Shandong University in 2017. His research interests include data visualization and visual analysis.
Yu Wu is working toward a master degree at the Institute of Applied Physics and Computational Mathematics. He received his bachelor degree from Shandong University in 2020. His research interests include large-scale scientific data visualization and scientific data compression.
Yunhai Wang is a professor in the School of Computer Science and Technology at the Shandong University Qingdao Campus. He leads the Interactive Data Exploration System Laboratory that aims to enhance people’s ability to understand and communicate data through the design of automated visualization and visual analytics systems.
Yi Cao is a professor in the High Performance Computing Center at the Institute of Applied Physics and Computational Mathematics. His research interests include large-scale scientific data visualization, computer graphics, and parallel computing.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Yang, Y., Lu, K., Wu, Y. et al. Correlation-aware probabilistic data summarization for large-scale multi-block scientific data visualization. Comp. Visual Media 9, 513–529 (2023). https://doi.org/10.1007/s41095-022-0304-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-022-0304-6