Abstract
Mathematical modeling, probability estimation, and statistical inference represent core elements of modern artificial intelligence (AI) approaches for data-driven prediction, forecasting, classification, risk estimation, and prognosis. Currently, there are many tools that help calculate and visualize univariate probability distributions. However, very few resources venture beyond into multivariate distributions, which are commonly used in advanced statistical inference and AI decision-making. This article presents a new web-calculator that enables some calculation and visualization of bivariate and trivariate probability distributions. Several methods are explored to compute the joint bivariate and trivariate probability densities, including the optimal multivariate modeling using Gaussian copula. We developed an interactive webapp to visually illustrate the parallels between the mathematical formulation, computational implementation, and graphical depiction of multivariate probability density and cumulative distribution functions. To ensure the interface and functionality are hardware platform independent, scalable, and functional, the app and its component widgets are implemented using HTML5 and JavaScript. We validated the webapp by testing the multivariate copula models under different experimental conditions and inspecting the performance in terms of accuracy and reliability of the estimated multivariate probability densities and distribution function values. This article demonstrates the construction, implementation, and utilization of multivariate probability calculators. The proposed webapp implementation is freely available online (https://socr.umich.edu/HTML5/BivariateNormal/BVN2/) and can be used to assist with education and research of a diverse array of data scientists, STEM instructors, and AI learners.
Similar content being viewed by others
Availability of data and material
NA.
Code availability
All code is publicly available via the Project GitHub repository.
References
Arnold KF, et al. Reflections on modern methods: generalized linear models for prognosis and intervention—theory, practice and implications for machine learning. Int J Epidemiol. 2020;49:2074–82.
Syam N, Sharma A. Waiting for a sales renaissance in the fourth industrial revolution: machine learning and artificial intelligence in sales research and practice. Ind Mark Manag. 2018;69:135–46.
King AJ, Streltchenko O, Yesha Y. Using multi-agent simulation to understand trading dynamics of a derivatives market. Ann Math Artif Intell. 2005;44(3):233–53.
Huang S, et al. Artificial intelligence in cancer diagnosis and prognosis: opportunities and challenges. Cancer Lett. 2020;471:61–71.
Brynjolfsson E, Rock D, Syverson C. 1. Artificial intelligence and the modern productivity paradox: a clash of expectations and statistics. University of Chicago Press; 2019.
Brown S. The Innovation Ultimatum: How six strategic technologies will reshape every business in the 2020s. Wiley; 2020.
Dinov I, Siegrist K, Pearl DK, Kalinin A, Christou N. Probability Distributome: a web computational infrastructure for exploring the properties, interrelations, and applications of probability distributions. Comput Stat. 2015;594:1–19.
Leemis LM, McQueston JT. Univariate distribution relationships. Am Stat. 2008;62(1):45–53.
Al-Aziz J, Christou N, Dinov I. SOCR motion charts: an efficient, open-source, interactive and dynamic applet for visualizing longitudinal multivariate data. JSE. 2010;18(3):1–29.
Zhou H, Li L, Zhu H. Tensor regression with applications in neuroimaging data analysis. J Am Stat Assoc. 2013;108(502):540–52.
Clark JS. Models for ecological data. Princeton University Press; 2020.
Drezner Z, Farnum N. A generalized binomial distribution. Commun Stat-Theory Methods. 1993;22(11):3051–63.
Dinov I, Christou N, Gould R. Law of large numbers: the theory, applications and technology-based education. J Stat Educ. 2009;17(1):1–15.
Dinov I, Christou N, Sanchez J. Central limit theorem: new SOCR applet and demonstration activity. J Stat Educ. 2008;16(2):1–12.
Keller JB. A characterization of the Poisson distribution and the probability of winning a game. Am Stat. 1994;48(4):294–8.
Montgomery DC, Runger GC. Applied statistics and probability for engineers. Wiley; 2010.
De Moivre A. The doctrine of chances: or, A method of calculating the probability of events in play. 1718; W. Pearson.
Bernoulli J. Ars coniectandi. 1713: Impensis Thurnisiorum, fratrum.
Edwards A. The meaning of binomial distribution. Nature. 1960;186(4730):1074–1074.
Poisson SD. Traité de mécanique, vol. 2. Société belge de librairie; 1838.
Clarke R. An application of the Poisson distribution. J Inst Actuar. 1946;72(3):481–481.
Gómez-Déniz E. Another generalization of the geometric distribution. TEST. 2010;19(2):399–415.
Lotka AJ. Elements of physical biology. Williams & Wilkins; 1925.
Volterra V. Fluctuations in the abundance of a species considered mathematically 1. Nature Publishing Group; 1926.
Balakrishna N, Lai CD. Distributions expressed as copulas, in continuous bivariate distributions: second edition. New York: Springer New York; 2009. p. 67–103.
Trivedi PK, Zimmer DM. Copula modeling: an introduction for practitioners. Now Publishers Inc.; 2007.
Tsukahara H. Semiparametric estimation in copula models. Can J Stat. 2005;33(3):357–75.
Choroś B, Ibragimov R, Permiakova E. Copula estimation, in Copula theory and its applications. Springer; 2010. p. 77–91.
Sklar M. Fonctions de repartition an dimensions et leurs marges. Publ inst statist univ Paris. 1959;8:229–31.
Durante F, Fernandez-Sanchez J, Sempi C. A topological proof of Sklar’s theorem. Appl Math Lett. 2013;26(9):945–8.
Kolesárová A, Mesiar R, Saminger-Platz S. Generalized Farlie-Gumbel-Morgenstern copulas. Cham: Springer International Publishing; 2018.
Arfken GB, Weber HJ, Harris FE. Chapter 14 - Bessel Functions. In: Arfken GB, Weber HJ, Harris FE, editors. Mathematical Methods for Physicists (Seventh Edition). Boston: Academic Press; 2013. p. 643–713.
Abramowitz M, Stegun IA. Modified Bessel functions I and K. Handbook of mathematical functions with formulas, graphs, and mathematical tables, 9th printing, 1972: p. 374–377.
Dinov ID, Velev MV. Data science: time complexity, inferential uncertainty, and spacekime analytics. 1 ed. STEM Series. 2021; Berlin/Boston: De Gruyter, ISBN 9783110697803.
Xue-Kun Song P. Multivariate dispersion models generated from Gaussian copula. Scand J Stat. 2000;27(2):305–20.
Pitt M, Chan D, Kohn R. Efficient Bayesian inference for Gaussian copula regression models. Biometrika. 2006;93(3):537–54.
Strecok AJ. On the calculation of the Inverse of the Error Function. Math Comput. 1968;22(101):144–58.
Masarotto G, Varin C. Gaussian copula regression in R. J Stat Softw. 2017;77(8):1–26.
Arbenz P. Bayesian copulae distributions, with application to operational risk management—some comments. Methodol Comput Appl Probab. 2013;15(1):105–8.
Andersen L, Sidenius J. Extensions to the Gaussian copula: Random recovery and random factor loadings. J Credit Risk. 2004;1(1):05.
Holst E, Jorgensen K, Natalski I. The bivariate normal distribution. Copenhagen: AMI, National Institute of Occupational Health; 1999.
Rose C, Smith MD. Random [Title]: manipulating probability density functions. Comput Econ Finance Model Anal Math. 1996;2:416.
SOCR. SOCR Randomization and Resampling Inference Framework: Technical Documentation. 2014; http://wiki.stat.ucla.edu/socr/index.php/SOCR_ResamplingSimulation_Docs. Accessed 2 June 2022.
Dinov I, Christou N. Statistics Online Computational Resource for Education. Teach Stat. 2009;31(2):49–51.
Chu A, Cui J, Dinov I. SOCR analyses: implementation and demonstration of a new graphical statistics educational toolkit. JSS. 2009;30(3):1–19.
Stathopoulos V, Girolami MA. Markov chain Monte Carlo inference for Markov jump processes via the linear noise approximation. Philos Trans R Soc A Math Phys Eng Sci. 2013;371(1984):20110541.
Mooney CZ. Monte Carlo simulation, vol. 116. Sage Publications, Incorporated; 1997.
Trinh G, Genz A. Bivariate conditioning approximations for multivariate normal probabilities. Stat Comput. 2015;25(5):989–96.
Botev ZI. The normal law under linear restrictions: simulation and estimation via minimax tilting. J R Stat Soc Ser B (Statistical Methodology). 2017;79(1):125–48.
Wang M, Kennedy W. A numerical method for accurately approximating multivariate normal probabilities. Comput Stat Data Anal. 1992;13(2):197–210.
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. (2018) Available online at https://www.R-project.org/.
Dinov I. Data science and predictive analytics: biomedical and health applications using R. computer science. Springer International Publishing; 2018. p. 800.
Van der Walt S, et al. scikit-image: image processing in Python. PeerJ. 2014;2: e453.
Danaher PJ, Smith MS. Modeling multivariate distributions using copulas: applications in marketing. Mark Sci. 2011;30(1):4–21.
Joe H. Dependence modeling with copulas. CRC Press; 2014.
Zhang Y, et al. Reliability analysis with consideration of asymmetrically dependent variables: discussion and application to geotechnical examples. Reliab Eng Syst Saf. 2019;185:261–77.
Schoelzel C, Friederichs P. Multivariate non-normally distributed random variables in climate research–introduction to the copula approach. Nonlinear Process Geophys. 2008;15(5):761–72.
Chen X, Fan Y. Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. J Economet. 2006;135(1–2):125–54.
Mai JF, Scherer, M. Simulating copulas: stochastic models, sampling algorithms, and applications, World Scientific Publishing, Series on Quantitative Finance. 2017;6. ISBN 9813149264.
Patton AJ. A review of copula models for economic time series. J Multivar Anal. 2012;110:4–18.
Hafner CM, Manner H. Dynamic stochastic copula models: estimation, inference and applications. J Appl Economet. 2012;27(2):269–95.
Schmidt T. Coping with Copulas in “Copulas: From Theory to Applications in Finance”. London: Risk Books. (2006).
Jaworski P, et al. Copula theory and its applications, vol. 198. Berlin: Springer; 2010.
Renard B, Lang M. Use of a Gaussian copula for multivariate extreme value analysis: some case studies in hydrology. Adv Water Resour. 2007;30(4):897–912.
Frees EW, Valdez EA. Understanding relationships using copulas. N Am Actuar J. 1998;2(1):1–25.
Embrechts P. Copulas: a personal view. J Risk Insur. 2009;76(3):639–50.
de la Peña VH, Ibragimov R, Sharakhmetov S. Characterizations of joint distributions, copulas, information, dependence and decoupling, with applications to time series. In: Optimality. Institute of Mathematical Statistics; 2006. p. 183–209.
Jiryaie F, et al. Gaussian copula distributions for mixed data, with application in discrimination. J Stat Comput Simul. 2016;86(9):1643–59.
Favre AC, et al. Multivariate hydrological frequency analysis using copulas. Water Resour Res. 2004;40(W01101):1-12.
Wilks DS. Multivariate ensemble model output statistics using empirical copulas. Q J R Meteorol Soc. 2015;141(688):945–52.
Wang F, Li H. The role of copulas in random fields: characterization and application. Struct Saf. 2018;75:75–88.
Inouye DI, et al. A review of multivariate distributions for count data derived from the Poisson distribution. Wiley Interdiscipl Rev Comput Stat. 2017;9(3): e1398.
Mair P, Satorra A, Bentler PM. Generating nonnormal multivariate data using copulas: applications to SEM. Multivar Behav Res. 2012;47(4):547–65.
Durante F, Sánchez JF, Sempi C. Multivariate patchwork copulas: a unified approach with applications to partial comonotonicity. Insur Math Econ. 2013;53(3):897–905.
Acknowledgements
Partial support for this work was provided by NSF grants 1916425, 1734853, 1636840, 1416953, 0716055 and 1023115, NIH grants P20 NR015331, UL1 TR002240, R01 CA233487, R01 MH121079, R01 MH126137, T32 GM141746. The funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. Colleagues at the University of Michigan Statistics Online Computational Resource (SOCR) and the Michigan Institute for Data Science (MIDAS) contributed ideas, infrastructure, and support for the project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
Not applicable.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Research involving human participants and/or animals
Not applicable.
Informed consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Bobrovnikov, M., Chai, J.T. & Dinov, I.D. Interactive Visualization and Computation of 2D and 3D Probability Distributions. SN COMPUT. SCI. 3, 327 (2022). https://doi.org/10.1007/s42979-022-01206-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01206-w