Computational Statistics

, Volume 31, Issue 2, pp 559–577

Probability Distributome: a web computational infrastructure for exploring the properties, interrelations, and applications of probability distributions

  • Ivo D. Dinov
  • Kyle Siegrist
  • Dennis K. Pearl
  • Alexandr Kalinin
  • Nicolas Christou
Original Paper

Abstract

Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena. There are uncountably many probability distributions. However, a few dozen families of distributions are commonly defined and are frequently used in practice for problem solving, experimental applications, and theoretical studies. In this paper, we present a new computational and graphical infrastructure, the Distributome, which facilitates the discovery, exploration and application of diverse spectra of probability distributions. The extensible Distributome infrastructure provides interfaces for (human and machine) traversal, search, and navigation of all common probability distributions. It also enables distribution modeling, applications, investigation of inter-distribution relations, as well as their analytical representations and computational utilization. The entire Distributome framework is designed and implemented as an open-source, community-built, and Internet-accessible infrastructure. It is portable, extensible and compatible with HTML5 and Web2.0 standards (http://Distributome.org). We demonstrate two types of applications of the probability Distributome resources: computational research and science education. The Distributome tools may be employed to address five complementary computational modeling applications (simulation, data analysis and inference, model-fitting, examination of the analytical, mathematical and computational properties of specific probability distributions, and exploration of the inter-distributional relations). Many high school and college science, technology, engineering and mathematics (STEM) courses may be enriched by the use of modern pedagogical approaches and technology-enhanced methods. The Distributome resources provide enhancements for blended STEM education by improving student motivation, augmenting the classical curriculum with interactive webapps, and overhauling the learning assessment protocols.

Keywords

Probability distributions Models Graphical user interface Transformations Applications Inference Distributome 

References

  1. Abrahams M-R et al (2009) Quantitating the multiplicity of infection with human immunodeficiency virus type 1 subtype C reveals a non-poisson distribution of transmitted variants. J Virol 83(8):3556–3567CrossRefGoogle Scholar
  2. Allen PR (2005) The substellar mass function: a Bayesian approach. Astrophys J 625:385–397CrossRefGoogle Scholar
  3. Ambrose PG, Grasela DM (2000) The use of Monte Carlo simulation to examine pharmacodynamic variance of drugs: fluoroquinolone pharmacodynamics against Streptococcus pneumoniae. Diagn Microbiol Infect Dis 38(3):151–157CrossRefGoogle Scholar
  4. Babuka I, Nobile F, Tempone R (2007) Reliability of computational science. Numer Methods Partial Differ Equ 23(4):753–784MathSciNetCrossRefMATHGoogle Scholar
  5. Balakrishnan N, Basu AP (1995) The exponential distribution: theory, methods and applications. CRC Press, Boca RatonMATHGoogle Scholar
  6. Binder K, Heermann DW (2010) Monte Carlo simulation in statistical physics: an introduction, vol 80. Springer, BerlinCrossRefMATHGoogle Scholar
  7. Chakak A, Koehler K (1995) A strategy for constructing multivariate distributions. Commun Stat Simul Comput 24(3):537–550MathSciNetCrossRefMATHGoogle Scholar
  8. Consortium for the Advancement of Undergraduate Statistics Education (CAUSE) (2013). Available from: www.causeweb.org
  9. Couto P (2003) Assessing the accuracy of spatial simulation models. Ecol Model 167(1–2):181–198CrossRefGoogle Scholar
  10. Cramer H (2004) Random variables and probability distributions. Cambridge University Press, CambridgeMATHGoogle Scholar
  11. Dinov I (2006) SOCR: statistics online computational resource. J Stat Softw 16(1):1–16Google Scholar
  12. Dinov I (2006) Statistics online computational resource. J Stat Softw 16(1):1–16Google Scholar
  13. Dinov I, Christou N, Sanchez J (2008) Central limit theorem: new SOCR applet and demonstration activity. J Stat Educ 16(2):1–12Google Scholar
  14. Dobyns WB et al (2004) Inheritance of most X-linked traits is not dominant or recessive, just X-linked. Am J Med Genet A 129(2):136–143CrossRefGoogle Scholar
  15. Dvison A, Hinkley DV, Schechtman E (1986) Efficient bootstrap simulation. Biometrika 73(3):555–566MathSciNetCrossRefMATHGoogle Scholar
  16. Eberhard OV (1992) The S-distribution a tool for approximation and classification of univariate. Unimodal Prob Distrib Biometrical J 34(7):855–878MATHGoogle Scholar
  17. Eddelbuettel D, François R (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40(8):1–18CrossRefGoogle Scholar
  18. Etienne RS, Olff H (2005) Confronting different models of community structure to species-abundance data: a Bayesian model comparison. Ecol Lett 8(5):493–504CrossRefGoogle Scholar
  19. Ferguson TS (1996) A course in large sample theory. Chapman and Hall, LondonCrossRefMATHGoogle Scholar
  20. Forbes C et al (2011) Statistical distributions. Wiley Online Library, HobokenMATHGoogle Scholar
  21. Frank SA, Smith E (2011) A simple derivation and classification of common probability distributions based on information symmetry and measurement scale. J Evol Biol 24(3):469–484CrossRefGoogle Scholar
  22. Freedman D et al (2005) Model-based segmentation of medical imagery by matching distributions. Med Imaging IEEE Trans 24(3):281–292CrossRefGoogle Scholar
  23. Galvão RD, Chiyoshi FY, Morabito R (2005) Towards unified formulations and extensions of two classical probabilistic location models. Comput Oper Res 32(1):15–33MathSciNetCrossRefMATHGoogle Scholar
  24. Gardiner CW (2009) Stochastic methods. Springer, BerlinMATHGoogle Scholar
  25. Gelman A et al (2010) Handbook of Markov chain Monte Carlo: methods and applications. Chapman & Hall/CRC, LondonGoogle Scholar
  26. Giot L et al (2003) A protein interaction map of Drosophila melanogaster. Science 302(5651):1727–1736CrossRefGoogle Scholar
  27. Gokhale S, Khare M (2007) Statistical behavior of carbon monoxide from vehicular exhausts in urban environments. Environ Model Softw 22(4):526–535CrossRefGoogle Scholar
  28. Guisan A, Edwards TC, Hastie T (2002) Generalized linear and generalized additive models in studies of species distributions: setting the scene. Ecol Model 157(2–3):89–100CrossRefGoogle Scholar
  29. Jackwerth JC, Rubinstein M (1996) Recovering probability distributions from option prices. J Finance 51(5):1611–1631CrossRefGoogle Scholar
  30. Jara A et al (2011) DPpackage: Bayesian non-and semi-parametric modelling in R. J Stat Softw 40(5):1MathSciNetCrossRefGoogle Scholar
  31. Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, vol 2. Wiley, New YorkMATHGoogle Scholar
  32. Jones MC (2004) Families of distributions arising from distributions of order statistics (with discussion). TEST 13:1–43MathSciNetCrossRefMATHGoogle Scholar
  33. Kelton WD, Law AM (2000) Simulation modeling and analysis. McGraw Hill, BostonMATHGoogle Scholar
  34. Kittur A, Chi EH, Suh B (2009) What’s in Wikipedia? Mapping topics and conflict using socially annotated category structure. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACMGoogle Scholar
  35. Kogan V, Rind T (2011) Determining critical power equipment inventory using extreme value approach and an auxiliary Poisson model. Comput Ind Eng 60(1):25–33CrossRefGoogle Scholar
  36. Lappin G, Temple S (2006) Radiotracers in drug development. CRC/Taylor & Francis, Boca RatonGoogle Scholar
  37. Le S, Josse J, Husson F (2008) FactoMineR: an R package for multivariate analysis. J Stat Softw 25(1):1–18CrossRefGoogle Scholar
  38. Lee K-I et al (2012) Variation in stress resistance patterns among stx genotypes and genetic lineages of shiga toxin-producing Escherichia coli O157. Appl Environ Microbiol 78(9):3361–3368CrossRefGoogle Scholar
  39. Leemis LM, McQueston JT (2008) Univariate distribution relationships. Am stat 62:45–53MathSciNetCrossRefGoogle Scholar
  40. Leo WR (1994) Techniques for nuclear and particle physics experiments: a how-to approach. Springer, BerlinCrossRefGoogle Scholar
  41. Lou S-J et al (2011) The impact of problem-based learning strategies on STEM knowledge integration and attitudes: an exploratory study among female Taiwanese senior high school students. Int J Technol Des Educ 21(2):195–215CrossRefGoogle Scholar
  42. Manders KL (1986) What numbers are real? In: PSA: proceedings of the biennial meeting of the Philosophy of Science Association, 1986, pp 253–269Google Scholar
  43. Milne D, Witten IH (2012) An open-source toolkit for mining Wikipedia. Artif Intell. 194:222–239. http://www.sciencedirect.com/science/article/pii/S000437021200077X
  44. Mooney CZ (1997) Monte carlo simulation, vol 116. Sage, California IncorporatedCrossRefMATHGoogle Scholar
  45. Musa JD, Okumoto K (1984) A logarithmic Poisson execution time model for software reliability measurement. In: Proceedings of the 7th international conference on Software engineering. IEEE PressGoogle Scholar
  46. Nadarajah S (2007) Statistical distributions of potential interest in ultrasound speckle analysis. Phys Med Biol 52:N213–N227CrossRefGoogle Scholar
  47. Newman MEJ (2002) Assortative mixing in networks. Phys Rev Lett 89(20):208701CrossRefGoogle Scholar
  48. Nichols TE et al (2002) Spatiotemporal reconstruction of list-mode PET data. Med Imaging IEEE Trans 21(4):396–404MathSciNetCrossRefGoogle Scholar
  49. Panfilo G, Tavella P, Zucca C, (2004) Stochastic processes for modelling and evaluating atomic click behavious. In: Ciarlini P, Cox MG, Pavese FG (eds) Advanced mathematical & computational tools in metrology VIGoogle Scholar
  50. Plerou V et al (1999) Scaling of the distribution of price fluctuations of individual companies. Phys Rev E 60(6):6519CrossRefGoogle Scholar
  51. Qiao F, Yang H, Lam WHK (2001) Intelligent simulation and prediction of traffic flow dispersion. Transp Res B Methodol 35(9):843–863CrossRefGoogle Scholar
  52. Ramírez P, Carta JA (2005) Influence of the data sampling interval in the estimation of the parameters of the Weibull wind speed probability density distribution: a case study. Energy Convers Manag 46(15–16):2419–2438CrossRefGoogle Scholar
  53. Ripley BD (2009) Stochastic simulation, vol 316. Wiley, New YorkMATHGoogle Scholar
  54. Rubinstein RY, Kroese DP (2011) Simulation and the Monte Carlo method, vol 707. Wiley, New YorkMATHGoogle Scholar
  55. Rule G, Bajzek D, Kessler A (2010) Molecular visualization in STEM education: leveraging Jmol in an integrated assessment platform. In: World conference on E-learning in corporate, government, healthcare, and higher educationGoogle Scholar
  56. Sarovar M et al (2004) Practical scheme for error control using feedback. Phys Rev A 69(5):052324CrossRefGoogle Scholar
  57. Siegrist K (2004) The probability/statistics object library. J Online Math Its Appl 4:1–12Google Scholar
  58. Song WT (2005) Relationships among some univariate distributions. IIE Trans 37(7):651–656CrossRefGoogle Scholar
  59. Talamo A, Gohar Y (2008) Production of medical radioactive isotopes using KIPT electron driven subcritical facility. Appl Radiat Isot 66(5):577–586CrossRefGoogle Scholar
  60. Traboulsi EI (2012) Genetic diseases of the eye, 2nd edn. OUP, USACrossRefGoogle Scholar
  61. Train K (2009) Discrete choice methods with simulation. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  62. Uppal R, Wang T (2003) Model misspecification and underdiversification. J Finance 58(6):2465–2486CrossRefGoogle Scholar
  63. Van den Hoff J (2005) Principles of quantitative positron emission tomography. Amino Acids 29(4):341–353MathSciNetCrossRefGoogle Scholar
  64. Wald A (1949) Note on the consistency of the maximum likelihood estimate. Ann Math Stat 20(4):595–601MathSciNetCrossRefMATHGoogle Scholar
  65. Weidlich W (2003) Sociodynamics-a systematic approach to mathematical modelling in the social sciences. Chaos Solitons Fractals 18(3):431–437MathSciNetCrossRefMATHGoogle Scholar
  66. Wolfram S (1999) The MATHEMATICA\({\textregistered }\) book, version 4. Cambridge University Press, CambridgeMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Ivo D. Dinov
    • 1
    • 2
    • 3
    • 4
  • Kyle Siegrist
    • 5
  • Dennis K. Pearl
    • 6
  • Alexandr Kalinin
    • 1
    • 2
  • Nicolas Christou
    • 3
  1. 1.Statistics Online Computational Resource (SOCR)University of Michigan, UMSNAnn ArborUSA
  2. 2.Michigan Institute for Data Science (MIDAS), DCM&BUniversity of MichiganAnn ArborUSA
  3. 3.SOCR Resource, Department of StatisticsUniversity of California, Los AngelesLos AngelesUSA
  4. 4.Center for Computational BiologyUniversity of California, Los AngelesLos AngelesUSA
  5. 5.Department of Mathematical SciencesUniversity of AlabamaHuntsvilleUSA
  6. 6.Department of StatisticsPennsylvania State UniversityState CollegeUSA

Personalised recommendations