Advertisement

Behavior Genetics

, Volume 46, Issue 2, pp 252–268 | Cite as

Applying Multivariate Discrete Distributions to Genetically Informative Count Data

  • Robert M. Kirkpatrick
  • Michael C. Neale
Original Research

Abstract

We present a novel method of conducting biometric analysis of twin data when the phenotypes are integer-valued counts, which often show an L-shaped distribution. Monte Carlo simulation is used to compare five likelihood-based approaches to modeling: our multivariate discrete method, when its distributional assumptions are correct, when they are incorrect, and three other methods in common use. With data simulated from a skewed discrete distribution, recovery of twin correlations and proportions of additive genetic and common environment variance was generally poor for the Normal, Lognormal and Ordinal models, but good for the two discrete models. Sex-separate applications to substance-use data from twins in the Minnesota Twin Family Study showed superior performance of two discrete models. The new methods are implemented using R and OpenMx and are freely available.

Keywords

Count variables Twin study Biometric variance components Multivariate discrete distributions Substance use Lagrangian probability distributions 

Notes

Acknowledgments

The authors were supported by U.S. Public Health Service grant DA026119. William G. Iacono and Matt McGue provided the MTFS dataset, which was supported by U.S. Public Health Service Grants DA05147, AA009367, and DA013240. The first author gives his special thanks to Matt McGue, Niels G. Waller, and Hermine H. Maes for their comments on drafts of the paper.

Compliance with Ethical Standards

Conflict of Interest

Robert M. Kirkpatrick and Michael C. Neale declare that they have no conflict of interest.

Human and animal rights and informed consent

The MTFS was reviewed and approved by the Institutional Review Board at the University of Minnesota. Written informed assent or consent was obtained from all participants, with parents providing written consent for their minor children.

Supplementary material

10519_2015_9757_MOESM1_ESM.pdf (150 kb)
Online Resource 1: Supplementary Appendices. Supplementary material 1 (pdf 150 kb)
10519_2015_9757_MOESM2_ESM.zip (16 kb)
Online Resource 2: 3 text files: a README file for the other two, an example R script from the Monte Carlo simulation, and an R script for producing graphs and summary statistics from the raw simulation data (read in over the web). Supplementary material 2 (zip 15 kb)

References

  1. Atkins DC, Gallop RJ (2007) Rethinking how family researchers model infrequent outcomes: a tutorial on count regression and zero-inflated models. J Fam Psychol 21(4):726–735CrossRefPubMedGoogle Scholar
  2. Balakrishnan N, Lai C-D (2009) Continuous bivariate distributions, 2nd edn. Springer, New YorkGoogle Scholar
  3. Barton DE (1957) The modality of Neyman’s contagious distribution of Type A. Trabajos de Estadística 8:13–22CrossRefGoogle Scholar
  4. Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T e al. (2011) OpenMx: An open source extended structural equation modeling framework. Psychometrika 76(2):306–317. doi:  10.1007/S11336-010-9200-6. Software and documentation available at http://openmx.psyc.virginia.edu/
  5. Cameron AC, Trivedi PK (1986) Econometric models based on count data: comparisons and applications of some estimators and tests. J Appl Econom 1(1):29–53CrossRefGoogle Scholar
  6. Consul PC (1989) Generalized poisson distributions: properties and applications. Marcel Dekker Inc., New YorkGoogle Scholar
  7. Consul PC, Famoye F (2006) Lagrangian probability distributions. Birkhäuser, BostonGoogle Scholar
  8. Famoye F (2010) A new bivariate generalized Poisson distribution. Stat Neerl 64(1):112–124. doi: 10.1111/j.1467-9574.2009.00446.x CrossRefGoogle Scholar
  9. Famoye F, Consul PC (1995) Bivariate generalized Poisson distribution with some applications. Metrika 42:127–138CrossRefGoogle Scholar
  10. Forbes C, Evans M, Hastings N, Peacock B (2011) Statistical distributions, 4th edn. Wiley, HobokenGoogle Scholar
  11. Genest C, Favre A-C (2007) Everything you always wanted to know about copula modeling but were afraid to ask. J Hydrol Eng 12(4):347–368CrossRefGoogle Scholar
  12. Genz A, Bretz F (2009) Computation of multivariate normal and t probabilities. Springer, Heidelberg. Software and documentation available at http://cran.r-project.org/web/packages/mvtnorm/index.html
  13. Genest C, Nešlehová J (2007) A primer on copulas for count data. Astin Bulletin 37(2):475–515CrossRefGoogle Scholar
  14. Giles DE (2010) Hermite regression analysis of multi-modal count data. Econ Bull 30(4):2936–2945Google Scholar
  15. Good IJ (1960) Generalizations to several variables of Lagrange’s expansion, with applications to stochastic processes. Math Proc Cambridge Philos Soc 56:367–380. doi: 10.1017/S0305004100034666 CrossRefGoogle Scholar
  16. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York. doi:  10.1007/b94608 CrossRefGoogle Scholar
  17. Holgate P (1964) Estimation for the bivariate Poisson distribution. Biometrika 51:241–245CrossRefGoogle Scholar
  18. Iacono WG, Carlson SR, Taylor J, Elkins IJ, McGue M (1999) Behavioral disinhibition and the development of substance-use disorders: findings from the Minnesota Twin Family Study. Dev Psychopathol 11:869–900CrossRefPubMedGoogle Scholar
  19. Iacono WG, McGue M (2002) Minnesota Twin Family Study. Twin Res 5(5):482–487CrossRefPubMedGoogle Scholar
  20. Johnson NL, Kemp AW, Kotz S (2005) Univariate discrete distributions, 3rd edn. Wiley, HobokenCrossRefGoogle Scholar
  21. Johnson NL, Kotz S, Balakrishnan N (1997) Discrete multivariate distributions. Wiley, New YorkGoogle Scholar
  22. Kemp AW, Kemp CD (1966) An alternative derivation of the Hermite distribution. Biometrika 53:627–628CrossRefGoogle Scholar
  23. Keyes MA, Malone SM, Elkins IJ, Legrand LN, McGue M, Iacono WG (2009) The Enrichment Study of the Minnesota Twin Family Study: increasing the yield of twin families at high risk for externalizing psychopathology. Twin Res Human Gen 12(5):489–501CrossRefGoogle Scholar
  24. Kirkpatrick RM (2014) RMKdiscrete (Version 0.1). Software and documentation available at http://cran.r-project.org/web/packages/RMKdiscrete/
  25. Kocherlakota S, Kocherlakota K (1992) Bivariate discrete distributions. Marcel Dekker Inc, New YorkGoogle Scholar
  26. Kocherlakota S, Kocherlakota K (2001) Regression in the bivariate Poisson distribution. Commun Stat 30(5):815–825. doi: 10.1081/STA-100002259 CrossRefGoogle Scholar
  27. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86CrossRefGoogle Scholar
  28. Lakshminarayana J, Pandit SNN, Rao KS (1999) On a bivariate Poisson distribution. Commun Stat 28(2):267–276. doi: 10.1080/03610929908832297 CrossRefGoogle Scholar
  29. Lee A (1999) Modelling rugby league data via bivariate negative binomial regression. Aust NZ J Stat 41(2):141–152CrossRefGoogle Scholar
  30. Lehmann EL (1999) Elements of large-sample theory. Springer, New YorkCrossRefGoogle Scholar
  31. Li C-S, Lu J-C, Park J, Kim K, Brinkley PA, Peterson JP (1999) Multivariate zero-inflated Poisson models and their applications. Technometrics 41(1):29–38CrossRefGoogle Scholar
  32. McGue M, Bouchard TJ (1984) Adjustment of twin data for the effects of age and sex. Behav Genet 14(4):325–343CrossRefPubMedGoogle Scholar
  33. Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New YorkGoogle Scholar
  34. Nikoloulopoulos AK, Karlis D (2009) Finite normal mixture copulas for multivariate discrete data modeling. J Stat Plan Inference 139:3878–3890. doi: 10.1016/j.jspi.2009.05.034 CrossRefGoogle Scholar
  35. R Core Team (2013). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/. [computer software]
  36. Teicher H (1954) On the multivariate Poisson distribution. Scand Actuar J 37:1–9CrossRefGoogle Scholar
  37. Warton DI (2005) Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data. Environmetrics 16:275–289. doi: 10.1002/env.702 CrossRefGoogle Scholar
  38. Wu H, Neale MC (2012) Adjusted confidence intervals for a bounded parameter. Behav Genet 42:886–898PubMedCentralCrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Virginia Institute for Psychiatric & Behavioral GeneticsVirginia Commonwealth UniversityRichmondUSA

Personalised recommendations