Skip to main content
Log in

Applying Multivariate Discrete Distributions to Genetically Informative Count Data

  • Original Research
  • Published:
Behavior Genetics Aims and scope Submit manuscript

Abstract

We present a novel method of conducting biometric analysis of twin data when the phenotypes are integer-valued counts, which often show an L-shaped distribution. Monte Carlo simulation is used to compare five likelihood-based approaches to modeling: our multivariate discrete method, when its distributional assumptions are correct, when they are incorrect, and three other methods in common use. With data simulated from a skewed discrete distribution, recovery of twin correlations and proportions of additive genetic and common environment variance was generally poor for the Normal, Lognormal and Ordinal models, but good for the two discrete models. Sex-separate applications to substance-use data from twins in the Minnesota Twin Family Study showed superior performance of two discrete models. The new methods are implemented using R and OpenMx and are freely available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. This is not the same as the bivariate negative binomial distribution of Kocherlakota andKocherlakota (1992), which is actually a special case of the negative multinomial distribution (Johnson et al. 1997).

  2. We use “monophenotype twin model” to refer to what behavior geneticists commonly refer to as a “univariate twin model.” The latter terminology is rather unfortunate. The independent unit of analysis is the twin pair, and thus, a sample of twin data on a single phenotype is a sample of realizations of random 2-vectors, that is, from a bivariate distribution.

  3. If, say, people’s latent tendency to drink were normally distributed in the population, and number of drinks each day were conditionally Poisson, then daily number of drinks would follow a Hermite distribution in the population (Kemp and Kemp, 1966). The present paper does not consider the Hermite distribution, as it would be most appropriate for variables with multimodal distributions (Johnson et al. 2005; Giles 2010).

  4. We considered basing our quadratic loss metric on \({\mathbf{y}}_{i}\)’s Mahalanobis distance from \({\hat{\mathbf{y}}}_{i}\). But, if the predicted variances are systematically too large, the sum of squared Mahalanobis distances will be smaller than if the variances were accurately predicted—not a desirable property for a loss metric.

References

  • Atkins DC, Gallop RJ (2007) Rethinking how family researchers model infrequent outcomes: a tutorial on count regression and zero-inflated models. J Fam Psychol 21(4):726–735

    Article  PubMed  Google Scholar 

  • Balakrishnan N, Lai C-D (2009) Continuous bivariate distributions, 2nd edn. Springer, New York

    Google Scholar 

  • Barton DE (1957) The modality of Neyman’s contagious distribution of Type A. Trabajos de Estadística 8:13–22

    Article  Google Scholar 

  • Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T e al. (2011) OpenMx: An open source extended structural equation modeling framework. Psychometrika 76(2):306–317. doi: 10.1007/S11336-010-9200-6. Software and documentation available at http://openmx.psyc.virginia.edu/

  • Cameron AC, Trivedi PK (1986) Econometric models based on count data: comparisons and applications of some estimators and tests. J Appl Econom 1(1):29–53

    Article  Google Scholar 

  • Consul PC (1989) Generalized poisson distributions: properties and applications. Marcel Dekker Inc., New York

    Google Scholar 

  • Consul PC, Famoye F (2006) Lagrangian probability distributions. Birkhäuser, Boston

    Google Scholar 

  • Famoye F (2010) A new bivariate generalized Poisson distribution. Stat Neerl 64(1):112–124. doi:10.1111/j.1467-9574.2009.00446.x

    Article  Google Scholar 

  • Famoye F, Consul PC (1995) Bivariate generalized Poisson distribution with some applications. Metrika 42:127–138

    Article  Google Scholar 

  • Forbes C, Evans M, Hastings N, Peacock B (2011) Statistical distributions, 4th edn. Wiley, Hoboken

    Google Scholar 

  • Genest C, Favre A-C (2007) Everything you always wanted to know about copula modeling but were afraid to ask. J Hydrol Eng 12(4):347–368

    Article  Google Scholar 

  • Genz A, Bretz F (2009) Computation of multivariate normal and t probabilities. Springer, Heidelberg. Software and documentation available at http://cran.r-project.org/web/packages/mvtnorm/index.html

  • Genest C, Nešlehová J (2007) A primer on copulas for count data. Astin Bulletin 37(2):475–515

    Article  Google Scholar 

  • Giles DE (2010) Hermite regression analysis of multi-modal count data. Econ Bull 30(4):2936–2945

    Google Scholar 

  • Good IJ (1960) Generalizations to several variables of Lagrange’s expansion, with applications to stochastic processes. Math Proc Cambridge Philos Soc 56:367–380. doi:10.1017/S0305004100034666

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York. doi: 10.1007/b94608

    Book  Google Scholar 

  • Holgate P (1964) Estimation for the bivariate Poisson distribution. Biometrika 51:241–245

    Article  Google Scholar 

  • Iacono WG, Carlson SR, Taylor J, Elkins IJ, McGue M (1999) Behavioral disinhibition and the development of substance-use disorders: findings from the Minnesota Twin Family Study. Dev Psychopathol 11:869–900

    Article  PubMed  Google Scholar 

  • Iacono WG, McGue M (2002) Minnesota Twin Family Study. Twin Res 5(5):482–487

    Article  PubMed  Google Scholar 

  • Johnson NL, Kemp AW, Kotz S (2005) Univariate discrete distributions, 3rd edn. Wiley, Hoboken

    Book  Google Scholar 

  • Johnson NL, Kotz S, Balakrishnan N (1997) Discrete multivariate distributions. Wiley, New York

    Google Scholar 

  • Kemp AW, Kemp CD (1966) An alternative derivation of the Hermite distribution. Biometrika 53:627–628

    Article  Google Scholar 

  • Keyes MA, Malone SM, Elkins IJ, Legrand LN, McGue M, Iacono WG (2009) The Enrichment Study of the Minnesota Twin Family Study: increasing the yield of twin families at high risk for externalizing psychopathology. Twin Res Human Gen 12(5):489–501

    Article  Google Scholar 

  • Kirkpatrick RM (2014) RMKdiscrete (Version 0.1). Software and documentation available at http://cran.r-project.org/web/packages/RMKdiscrete/

  • Kocherlakota S, Kocherlakota K (1992) Bivariate discrete distributions. Marcel Dekker Inc, New York

    Google Scholar 

  • Kocherlakota S, Kocherlakota K (2001) Regression in the bivariate Poisson distribution. Commun Stat 30(5):815–825. doi:10.1081/STA-100002259

    Article  Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  Google Scholar 

  • Lakshminarayana J, Pandit SNN, Rao KS (1999) On a bivariate Poisson distribution. Commun Stat 28(2):267–276. doi:10.1080/03610929908832297

    Article  Google Scholar 

  • Lee A (1999) Modelling rugby league data via bivariate negative binomial regression. Aust NZ J Stat 41(2):141–152

    Article  Google Scholar 

  • Lehmann EL (1999) Elements of large-sample theory. Springer, New York

    Book  Google Scholar 

  • Li C-S, Lu J-C, Park J, Kim K, Brinkley PA, Peterson JP (1999) Multivariate zero-inflated Poisson models and their applications. Technometrics 41(1):29–38

    Article  Google Scholar 

  • McGue M, Bouchard TJ (1984) Adjustment of twin data for the effects of age and sex. Behav Genet 14(4):325–343

    Article  PubMed  Google Scholar 

  • Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New York

    Google Scholar 

  • Nikoloulopoulos AK, Karlis D (2009) Finite normal mixture copulas for multivariate discrete data modeling. J Stat Plan Inference 139:3878–3890. doi:10.1016/j.jspi.2009.05.034

    Article  Google Scholar 

  • R Core Team (2013). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/. [computer software]

  • Teicher H (1954) On the multivariate Poisson distribution. Scand Actuar J 37:1–9

    Article  Google Scholar 

  • Warton DI (2005) Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data. Environmetrics 16:275–289. doi:10.1002/env.702

    Article  Google Scholar 

  • Wu H, Neale MC (2012) Adjusted confidence intervals for a bounded parameter. Behav Genet 42:886–898

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

The authors were supported by U.S. Public Health Service grant DA026119. William G. Iacono and Matt McGue provided the MTFS dataset, which was supported by U.S. Public Health Service Grants DA05147, AA009367, and DA013240. The first author gives his special thanks to Matt McGue, Niels G. Waller, and Hermine H. Maes for their comments on drafts of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert M. Kirkpatrick.

Ethics declarations

Conflict of Interest

Robert M. Kirkpatrick and Michael C. Neale declare that they have no conflict of interest.

Human and animal rights and informed consent

The MTFS was reviewed and approved by the Institutional Review Board at the University of Minnesota. Written informed assent or consent was obtained from all participants, with parents providing written consent for their minor children.

Additional information

Edited by Gitta Lubke.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Online Resource 1: Supplementary Appendices. Supplementary material 1 (pdf 150 kb)

10519_2015_9757_MOESM2_ESM.zip

Online Resource 2: 3 text files: a README file for the other two, an example R script from the Monte Carlo simulation, and an R script for producing graphs and summary statistics from the raw simulation data (read in over the web). Supplementary material 2 (zip 15 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kirkpatrick, R.M., Neale, M.C. Applying Multivariate Discrete Distributions to Genetically Informative Count Data. Behav Genet 46, 252–268 (2016). https://doi.org/10.1007/s10519-015-9757-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10519-015-9757-z

Keywords

Navigation