Applying Multivariate Discrete Distributions to Genetically Informative Count Data

Kirkpatrick, Robert M.; Neale, Michael C.

doi:10.1007/s10519-015-9757-z

Applying Multivariate Discrete Distributions to Genetically Informative Count Data

Original Research
Published: 24 October 2015

Volume 46, pages 252–268, (2016)
Cite this article

Behavior Genetics Aims and scope Submit manuscript

Robert M. Kirkpatrick¹ &
Michael C. Neale¹

361 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

We present a novel method of conducting biometric analysis of twin data when the phenotypes are integer-valued counts, which often show an L-shaped distribution. Monte Carlo simulation is used to compare five likelihood-based approaches to modeling: our multivariate discrete method, when its distributional assumptions are correct, when they are incorrect, and three other methods in common use. With data simulated from a skewed discrete distribution, recovery of twin correlations and proportions of additive genetic and common environment variance was generally poor for the Normal, Lognormal and Ordinal models, but good for the two discrete models. Sex-separate applications to substance-use data from twins in the Minnesota Twin Family Study showed superior performance of two discrete models. The new methods are implemented using R and OpenMx and are freely available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multivariate Generalized Linear Models for Twin and Family Data

Article 16 January 2022

Using Multimodel Inference/Model Averaging to Model Causes of Covariation Between Variables in Twins

Article 04 November 2020

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Article 17 August 2020

Notes

This is not the same as the bivariate negative binomial distribution of Kocherlakota andKocherlakota (1992), which is actually a special case of the negative multinomial distribution (Johnson et al. 1997).
We use “monophenotype twin model” to refer to what behavior geneticists commonly refer to as a “univariate twin model.” The latter terminology is rather unfortunate. The independent unit of analysis is the twin pair, and thus, a sample of twin data on a single phenotype is a sample of realizations of random 2-vectors, that is, from a bivariate distribution.
If, say, people’s latent tendency to drink were normally distributed in the population, and number of drinks each day were conditionally Poisson, then daily number of drinks would follow a Hermite distribution in the population (Kemp and Kemp, 1966). The present paper does not consider the Hermite distribution, as it would be most appropriate for variables with multimodal distributions (Johnson et al. 2005; Giles 2010).
We considered basing our quadratic loss metric on \({\mathbf{y}}_{i}\)’s Mahalanobis distance from \({\hat{\mathbf{y}}}_{i}\). But, if the predicted variances are systematically too large, the sum of squared Mahalanobis distances will be smaller than if the variances were accurately predicted—not a desirable property for a loss metric.

References

Atkins DC, Gallop RJ (2007) Rethinking how family researchers model infrequent outcomes: a tutorial on count regression and zero-inflated models. J Fam Psychol 21(4):726–735
Article PubMed Google Scholar
Balakrishnan N, Lai C-D (2009) Continuous bivariate distributions, 2nd edn. Springer, New York
Google Scholar
Barton DE (1957) The modality of Neyman’s contagious distribution of Type A. Trabajos de Estadística 8:13–22
Article Google Scholar
Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T e al. (2011) OpenMx: An open source extended structural equation modeling framework. Psychometrika 76(2):306–317. doi: 10.1007/S11336-010-9200-6. Software and documentation available at http://openmx.psyc.virginia.edu/
Cameron AC, Trivedi PK (1986) Econometric models based on count data: comparisons and applications of some estimators and tests. J Appl Econom 1(1):29–53
Article Google Scholar
Consul PC (1989) Generalized poisson distributions: properties and applications. Marcel Dekker Inc., New York
Google Scholar
Consul PC, Famoye F (2006) Lagrangian probability distributions. Birkhäuser, Boston
Google Scholar
Famoye F (2010) A new bivariate generalized Poisson distribution. Stat Neerl 64(1):112–124. doi:10.1111/j.1467-9574.2009.00446.x
Article Google Scholar
Famoye F, Consul PC (1995) Bivariate generalized Poisson distribution with some applications. Metrika 42:127–138
Article Google Scholar
Forbes C, Evans M, Hastings N, Peacock B (2011) Statistical distributions, 4th edn. Wiley, Hoboken
Google Scholar
Genest C, Favre A-C (2007) Everything you always wanted to know about copula modeling but were afraid to ask. J Hydrol Eng 12(4):347–368
Article Google Scholar
Genz A, Bretz F (2009) Computation of multivariate normal and t probabilities. Springer, Heidelberg. Software and documentation available at http://cran.r-project.org/web/packages/mvtnorm/index.html
Genest C, Nešlehová J (2007) A primer on copulas for count data. Astin Bulletin 37(2):475–515
Article Google Scholar
Giles DE (2010) Hermite regression analysis of multi-modal count data. Econ Bull 30(4):2936–2945
Google Scholar
Good IJ (1960) Generalizations to several variables of Lagrange’s expansion, with applications to stochastic processes. Math Proc Cambridge Philos Soc 56:367–380. doi:10.1017/S0305004100034666
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York. doi: 10.1007/b94608
Book Google Scholar
Holgate P (1964) Estimation for the bivariate Poisson distribution. Biometrika 51:241–245
Article Google Scholar
Iacono WG, Carlson SR, Taylor J, Elkins IJ, McGue M (1999) Behavioral disinhibition and the development of substance-use disorders: findings from the Minnesota Twin Family Study. Dev Psychopathol 11:869–900
Article PubMed Google Scholar
Iacono WG, McGue M (2002) Minnesota Twin Family Study. Twin Res 5(5):482–487
Article PubMed Google Scholar
Johnson NL, Kemp AW, Kotz S (2005) Univariate discrete distributions, 3rd edn. Wiley, Hoboken
Book Google Scholar
Johnson NL, Kotz S, Balakrishnan N (1997) Discrete multivariate distributions. Wiley, New York
Google Scholar
Kemp AW, Kemp CD (1966) An alternative derivation of the Hermite distribution. Biometrika 53:627–628
Article Google Scholar
Keyes MA, Malone SM, Elkins IJ, Legrand LN, McGue M, Iacono WG (2009) The Enrichment Study of the Minnesota Twin Family Study: increasing the yield of twin families at high risk for externalizing psychopathology. Twin Res Human Gen 12(5):489–501
Article Google Scholar
Kirkpatrick RM (2014) RMKdiscrete (Version 0.1). Software and documentation available at http://cran.r-project.org/web/packages/RMKdiscrete/
Kocherlakota S, Kocherlakota K (1992) Bivariate discrete distributions. Marcel Dekker Inc, New York
Google Scholar
Kocherlakota S, Kocherlakota K (2001) Regression in the bivariate Poisson distribution. Commun Stat 30(5):815–825. doi:10.1081/STA-100002259
Article Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article Google Scholar
Lakshminarayana J, Pandit SNN, Rao KS (1999) On a bivariate Poisson distribution. Commun Stat 28(2):267–276. doi:10.1080/03610929908832297
Article Google Scholar
Lee A (1999) Modelling rugby league data via bivariate negative binomial regression. Aust NZ J Stat 41(2):141–152
Article Google Scholar
Lehmann EL (1999) Elements of large-sample theory. Springer, New York
Book Google Scholar
Li C-S, Lu J-C, Park J, Kim K, Brinkley PA, Peterson JP (1999) Multivariate zero-inflated Poisson models and their applications. Technometrics 41(1):29–38
Article Google Scholar
McGue M, Bouchard TJ (1984) Adjustment of twin data for the effects of age and sex. Behav Genet 14(4):325–343
Article PubMed Google Scholar
Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New York
Google Scholar
Nikoloulopoulos AK, Karlis D (2009) Finite normal mixture copulas for multivariate discrete data modeling. J Stat Plan Inference 139:3878–3890. doi:10.1016/j.jspi.2009.05.034
Article Google Scholar
R Core Team (2013). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/. [computer software]
Teicher H (1954) On the multivariate Poisson distribution. Scand Actuar J 37:1–9
Article Google Scholar
Warton DI (2005) Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data. Environmetrics 16:275–289. doi:10.1002/env.702
Article Google Scholar
Wu H, Neale MC (2012) Adjusted confidence intervals for a bounded parameter. Behav Genet 42:886–898
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgments

The authors were supported by U.S. Public Health Service grant DA026119. William G. Iacono and Matt McGue provided the MTFS dataset, which was supported by U.S. Public Health Service Grants DA05147, AA009367, and DA013240. The first author gives his special thanks to Matt McGue, Niels G. Waller, and Hermine H. Maes for their comments on drafts of the paper.

Author information

Authors and Affiliations

Virginia Institute for Psychiatric & Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, 23298-0126, USA
Robert M. Kirkpatrick & Michael C. Neale

Authors

Robert M. Kirkpatrick
View author publications
You can also search for this author in PubMed Google Scholar
Michael C. Neale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert M. Kirkpatrick.

Ethics declarations

Conflict of Interest

Robert M. Kirkpatrick and Michael C. Neale declare that they have no conflict of interest.

Human and animal rights and informed consent

The MTFS was reviewed and approved by the Institutional Review Board at the University of Minnesota. Written informed assent or consent was obtained from all participants, with parents providing written consent for their minor children.

Additional information

Edited by Gitta Lubke.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Online Resource 1: Supplementary Appendices. Supplementary material 1 (pdf 150 kb)

10519_2015_9757_MOESM2_ESM.zip

Online Resource 2: 3 text files: a README file for the other two, an example R script from the Monte Carlo simulation, and an R script for producing graphs and summary statistics from the raw simulation data (read in over the web). Supplementary material 2 (zip 15 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kirkpatrick, R.M., Neale, M.C. Applying Multivariate Discrete Distributions to Genetically Informative Count Data. Behav Genet 46, 252–268 (2016). https://doi.org/10.1007/s10519-015-9757-z

Download citation

Received: 03 November 2014
Accepted: 06 October 2015
Published: 24 October 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10519-015-9757-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying Multivariate Discrete Distributions to Genetically Informative Count Data

Abstract

Access this article

Similar content being viewed by others

Multivariate Generalized Linear Models for Twin and Family Data

Using Multimodel Inference/Model Averaging to Model Causes of Covariation Between Variables in Twins

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Human and animal rights and informed consent

Additional information

Electronic supplementary material

Online Resource 1: Supplementary Appendices. Supplementary material 1 (pdf 150 kb)

10519_2015_9757_MOESM2_ESM.zip

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Applying Multivariate Discrete Distributions to Genetically Informative Count Data

Abstract

Access this article

Similar content being viewed by others

Multivariate Generalized Linear Models for Twin and Family Data

Using Multimodel Inference/Model Averaging to Model Causes of Covariation Between Variables in Twins

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Human and animal rights and informed consent

Additional information

Electronic supplementary material

Online Resource 1: Supplementary Appendices. Supplementary material 1 (pdf 150 kb)

10519_2015_9757_MOESM2_ESM.zip

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation