Skip to main content
Log in

Decomposing Group Differences of Latent Means of Ordered Categorical Variables within a Genetic Factor Model

  • ORIGINAL RESEARCH
  • Published:
Behavior Genetics Aims and scope Submit manuscript

Abstract

A genetic factor model is introduced for decomposition of group differences of the means of phenotypic behavior as well as individual differences when the research variables under consideration are ordered categorical. The model employs the general Genetic Factor Model proposed by Neale and Cardon (Methodology for genetic studies of twins and families, 1992) and, more specifically, the extension proposed by Dolan et al. (Behav Genet 22: 319–335, 1992) which enables decomposition of group differences of the means associated with genetic and environmental factors. Using a latent response variable (LRV) formulation (Muthén and Asparouhov, Latent variable analysis with categorical outcomes: multiple-group and growth modeling in Mplus. Mplus web notes: No. 4, Version 5, 2002), proportional differences of response categories between groups are modeled within the genetic factor model in terms of the distributional differences of latent response variables assumed to underlie the observed ordered categorical variables. Use of the proposed model is illustrated using a measure of conservatism in the data collected from the Australian Twin Registry.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

Download references

Acknowledgments

Preparation of this research was supported by grants to Kenneth J. Sher (R37 AA07231) and Andrew C. Heath (P50 AA11998) from the National Institute on Alcohol Abuse and Alcoholism. This research was facilitated with access to the Australian Twin Registry, a national research resource supported by an Enabling Grant (ID 310667) from the National Health & Medical Research Council, and administered by The University of Melbourne. The authors are grateful to Dr. Nicholas G. Martin for providing access to the Australian Twin Registry data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seung Bin Cho.

Additional information

Edited by Dorret Boomsma.

Appendices

Technical supplement A: mathematical details of model identification

Minimal constraints needed to identify the proposed model are:

  1. (a)

    The mean and variance of each latent response variable, \( y_{j}^{*} \), are set to zero and one, respectively, in the reference groups.

  2. (b)

    The mean and variance of each factor are set to zero and one, respectively, in the reference groups.

  3. (c)

    Factor loadings are constrained to be equal across groups.

  4. (d)

    Intercept of each variable are set to zero in both reference and non-reference groups.

  5. (e)

    For three selected indicator variables, two thresholds are set to be equal across groups, respectively.

This section outlines how factor means and distributions of latent response variables can be identified under minimal identification constraints. Denote observed variable j for twin i (i = {1, 2}) in group g as y (g) ij and its corresponding continuous latent response variable as \( y_{ij}^{*(g)} \). Constraints (a) and (b) identify all the parameters in reference groups by providing the scale of latent response variables and factors, from which factor loadings and thresholds are estimated. Thresholds are estimated as z-scores which correspond to the cumulative proportions of each response category. Factor loadings in non-reference groups are identified by constraint (c). By constraint (e) the means and variances of selected three variables in non-reference groups are identified. Those three latent response variables in non-reference groups are denoted as \( y_{1}^{*(n)} \), \( y_{2}^{*(n)} \), and \( y_{3}^{*(n)} \). Subscripts identifying the first and second twin in the same pair are omitted because the order of twin is assumed to be randomly selected and the means and variances from each twin in the same pair are assumed to be the same (Neale and Cardon 1992). For the first latent response variable, \( y_{1}^{*(n)} \), in the non-reference groups, denoting two thresholds constrained to be equal as τ 11 and τ 12, the mean and variance, \( \mu_{1}^{*(n)} \) and \( \sigma_{1}^{2*(n)} \), respectively, can be identified from the following.

$$ \frac{{\tau_{11}^{{}} - \mu_{1}^{*(n)} }}{{\sigma_{1}^{*(n)} }} = z_{11}^{(n)},\quad \frac{{\tau_{12}^{{}} - \mu_{1}^{*(n)} }}{{\sigma_{1}^{*(n)} }} = z_{12}^{(n)} $$
(10)

z (n)11 and z (n)12 are z-scores correspond to the cumulative proportions for the first and second response categories, respectively, of \( y_{1}^{*(n)} \) in the non-reference groups. The superscripts of group membership on τ 11 and τ 12 are omitted because they are set to be equal across groups. Because τ 11 and τ 12 are given from constraint (e), and z (n)11 and z (n)12 are given from data, the Eq. 10 contain two unknowns, \( \mu_1^{{{*}(n)}} \)and \( \sigma_1^{{{*}(n)}} \), with two equations. Therefore \( \mu_{1}^{*(n)} \) and \( \sigma_{1}^{*(n)} \) can be identified as following.

$$ \mu_{1}^{*(n)} = \frac{{\tau_{11}^{{}} z_{12}^{(n)} - \tau_{12}^{{}} z_{11}^{(n)} }}{{z_{12}^{(n)} - z_{11}^{(n)} }},\quad\sigma_{1}^{*(n)} = \frac{{\tau_{12}^{{}} - \tau_{11}^{{}} }}{{z_{12}^{(n)} - z_{11}^{(n)} }} $$

If there are more than three response categories in \( y_{1}^{*(n)} \), the remaining thresholds can be identified from the mean and variance identified in Eq. 10. Denoting the third threshold of \( y_{1}^{*(n)} \) as τ (n)13 , τ (n)13 is identified from z-score corresponding third category.

$$ \frac{{\tau_{13}^{(n)} - \mu_{1}^{*(n)} }}{{\sigma_{1}^{*(n)} }} = z_{13}^{(n)} $$

Means, variances, and the rest of thresholds not included in constraints (e) for \( y_{2}^{*(n)} \) and \( y_{3}^{*(n)} \) can be identified in the same way.

Constraint (e), combined with constraints (c) and (d), identifies the means and variances of factors in the non-reference groups. The means of latent response variables in the non-reference groups selected in constraint (e)—\( \mu_{1}^{*(n)} \), \( \mu_{2}^{*(n)} \), and \( \mu_{3}^{*(n)} \)—can be expressed as linear combinations of factor means differences from reference groups and factor loadings.

$$ \begin{gathered} \mu_{1}^{*(n)} = \lambda_{A1} \delta_{A}^{(n)} + \lambda_{C1} \delta_{C}^{(n)} + \lambda_{E1} \delta_{E}^{(n)} \hfill \\ \mu_{2}^{*(n)} = \lambda_{A2} \delta_{A}^{(n)} + \lambda_{C2} \delta_{C}^{(n)} + \lambda_{E2} \delta_{E}^{(n)} \hfill \\ \mu_{3}^{*(n)} = \lambda_{A3} \delta_{A}^{(n)} + \lambda_{C3} \delta_{C}^{(n)} + \lambda_{E3} \delta_{E}^{(n)} \hfill \\ \end{gathered} $$
(11)

δ (n) is the factor mean difference from reference groups for each factor. There are three equations with three unknowns because factor loadings are identified by constraint (c), so δ (n) for each factor can be identified. Factor variances can be identified from polychoric correlations among \( y_{1}^{*(n)} \), \( y_{2}^{*(n)} \), and \( y_{3}^{*(n)} \).

$$ \begin{gathered} \rho_{12}^{(n)} = \frac{{\lambda_{1A} \lambda_{2A} \phi_{A}^{(n)} + \lambda_{1C} \lambda_{2C} \phi_{C}^{(n)} + \lambda_{1E} \lambda_{2E} \phi_{E}^{(n)} }}{{\sigma_{1}^{*(n)} \sigma_{2}^{*(n)} }} \hfill \\ \rho_{13}^{(n)} = \frac{{\lambda_{1A} \lambda_{3A} \phi_{A}^{(n)} + \lambda_{1C} \lambda_{3C} \phi_{C}^{(n)} + \lambda_{1E} \lambda_{3E} \phi_{E}^{(n)} }}{{\sigma_{1}^{*(n)} \sigma_{3}^{*(n)} }} \hfill \\ \rho_{23}^{(n)} = \frac{{\lambda_{2A} \lambda_{3A} \phi_{A}^{(n)} + \lambda_{2C} \lambda_{3C} \phi_{C}^{(n)} + \lambda_{2E} \lambda_{3E} \phi_{E}^{(n)} }}{{\sigma_{2}^{*(n)} \sigma_{3}^{*(n)} }} \hfill \\ \end{gathered} $$
(12)

ϕ(n) is the variance of each factor in non-reference groups and ρ (n) hk is the polychoric correlation between variable h and k in non-reference groups. Because polychoric correlations are given from the data, factor loadings are given from constraint (c), and the variances of each latent response variables are given from Eq. 10, Eq. 12 consists of three equations with three unknowns. Thus, factor variances can be identified from Eq. 12. The means and variances of latent response variables not included in constraint (e) can be identified from factor loadings constrained from constraint (c) and factor means identified from Eq. 11. Denoting one of the latent response variable not included in constraint (e) as \( y_{4}^{*(n)} \) and its mean and variance as \( \mu_{4}^{*(n)} \) and \( \sigma_{4}^{2*(n)} \), respectively, \( \mu_{4}^{*(n)} \) and \( \sigma_{4}^{2*(n)} \) are identified from following.

$$ \mu_{4}^{*(n)} = \lambda_{A4} \delta_{A}^{(n)} + \lambda_{C4} \delta_{C}^{(n)} + \lambda_{E4} \delta_{E}^{(n)} $$
(13)
$$ \rho_{14}^{(n)} = \frac{{\lambda_{1A} \lambda_{4A} \phi_{A}^{(n)} + \lambda_{1C} \lambda_{4C} \phi_{C}^{(n)} + \lambda_{1E} \lambda_{4E} \phi_{E}^{(n)} }}{{\sigma_{1}^{*(n)} \sigma_{4}^{*(n)} }} $$
(14)

Because the mean and variance of \( y_{4}^{*(n)} \) are identified the thresholds of \( y_{4}^{*(n)} \) can be identified from z-scores correspond to the cumulative proportions of each response categories. Denoting the first threshold of \( y_{4}^{*(n)} \) as τ (n)41 ,

$$ \frac{{\tau_{41}^{(n)} - \mu_{4}^{*(n)} }}{{\sigma_{4}^{*(n)} }} = z_{41}^{(n)} , $$
(15)

and the rest of the thresholds of \( y_{4}^{*(n)} \) are identified likewise. Means, variances, and thresholds for the rest of the variables in the non-reference groups can be identified in the same way as in Eqs. 1315. Thus, all parameters are identified.

Technical supplement B: description of sample Mplus program

An excerpt of an Mplus program used for the racial sub-dimension is presented in Table 7 as a sample program code. Rather than going over general Mplus programming, the description of program is focused on the specification of the proposed model with Mplus. The details of Mplus programming are described in the Mplus User’s Guide (Muthén and Muthén 2007). The Mplus program starts with Data statement which specifies the location of the data file. The Variable statement specifies the variables to be used and missing values and grouping variables. The Categorical statement specifies categorical variables. In the Grouping statement five groups are specified by the variable zygw1t1 . Because the grouping variables cannot be used in combination, when more than one variable define groups, such as zygosity and gender in this case, grouping variable should be defined as a single variable beforehand. The Model statement specifies the overall model and model specific for each group should be specified after defining overall model. The model specific for each group is specified by Model statement followed by a group name. Latent factors are defined by By statement followed by indicator variables. The latent variables F1 through F3 are the additive genetic factor, common environmental factor, and unique environmental factor, respectively, for the first twin in the same family. Factors F4 through F6 are the same factors, respectively, for the second twin. The variable names without any parameter or bracket represent the variance of independent variables or residual variances of dependent variables. The variable names in the bracket [] represent the mean or intercept of the variable. The parameters can be labeled by the number in the parenthesis. The equality constraints can be applied by assigning the same label. Parameters for the twins in the same family are constrained to be equal. Labels of parameters are also used in the Model constraint option to apply non-linear constraints. Asterisks after the parameter indicate free parameters, and are often used to override Mplus to estimate the parameters fixed to constants by default. Asterisks are needed for the factor loadings because Mplus constrains the factor loading for the first indicator of each factor to one by default. The Symbol “ @ ” fixes the parameter to a following value. The With statement specifies the covariance between variables. Regression on a covariate variable is specified via the On statement. Each phenotype indicator variables is regressed on the variable aget1 . Regression coefficients for the same variable are constrained to be equal across groups by assigning the same label across groups. Because, unless specified, Mplus estimates the covariances between exogenous variables, covariances among factors should be explicitly set to zero. Thresholds are specified by the variable name with “ $ ” sign in the bracket [ ]. The numbers following the “ $ ” sign indicate the order of threshold. There are two thresholds ( $1 and $2 ) because each item has three response categories. To estimate the thresholds in male groups which are not constrained to be equal to those in female groups, the labels different than in the overall model are used. Factor means and variances are also estimated in male groups by using labels different than labels in overall model statement. Scale parameter for each variable is specified by the variable name in the bracket {} . Scale parameters are estimated for male groups, while they are set to one in overall model statement. In the Model constraint option, non-linear constraints for covariance structure in male groups are specified to impose correct correlation structure in Eq. 2.

Table 7 Excerpt of Mplus program

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cho, S.B., Wood, P.K. & Heath, A.C. Decomposing Group Differences of Latent Means of Ordered Categorical Variables within a Genetic Factor Model. Behav Genet 39, 101–122 (2009). https://doi.org/10.1007/s10519-008-9237-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10519-008-9237-9

Keywords

Navigation