Modeling Associations Among Multivariate Longitudinal Categorical Variables in Survey Data: A Semiparametric Bayesian Approach

Tchumtchoua, Sylvie; Dey, Dipak K.

doi:10.1007/s11336-012-9274-4

Modeling Associations Among Multivariate Longitudinal Categorical Variables in Survey Data: A Semiparametric Bayesian Approach

Published: 30 May 2012

Volume 77, pages 670–692, (2012)
Cite this article

Psychometrika Aims and scope Submit manuscript

Sylvie Tchumtchoua¹ &
Dipak K. Dey²

411 Accesses
1 Citation
Explore all metrics

Abstract

This paper proposes a semiparametric Bayesian framework for the analysis of associations among multivariate longitudinal categorical variables in high-dimensional data settings. This type of data is frequent, especially in the social and behavioral sciences. A semiparametric hierarchical factor analysis model is developed in which the distributions of the factors are modeled nonparametrically through a dynamic hierarchical Dirichlet process prior. A Markov chain Monte Carlo algorithm is developed for fitting the model, and the methodology is exemplified through a study of the dynamics of public attitudes toward science and technology in the United States over the period 1992–2001.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Do Not Cross Me: Optimizing the Use of Cross-Sectional Designs

Article 07 January 2019

References

Albert, J.H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
Article Google Scholar
Bak, H.J. (2001). Education and public attitudes toward science: implications for the deficit model of education and support for science and technology. Social Science Quarterly, 82(4), 779–795.
Article Google Scholar
Bhattacharya, A., & Dunson, D.B. (2011). Sparse Bayesian infinite factor models. Biometrika, 98(2), 291–306.
Article PubMed Google Scholar
Caron, F., Davy, M., & Doucet, A. (2007). Generalized poly urn for time-varying Dirichlet process mixtures. In Proceedings of the twenty-third annual conference on uncertainty in artificial intelligence (pp. 33–40). Corvallis: AUAI Press.
Google Scholar
Dunson, D.B. (2006). Bayesian dynamic modeling of latent trait distributions. Biostatistics, 7(4), 551–568.
Article PubMed Google Scholar
Escofier, B., & Pages, J. (1988). Analyses factorielles simples et multiples; objectifs, méthodes et interprétations. Paris: Dunod.
Google Scholar
Everitt, B.S. (1992). The analysis of contingency tables (2nd ed.). London: Chapman & Hall.
Google Scholar
Fokoue, E., & Titterington, D.M. (2003). Mixtures of factor analysers: Bayesian estimation and inference by stochastic simulation. Machine Learning, 50, 73–94.
Article Google Scholar
Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics (pp. 169–193). Oxford: Clarendon Press.
Google Scholar
Ghosh, J., & Dunson, D.B. (2009). Default priors and efficient posterior computation in Bayesian factor analysis. Journal of Computational and Graphical Statistics, 18, 306–320.
Article Google Scholar
Goodman, L.A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215–231.
Article Google Scholar
Goodman, L.E. (1986). Some useful extensions of the usual correspondence analysis approach and the usual log-linear models approach in the analysis of contingency tables (with discussion). International Statistical Review, 54, 243–270.
Article Google Scholar
Goodman, L.A., & Hout, M. (1998). Statistical methods and graphical displays for analyzing how the association between two qualitative variables differs among countries, among groups, or over time: a modified regression-type approach. Sociological Methodology, 28, 175–230.
Article Google Scholar
Greenacre, M.J. (2007). Correspondence analysis in practice (2nd ed.). Boca Raton: Chapman & Hall.
Book Google Scholar
Ishwaran, H., & James, L.F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96, 161–173.
Article Google Scholar
Lazarsfeld, P.F., & Henry, N.W. (1968). Latent structure analysis. Boston: Houghton Mifflin.
Google Scholar
Lee, S.Y., & Song, X.Y. (2002). Bayesian selection on the number of factors in a factor analysis model. Behaviormetrika, 29, 23–40.
Article Google Scholar
Lopes, H.F., & West, M. (2004). Bayesian model assessment in factor analysis. Statistica Sinica, 14, 41–67.
Google Scholar
MacEachern, S.N. (1999). Dependent nonparametric processes. In Proceedings of Bayesian statistical science section (pp. 50–55). Alexandria: Am. Statist. Assoc..
Google Scholar
MacEachern, S.N. (2000). Dependent Dirichlet processes. Unpublished manuscript, Department of Statistics, The Ohio State University.
MacEachern, S.N. (2001). Decision theoretic aspects of dependent nonparametric processes. In E. George & P. Nanopoulos (Eds.), Bayesian methods with applications to science, policy and official statistics (pp. 551–560). Crete: International Society for Bayesian Analysis.
Google Scholar
McLachlan, G.J., Peel, D., & Bean, R.W. (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics & Data Analysis, 41, 379–388.
Article Google Scholar
Mislevy, R.J. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 11(1), 3–31.
Article Google Scholar
Ng, S., & Moench, E. (2011). A hierarchical factor analysis of US housing market dynamics. Econometrics Journal, 14(1), 1–24.
Article Google Scholar
Ren, L., Dunson, D.B., & Carin, L. (2008). The dynamic hierarchical Dirichlet process. In Proceedings of the international conference on machine learning (pp. 824–831). Helsinki: ACM.
Chapter Google Scholar
Sasaki, M., & Suzuki, T. (1991). Dimensions of public acceptance of science and technology among five industrialized nations. Behaviormetrika, 29, 73–82.
Article Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., & Blei, D.M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101, 1566–1581.
Article Google Scholar
Thurstone, L.L. (1947). Multiple factor analysis. Chicago: University of Chicago Press.
Google Scholar
Yang, M., & Dunson, D.B. (2010). Bayesian semiparametric structural equation models with latent variables. Psychometrika, 75(4), 675–693.
Article Google Scholar

Download references

Acknowledgements

The authors thank the editor, an associate editor, and two anonymous reviewers for comments and suggestions that greatly improved the manuscript. This research was partly supported by a Warren J. Mitofsky Fellowship from the Roper Center at the University of Connecticut.

Author information

Authors and Affiliations

Statistical and Applied Mathematical Sciences Institute, 19 T. W. Alexander Drive, Research Triangle Park, P.O. Box 14006, Durham, NC, 27709, USA
Sylvie Tchumtchoua
Department of Statistics, University of Connecticut, 215 Glenbrook Rd. U-4120, Storrs, CT, 06269, USA
Dipak K. Dey

Authors

Sylvie Tchumtchoua
View author publications
You can also search for this author in PubMed Google Scholar
Dipak K. Dey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sylvie Tchumtchoua.

Appendices

Appendix A

Estimation proceeds through the following steps:

Step 1

Sample z _tigj, t=1,…,T, i=1,…,n _t, and g=1,…,K, from

where TN denotes a truncated normal distribution.

Step 2

Sample $\tilde{\pi}_{t}$, t=1,…,T−1, and $\tilde{\omega}_{tl} $, t=1,…,T, l=1,…,L, from

Step 3

Sample m _ti and c _ti, t=1,…,T, i=1,…,n _t, from a multinomial distribution with

where N _r(x;μ;Σ) denotes the probability density function of an r-dimensional vector having a multivariate normal distribution with mean vector μ and covariance matrix Σ, evaluated at x.

Step 4

Sample the component parameters $\{ \varLambda_{l}^{*}\}_{l = 1}^{L}$ from

Step 5

Sample each ψ _tig, t=1,…,T, i=1,…,n _t, and g=1,…,K, from

$$\psi_{\mathit{tig}}|A_{g},B_{g},\varLambda_{ti},z_{\mathit{tig}}\sim N_{q_{g}}\bigl(W(A_{g}'z_{\mathit{tig}} + \varOmega_{g}^{ - 1}B{}_{g}\varLambda_{ti}),W\bigr), \quad W = \bigl(A_{g}'A_{g} + \varOmega_{g}^{ - 1}\bigr)^{ - 1}. $$

Step 6

Sample μ _gj, g=1,…,K, j=1,…,p _g from

$$\mu_{gj}|z_{tgj},\psi_{tg},A_{g}\sim N\bigl(n + \sigma_{g0}^{ - 2}\bigr)^{ - 1}\Biggl( \sigma_{g0}^{ - 2}\mu_{g0} + \sum _{t = 1}^{T} \sum_{i = 1}^{n_{t}} (z_{\mathit{tig}j} - a_{g,j}\psi_{\mathit{tig}})^{2} ,\bigl(n + \sigma_{g0}^{ - 2}\bigr)^{ - 1}\Biggr). $$

Step 7

Sample a _g,j, the jth row A _g, g=1,…,K, and $j = 2, \ldots,p_{g_{1}} + p_{g_{2}}$, from

Step 8

Sample b _g,j, the jth diagonal element of B _g, g=2,…,K and j=1,…,q _g, from

Step 9

Sample $\sigma_{g,j}^{ - 2}$, g=1,…,K and j=1,…,q _g, from

Step 10

Sample the thresholds ς _gj,r, $g = 1,\ldots,K, j = 1, \ldots,p_{g_{2}}$, and r=1,…,ν _j, from the full conditional

which, following Albert and Chib (1993), is a uniform distribution on the interval

$$\bigl[\max\bigl\{ \max\{ z_{\mathit{tig}j}:y_{\mathit{tig}j} = r\} ,\varsigma_{gj,r - 1}\bigr\} ,\min\bigl\{ \min\{ z_{\mathit{tig}j}:y_{\mathit{tig}j} = r + 1\} ,\varsigma_{gj,r + 1}\bigr\} \bigr]. $$

Appendix B

Table B.1. National Science Foundation Surveys of Public’s Understanding of S&T data.

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tchumtchoua, S., Dey, D.K. Modeling Associations Among Multivariate Longitudinal Categorical Variables in Survey Data: A Semiparametric Bayesian Approach. Psychometrika 77, 670–692 (2012). https://doi.org/10.1007/s11336-012-9274-4

Download citation

Received: 05 April 2010
Revised: 20 December 2011
Published: 30 May 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11336-012-9274-4

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling Associations Among Multivariate Longitudinal Categorical Variables in Survey Data: A Semiparametric Bayesian Approach

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Do Not Cross Me: Optimizing the Use of Cross-Sectional Designs

References

Acknowledgements