Abstract
This paper proposes a semiparametric Bayesian framework for the analysis of associations among multivariate longitudinal categorical variables in high-dimensional data settings. This type of data is frequent, especially in the social and behavioral sciences. A semiparametric hierarchical factor analysis model is developed in which the distributions of the factors are modeled nonparametrically through a dynamic hierarchical Dirichlet process prior. A Markov chain Monte Carlo algorithm is developed for fitting the model, and the methodology is exemplified through a study of the dynamics of public attitudes toward science and technology in the United States over the period 1992–2001.
Similar content being viewed by others
References
Albert, J.H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
Bak, H.J. (2001). Education and public attitudes toward science: implications for the deficit model of education and support for science and technology. Social Science Quarterly, 82(4), 779–795.
Bhattacharya, A., & Dunson, D.B. (2011). Sparse Bayesian infinite factor models. Biometrika, 98(2), 291–306.
Caron, F., Davy, M., & Doucet, A. (2007). Generalized poly urn for time-varying Dirichlet process mixtures. In Proceedings of the twenty-third annual conference on uncertainty in artificial intelligence (pp. 33–40). Corvallis: AUAI Press.
Dunson, D.B. (2006). Bayesian dynamic modeling of latent trait distributions. Biostatistics, 7(4), 551–568.
Escofier, B., & Pages, J. (1988). Analyses factorielles simples et multiples; objectifs, méthodes et interprétations. Paris: Dunod.
Everitt, B.S. (1992). The analysis of contingency tables (2nd ed.). London: Chapman & Hall.
Fokoue, E., & Titterington, D.M. (2003). Mixtures of factor analysers: Bayesian estimation and inference by stochastic simulation. Machine Learning, 50, 73–94.
Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics (pp. 169–193). Oxford: Clarendon Press.
Ghosh, J., & Dunson, D.B. (2009). Default priors and efficient posterior computation in Bayesian factor analysis. Journal of Computational and Graphical Statistics, 18, 306–320.
Goodman, L.A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215–231.
Goodman, L.E. (1986). Some useful extensions of the usual correspondence analysis approach and the usual log-linear models approach in the analysis of contingency tables (with discussion). International Statistical Review, 54, 243–270.
Goodman, L.A., & Hout, M. (1998). Statistical methods and graphical displays for analyzing how the association between two qualitative variables differs among countries, among groups, or over time: a modified regression-type approach. Sociological Methodology, 28, 175–230.
Greenacre, M.J. (2007). Correspondence analysis in practice (2nd ed.). Boca Raton: Chapman & Hall.
Ishwaran, H., & James, L.F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96, 161–173.
Lazarsfeld, P.F., & Henry, N.W. (1968). Latent structure analysis. Boston: Houghton Mifflin.
Lee, S.Y., & Song, X.Y. (2002). Bayesian selection on the number of factors in a factor analysis model. Behaviormetrika, 29, 23–40.
Lopes, H.F., & West, M. (2004). Bayesian model assessment in factor analysis. Statistica Sinica, 14, 41–67.
MacEachern, S.N. (1999). Dependent nonparametric processes. In Proceedings of Bayesian statistical science section (pp. 50–55). Alexandria: Am. Statist. Assoc..
MacEachern, S.N. (2000). Dependent Dirichlet processes. Unpublished manuscript, Department of Statistics, The Ohio State University.
MacEachern, S.N. (2001). Decision theoretic aspects of dependent nonparametric processes. In E. George & P. Nanopoulos (Eds.), Bayesian methods with applications to science, policy and official statistics (pp. 551–560). Crete: International Society for Bayesian Analysis.
McLachlan, G.J., Peel, D., & Bean, R.W. (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics & Data Analysis, 41, 379–388.
Mislevy, R.J. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 11(1), 3–31.
Ng, S., & Moench, E. (2011). A hierarchical factor analysis of US housing market dynamics. Econometrics Journal, 14(1), 1–24.
Ren, L., Dunson, D.B., & Carin, L. (2008). The dynamic hierarchical Dirichlet process. In Proceedings of the international conference on machine learning (pp. 824–831). Helsinki: ACM.
Sasaki, M., & Suzuki, T. (1991). Dimensions of public acceptance of science and technology among five industrialized nations. Behaviormetrika, 29, 73–82.
Teh, Y.W., Jordan, M.I., Beal, M.J., & Blei, D.M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101, 1566–1581.
Thurstone, L.L. (1947). Multiple factor analysis. Chicago: University of Chicago Press.
Yang, M., & Dunson, D.B. (2010). Bayesian semiparametric structural equation models with latent variables. Psychometrika, 75(4), 675–693.
Acknowledgements
The authors thank the editor, an associate editor, and two anonymous reviewers for comments and suggestions that greatly improved the manuscript. This research was partly supported by a Warren J. Mitofsky Fellowship from the Roper Center at the University of Connecticut.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
Estimation proceeds through the following steps:
Step 1
Sample z tigj , t=1,…,T, i=1,…,n t , and g=1,…,K, from
where TN denotes a truncated normal distribution.
Step 2
Sample \(\tilde{\pi}_{t}\), t=1,…,T−1, and \(\tilde{\omega}_{tl} \), t=1,…,T, l=1,…,L, from
Step 3
Sample m ti and c ti , t=1,…,T, i=1,…,n t , from a multinomial distribution with
where N r (x;μ;Σ) denotes the probability density function of an r-dimensional vector having a multivariate normal distribution with mean vector μ and covariance matrix Σ, evaluated at x.
Step 4
Sample the component parameters \(\{ \varLambda_{l}^{*}\}_{l = 1}^{L}\) from
Step 5
Sample each ψ tig , t=1,…,T, i=1,…,n t , and g=1,…,K, from
Step 6
Sample μ gj , g=1,…,K, j=1,…,p g from
Step 7
Sample a g,j , the jth row A g , g=1,…,K, and \(j = 2, \ldots,p_{g_{1}} + p_{g_{2}}\), from
Step 8
Sample b g,j , the jth diagonal element of B g , g=2,…,K and j=1,…,q g , from
Step 9
Sample \(\sigma_{g,j}^{ - 2}\), g=1,…,K and j=1,…,q g , from
Step 10
Sample the thresholds ς gj,r , \(g = 1,\ldots,K, j = 1, \ldots,p_{g_{2}}\), and r=1,…,ν j , from the full conditional
which, following Albert and Chib (1993), is a uniform distribution on the interval
Appendix B
Rights and permissions
About this article
Cite this article
Tchumtchoua, S., Dey, D.K. Modeling Associations Among Multivariate Longitudinal Categorical Variables in Survey Data: A Semiparametric Bayesian Approach. Psychometrika 77, 670–692 (2012). https://doi.org/10.1007/s11336-012-9274-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-012-9274-4