Summary
In panel studies binary outcome measures together with time stationary and time varying explanatory variables are collected over time on the same individual. Therefore, a regression analysis for this type of data must allow for the correlation among the outcomes of an individual. The multivariate probit model of Ashford and Sowden (1970) was the first regression model for multivariate binary responses. However, a likelihood analysis of the multivariate probit model with general correlation structure for higher dimensions is intractable due to the maximization over high dimensional integrals thus severely restricting ist applicability so far. Czado (1996) developed a Markov Chain Monte Carlo (MCMC) algorithm to overcome this difficulty. In this paper we present an application of this algorithm to unemployment data from the Panel Study of Income Dynamics involving 11 waves of the panel study. In addition we adapt Bayesian model checking techniques based on the posterior predictive distribution (see for example Gelman et al. (1996)) for the multivariate probit model. These help to identify mean and correlation specification which fit the data well.
Similar content being viewed by others
References
Amemiya, T. (1986).Advanced Econometrics. Harvard University Press, Cambridge, Mass.
Anderson, J.A. and Pemberton, J.D. (1985). The grouped continuous model for multivariate ordered categorical variables and covariate adjustment,Biometrics,41, 875–885.
Ashby, M., Neuhaus, J.M., Hauck, W.W., Bacchetti, P., Heibron, D.C., Jewell, N.P., Segal, M.R. and Fusaro, R.E. (1992). An Annotated Bibliography of Methods for Analyzing Correlated Categorical Data.Stotistics in Medicine,11, 67–99.
Ashford, J.R. and Sowden, R.R. (1970). Multivariate probit analysis,Biometrics,26, 535–546.
Baltagi, B.H. (1996).Econometric Analysis of Panel Data, John Wiley & Sons, New York.
Besag, J., Green, P., Hidgon, D. and Mengersen, K. (1995). Bayesian Computation and Stochastic Systems.Statistical Science,10, No. 1, 3–66.
Best, N., Cowles, M.K. and Vines, K. (1995). CODA—Convergence Diagnosis and Output Analysis Software,MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, UK, email: bugs@mrc-bsu.cam.ac.uk
Butler, J.S. and Moffit, R. (1982). A computationally efficient quadrature procedure for the one-factor multinomial probit model.Econometrica,50, 761–764.
Carey, V., Zeger, S.L. and Diggle, P.J. (1993). Modelling multivariate binary data with alternating logistic regressionsBiometrika,80, 517–526.
Carlin, B.P., Polsen, N.G. and Stoffer, D.S. (1992). A Monte Carlo approach to nonnormal and nonlinear state-space-modeling.J. Am. Statist. Ass.,87, 493–500.
Carter, C.K. and Kohn, R. (1994). On Gibbs sampling for state space models.Biometrika,81, 541–553.
Chamberlain, G. (1984). Comments on “Adaptive estimation of nonlinear regression models”,Econometric Review,3, 199–202.
le Cessie, S. and van Houwelingen, J.C. (1994). Logistic Regression for Correlated Binary Data.Appl. Statist.,43, No. 1, 95–108.
Cowles, M.K. and Carlin, B.P. (1995). Markov chain Monte Carlo convergence diagnostics: a comparative review,J. Am. Statist. Ass.,91, 883–904.
Cox, D.R. (1972). The analysis of multivariate binary data,Appl. Statist.,21, 113–120.
Czado, C. (1996). Multivariate Probit Analysis of Binary Time Series Data with Missing Responses, preprint (http://www-m4.mathematik.tu-muenchen.de/m4/Papers/Czado/cc-pubs.html).
Fahrmeir, L. and Tutz, G. (1994).Multivariate Statistical Modelling based on Generalized Linear Models. New York, Springer-Verlag.
Fitzmaurice, G.M. and Laird, N.M. (1993). A likelihood-based method for analysing longitudinal binary responses,Biometrika,80, 1, 141–151.
Fitzmaurice, G. H., Laird, N.H. and Rotnitzky, A.G. (1993). Regression Models for Discrete Longitudinal Responses,Statist. Sci.,8, 284–309.
Fitzmaurice, G.M. and Lipsitz, S.R. (1995). A model for binary time series data with serial odds ratio patterns.Appl. Statist.,44, No. 1, 51–61.
Fruehwirth-Schnatter, S. (1994). Data augmentation and dynamic linear models.J. of Time Series Analysis,15, 183–202.
Gelfand, A.E. and Smith, A.F.M. (1995).Bayesian Computations, New York, Wiley, in preparation.
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (1995).Bayesian Data Analysis, New York, Chapman and Hall.
Gelman, A., Meng, X.-L. and Stern, H.S. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion),Statistica Sinica,6, 733–807.
Geweke, J. (1991). Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints,Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface, Seattle, Washington, April 21–24, 1991, 571–578.
Geweke, J., Keane, K. and Runkle, D. (1995). Recursively Simulating Multinomial Multiperiod Probit Probabilities,American Statistical Association 1994 Proceedings of the Business and Economic Statistics Section.
Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996)Markov Chain Monte Carlo in Practice, New York, Chapman and Hall.
Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems.J.R. Statist. Soc. B.,29, 83–100.
Hajivassiliou, V., McFadden, D. and Ruud, P. (1996). Simulation of multivariate normal rectangle probabilities and their derivatives. Theoretical and computational results.J. of Econometrics,72, 85–134.
Hastie, T.J. and Tibsherani, R.J. (1990).Generalized Additive Models, Chapman and Hall, New York.
Heagerty, P.J. and Zeger, S.L. (1996). Marginal regression models for clustered ordinal measurements,J. Amer. Statist. Soc.,91, 1024–1036.
Heckman, J.J. and Borjas, G. (1980). Does Unemployment Cause Future Unemployment? Definitions, Questions and Answers from a Continuous Time Model of Heterogeneity and State Dependence.Economica,47, 247–283.
Heumann, C. (1996). Marginal regression modeling of correlated multicategorical response: a likelihood approach, Disscusion paper 19, SFB 386, Seminar für Statistics, Ludwig-Maximilians-Universität, München.
Hsiao, C. (1986).Analysis of Panel Data. Cambridge University Press, Cambridge.
Knorr-Held, L. (1996). Conditional Prior Proposals in Dynamic Models, Discussion Paper 36, SFB 386, LMU Muenchen, Seminar für Statistik, (http://www.stat.uni-muenchen.de/sfb386/publikation.html).
Lee, P.M. (1997).Bayesian Statistics: An Introduction, Second Edition. John Wiley & Sons, New York.
Liang, K.-Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models,Biometrika,73, 13–22.
Liang, K.-Y., Zeger, S.L. and Qaqish, B. (1992). Multivariate regression analyses for categorical data (with discussion).J.R. Statist. Soc. B,54. 3–40.
Lipsitz, S.R., Laird, N.M. and Harrington, D.P. (1991). Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association,Biometrika,78, 153–160.
Lipsitz, S.R., Fitzmaurice, G.M., Sleeper, L. and Zhao, L.P. (1995). Estimation methods for the joint distribution of repeated binary observations,Biometrics,51, 562–570.
Molenberghs, G. and Lesaffre, E. (1994). Marginal Modeling of Correlated Ordinal Data Using a Multivariate Plackett Distribution.J. Amer. Statist. Soc.,89, No. 426, 633–644.
Müller, P. (1994). A Generic Approach to Posterior Integration and Gibbs Sampling. to appear inJ. Amer. Stat. Assoc.
Niesing, W., van Praag, B.M.S. and Veenman, J. (1994). The unemployment of ethnic minority groups in the Netherlands.J. Econometrics,61, 173–196.
Ochi, Y. and Prentice, R.L. (1984). Likelihood inference in a correlated probit regression model.Biometrika,73, 531–543.
Pendergast, J.F., Gange, S.J., Newton, M.A., Lindstrom, M.J., Palta, M. and Fisher, M.R. (1996) A survey of methods for analyzing clustered binary response data,Inter. Statist. Rev., 89–118.
Plackett, R.L. (1965). A class of bivariate distributions,J. Amer. Statist. Ass.,60, 516–522.
Qu, Y., Piedmonte, M.R. and Medendorp, S.V. (1995). Regression models for clustered ordinal data.Biometrics,51, 268–275.
Rice, J.A. and Silverman, B.W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves.J.R. Statist. Soc. B.,53, 233–243.
Robert, C.P. (1995). Simulation of truncated normal variables.Statistics and Computing,5, 121–125.
Rubin, D.B. (1981) Estimation in parallel randomized experiments.J. Educ. Statist.,6, 377–401.
Rubin, D.B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician.Ann. Statist,12, 1151–1172.
Spiess, M. and Hamerle, A. (1996). On properties of GEE estimators in the presence of invariant covariates.Biometrical J.,38, 931–940.
Spiess, M., Nagl, W. and Hamerle, A. (1996) Probit models: Regression parameter estimation using the ML principle despite misspecification of the correlation structure, Discussion Paper 67, SFB 386, (http://www.stat.uni-muenchen.de/sfb386/publikation.html).
Zhao, L.P. and Prentice, R.L. (1990). Correlated binary regression using a quadratic exponential model,Biometrika,77, 642–648.
Author information
Authors and Affiliations
Corresponding author
Additional information
C. Czado was supported by research grant OGP0089858 of the Natural Sciences and Engineering Research Council of Canada.
Rights and permissions
About this article
Cite this article
Czado, C. Multivariate regression analysis of panel data with binary outcomes applied to unemployment data. Statistical Papers 41, 281–304 (2000). https://doi.org/10.1007/BF02925924
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02925924