Skip to main content
Log in

Multivariate regression analysis of panel data with binary outcomes applied to unemployment data

  • Articles
  • Published:
Statistical Papers Aims and scope Submit manuscript

Summary

In panel studies binary outcome measures together with time stationary and time varying explanatory variables are collected over time on the same individual. Therefore, a regression analysis for this type of data must allow for the correlation among the outcomes of an individual. The multivariate probit model of Ashford and Sowden (1970) was the first regression model for multivariate binary responses. However, a likelihood analysis of the multivariate probit model with general correlation structure for higher dimensions is intractable due to the maximization over high dimensional integrals thus severely restricting ist applicability so far. Czado (1996) developed a Markov Chain Monte Carlo (MCMC) algorithm to overcome this difficulty. In this paper we present an application of this algorithm to unemployment data from the Panel Study of Income Dynamics involving 11 waves of the panel study. In addition we adapt Bayesian model checking techniques based on the posterior predictive distribution (see for example Gelman et al. (1996)) for the multivariate probit model. These help to identify mean and correlation specification which fit the data well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amemiya, T. (1986).Advanced Econometrics. Harvard University Press, Cambridge, Mass.

    Google Scholar 

  • Anderson, J.A. and Pemberton, J.D. (1985). The grouped continuous model for multivariate ordered categorical variables and covariate adjustment,Biometrics,41, 875–885.

    Article  MATH  MathSciNet  Google Scholar 

  • Ashby, M., Neuhaus, J.M., Hauck, W.W., Bacchetti, P., Heibron, D.C., Jewell, N.P., Segal, M.R. and Fusaro, R.E. (1992). An Annotated Bibliography of Methods for Analyzing Correlated Categorical Data.Stotistics in Medicine,11, 67–99.

    Google Scholar 

  • Ashford, J.R. and Sowden, R.R. (1970). Multivariate probit analysis,Biometrics,26, 535–546.

    Article  Google Scholar 

  • Baltagi, B.H. (1996).Econometric Analysis of Panel Data, John Wiley & Sons, New York.

    Google Scholar 

  • Besag, J., Green, P., Hidgon, D. and Mengersen, K. (1995). Bayesian Computation and Stochastic Systems.Statistical Science,10, No. 1, 3–66.

    Article  MATH  MathSciNet  Google Scholar 

  • Best, N., Cowles, M.K. and Vines, K. (1995). CODA—Convergence Diagnosis and Output Analysis Software,MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, UK, email: bugs@mrc-bsu.cam.ac.uk

    Google Scholar 

  • Butler, J.S. and Moffit, R. (1982). A computationally efficient quadrature procedure for the one-factor multinomial probit model.Econometrica,50, 761–764.

    Article  MATH  Google Scholar 

  • Carey, V., Zeger, S.L. and Diggle, P.J. (1993). Modelling multivariate binary data with alternating logistic regressionsBiometrika,80, 517–526.

    Article  MATH  Google Scholar 

  • Carlin, B.P., Polsen, N.G. and Stoffer, D.S. (1992). A Monte Carlo approach to nonnormal and nonlinear state-space-modeling.J. Am. Statist. Ass.,87, 493–500.

    Article  Google Scholar 

  • Carter, C.K. and Kohn, R. (1994). On Gibbs sampling for state space models.Biometrika,81, 541–553.

    Article  MATH  MathSciNet  Google Scholar 

  • Chamberlain, G. (1984). Comments on “Adaptive estimation of nonlinear regression models”,Econometric Review,3, 199–202.

    Article  MathSciNet  Google Scholar 

  • le Cessie, S. and van Houwelingen, J.C. (1994). Logistic Regression for Correlated Binary Data.Appl. Statist.,43, No. 1, 95–108.

    Article  MATH  Google Scholar 

  • Cowles, M.K. and Carlin, B.P. (1995). Markov chain Monte Carlo convergence diagnostics: a comparative review,J. Am. Statist. Ass.,91, 883–904.

    Article  MathSciNet  Google Scholar 

  • Cox, D.R. (1972). The analysis of multivariate binary data,Appl. Statist.,21, 113–120.

    Article  Google Scholar 

  • Czado, C. (1996). Multivariate Probit Analysis of Binary Time Series Data with Missing Responses, preprint (http://www-m4.mathematik.tu-muenchen.de/m4/Papers/Czado/cc-pubs.html).

  • Fahrmeir, L. and Tutz, G. (1994).Multivariate Statistical Modelling based on Generalized Linear Models. New York, Springer-Verlag.

    MATH  Google Scholar 

  • Fitzmaurice, G.M. and Laird, N.M. (1993). A likelihood-based method for analysing longitudinal binary responses,Biometrika,80, 1, 141–151.

    Article  MATH  Google Scholar 

  • Fitzmaurice, G. H., Laird, N.H. and Rotnitzky, A.G. (1993). Regression Models for Discrete Longitudinal Responses,Statist. Sci.,8, 284–309.

    Article  MATH  MathSciNet  Google Scholar 

  • Fitzmaurice, G.M. and Lipsitz, S.R. (1995). A model for binary time series data with serial odds ratio patterns.Appl. Statist.,44, No. 1, 51–61.

    Article  MATH  Google Scholar 

  • Fruehwirth-Schnatter, S. (1994). Data augmentation and dynamic linear models.J. of Time Series Analysis,15, 183–202.

    Article  MATH  Google Scholar 

  • Gelfand, A.E. and Smith, A.F.M. (1995).Bayesian Computations, New York, Wiley, in preparation.

    Google Scholar 

  • Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (1995).Bayesian Data Analysis, New York, Chapman and Hall.

    Google Scholar 

  • Gelman, A., Meng, X.-L. and Stern, H.S. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion),Statistica Sinica,6, 733–807.

    MATH  MathSciNet  Google Scholar 

  • Geweke, J. (1991). Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints,Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface, Seattle, Washington, April 21–24, 1991, 571–578.

  • Geweke, J., Keane, K. and Runkle, D. (1995). Recursively Simulating Multinomial Multiperiod Probit Probabilities,American Statistical Association 1994 Proceedings of the Business and Economic Statistics Section.

  • Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996)Markov Chain Monte Carlo in Practice, New York, Chapman and Hall.

    MATH  Google Scholar 

  • Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems.J.R. Statist. Soc. B.,29, 83–100.

    MATH  MathSciNet  Google Scholar 

  • Hajivassiliou, V., McFadden, D. and Ruud, P. (1996). Simulation of multivariate normal rectangle probabilities and their derivatives. Theoretical and computational results.J. of Econometrics,72, 85–134.

    Article  MATH  MathSciNet  Google Scholar 

  • Hastie, T.J. and Tibsherani, R.J. (1990).Generalized Additive Models, Chapman and Hall, New York.

    MATH  Google Scholar 

  • Heagerty, P.J. and Zeger, S.L. (1996). Marginal regression models for clustered ordinal measurements,J. Amer. Statist. Soc.,91, 1024–1036.

    MATH  Google Scholar 

  • Heckman, J.J. and Borjas, G. (1980). Does Unemployment Cause Future Unemployment? Definitions, Questions and Answers from a Continuous Time Model of Heterogeneity and State Dependence.Economica,47, 247–283.

    Article  Google Scholar 

  • Heumann, C. (1996). Marginal regression modeling of correlated multicategorical response: a likelihood approach, Disscusion paper 19, SFB 386, Seminar für Statistics, Ludwig-Maximilians-Universität, München.

  • Hsiao, C. (1986).Analysis of Panel Data. Cambridge University Press, Cambridge.

    MATH  Google Scholar 

  • Knorr-Held, L. (1996). Conditional Prior Proposals in Dynamic Models, Discussion Paper 36, SFB 386, LMU Muenchen, Seminar für Statistik, (http://www.stat.uni-muenchen.de/sfb386/publikation.html).

  • Lee, P.M. (1997).Bayesian Statistics: An Introduction, Second Edition. John Wiley & Sons, New York.

    MATH  Google Scholar 

  • Liang, K.-Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models,Biometrika,73, 13–22.

    Article  MATH  MathSciNet  Google Scholar 

  • Liang, K.-Y., Zeger, S.L. and Qaqish, B. (1992). Multivariate regression analyses for categorical data (with discussion).J.R. Statist. Soc. B,54. 3–40.

    MATH  MathSciNet  Google Scholar 

  • Lipsitz, S.R., Laird, N.M. and Harrington, D.P. (1991). Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association,Biometrika,78, 153–160.

    Article  MathSciNet  Google Scholar 

  • Lipsitz, S.R., Fitzmaurice, G.M., Sleeper, L. and Zhao, L.P. (1995). Estimation methods for the joint distribution of repeated binary observations,Biometrics,51, 562–570.

    Article  Google Scholar 

  • Molenberghs, G. and Lesaffre, E. (1994). Marginal Modeling of Correlated Ordinal Data Using a Multivariate Plackett Distribution.J. Amer. Statist. Soc.,89, No. 426, 633–644.

    MATH  Google Scholar 

  • Müller, P. (1994). A Generic Approach to Posterior Integration and Gibbs Sampling. to appear inJ. Amer. Stat. Assoc.

  • Niesing, W., van Praag, B.M.S. and Veenman, J. (1994). The unemployment of ethnic minority groups in the Netherlands.J. Econometrics,61, 173–196.

    Article  Google Scholar 

  • Ochi, Y. and Prentice, R.L. (1984). Likelihood inference in a correlated probit regression model.Biometrika,73, 531–543.

    Article  MathSciNet  Google Scholar 

  • Pendergast, J.F., Gange, S.J., Newton, M.A., Lindstrom, M.J., Palta, M. and Fisher, M.R. (1996) A survey of methods for analyzing clustered binary response data,Inter. Statist. Rev., 89–118.

  • Plackett, R.L. (1965). A class of bivariate distributions,J. Amer. Statist. Ass.,60, 516–522.

    Article  MathSciNet  Google Scholar 

  • Qu, Y., Piedmonte, M.R. and Medendorp, S.V. (1995). Regression models for clustered ordinal data.Biometrics,51, 268–275.

    Article  MATH  Google Scholar 

  • Rice, J.A. and Silverman, B.W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves.J.R. Statist. Soc. B.,53, 233–243.

    MATH  MathSciNet  Google Scholar 

  • Robert, C.P. (1995). Simulation of truncated normal variables.Statistics and Computing,5, 121–125.

    Article  MATH  Google Scholar 

  • Rubin, D.B. (1981) Estimation in parallel randomized experiments.J. Educ. Statist.,6, 377–401.

    Article  Google Scholar 

  • Rubin, D.B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician.Ann. Statist,12, 1151–1172.

    Article  MATH  MathSciNet  Google Scholar 

  • Spiess, M. and Hamerle, A. (1996). On properties of GEE estimators in the presence of invariant covariates.Biometrical J.,38, 931–940.

    Article  MATH  Google Scholar 

  • Spiess, M., Nagl, W. and Hamerle, A. (1996) Probit models: Regression parameter estimation using the ML principle despite misspecification of the correlation structure, Discussion Paper 67, SFB 386, (http://www.stat.uni-muenchen.de/sfb386/publikation.html).

  • Zhao, L.P. and Prentice, R.L. (1990). Correlated binary regression using a quadratic exponential model,Biometrika,77, 642–648.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudia Czado.

Additional information

C. Czado was supported by research grant OGP0089858 of the Natural Sciences and Engineering Research Council of Canada.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Czado, C. Multivariate regression analysis of panel data with binary outcomes applied to unemployment data. Statistical Papers 41, 281–304 (2000). https://doi.org/10.1007/BF02925924

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02925924

Keywords

Navigation