Data augmentation, frequentist estimation, and the Bayesian analysis of multinomial logit models Authors Regular Article
First Online: 11 February 2009 Received: 04 April 2008 Revised: 20 January 2009 DOI :
10.1007/s00362-009-0205-0
Cite this article as: Scott, S.L. Stat Papers (2011) 52: 87. doi:10.1007/s00362-009-0205-0 Abstract This article describes a convenient method of selecting Metropolis– Hastings proposal distributions for multinomial logit models. There are two key ideas involved. The first is that multinomial logit models have a latent variable representation similar to that exploited by Albert and Chib (J Am Stat Assoc 88:669–679, 1993) for probit regression. Augmenting the latent variables replaces the multinomial logit likelihood function with the complete data likelihood for a linear model with extreme value errors. While no conjugate prior is available for this model, a least squares estimate of the parameters is easily obtained. The asymptotic sampling distribution of the least squares estimate is Gaussian with known variance. The second key idea in this paper is to generate a Metropolis–Hastings proposal distribution by conditioning on the estimator instead of the full data set. The resulting sampler has many of the benefits of so-called tailored or approximation Metropolis–Hastings samplers. However, because the proposal distributions are available in closed form they can be implemented without numerical methods for exploring the posterior distribution. The algorithm converges geometrically ergodically, its computational burden is minor, and it requires minimal user input. Improvements to the sampler’s mixing rate are investigated. The algorithm is also applied to partial credit models describing ordinal item response data from the 1998 National Assessment of Educational Progress. Its application to hierarchical models and Poisson regression are briefly discussed.
Keywords Multinomial Poisson transformation Discrete choice model Partial credit model Markov chain Monte Carlo Gibbs sampler Metropolis–Hastings Logistic regression Polytomous Polychotomous Steven L. Scott is a Senior Economic Analyst at Google.
References Abe M (1999) A generalized additive model for discrete-choice data. J Bus Econ Stat 17: 271–284
CrossRef Agresti A (1990) Categorical data analysis. Wiley, New York
Albert J (1992) A Bayesian estimation of normal ogive item response curves using Gibbs sampling. J Educ Stat 17: 251–269
CrossRef Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88: 669–679
CrossRef MathSciNet Allen NL, Donoghue JR, Schoeps TL (2000) The NAEP 1998 technical report. Tech. Rep. NCES 2001-509, United States Department of Education
Baker SG (1994) The multinomial-Poisson transformation. The Statistician 43: 495–504
CrossRef Barnard J, McCulloch R, Meng X-L (2000) Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Stat Sin 10(4): 1281–1311
MathSciNet Bradlow ET, Wainer H, Wang X (1999) A Bayesian random effects model for testlets. Psychometrika 64(2): 153–168
CrossRef Chen Z, Kuo L (2001) A note on the estimation of multinomial logit models with random effects. Am Stat 55: 89–95
CrossRef MathSciNet Chib S, Greenberg E (1995) Understanding the Metropolis–Hastings algorithm. Am Stat 49: 327–335
CrossRef Cox DR, Oakes D (1984) Analysis of survival data. Chapman & Hall, New york
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (C/R: p 22–37). J R Stat Soc Ser B Methodol 39: 1–22
MathSciNet Foster DP, Stine RA, Waterman RP (1998) Business analysis using regression. Springer, Berlin
Frühwirth-Schnatter S, Frühwirth R (2007) Auxiliary mixture sampling with applications to logistic models. Comput Stat Data Anal 51: 3509–3528
CrossRef Frühwirth-Schnatter S, Wagner H (2006) Auxiliary mixture sampling for parameter-driven models of time series of counts with applications to state space modelling. Biometrika 93: 827–841
CrossRef MathSciNet Gelman A, Gilks WR, Roberts GO (1997) Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann Appl Probab 7(1): 110–120
CrossRef MathSciNet Gilks WR, Wild P (1992) Adaptive rejection sampling for Gibbs sampling. Appl Stat 41: 337–348
CrossRef Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57: 97–109
CrossRef Higdon DM (2001) Discussion of “The art of data augmentation”. J Comput Graph Stat 10(1): 69–74
CrossRef MathSciNet Holmes CC, Held L (2006) Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal 1(1): 145–168
MathSciNet Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, 2nd edn, vol 2. Wiley Interscience, Somerset
Johnson VE, Albert JH (1999) Ordinal data modeling. Springer, Berlin
Jones GL, Hobert JP (2001) Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Stat Sci 16(4): 312–334
CrossRef MathSciNet Lavine M (2003) A marginal ergodic theorem. In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M (eds) Bayesian statistics, vol 7. Oxford University Press, Oxford, pp 577–586
Le Cam LM, Yang GL (2000) Asymptotics in statistics: some basic concepts. Springer, Berlin
Liu JS, Wong WH, Kong A (1994) Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81: 27–40
CrossRef MathSciNet Liu JS, Wong WH, Kong A (1995) Covariance structure and convergence rate of the Gibbs sampler with various scans. J R Stat Soc Ser B Methodol 57: 157–169
MathSciNet Liu JS, Wu YN (1999) Parameter expansion for data augmentation. J Am Stat Assoc 94: 1264–1274
CrossRef Lord FM, Novick MR (1968) Statistical theories of mental test scores. Addison-Wesley, Reading
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, New York
McCulloch R, Rossi PE (1994) An exact likelihood analysis of the multinomial probit model. J Econom 64: 207–240
CrossRef MathSciNet McFadden D (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P (eds) Frontiers in econometrics. Academic Press, Dublin, pp 105–142
Meng X-L, van Dyk DA (1999) Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86(2): 301–320
CrossRef MathSciNet Mengersen KL, Tweedie RL (1996) Rates of convergence of the Hastings and Metropolis algorithms. Ann Stat 24: 101–121
CrossRef MathSciNet Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21: 1087–1092
CrossRef Muraki E (1992) A generalized partial credit model: application of an EM algorithm. Appl Psychol Meas 16: 159–176
CrossRef Neal RM (2003) Slice sampling. Ann Stat 31(3): 705–767
CrossRef MathSciNet Patz R, Junker BW (1999) A straightforward approach to Markov Chain Monte Carlo methods for item response models. J Educ Behav Stat 24: 146–178
Rosenthal JS (1995) Minorization conditions and convergence rates for Markov chain Monte Carlo (corr: 95v90 p 1136). J Am Stat Assoc 90: 558–566
CrossRef Rousseeuw PJ, Molenberghs G (1994) The shape of correlation matrices. Am Stat 48: 276–279
CrossRef MathSciNet Rubin DB (1984) Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Stat 12: 1151–1172
CrossRef Scott SL, Ip EH (2002) Empirical Bayes and item clustering effects in a latent variable hierarchical model: a case study from the National Assessment of Educational Progress. J Am Stat Assoc 97(458): 409–419
CrossRef MathSciNet Spiegelhalter DJ, Thomas A, Best NG, Gilks WR (1996) BUGS: Bayesian inference using Gibbs sampling, Version 0.5 (version ii).
http://www.mrc-bsu.cam.ac.uk/bugs
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation (c/r: P 541–550). J Am Stat Assoc 82: 528–540
CrossRef MathSciNet Tierney L (1994) Markov chains for exploring posterior distributions (disc: P1728-1762). Ann Stat 22: 1701–1728
CrossRef MathSciNet Train KE (2003) Discrete choice methods with simulation. Cambridge University Press, New York. Available at
http://elsa.berkeley.edu/books/choice2.html
Tüchler R (2008) Bayesian variable selection for logistic models using auxiliary mixture sampling. J Comput Graph Stat 17(1): 76–94
CrossRef van Dyk DA, Meng X-L (2001) The art of data augmentation (disc: P 51–111). J Comput Graph Stat 10: 1–50
CrossRef Wainer H, Bradlow ET, Du Z (2000) Testlet response theory: an analog for the 3PL model useful in testlet-based adaptive testing. In: Computerized adaptive testing: theory and practice. Kluwer Academic Publishers, Dordrecht, pp 245–269
Zellner A (1997) The Bayesian method of moments (BMOM): theory and applications. Adv Econom 12: 85–105
CrossRef