Abstract
A new Bayesian multinomial probit model is proposed for the analysis of panel choice data. Using a parameter expansion technique, we are able to devise a Markov Chain Monte Carlo algorithm to compute our Bayesian estimates efficiently. We also show that the proposed procedure enables the estimation of individual level coefficients for the single-period multinomial probit model even when the available prior information is vague. We apply our new procedure to consumer purchase data and reanalyze a well-known scanner panel dataset that reveals new substantive insights. In addition, we delineate a number of advantageous features of our proposed procedure over several benchmark models. Finally, through a simulation analysis employing a fractional factorial design, we demonstrate that the results from our proposed model are quite robust with respect to differing factors across various conditions.
Similar content being viewed by others
References
Addelman, S. (1962). Orthogonal main-effect plans for asymmetrical factorial experiments. Technometrics, 4, 21–46.
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
Barnard, J., McCulloch, R., & Meng, X. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with applications to shrinkage. Statistica Sinica, 10, 1281–1311.
Burgette, L. F., & Nordheim, E. V. (2012). The trace restriction: An alternative identification strategy for the Bayesian multinomial probit model. Journal of Business and Economic Statistics, 30(3), 404–410.
Chib, S., & Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika, 85(2), 347–361.
Chib, S., Greenberg, E., Chen, Y. (1998). MCMC methods for fitting and comparing multinomial response models. Working Paper, Olin School of Business, Washington University.
Daganzo, C. (1980). Multinomial probit. New York: Academic Press.
Dawid, A. (1981). Some matrix-variate distribution theory: Notational considerations and a Bayesian application. Biometrika, 68(1), 265–274.
DeSarbo, W. S. (1982). GENNCLUS: New models for general nonhierarchical clustering analysis. Psychometrika, 47(4), 449–475.
DeSarbo, W. S., & Carroll, J. D. (1985). Three-way metric unfolding via alternating weighted least squares. Psychometrika, 50(3), 275–300.
DeSarbo, W. S., & Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5(2), 249–282.
Dotson, J., Lenk, P., Brazell, J., Otter, T., MacEachern, S., Allenby G. M. (2010). A probit model with structured covariances for similarity effects and source of volume calculations. Working paper.
Fiebig, D. G., Keane, M. P., Louviere, J., & Wasi, N. (2010). The generalized multinomial logit model: Accounting for scale and coefficient heterogeneity. Marketing Science, 29, 393–421.
Fong, D. K. H., Ebbes, P., & DeSarbo, W. S. (2012). A heterogeneous Bayesian regression model for cross sectional data involving a single observation per response unit. Psychometrika, 77(2), 293–314.
Gupta, A. K., & Nagar, D. K. (2000). Matrix variate distributions. Monographs and surveys in pure and applied mathematics (Vol. 104). London: Chapman & Hall/CRC.
Hausman, J., & Wise, D. (1978). A conditional probit model for qualitative choice: Discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica, 45, 319–339.
Hobert, J. P., & Marchev, D. (2008). A theoretical comparison of the data augmentation, marginal augmentation and px-da algorithms. Annals of Statistics, 36(2), 532–554.
Imai, K., & van Dyk, D. A. (2005). A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of Econometrics, 124(2), 311–334.
Jedidi, K., & DeSarbo, W. S. (1991). A stochastic multidimensional scaling procedure for the spatial representation of three-mode, three-way pick any/J data. Psychometrika, 56(3), 471–494.
Liechty, J. C., Liechty, M. W., & Muller, P. (2004). Bayesian correlation estimation. Biometrika, 91(1), 1–14.
Liu, C. (2001). Discussion on the art of data augmentation. Journal of Computational and Graphical Statistics, 10(1), 75–81.
Liu, J. S., & Wu, Y. N. (1999). Parameter expansion for data augmentation. Journal of the American Statistical Association, 94(448), 1264–1274.
Liu, X., & Daniels, M. (2006). A new efficient algorithm for sampling a correlation matrix based on parameter expansion and re-parameterization. Journal of Computational and Graphical Statistics, 15(4), 897–914.
Maydeu-Olivares, A., & Hernández, A. (2007). Identification and small sample estimation of Thurstone’s unrestricted model for paired comparisons data. Multivariate Behavioral Research, 42(2), 323–347.
McCulloch, R., & Rossi, P. E. (1994). An exact likelihood analysis of the multinomial probit model. Journal of Econometrics, 64, 207–240.
McCulloch, R. E., Polson, N. G., & Rossi, P. E. (2000). A Bayesian analysis of the multinomial probit model with fully identified parameters. Journal of Econometrics, 99(1), 173–193.
Nobile, A. (1998). A hybrid Markov chain for the Bayesian analysis of the multinomial probit model. Statistics and Computing, 8, 229–242.
Nobile, A. (2000). Comment: Bayesian multinomial probit models with normalization constraint. Journal of Econometrics, 99(1), 335–345.
Rossi, P. E., Allenby, G. M., & McCulloch, R. E. (2005). Bayesian statistics and marketing. Chichester: Wiley.
Rossi, P. E., McCulloch, R. E., & Allenby, G. M. (1996). The value of purchase history data in target marketing. Marketing Science, 15(4), 321–340.
Rousseeuw, P., & Molenberghs, G. (1994). The shape of correlation matrices. The American Statistician, 48, 276–279.
Thurstone, L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286.
Tsai, R. (2000). Remarks on the identifiability of Thurstonian ranking models: Case V, case III, or neither? Psychometrika, 65(2), 233–240.
Tsai, R. (2003). Remarks on the identifiability of Thurstonian paired comparison models under multiple judgment. Psychometrika, 68(1), 361–372.
Acknowledgments
We wish to thank the Editor and three anonymous reviewers for their constructive comments which helped to improve this manuscript. We also wish to thank Eric Bradlow and Greg Allenby for kindly providing the data for the application. This research was funded in part by the Smeal College of Business.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Derivation of the Full Conditional Distributions for the Proposed Model
1.1 Proof of Equation (6)
Since \(\varvec{\beta }_{h} \) now denotes the reduced \((k-1)\)-dimensional vector of parameters, Equation (2) becomes
where \(\tilde{{\mathbf {X}}}_{ht} \) is the reduced matrix of \({{\mathbf {X}}}_{ht} \) by deleting its first column. Then, the full conditional distribution of \({{\mathbf {y}}}_{ht} \) is
So \(({{\mathbf {y}}}_{ht} |\hbox {all}\,\,\hbox {others})\) follows a truncated normal distribution, \(TN\left( {\tilde{{\mathbf {X}}}_{ht} \varvec{\beta }_{h} ,\varvec{R}} \right) ,\)where the truncation is such that \(\varvec{y}_{ht[I_{ht} ]} >\mathrm{{max}}(\varvec{y}_{ht[-I_{ht} ]} )\). The \(j\)th component of \({{\mathbf {y}}}_{ht} ,\varvec{y}_{ht[j]} \), has a univariate truncated normal distribution conditional on all other components of \({{\mathbf {y}}}_{ht} \) and other parameters. Let \(\varvec{D}_j\) be a matrix that switches the first and the \(j\)th components of \({{\mathbf {y}}}_{ht} \):
Then,
where the truncation is given by
1.2 Proof of Equation (7)
The full conditional distribution of \(\varvec{\beta }_{h} \) is
So, \(\varvec{\beta }_{h} |\hbox {all}\,\,\hbox {others}\sim N(\varvec{\beta }_{h}^0 ,\varvec{V}_\beta ^h )\), where
1.3 Proof of Equation (8)
The full conditional distribution of \(\varvec{\Delta }\) is
here etr refers to an exponential function of the trace of (a matrix).
So, \(\varvec{\Delta }|\hbox {all}\,\,\hbox {others}\sim MN((\varvec{BZ}^{{\prime }}+\varvec{\Delta }_0 \varvec{A}_d )(\varvec{ZZ}^{{\prime }}+\varvec{A}_d )^{-1},\varvec{V}_\beta ,(\varvec{ZZ}^{{\prime }}+\varvec{A}_d )^{-1})\), where
1.4 Proof of Equation (9)
The full conditional distribution of \(\varvec{V}_\beta ^{-1} \) is
which is proportional to
So, \(\varvec{V}_\beta ^{-1} |\hbox {all}\,\,\hbox {others}\sim W(v+H+l,\varvec{V}_n )\), where
Appendix 2: A Parameter Expansion Algorithm for Sampling the Correlation Matrix
1.1 Stage I: Parameter Expanded Reparameterization
We define the following one-to-one mapping from \(\left\{ {{{\mathbf {y}}}_{ht} ,\varvec{R}} \right\} \) to \(\left\{ {{{\mathbf {y}}}_{ht}^*,\varvec{\varSigma } } \right\} \) to facilitate making random draws of the correlation matrix:
where \(\varvec{\Sigma }\) is a positive definite matrix, \(\mathop \sum \nolimits _{h=1}^H \,\,\mathop \sum \nolimits _{t=1}^{T_h } (\varvec{y}_{ht[j]}^{*} )^{2}=1\hbox {for}\,\,\hbox {any}\,\,j=1,...,m\), and \(\varvec{D}\) is a diagonal matrix. Note that the constraints, \(\mathop \sum \nolimits _{h=1}^H\,\,\mathop \sum \nolimits _{t=1}^{T_h } (\varvec{y}_{ht[j]}^{*} )^{2}=1\hbox {for}\,\,\hbox {any}\,\,j=1,...,m\), are needed to make the transformation in (20) a one-to-one mapping. Given \(\varvec{\beta }_{h} ,\) the step that draws \({{\mathbf {y}}}_{ht} \) implicitly draws \({{\mathbf {y}}}_{ht}^{*} \) and \(\varvec{D}\) because \(\varvec{D}_{jj}^{1/2} =[\mathop \sum \nolimits _{h=1}^H \mathop \sum \nolimits _{t=1}^{T_h }(\varvec{y}_{ht[j]} -\tilde{{\mathbf {X}}}_{ht[j]} \varvec{\beta }_{h} )^{2}]^{-1/2}\), where \(\varvec{D}_{jj} \) is the jth element of \(\varvec{D}\) and \(\tilde{{\mathbf {X}}}_{ht[j]} \) is the jth row of \(\tilde{{\mathbf {X}}}_{ht} \). Thus, one can derive the joint conditional distribution of (\({{\mathbf {y}}}_{ht}^*,\varvec{\varSigma } |\hbox {all}\,\,\hbox {others})\) from that of (\({{\mathbf {y}}}_{ht} ,\varvec{R}|\hbox {all}\,\,\hbox {others})\). When the prior distribution of \(\varvec{R}\) is \(\pi (\varvec{R})\propto |\varvec{R}|^{ -(m+1)/2}\), the full conditional distribution of \(\varvec{\Sigma }\) is
where \(\varvec{S}=\mathop \sum \nolimits _{h=1}^H \mathop \sum \nolimits _{t=1}^{T_h } {{\mathbf {y}}}_{ht}^*{{\mathbf {y}}}_{ht} ^{*{\prime }}\). The expression in (21) is proportional to a Wishart distribution and so \(\varvec{\Sigma }^{-1}|\hbox {all}\,\,\hbox {others}\sim \,\,W(\mathop \sum \nolimits _h \varvec{T}_h ,\,\,\varvec{S})\).
1.2 Stage II: Parameter Expanded Metropolis–Hastings
After we have obtained a random deviate of \(\varvec{\Sigma }\) from (21) Stage I, we can transform it into a correlation matrix using \(\varvec{R}^{{*}}=\varvec{D}^{-1/2}\varvec{\Sigma }\varvec{D}^{-1/2}\). Since \(\varvec{R}^{{*}}\) is obtained based on the candidate prior \(\pi (\varvec{R}^{*})\propto |\varvec{R}^{*}|^{-(m+1)/2}\), it is accepted in a Metropolis–Hasting step with probability \(\upalpha \), where, at iteration n+1, \(\alpha =min\{1,\mathrm{{exp}}[\frac{m+1}{2}(log|\varvec{R}^{{*}}|-log|\varvec{R}^{(n)}|)]\}\).
Appendix 3
1.1 Proof of Theorem 1
Since the joint posterior distribution is
it is a proper probability distribution if each of the posterior distributions on the right-hand side of the above equation is proper. To establish the result, we make use of the likelihood function of our model when \(T_h =1\):
where \(\varvec{X}=[\tilde{{\mathbf {X}}}_1 ,\ldots ,\tilde{{\mathbf {X}}}_H ]\) and \(\varvec{Y}=[\varvec{y}_1 ^{\prime },\ldots ,\varvec{y}_H ^{{\prime }}]^{{\prime }}\).
-
For \(h=1,\ldots ,H\),
So, \(\varvec{\beta }_{h} |\varvec{I,X,Z,Y,\Delta ,R},\varvec{V}_\beta \sim N(\varvec{b}_h^0 ,\varvec{V}_\beta ^{\hbox {new}} )\), where
Thus, \(\pi (\varvec{\beta }_{h} |\varvec{I,X,Z,Y,\Delta ,R,}\varvec{V}_\beta )\) is proper.
-
Let \(\varvec{\eta }\) be the vectorization of \(\varvec{\Delta }\), and \(\varvec{\Upsilon }_h = \tilde{{\mathbf {X}}}_h (\varvec{E}_{k-1} \otimes \varvec{Z}_h )^{{\prime }}\), where \(\varvec{E}_{k-1} \) is the identity matrix with dimension \((k-1)\times (k-1)\) and \(\otimes \) denotes the Kronecker product. \(\pi (\varvec{\Delta }|\varvec{I,X,Z,Y,R,}\varvec{V}_\beta )\) is equivalent to \(\pi (\varvec{\eta } |\varvec{I,X,Z,Y,R,}\varvec{V}_\beta )\), where
Typically, one lets \(\varvec{A}_d \) in the prior distribution to approach a zero matrix to represent vague information. Then, in the limit, the above expression goes to
Thus, \(\varvec{\eta }|\varvec{I,X,Z,Y,R,}\varvec{V}_\beta \sim N(\varvec{d}_0 ,\varvec{V}_\eta )\), where
So, the posterior distribution of \(\varvec{\eta } |\varvec{I,X,Z,Y,R,}\varvec{V}_\beta \) is proper.
-
Let
Since
So, \(\varvec{Y}|\varvec{I,X,Z,R,}\varvec{V}_\beta \sim \,\,TN(0,(\varvec{\Psi }-\varvec{\Psi \Upsilon }(\varvec{\Upsilon }^{\prime }\varvec{\Psi \Upsilon })^{-1}\varvec{\Upsilon }^{\prime }\varvec{\Psi })^{-1})\) with truncation restriction on \(I\{\varvec{y}_{h[I_h ]} >\varvec{y}_{h[-I_h ]} \}\) for any \(h=1,\ldots ,H.\) Thus, the posterior distribution of \(\varvec{Y}|\varvec{I,X,Z,R,}\varvec{V}_\beta \) is proper.
-
Since the space of \(m\)-dimensional correlation matrix is convex and compact (Rousseeuw & Molenberghs, 1994), the prior distribution we specify on \(\varvec{R}\) is a proper distribution. Thus, the conditional posterior distribution \(\pi (\varvec{R}|\varvec{I,X,Z,}\varvec{V}_\beta )\) is also proper.
-
Similar to the argument given above, because the prior distribution on \(\varvec{V}_\beta ^{-1} \) is Wishart which is a proper distribution, the conditional distribution of \(\varvec{V}_\beta |\varvec{I,X,Z}\) is proper.
Appendix 4
1.1 Proof of Theorem 2
When \(T_h =1,\) the joint posterior distribution is
We will show that the conditional distribution \(\pi (\varvec{\Lambda }|\varvec{I,X,Z,Y,}\varvec{V}_\beta )\) is improper. Recall that \(\varvec{\Lambda }=\mathrm{{diag}}(1,\sigma _2^2 ,\ldots ,\sigma _m^2 )\), and we let
Since the conditional distribution is proportional to the prior of \(\varvec{\Lambda }\) multiplied by the corresponding likelihood function, we have (see the proof of Theorem 1)
where \(\varvec{C}_1 \) is the normalizing constant. Again, we set the variance of the Gamma distribution to be large to represent vague information. So, in the limit as we let \(\nu \rightarrow 0\) and \(v_j \rightarrow \infty \), the above posterior distribution becomes
Let
Then,
Let \(F\) be the region defined by \(\{\sigma _j^{-2} >w,\forall j=2,\ldots ,m\}\). It is obvious that \(g(\sigma ^{-2})>0\) over \(F\) with the only possible exception when \(\sigma _j^{-2} \) approaches infinity. However, we show that \(g(\sigma ^{-2})\ne 0\) in the limiting case. First, when \(\sigma _j^{-2} \rightarrow \infty \,\,\left( {j=2,\ldots ,m} \right) , \quad (\tilde{{\mathbf {X}}}_j \varvec{V}_\beta \tilde{{\mathbf {X}}}_j ^{{\prime }}+\varvec{\Lambda })^{-1}\rightarrow (\tilde{{\mathbf {X}}}_j \varvec{V}_\beta \tilde{{\mathbf {X}}}_j ^{{\prime }}+\ddot{\varvec{E}})^{-1}\), where \(\ddot{E}\) is a \(m\times m\) zero matrix except that its \((1,1)\) component is \(1\). Then,
Therefore,
Hence, min\(_{\sigma ^{-2}\in F} g(\sigma ^{-2})=m_0 \), where \(0<m_0 <\infty \). Thus
Because \(\int \pi (\varvec{\Lambda }|\varvec{I,X,Z,Y,}\varvec{V}_\beta )d\varvec{\Lambda }\) does not exist, \(\pi (\varvec{\Lambda }|\varvec{I,X,Z,Y,}\varvec{V}_\beta )\) is not a proper distribution, and so the joint posterior distribution is improper.
Rights and permissions
About this article
Cite this article
Fong, D.K.H., Kim, S., Chen, Z. et al. A Bayesian Multinomial Probit MODEL FOR THE ANALYSIS OF PANEL CHOICE DATA. Psychometrika 81, 161–183 (2016). https://doi.org/10.1007/s11336-014-9437-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-014-9437-6