Skip to main content

Advertisement

Log in

A Bayesian Multinomial Probit MODEL FOR THE ANALYSIS OF PANEL CHOICE DATA

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

A new Bayesian multinomial probit model is proposed for the analysis of panel choice data. Using a parameter expansion technique, we are able to devise a Markov Chain Monte Carlo algorithm to compute our Bayesian estimates efficiently. We also show that the proposed procedure enables the estimation of individual level coefficients for the single-period multinomial probit model even when the available prior information is vague. We apply our new procedure to consumer purchase data and reanalyze a well-known scanner panel dataset that reveals new substantive insights. In addition, we delineate a number of advantageous features of our proposed procedure over several benchmark models. Finally, through a simulation analysis employing a fractional factorial design, we demonstrate that the results from our proposed model are quite robust with respect to differing factors across various conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Addelman, S. (1962). Orthogonal main-effect plans for asymmetrical factorial experiments. Technometrics, 4, 21–46.

  • Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.

    Article  Google Scholar 

  • Barnard, J., McCulloch, R., & Meng, X. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with applications to shrinkage. Statistica Sinica, 10, 1281–1311.

    Google Scholar 

  • Burgette, L. F., & Nordheim, E. V. (2012). The trace restriction: An alternative identification strategy for the Bayesian multinomial probit model. Journal of Business and Economic Statistics, 30(3), 404–410.

    Article  Google Scholar 

  • Chib, S., & Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika, 85(2), 347–361.

    Article  Google Scholar 

  • Chib, S., Greenberg, E., Chen, Y. (1998). MCMC methods for fitting and comparing multinomial response models. Working Paper, Olin School of Business, Washington University.

  • Daganzo, C. (1980). Multinomial probit. New York: Academic Press.

    Google Scholar 

  • Dawid, A. (1981). Some matrix-variate distribution theory: Notational considerations and a Bayesian application. Biometrika, 68(1), 265–274.

    Article  Google Scholar 

  • DeSarbo, W. S. (1982). GENNCLUS: New models for general nonhierarchical clustering analysis. Psychometrika, 47(4), 449–475.

    Article  Google Scholar 

  • DeSarbo, W. S., & Carroll, J. D. (1985). Three-way metric unfolding via alternating weighted least squares. Psychometrika, 50(3), 275–300.

    Article  Google Scholar 

  • DeSarbo, W. S., & Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5(2), 249–282.

    Article  Google Scholar 

  • Dotson, J., Lenk, P., Brazell, J., Otter, T., MacEachern, S., Allenby G. M. (2010). A probit model with structured covariances for similarity effects and source of volume calculations. Working paper.

  • Fiebig, D. G., Keane, M. P., Louviere, J., & Wasi, N. (2010). The generalized multinomial logit model: Accounting for scale and coefficient heterogeneity. Marketing Science, 29, 393–421.

    Article  Google Scholar 

  • Fong, D. K. H., Ebbes, P., & DeSarbo, W. S. (2012). A heterogeneous Bayesian regression model for cross sectional data involving a single observation per response unit. Psychometrika, 77(2), 293–314.

    Article  Google Scholar 

  • Gupta, A. K., & Nagar, D. K. (2000). Matrix variate distributions. Monographs and surveys in pure and applied mathematics (Vol. 104). London: Chapman & Hall/CRC.

    Google Scholar 

  • Hausman, J., & Wise, D. (1978). A conditional probit model for qualitative choice: Discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica, 45, 319–339.

    Google Scholar 

  • Hobert, J. P., & Marchev, D. (2008). A theoretical comparison of the data augmentation, marginal augmentation and px-da algorithms. Annals of Statistics, 36(2), 532–554.

    Article  Google Scholar 

  • Imai, K., & van Dyk, D. A. (2005). A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of Econometrics, 124(2), 311–334.

    Article  Google Scholar 

  • Jedidi, K., & DeSarbo, W. S. (1991). A stochastic multidimensional scaling procedure for the spatial representation of three-mode, three-way pick any/J data. Psychometrika, 56(3), 471–494.

    Article  Google Scholar 

  • Liechty, J. C., Liechty, M. W., & Muller, P. (2004). Bayesian correlation estimation. Biometrika, 91(1), 1–14.

    Article  Google Scholar 

  • Liu, C. (2001). Discussion on the art of data augmentation. Journal of Computational and Graphical Statistics, 10(1), 75–81.

    Article  Google Scholar 

  • Liu, J. S., & Wu, Y. N. (1999). Parameter expansion for data augmentation. Journal of the American Statistical Association, 94(448), 1264–1274.

    Article  Google Scholar 

  • Liu, X., & Daniels, M. (2006). A new efficient algorithm for sampling a correlation matrix based on parameter expansion and re-parameterization. Journal of Computational and Graphical Statistics, 15(4), 897–914.

    Article  Google Scholar 

  • Maydeu-Olivares, A., & Hernández, A. (2007). Identification and small sample estimation of Thurstone’s unrestricted model for paired comparisons data. Multivariate Behavioral Research, 42(2), 323–347.

    Article  PubMed  Google Scholar 

  • McCulloch, R., & Rossi, P. E. (1994). An exact likelihood analysis of the multinomial probit model. Journal of Econometrics, 64, 207–240.

    Article  Google Scholar 

  • McCulloch, R. E., Polson, N. G., & Rossi, P. E. (2000). A Bayesian analysis of the multinomial probit model with fully identified parameters. Journal of Econometrics, 99(1), 173–193.

    Article  Google Scholar 

  • Nobile, A. (1998). A hybrid Markov chain for the Bayesian analysis of the multinomial probit model. Statistics and Computing, 8, 229–242.

    Article  Google Scholar 

  • Nobile, A. (2000). Comment: Bayesian multinomial probit models with normalization constraint. Journal of Econometrics, 99(1), 335–345.

    Article  Google Scholar 

  • Rossi, P. E., Allenby, G. M., & McCulloch, R. E. (2005). Bayesian statistics and marketing. Chichester: Wiley.

    Book  Google Scholar 

  • Rossi, P. E., McCulloch, R. E., & Allenby, G. M. (1996). The value of purchase history data in target marketing. Marketing Science, 15(4), 321–340.

    Article  Google Scholar 

  • Rousseeuw, P., & Molenberghs, G. (1994). The shape of correlation matrices. The American Statistician, 48, 276–279.

    Google Scholar 

  • Thurstone, L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286.

    Article  Google Scholar 

  • Tsai, R. (2000). Remarks on the identifiability of Thurstonian ranking models: Case V, case III, or neither? Psychometrika, 65(2), 233–240.

    Article  Google Scholar 

  • Tsai, R. (2003). Remarks on the identifiability of Thurstonian paired comparison models under multiple judgment. Psychometrika, 68(1), 361–372.

    Article  Google Scholar 

Download references

Acknowledgments

We wish to thank the Editor and three anonymous reviewers for their constructive comments which helped to improve this manuscript. We also wish to thank Eric Bradlow and Greg Allenby for kindly providing the data for the application. This research was funded in part by the Smeal College of Business.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wayne S. DeSarbo.

Appendices

Appendix 1: Derivation of the Full Conditional Distributions for the Proposed Model

1.1 Proof of Equation (6)

Since \(\varvec{\beta }_{h} \) now denotes the reduced \((k-1)\)-dimensional vector of parameters, Equation (2) becomes

$$\begin{aligned} {{\mathbf {y}}}_{ht} =\tilde{{\mathbf {X}}}_{ht} \varvec{\beta }_{h} +\varvec{\varepsilon }_{ht} , \end{aligned}$$

where \(\tilde{{\mathbf {X}}}_{ht} \) is the reduced matrix of \({{\mathbf {X}}}_{ht} \) by deleting its first column. Then, the full conditional distribution of \({{\mathbf {y}}}_{ht} \) is

$$\begin{aligned} \pi ({{\mathbf {y}}}_{ht} |\hbox {all}\,\,\hbox {others})\propto \mathrm{{exp}}\left\{ -\frac{1}{2}({{\mathbf {y}}}_{ht} -\tilde{{\mathbf {X}}}_{ht} \varvec{\beta }_{h} )^{{\prime }}\varvec{R}^{-1}({{\mathbf {y}}}_{ht} -\tilde{{\mathbf {X}}}_{ht} \varvec{\beta }_{h} )\right\} I(\varvec{y}_{ht[I_{ht} ]} >\mathrm{{max}}(\varvec{y}_{ht[-I_{ht} ]} )). \end{aligned}$$

So \(({{\mathbf {y}}}_{ht} |\hbox {all}\,\,\hbox {others})\) follows a truncated normal distribution, \(TN\left( {\tilde{{\mathbf {X}}}_{ht} \varvec{\beta }_{h} ,\varvec{R}} \right) ,\)where the truncation is such that \(\varvec{y}_{ht[I_{ht} ]} >\mathrm{{max}}(\varvec{y}_{ht[-I_{ht} ]} )\). The \(j\)th component of \({{\mathbf {y}}}_{ht} ,\varvec{y}_{ht[j]} \), has a univariate truncated normal distribution conditional on all other components of \({{\mathbf {y}}}_{ht} \) and other parameters. Let \(\varvec{D}_j\) be a matrix that switches the first and the \(j\)th components of \({{\mathbf {y}}}_{ht} \):

$$\begin{aligned} \varvec{D}_j \tilde{{\mathbf {X}}}_{ht} \varvec{\beta }_{h} \mathop {=}\limits ^{\Delta } \left( {{\begin{array}{l} {\mu _j } \\ {\mu _{-j} } \\ \end{array} }} \right) , \hbox {and}\,\,\varvec{D}_j \varvec{RD}_{j^{{\prime }}} \mathop {=}\limits ^{\Delta } \left( {{\begin{array}{ll} {\hbox {r}_{\mathrm{jj}} }&{} {r_{j12} } \\ {r_{j21} }&{} {r_{j22} } \\ \end{array} }} \right) . \end{aligned}$$

Then,

$$\begin{aligned} \varvec{y}_{ht[j]} |\hbox {all}\,\,\hbox {others}\sim TN(\mu _j +r_{j12} r_{j22}^{-1} (\varvec{y}_{ht[-j]} -\varvec{\mu }_{-j} ),r_{jj} -r_{j12} r_{j22}^{-1} r_{j21} ), \end{aligned}$$

where the truncation is given by

$$\begin{aligned} \left\{ {{\begin{array}{l} {\varvec{y}_{htj} \in \left( {\hbox {max}\left( {\varvec{y}_{ht\left[ {-j} \right] } } \right) ,+\infty } \right) ,\quad \hbox {if}\quad j =I_{ht} ;} \\ {\varvec{y}_{htj} \in \left( {-\infty ,y_{ht\left[ {I_{ht} } \right] } } \right) ,\quad \hbox {if}\quad j\ne I_{ht} .} \\ \end{array} }} \right. \end{aligned}$$

1.2 Proof of Equation (7)

The full conditional distribution of \(\varvec{\beta }_{h} \) is

$$\begin{aligned}&\!\! {\pi (\varvec{\beta }_{h} |\hbox {all}\,\,\hbox {others})} \\&\!\!\propto {\mathrm{{exp}}\left\{ -\frac{1}{2} \mathop \sum \limits _{t=1}^{T_h } \left[ ({{\mathbf {y}}}_{ht} -\tilde{{\mathbf {X}}}_{ht} \varvec{\beta }_{h} )^{{\prime }}\varvec{R}^{-1}({{\mathbf {y}}}_{ht} -\tilde{{\mathbf {X}}}_{ht} \varvec{\beta }_{h} )\right] -\frac{1}{2}(\varvec{\beta }_{h} -\varvec{\Delta }\varvec{Z}_h )^{{\prime }}\varvec{V}_\beta ^{-1} (\varvec{\beta }_{h} -\varvec{\Delta }\varvec{Z}_h )\right\} }\\&\!\!\propto {\mathrm{{exp}}\left\{ -\frac{1}{2}\left[ \varvec{\beta }_{h}^{\prime } \,\, \left( \mathop \sum \limits _t (\tilde{{\mathbf {X}}}_{ht} ^{{\prime }}\varvec{R}^{-1}\tilde{{\mathbf {X}}}_{ht} )+\varvec{V}_\beta ^{-1} \right) \varvec{\beta }_{h} -2\,\,\varvec{\beta }_{h}^{\prime } \left( \mathop \sum \limits _t (\tilde{{\mathbf {X}}}_{ht} ^{{\prime }}\varvec{R}^{-1}{{\mathbf {y}}}_{ht} )+\varvec{V}_\beta ^{-1} \varvec{\Delta }\varvec{Z}_h \right) \right] \right\} .} \end{aligned}$$

So, \(\varvec{\beta }_{h} |\hbox {all}\,\,\hbox {others}\sim N(\varvec{\beta }_{h}^0 ,\varvec{V}_\beta ^h )\), where

$$\begin{aligned} \varvec{V}_\beta ^h =\left( \mathop \sum \nolimits _t(\tilde{{\mathbf {X}}} _{ht} ^{{\prime }}\varvec{R}^{-1}\tilde{{\mathbf {X}}}_{ht} )+\varvec{V}_\beta ^{-1} \right) ^{-1}\,\,\hbox {and}\,\,\varvec{\beta }_{h}^0 =\varvec{V}_\beta ^h \left( \mathop \sum \nolimits _t (\tilde{{\mathbf {X}}}_{ht} ^{{\prime }}\varvec{R}^{-1}{{\mathbf {y}}}_{ht} )+\varvec{V}_\beta ^{-1} \varvec{\Delta }\varvec{Z}_h \right) . \end{aligned}$$

1.3 Proof of Equation (8)

The full conditional distribution of \(\varvec{\Delta }\) is

$$\begin{aligned}&\pi (\varvec{\Delta }|\hbox {all}\,\,\hbox {others})\propto {\mathrm{{etr}}\left\{ -\frac{1}{2}\varvec{V}_\beta ^{-1} [(\varvec{B}-\varvec{\Delta }\varvec{Z})(\varvec{B}- \varvec{\Delta }\varvec{Z})^{{\prime }}+(\varvec{\Delta }-\varvec{\Delta }_0 )\varvec{A}_d (\varvec{\Delta }-\varvec{\Delta }_0 )^{{\prime }}]\right\} }\\&{\propto } \,\, {\mathrm{{etr}}\{\varvec{V}_\beta ^{-1} [\varvec{\Delta }(\varvec{ZZ}^{{\prime }}+\varvec{A}_d )\varvec{\Delta }^{{\prime }}-2\varvec{\Delta }(\varvec{ZB}^{{\prime }}+\varvec{A}_d \varvec{\Delta }_0^{\prime } )]\},\,\,} \end{aligned}$$

here etr refers to an exponential function of the trace of (a matrix).

So, \(\varvec{\Delta }|\hbox {all}\,\,\hbox {others}\sim MN((\varvec{BZ}^{{\prime }}+\varvec{\Delta }_0 \varvec{A}_d )(\varvec{ZZ}^{{\prime }}+\varvec{A}_d )^{-1},\varvec{V}_\beta ,(\varvec{ZZ}^{{\prime }}+\varvec{A}_d )^{-1})\), where

$$\begin{aligned} \varvec{B}=[\varvec{\beta } _1 ,\ldots ,\varvec{\beta } _H ] \quad \hbox {and}\quad \varvec{Z}=[\varvec{Z}_1 ,\ldots ,\varvec{Z}_H ]. \end{aligned}$$

1.4 Proof of Equation (9)

The full conditional distribution of \(\varvec{V}_\beta ^{-1} \) is

$$\begin{aligned}&{\pi (\varvec{V}_\beta ^{-1} |\hbox {all}\,\,\hbox {others})\propto |\varvec{V}_\beta ^{-1} |^{H/2}\mathrm{{etr}}\left\{ -\frac{1}{2}\varvec{V}_\beta ^{-1} (\varvec{B}-\varvec{\Delta }\varvec{Z}) (\varvec{B}-\varvec{\Delta }\varvec{Z})^{{\prime }}\right\} } \\&{\times |\varvec{V}_\beta ^{-1} |^{l/2}\mathrm{{etr}}\left\{ -\frac{1}{2}\varvec{V}_\beta ^{-1} (\varvec{\Delta }-\varvec{\Delta }_0 )\varvec{A}_d (\varvec{\Delta }-\varvec{\Delta }_0 )^{{\prime }}\right\} |\varvec{V}_\beta ^{-1} |^{\frac{v-k}{2}}\mathrm{{etr}}\left\{ -\frac{1}{2}\varvec{V}_\beta ^{-1} \varvec{V}^{-1}\right\} } \end{aligned}$$

which is proportional to

$$\begin{aligned} |\varvec{V}_\beta ^{-1} |^{\frac{v+H+l-k}{2}}\mathrm{{etr}}\left\{ -\frac{1}{2}\varvec{V}_\beta ^{-1} [(\varvec{B}-\varvec{\Delta }\varvec{Z})(\varvec{B}- \varvec{\Delta }\varvec{Z})^{{\prime }}+(\varvec{\Delta }- \varvec{\Delta }_0 )\varvec{A}_d (\varvec{\Delta }-\varvec{\Delta }_0 )^{{\prime }}+\varvec{V}^{-1}]\right\} . \end{aligned}$$

So, \(\varvec{V}_\beta ^{-1} |\hbox {all}\,\,\hbox {others}\sim W(v+H+l,\varvec{V}_n )\), where

$$\begin{aligned} \varvec{V}_n =[(\varvec{B}-\varvec{\Delta }\varvec{Z})(\varvec{B}- \varvec{\Delta }\varvec{Z})^{{\prime }}+(\varvec{\Delta }-\varvec{\Delta }_0 )\varvec{A}_d (\varvec{\Delta }-\varvec{\Delta }_0 )^{{\prime }}+\varvec{V}^{-1}]^{-1}. \end{aligned}$$

Appendix 2: A Parameter Expansion Algorithm for Sampling the Correlation Matrix

1.1 Stage I: Parameter Expanded Reparameterization

We define the following one-to-one mapping from \(\left\{ {{{\mathbf {y}}}_{ht} ,\varvec{R}} \right\} \) to \(\left\{ {{{\mathbf {y}}}_{ht}^*,\varvec{\varSigma } } \right\} \) to facilitate making random draws of the correlation matrix:

$$\begin{aligned} \left\{ {{\begin{array}{l} {{{\mathbf {y}}}_{ht} =\tilde{{\mathbf {X}}} _{ht} \varvec{\beta }_{h} +\varvec{D}^{-1/2}{{\mathbf {y}}}_{ht}^*} \\ {\varvec{R}=\varvec{D}^{-1/2}\varvec{\varSigma } \varvec{D}^{-1/2}} \\ \end{array} }} \right. \quad h=1,\ldots ,\hbox {H}\quad \hbox {and} \quad t=1,\ldots ,T_h \end{aligned}$$
(20)

where \(\varvec{\Sigma }\) is a positive definite matrix, \(\mathop \sum \nolimits _{h=1}^H \,\,\mathop \sum \nolimits _{t=1}^{T_h } (\varvec{y}_{ht[j]}^{*} )^{2}=1\hbox {for}\,\,\hbox {any}\,\,j=1,...,m\), and \(\varvec{D}\) is a diagonal matrix. Note that the constraints, \(\mathop \sum \nolimits _{h=1}^H\,\,\mathop \sum \nolimits _{t=1}^{T_h } (\varvec{y}_{ht[j]}^{*} )^{2}=1\hbox {for}\,\,\hbox {any}\,\,j=1,...,m\), are needed to make the transformation in (20) a one-to-one mapping. Given \(\varvec{\beta }_{h} ,\) the step that draws \({{\mathbf {y}}}_{ht} \) implicitly draws \({{\mathbf {y}}}_{ht}^{*} \) and \(\varvec{D}\) because \(\varvec{D}_{jj}^{1/2} =[\mathop \sum \nolimits _{h=1}^H \mathop \sum \nolimits _{t=1}^{T_h }(\varvec{y}_{ht[j]} -\tilde{{\mathbf {X}}}_{ht[j]} \varvec{\beta }_{h} )^{2}]^{-1/2}\), where \(\varvec{D}_{jj} \) is the jth element of \(\varvec{D}\) and \(\tilde{{\mathbf {X}}}_{ht[j]} \) is the jth row of \(\tilde{{\mathbf {X}}}_{ht} \). Thus, one can derive the joint conditional distribution of (\({{\mathbf {y}}}_{ht}^*,\varvec{\varSigma } |\hbox {all}\,\,\hbox {others})\) from that of (\({{\mathbf {y}}}_{ht} ,\varvec{R}|\hbox {all}\,\,\hbox {others})\). When the prior distribution of \(\varvec{R}\) is \(\pi (\varvec{R})\propto |\varvec{R}|^{ -(m+1)/2}\), the full conditional distribution of \(\varvec{\Sigma }\) is

$$\begin{aligned} \pi \left( {\varvec{\Sigma }|\hbox {all}\,\,\hbox {others}} \right) \propto \mathop \prod \nolimits _{ht} \pi \,\,\, ({{\mathbf {y}}}_{ht}^*,\varvec{\varSigma } |\hbox {all}\,\,\hbox {others})\propto |\varvec{\Sigma }|^{-(\mathop \sum \nolimits _h T_h +m+1)/2}\mathrm{{exp}}\left\{ -\frac{1}{2}tr(\varvec{S}\varvec{\Sigma }^{-1})\right\} \nonumber \\ \end{aligned}$$
(21)

where \(\varvec{S}=\mathop \sum \nolimits _{h=1}^H \mathop \sum \nolimits _{t=1}^{T_h } {{\mathbf {y}}}_{ht}^*{{\mathbf {y}}}_{ht} ^{*{\prime }}\). The expression in (21) is proportional to a Wishart distribution and so \(\varvec{\Sigma }^{-1}|\hbox {all}\,\,\hbox {others}\sim \,\,W(\mathop \sum \nolimits _h \varvec{T}_h ,\,\,\varvec{S})\).

1.2 Stage II: Parameter Expanded Metropolis–Hastings

After we have obtained a random deviate of \(\varvec{\Sigma }\) from (21) Stage I, we can transform it into a correlation matrix using \(\varvec{R}^{{*}}=\varvec{D}^{-1/2}\varvec{\Sigma }\varvec{D}^{-1/2}\). Since \(\varvec{R}^{{*}}\) is obtained based on the candidate prior \(\pi (\varvec{R}^{*})\propto |\varvec{R}^{*}|^{-(m+1)/2}\), it is accepted in a Metropolis–Hasting step with probability \(\upalpha \), where, at iteration n+1, \(\alpha =min\{1,\mathrm{{exp}}[\frac{m+1}{2}(log|\varvec{R}^{{*}}|-log|\varvec{R}^{(n)}|)]\}\).

Appendix 3

1.1 Proof of Theorem 1

Since the joint posterior distribution is

$$\begin{aligned} \pi (\varvec{Y},\varvec{B},\varvec{\Delta },\varvec{R}, \varvec{V}_{\varvec{\beta }} |\varvec{I},\varvec{X},\varvec{Z})\!&= \! {\left[ \mathop \prod \nolimits _{h=1}^H \pi (\varvec{\beta }_{\varvec{h}} |\varvec{I},\varvec{X},\varvec{Z},\varvec{y}_{\varvec{h}} ,\varvec{\Delta },\varvec{R},\varvec{V}_{\varvec{\beta }} )\right] \pi (\varvec{\Delta }|\varvec{I},\varvec{X},\varvec{Z},\varvec{Y},\varvec{R},\varvec{V}_{\varvec{\beta }} )}\\&\,\,\times \,\, \pi \left( {\varvec{Y}|\varvec{I},\varvec{X},\varvec{Z},\varvec{R},\varvec{V}_{\varvec{\beta }} } \right) \pi \left( {\varvec{R}|\varvec{I},\varvec{X},\varvec{Z},\varvec{V}_{\varvec{\beta }} } \right) \pi \left( {\varvec{V}_{\varvec{\beta }} | \varvec{I},\varvec{X},\varvec{Z}} \right) , \end{aligned}$$

it is a proper probability distribution if each of the posterior distributions on the right-hand side of the above equation is proper. To establish the result, we make use of the likelihood function of our model when \(T_h =1\):

$$\begin{aligned}&L(\varvec{Y,B,\Delta ,R,V}_{\varvec{\beta }} |\varvec{I,X,Z})\propto \mathop \prod \limits _{h=1}^H \,\,[I(\varvec{y}_{h[I_h ]} >\mathrm{{max}}(\varvec{y}_{h[-I_h ]} ))]|\varvec{R}|^{-{\mathrm{H}}/2}|\varvec{V}_\beta |^{-{\mathrm{H}}/2}\\&\times \,\,\mathrm{{exp}}\left\{ -\frac{1}{2}\mathop \sum \limits _{h=1}^H \,\,((\varvec{y}_h -\tilde{{\mathbf {X}}}_h \varvec{\beta }_{h} )^{{\prime }}\varvec{R}^{-1}(\varvec{y}_h -\tilde{{\mathbf {X}}}_h \varvec{\beta }_{h} )+(\varvec{\beta }_{h} -\varvec{\Delta }\varvec{Z}_h )^{{\prime }}\varvec{V}_\beta ^{-1} (\varvec{\beta }_{h} -\varvec{\Delta }\varvec{Z}_h ))\right\} , \end{aligned}$$

where \(\varvec{X}=[\tilde{{\mathbf {X}}}_1 ,\ldots ,\tilde{{\mathbf {X}}}_H ]\) and \(\varvec{Y}=[\varvec{y}_1 ^{\prime },\ldots ,\varvec{y}_H ^{{\prime }}]^{{\prime }}\).

  • For \(h=1,\ldots ,H\),

$$\begin{aligned}&{\pi (\varvec{\beta }_{h} |\varvec{I,X,Z,Y,\Delta ,R},\varvec{V}_\beta )=\pi (\varvec{\beta }_{h} |\varvec{X,Z,Y,\Delta ,R},\varvec{V}_\beta )}\\&\propto {exp\left\{ -\frac{1}{2}[(\varvec{y}_h -\tilde{{\mathbf {X}}}_h \varvec{\beta }_{h} )^{{\prime }}\varvec{R}^{-1}(\varvec{y}_h -\tilde{{\mathbf {X}}}_h \varvec{\beta }_{h} )+(\varvec{\beta }_{h} -\varvec{\Delta }\varvec{Z}_h )^{{\prime }}\varvec{V}_\beta ^{-1} (\varvec{\beta }_{h} -\varvec{\Delta }\varvec{Z}_h )]\right\} .} \end{aligned}$$

So, \(\varvec{\beta }_{h} |\varvec{I,X,Z,Y,\Delta ,R},\varvec{V}_\beta \sim N(\varvec{b}_h^0 ,\varvec{V}_\beta ^{\hbox {new}} )\), where

$$\begin{aligned} \varvec{V}_\beta ^{\hbox {new}} =(\tilde{{\mathbf {X}}}_h^{\prime }\varvec{R}^{-1}\tilde{{\mathbf {X}}}_h +\varvec{V}_\beta ^{-1} )^{-1}\hbox {and} \quad \varvec{b}_h^0 =\varvec{V}_\beta ^{\hbox {new}} (\tilde{{\mathbf {X}}}_h^{\prime }\varvec{R}^{-1}\varvec{y}_h +\varvec{V}_\beta ^{-1} \varvec{\Delta }\varvec{Z}_h ). \end{aligned}$$

Thus, \(\pi (\varvec{\beta }_{h} |\varvec{I,X,Z,Y,\Delta ,R,}\varvec{V}_\beta )\) is proper.

  • Let \(\varvec{\eta }\) be the vectorization of \(\varvec{\Delta }\), and \(\varvec{\Upsilon }_h = \tilde{{\mathbf {X}}}_h (\varvec{E}_{k-1} \otimes \varvec{Z}_h )^{{\prime }}\), where \(\varvec{E}_{k-1} \) is the identity matrix with dimension \((k-1)\times (k-1)\) and \(\otimes \) denotes the Kronecker product. \(\pi (\varvec{\Delta }|\varvec{I,X,Z,Y,R,}\varvec{V}_\beta )\) is equivalent to \(\pi (\varvec{\eta } |\varvec{I,X,Z,Y,R,}\varvec{V}_\beta )\), where

$$\begin{aligned}&\pi (\varvec{\eta } |\varvec{I,X,Z,Y,R},\varvec{V}_\beta )=\pi (\varvec{\eta } |\varvec{X,Z,Y,R,}\varvec{V}_{\varvec{\beta }} )\propto L(\varvec{\eta } |\varvec{X,Z,Y,R,}\varvec{V}_\beta )\pi (\varvec{\eta } |\varvec{V}_{\varvec{\beta }} ) \\&\propto {\mathrm{{exp}}\left\{ -\frac{1}{2}\mathop \sum \nolimits _{h=1}^H \,\,[(\varvec{y}_h -\varvec{\Upsilon }_h \varvec{\eta } )^{{\prime }}(\tilde{{\mathbf {X}}}_h \varvec{V}_{\varvec{\beta }} \tilde{{\mathbf {X}}}_h ^{{\prime }}+\varvec{R})^{-1}(\varvec{y}_h -\varvec{\Upsilon }_h \varvec{\eta } )]\right\} } \\&\,\,{\times \,\, \mathrm{{exp}}\left\{ -\frac{1}{2}(\varvec{\eta } -\overline{\varvec{d}})^{{\prime }}(\varvec{V}_\beta \otimes \varvec{A}_d^{-1} )^{-1}(\varvec{\eta } -\overline{\varvec{d}} )\right\} } \\&\propto \mathrm{{exp}}\left\{ -\frac{1}{2}\left[ \varvec{\eta } ^{{\prime }}((\varvec{V}_\beta ^{-1} \otimes \varvec{A}_d )+\mathop \sum \nolimits _{h=1}^H \,\,[\varvec{\Upsilon }_h ^{{\prime }}(\tilde{{\mathbf {X}}}_h \varvec{V}_\beta \tilde{{\mathbf {X}}}_h ^{{\prime }}+\varvec{R})^{-1}\varvec{\Upsilon }_h ])\varvec{\eta }\right. \right. \\&\left. \left. -2\varvec{\eta } ^{{\prime }}\left( \mathop \sum \nolimits _{h=1}^H \,\,[\varvec{\Upsilon }_h ^{{\prime }}(\tilde{{\mathbf {X}}}_h \varvec{V}_\beta \tilde{{\mathbf {X}}}_h ^{{\prime }}+\varvec{R})^{-1}\varvec{y}_h ]+(\varvec{V}_\beta ^{-1} \otimes \varvec{A}_d )\overline{\varvec{d}} \right) \right] \right\} . \end{aligned}$$

Typically, one lets \(\varvec{A}_d \) in the prior distribution to approach a zero matrix to represent vague information. Then, in the limit, the above expression goes to

$$\begin{aligned} \mathrm{{exp}}\{\frac{-1}{2}[\varvec{\eta } ^{{\prime }}(\mathop \sum \nolimits _{h=1}^H \,\,[\varvec{\Upsilon }_h ^{{\prime }}(\tilde{{\mathbf {X}}}_h \varvec{V}_\beta \tilde{{\mathbf {X}}}_h ^{{\prime }}+\varvec{R})^{-1}\varvec{\Upsilon }_h ])\varvec{\eta } -2\varvec{\eta }^{{\prime }}(\mathop \sum \nolimits _{h=1}^H \,\,[\varvec{\Upsilon }_h ^{{\prime }}(\tilde{{\mathbf {X}}}_h \varvec{V}_\beta \tilde{{\mathbf {X}}}_h ^{{\prime }}+\varvec{R})^{-1}\varvec{y}_h ])]\}. \end{aligned}$$

Thus, \(\varvec{\eta }|\varvec{I,X,Z,Y,R,}\varvec{V}_\beta \sim N(\varvec{d}_0 ,\varvec{V}_\eta )\), where

$$\begin{aligned} \varvec{V}_\eta&= \left[ \mathop \sum \nolimits _{h=1}^H \,\,(\varvec{\Upsilon }_h ^{{\prime }}(\tilde{{\mathbf {X}}}_h \varvec{V}_\beta \tilde{{\mathbf {X}}}_h ^{{\prime }}+\varvec{R})^{-1}\varvec{\Upsilon }_h )\right] ^{-1}\\&\quad \hbox {and}\quad \varvec{d}_0 =\varvec{V}_\eta \mathop \sum \nolimits _{h=1}^H \,\,(\varvec{\Upsilon }_h ^{{\prime }}(\tilde{{\mathbf {X}}}_h \varvec{V}_\beta \tilde{{\mathbf {X}}}_h ^{{\prime }}+\varvec{R})^{-1}\varvec{y}_h ). \end{aligned}$$

So, the posterior distribution of \(\varvec{\eta } |\varvec{I,X,Z,Y,R,}\varvec{V}_\beta \) is proper.

  • Let

$$\begin{aligned} \varvec{\Upsilon }=\left( {{\begin{array}{l} {\varvec{\Upsilon }_1 } \\ \vdots \\ {\varvec{\Upsilon }_H } \\ \end{array} }} \right) \quad \hbox {and}\quad \varvec{\Psi }=\left( {{\begin{array}{lll} {(\tilde{{\mathbf {X}}}_1 \varvec{V}_\beta \tilde{{\mathbf {X}}}_1 ^{{\prime }}+\varvec{R})^{-1}} &{} &{} \\ &{} \ddots &{} \\ &{} &{} {(\tilde{{\mathbf {X}}}_H \varvec{V}_\beta \tilde{{\mathbf {X}}}_H ^{{\prime }}+\varvec{R})^{-1}} \\ \end{array} }} \right) . \end{aligned}$$

Since

$$\begin{aligned}&{\pi (\varvec{Y}|\varvec{I},\varvec{X},\varvec{Z},\varvec{R},\varvec{V}_\beta )}\\&{\propto \mathrm{{exp}}\left\{ -\frac{1}{2}\varvec{Y}^{{\prime }}(\varvec{\Psi }-\varvec{ \Psi \Upsilon }(\varvec{\Upsilon }^{\prime }\varvec{\Psi \Upsilon })^{-1}\varvec{\Upsilon }^{\prime }\varvec{\Psi })\varvec{Y}\right\} \mathop \prod \nolimits _h \,\,[I\{\varvec{y}_{h[I_h ]} >\mathrm{{max}}(\varvec{y}_{h[-I_h ]} )\}].}\\ \end{aligned}$$

So, \(\varvec{Y}|\varvec{I,X,Z,R,}\varvec{V}_\beta \sim \,\,TN(0,(\varvec{\Psi }-\varvec{\Psi \Upsilon }(\varvec{\Upsilon }^{\prime }\varvec{\Psi \Upsilon })^{-1}\varvec{\Upsilon }^{\prime }\varvec{\Psi })^{-1})\) with truncation restriction on \(I\{\varvec{y}_{h[I_h ]} >\varvec{y}_{h[-I_h ]} \}\) for any \(h=1,\ldots ,H.\) Thus, the posterior distribution of \(\varvec{Y}|\varvec{I,X,Z,R,}\varvec{V}_\beta \) is proper.

  • Since the space of \(m\)-dimensional correlation matrix is convex and compact (Rousseeuw & Molenberghs, 1994), the prior distribution we specify on \(\varvec{R}\) is a proper distribution. Thus, the conditional posterior distribution \(\pi (\varvec{R}|\varvec{I,X,Z,}\varvec{V}_\beta )\) is also proper.

  • Similar to the argument given above, because the prior distribution on \(\varvec{V}_\beta ^{-1} \) is Wishart which is a proper distribution, the conditional distribution of \(\varvec{V}_\beta |\varvec{I,X,Z}\) is proper.

Appendix 4

1.1 Proof of Theorem 2

When \(T_h =1,\) the joint posterior distribution is

$$\begin{aligned} \pi (\varvec{Y,B,}\varvec{\Delta },\varvec{\Lambda },\varvec{V}_{\varvec{\beta }} |\varvec{I,X,Z})\!&= \! \left[ \!\mathop \prod \nolimits _{h=1}^H \,\,\pi (\varvec{\beta }_{h} |\varvec{I,X,Z,}\varvec{y}_{\varvec{h}} ,\varvec{\Delta },\varvec{\Lambda },\varvec{V}_{\varvec{\beta }} )\!\right] \pi (\varvec{\Delta }|\varvec{I,X,Z,Y},\varvec{\Lambda },\varvec{V}_{\varvec{\beta }}) \\&\,\, \times \pi \left( {\varvec{\Lambda }\hbox {|}\varvec{I,X,Z,Y,}\varvec{V}_{\varvec{\beta }} } \right) \pi \left( {\varvec{Y}\hbox {|}\varvec{I,X,Z},\varvec{V}_{\varvec{\beta }} } \right) \pi \left( {\varvec{V}_{\varvec{\beta }} \hbox {|}\varvec{I,X,Z}} \right) . \end{aligned}$$

We will show that the conditional distribution \(\pi (\varvec{\Lambda }|\varvec{I,X,Z,Y,}\varvec{V}_\beta )\) is improper. Recall that \(\varvec{\Lambda }=\mathrm{{diag}}(1,\sigma _2^2 ,\ldots ,\sigma _m^2 )\), and we let

$$\begin{aligned} \varvec{\Psi }^{{*}}=\left( {{\begin{array}{lll} {(\tilde{{\mathbf {X}}}_1 \varvec{V}_\beta \tilde{{\mathbf {X}}}_{1{\prime }} +\varvec{\Lambda })^{-1}} &{} &{}\\ &{} \ddots &{} \\ &{} &{} {(\tilde{{\mathbf {X}}}_H \varvec{V}_\beta \tilde{{\mathbf {X}}}_{H^{{\prime }}} +\varvec{\Lambda })^{-1}} \end{array} }} \right) . \end{aligned}$$

Since the conditional distribution is proportional to the prior of \(\varvec{\Lambda }\) multiplied by the corresponding likelihood function, we have (see the proof of Theorem 1)

$$\begin{aligned}&{\pi (\varvec{\Lambda }|\varvec{I,X,Z,Y,}\varvec{V}_\beta )} \\&= \varvec{C}_1 \mathop \prod \limits _{j=2}^m \,\,\left[ (\sigma _j^{-2} )^{\nu -1}\mathrm{{exp}}\left\{ -\frac{\sigma _j^{-2} }{v_j }\right\} \right] \left( \frac{|\varvec{\Psi }^{{*}}|}{|\varvec{\Upsilon }^{{\prime }}\varvec{\Psi }^{{*}}\varvec{\Upsilon }|}\right) ^{\frac{1}{2}}\\&exp\left\{ -\frac{1}{2}\varvec{Y}^{\prime }(\varvec{\Psi }^{*}-\varvec{\Psi }^{*}\varvec{\Upsilon }(\varvec{\Upsilon }^{\prime }\varvec{\Psi }^{*}\varvec{\Upsilon })^{-1}\varvec{\Upsilon }^{\prime }\varvec{\Psi }^{*})\varvec{Y}\right\} , \end{aligned}$$

where \(\varvec{C}_1 \) is the normalizing constant. Again, we set the variance of the Gamma distribution to be large to represent vague information. So, in the limit as we let \(\nu \rightarrow 0\) and \(v_j \rightarrow \infty \), the above posterior distribution becomes

$$\begin{aligned}&{\pi (\varvec{\Lambda }|\varvec{I,X,Z,Y,}\varvec{V}_\beta )} \\&= {\varvec{C}_1 \left[ \mathop \prod \nolimits _{j=2}^m \,\,(\sigma _j^{-2} )^{-1}\right] \left( \frac{|\varvec{\Psi }^{*}|}{|\varvec{\Upsilon }^{\prime }\varvec{\Psi }^{*}\varvec{\Upsilon }|}\right) ^{\frac{1}{2}}\mathrm{{exp}}\left\{ -\frac{1}{2}\varvec{Y}^{\prime }(\varvec{\Psi }^{*}-\varvec{\Psi }^{*}\varvec{\Upsilon }(\varvec{\Upsilon }^{\prime }\varvec{\Psi }^{*}\varvec{\Upsilon })^{-1}\varvec{\Upsilon }^{\prime }\varvec{\Psi }^{*})\varvec{Y}\right\} .} \end{aligned}$$

Let

$$\begin{aligned} {f(\sigma ^{-2})}&= \mathop \prod \nolimits _{j=2}^m \,\,(\sigma _j^{-2} )^{-1} \\ {g(\sigma ^{-2})}&= \left( \frac{|\varvec{\Psi }^{*}|}{|\varvec{\Upsilon }^{\prime }\varvec{\Psi }^{*}\varvec{\Upsilon }|}\right) ^{\frac{1}{2}}\mathrm{{exp}}\left\{ -\frac{1}{2}\varvec{Y}^{\prime }(\varvec{\Psi }^{*}-\varvec{\Psi }^{*}\varvec{\Upsilon }(\varvec{\Upsilon }^{\prime }\varvec{\Psi }^{*}\varvec{\Upsilon })^{-1}\varvec{\Upsilon }^{\prime }\varvec{\Psi }^{*})\varvec{Y}\right\} . \end{aligned}$$

Then,

$$\begin{aligned}&\int \,\,\pi (\varvec{\Lambda }|\varvec{I,X,Z,Y,}\varvec{V}_\beta )\mathrm{{d}}\varvec{\Lambda }\\&= \int \nolimits _0^\infty \,\,\cdots \int \nolimits _0^\infty \,\,\pi (\sigma _2^{-2} ,\ldots ,\sigma _m^{-2} |\varvec{I,X,Z,Y,}\varvec{V}_\beta )\mathrm{{d}}\sigma _2^{-2} \ldots \mathrm{{d}}\sigma _m^{-2} \\&\ge \int \nolimits _w^\infty \,\,\ldots \int \nolimits _w^\infty \,\,\pi (\sigma _2^{-2} ,\ldots ,\sigma _m^{-2} |\varvec{I,X,Z,Y,}\varvec{V}_\beta )\mathrm{{d}}\sigma _2^{-2} \ldots \mathrm{{d}}\sigma _m^{-2} ,\,\,\quad (\forall w>0)\\&= \varvec{C}_1 \int \nolimits _w^\infty \,\,\ldots \int \nolimits _w^\infty \,\,f(\sigma ^{-2})g(\sigma ^{-2})\mathrm{{d}}\sigma _2^{-2} \ldots \mathrm{{d}}\sigma _m^{-2} . \end{aligned}$$

Let \(F\) be the region defined by \(\{\sigma _j^{-2} >w,\forall j=2,\ldots ,m\}\). It is obvious that \(g(\sigma ^{-2})>0\) over \(F\) with the only possible exception when \(\sigma _j^{-2} \) approaches infinity. However, we show that \(g(\sigma ^{-2})\ne 0\) in the limiting case. First, when \(\sigma _j^{-2} \rightarrow \infty \,\,\left( {j=2,\ldots ,m} \right) , \quad (\tilde{{\mathbf {X}}}_j \varvec{V}_\beta \tilde{{\mathbf {X}}}_j ^{{\prime }}+\varvec{\Lambda })^{-1}\rightarrow (\tilde{{\mathbf {X}}}_j \varvec{V}_\beta \tilde{{\mathbf {X}}}_j ^{{\prime }}+\ddot{\varvec{E}})^{-1}\), where \(\ddot{E}\) is a \(m\times m\) zero matrix except that its \((1,1)\) component is \(1\). Then,

$$\begin{aligned} \varvec{\Psi }^{*}\rightarrow \left( \begin{array}{lll} (\tilde{{\mathbf {X}}}_\mathbf{1} \varvec{V}_{\varvec{\beta }} \tilde{{\mathbf {X}}}_\mathbf{1} ^{\prime }+\ddot{\varvec{E}})^{-1} &{} {} \ddots &{} \\ &{} &{} (\tilde{{\mathbf {X}}}_{\varvec{H}} \varvec{V}_{\varvec{\beta }} \tilde{{\mathbf {X}}}_{\varvec{H}} ^{\prime }+\ddot{\varvec{E}})^{-1} \\ {} &{} {} &{} {} \\ \end{array}\right) \mathop {=}\limits ^{\Delta } \varvec{\Psi }_\infty ^{*} . \end{aligned}$$

Therefore,

$$\begin{aligned} g(\sigma ^{-2})\rightarrow \left( \frac{|\varvec{\Psi }_\infty ^{*} |}{|\varvec{\Upsilon }^{\prime }\varvec{\Psi }_\infty ^*\varvec{\Upsilon }|}\right) ^{\frac{1}{2}}\mathrm{{exp}}\left\{ -\frac{1}{2}\varvec{Y}^{\prime }(\varvec{\Psi }_\infty ^*-\varvec{\Psi }_\infty ^*\varvec{\Upsilon }(\varvec{\Upsilon }^{\prime }\varvec{\Psi }_\infty ^*\varvec{\Upsilon })^{-1}\varvec{\Upsilon }^{\prime }\varvec{\Psi }_\infty ^*)\varvec{Y}\right\} >0. \end{aligned}$$

Hence, min\(_{\sigma ^{-2}\in F} g(\sigma ^{-2})=m_0 \), where \(0<m_0 <\infty \). Thus

$$\begin{aligned} \int \pi (\varvec{\Lambda }|\varvec{I,X,Z,Y,}\varvec{V}_\beta )d\varvec{\Lambda }&\ge {\varvec{C}_1 \int \nolimits _w^\infty \,\,\ldots \int \nolimits _w^\infty \,\,f(\sigma ^{-2})g(\sigma ^{-2})\mathrm{{d}}\sigma _2^{-2} \ldots \mathrm{{d}}\sigma _m^{-2} }\\&\ge {m_0 \varvec{C}_1 \int \nolimits _w^\infty \,\,\cdots \int \nolimits _w^\infty \,\,\left[ \mathop \prod \nolimits _{j=2}^m \,\,(\sigma _j^{-2} )^{-1}\right] \mathrm{{d}}\sigma _2^{-2} \ldots \mathrm{{d}}\sigma _m^{-2} }\\&= {m_0 \varvec{C}_1 \mathop \prod \nolimits _{j=2}^m \,\,[\hbox {ln}\sigma _j^{-2} |_w^\infty ]=\infty .} \end{aligned}$$

Because \(\int \pi (\varvec{\Lambda }|\varvec{I,X,Z,Y,}\varvec{V}_\beta )d\varvec{\Lambda }\) does not exist, \(\pi (\varvec{\Lambda }|\varvec{I,X,Z,Y,}\varvec{V}_\beta )\) is not a proper distribution, and so the joint posterior distribution is improper.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fong, D.K.H., Kim, S., Chen, Z. et al. A Bayesian Multinomial Probit MODEL FOR THE ANALYSIS OF PANEL CHOICE DATA. Psychometrika 81, 161–183 (2016). https://doi.org/10.1007/s11336-014-9437-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-014-9437-6

Keywords

Navigation