Skip to main content

Advertisement

Log in

Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning

  • Theory & Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Understanding how the adult human brain learns novel categories is an important problem in neuroscience. Drift-diffusion models are popular in such contexts for their ability to mimic the underlying neural mechanisms. One such model for gradual longitudinal learning was recently developed in Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021). In practice, category response accuracies are often the only reliable measure recorded by behavioral scientists to describe human learning. Category response accuracies are, however, often the only reliable measure recorded by behavioral scientists to describe human learning. To our knowledge, however, drift-diffusion models for such scenarios have never been considered in the literature before. To address this gap, in this article, we build carefully on Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021), but now with latent response times integrated out, to derive a novel biologically interpretable class of ‘inverse-probit’ categorical probability models for observed categories alone. However, this new marginal model presents significant identifiability and inferential challenges not encountered originally for the joint model in Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021). We address these new challenges using a novel projection-based approach with a symmetry-preserving identifiability constraint that allows us to work with conjugate priors in an unconstrained space. We adapt the model for group and individual-level inference in longitudinal settings. Building again on the model’s latent variable representation, we design an efficient Markov chain Monte Carlo algorithm for posterior computation. We evaluate the empirical performance of the method through simulation experiments. The practical efficacy of the method is illustrated in applications to longitudinal tone learning studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. We can see this in a simpler example. Suppose we are interested in generating a sample from the conditional distribution of \({\tau }=(\tau _{1},\tau _{2})\) given \(d=\arg \min _{j} \tau _{j}=1\), where \(\tau _{i} \sim \texttt{Uniform}(0,1)\), \(i=1,2\), independently. The conditional density of \(\varvec{\tau }\) given \(d=1\) is \(f_{\varvec{\tau }\mid d} (\tau _{1},\tau _{2})= 0.5\) if \(0<\tau _{1}\le \tau _{2}<1\), and \(=0\) otherwise. However, if we draw \(\tau _{1}\) from \(\texttt{Uniform}(0,1)\) first and let that realization be \(\tau ^{\star }\), and draw \(\tau _{2}\) from the truncated uniform distribution (left truncated at \(\tau ^{\star }\)), then the pdf of the realization of \((\tau _{1},\tau _{2})\) is \(\tau ^{\star -1}\).

References

  • Agresti, A. (2018). An introduction to categorical data analysis. Wiley.

    Google Scholar 

  • Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, 88, 669–679.

    Article  MathSciNet  Google Scholar 

  • Ashby, F. G., Noble, S., Filoteo, J. V., Waldron, E. M., & Ell, S. W. (2003). Category learning deficits in Parkinson’s disease. Neuropsychology, 17, 115.

    Article  PubMed  Google Scholar 

  • Beck, A. (2017). First-order methods in optimization. SIAM.

    Book  Google Scholar 

  • Bogacz, R., Wagenmakers, E.-J., Forstmann, B. U., & Nieuwenhuis, S. (2010). The neural basis of the speed-accuracy tradeoff. Trends in Neurosciences, 33, 10–16.

    Article  CAS  PubMed  Google Scholar 

  • Borooah, V. K. (2002). Logit and probit: Ordered and multinomial models. Sage.

    Book  Google Scholar 

  • Brody, C. D., & Hanks, T. D. (2016). Neural underpinnings of the evidence accumulator. Current Opinion in Neurobiology, 37, 149–157.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57, 153–178.

    Article  PubMed  Google Scholar 

  • Burgette, L. F., & Nordheim, E. V. (2012). The trace restriction: An alternative identification strategy for the Bayesian multinomial probit model. Journal of Business & Economic Statistics, 30, 404–410.

    Article  MathSciNet  Google Scholar 

  • Burgette, L. F., Puelz, D., & Hahn, P. R. (2021). A symmetric prior for multinomial probit models. Bayesian Analysis, 16, 991–1008.

    Article  MathSciNet  Google Scholar 

  • Cavanagh, J. F., Wiecki, T. V., Cohen, M. X., Figueroa, C. M., Samanta, J., Sherman, S. J., & Frank, M. J. (2011). Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold. Nature Neuroscience, 14, 1462.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chandrasekaran, B., Yi, H.-G., & Maddox, W. T. (2014). Dual-learning systems during speech category learning. Psychonomic Bulletin & Review, 21, 488–495.

    Article  Google Scholar 

  • Chandrasekaran, B., Yi, H.-G., Smayda, K. E., & Maddox, W. T. (2016). Effect of explicit dimensional instruction on speech category learning. Attention, Perception, & Psychophysics, 78, 566–582.

    Article  Google Scholar 

  • Chhikara, R. (1988). The inverse Gaussian distribution: Theory, methodology, and applications. CRC Press.

    Google Scholar 

  • Chib, S., & Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika, 85, 347–361.

    Article  Google Scholar 

  • Cox, D. R., & Miller, H. D. (1965). The theory of stochastic processes. CRC Press.

    Google Scholar 

  • de Boor, C. (1978). A practical guide to splines. Springer.

    Book  Google Scholar 

  • Deo, S. (2018). Algebraic topology. Texts and Readings in Mathematics (Vol. 27). Hindustan Book Agency.

  • Ding, L., & Gold, J. I. (2013). The basal ganglia’s contributions to perceptual decision making. Neuron, 79, 640–649.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on machine learning (pp. 272–279).

  • Dufau, S., Grainger, J., & Ziegler, J. C. (2012). How to say “no’’ to a nonword: A leaky competing accumulator model of lexical decision. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38, 1117.

    PubMed  Google Scholar 

  • Dunson, D. B., & Neelon, B. (2003). Bayesian inference on order-constrained parameters in generalized linear models. Biometrics, 59, 286–295.

    Article  MathSciNet  PubMed  Google Scholar 

  • Eilers, P. H., & Marx, B. D. (1996). Flexible smoothing with b-splines and penalties. Statistical Science, 11, 89–102.

    Article  MathSciNet  Google Scholar 

  • Filoteo, J. V., Lauritzen, S., & Maddox, W. T. (2010). Removing the frontal lobes: The effects of engaging executive functions on perceptual category learning. Psychological Science, 21, 415–423.

    Article  PubMed  Google Scholar 

  • Glimcher, P. W., & Fehr, E. (2013). Neuroeconomics: Decision making and the brain. Academic Press.

    Google Scholar 

  • Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annual Review of Neuroscience, 30, 535–574.

    Article  CAS  PubMed  Google Scholar 

  • Gunn, L. H., & Dunson, D. B. (2005). A transformation approach for incorporating monotone or unimodal constraints. Biostatistics, 6, 434–449.

    Article  PubMed  Google Scholar 

  • Heekeren, H. R., Marrett, S., Bandettini, P. A., & Ungerleider, L. G. (2004). A general mechanism for perceptual decision-making in the human brain. Nature, 431, 859.

    Article  ADS  CAS  PubMed  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • Johndrow, J., Dunson, D., & Lum, K. (2013). Diagonal orthant multinomial probit models. In Artificial intelligence and statistics (pp. 29–38).

  • Kim, S., Potter, K., Craigmile, P. F., Peruggia, M., & Van Zandt, T. (2017). A Bayesian race model for recognition memory. Journal of the American Statistical Association, 112, 77–91.

    Article  MathSciNet  CAS  Google Scholar 

  • Lau, J. W., & Green, P. J. (2007). Bayesian model-based clustering procedures. Journal of Computational and Graphical Statistics, 16, 526–558.

    Article  MathSciNet  Google Scholar 

  • Leite, F. P., & Ratcliff, R. (2010). Modeling reaction time and accuracy of multiple-alternative decisions. Attention, Perception, & Psychophysics, 72, 246–273.

    Article  Google Scholar 

  • Llanos, F., McHaney, J. R., Schuerman, W. L., Yi, H. G., Leonard, M. K., & Chandrasekaran, B. (2020). Non-invasive peripheral nerve stimulation selectively enhances speech category learning in adults. NPJ Science of Learning, 1, 1–11.

    Google Scholar 

  • Lu, J. (1995). Degradation processes and related reliability models. PhD thesis, McGill University, Montreal, Canada.

  • McHaney, J. R., Tessmer, R., Roark, C. L., & Chandrasekaran, B. (2021). Working memory relates to individual differences in speech category learning: Insights from computational modeling and pupillometry. Brain and Language, 22, 1–15.

    Google Scholar 

  • Milosavljevic, M., Malmaud, J., Huth, A., Koch, C., & Rangel, A. (2010). The drift diffusion model can account for the accuracy and reaction time of value-based choices under high and low time pressure. Judgment and Decision Making, 5, 437–449.

    Article  Google Scholar 

  • Morris, J. S. (2015). Functional regression. Annual Review of Statistics and its Application, 2, 321–359.

    Article  ADS  Google Scholar 

  • Parthasarathy, A., Hancock, K. E., Bennett, K., DeGruttola, V., & Polley, D. B. (2020). Bottom-up and top-down neural signatures of disordered multi-talker speech perception in adults with normal hearing. Elife, 9, e51419.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Paulon, G., Llanos, F., Chandrasekaran, B., & Sarkar, A. (2021). Bayesian semiparametric longitudinal drift-diffusion mixed models for tone learning in adults. Journal of the American Statistical Association, 116, 1114–1127.

    Article  MathSciNet  CAS  PubMed  Google Scholar 

  • Peelle, J. E. (2018). Listening effort: How the cognitive consequences of acoustic challenge are reflected in brain and behavior. Ear and Hearing, 39, 204–214.

    Article  PubMed  PubMed Central  Google Scholar 

  • Purcell, B. A. (2013). Neural mechanisms of perceptual decision making. Vanderbilt University.

    Google Scholar 

  • Ramsay, J. O., & Silverman, B. W. (2007). Applied functional data analysis: Methods and case studies. Springer.

    Google Scholar 

  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.

    Article  Google Scholar 

  • Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59.

    Article  Google Scholar 

  • Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20, 873–922.

    Article  PubMed  PubMed Central  Google Scholar 

  • Ratcliff, R., & Rouder, J. N. (1998). Modeling response times for two-choice decisions. Psychological Science, 9, 347–356.

    Article  Google Scholar 

  • Ratcliff, R., Smith, P. L., Brown, S. D., & McKoon, G. (2016). Diffusion decision model: Current issues and history. Trends in Cognitive Sciences, 20, 260–281.

    Article  PubMed  PubMed Central  Google Scholar 

  • Reetzke, R., Xie, Z., Llanos, F., & Chandrasekaran, B. (2018). Tracing the trajectory of sensory plasticity across different stages of speech learning in adulthood. Current Biology, 28, 1419–1427.

    Article  CAS  PubMed  Google Scholar 

  • Roark, C. L., Smayda, K. E., & Chandrasekaran, B. (2021). Auditory and visual category learning in musicians and nonmusicians. Journal of Experimental Psychology: General.

  • Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods. Springer Texts in Statistics (2nd ed.). Springer.

  • Robison, M. K., & Unsworth, N. (2019). Pupillometry tracks fluctuations in working memory performance. Attention, Perception, & Psychophysics, 81, 407–419.

    Article  Google Scholar 

  • Ross, S. M., Kelly, J. J., Sullivan, R. J., Perry, W. J., Mercer, D., Davis, R. M., Washburn, T. D., Sager, E. V., Boyce, J. B., & Bristow, V. L. (1996). Stochastic processes. Wiley.

    Google Scholar 

  • Rudin, W. (1991). Functional analysis. International Series in Pure and Applied Mathematics (2nd ed.). McGraw-Hill Inc.

  • Schall, J. D. (2001). Neural basis of deciding, choosing and acting. Nature Reviews Neuroscience, 2, 33.

    Article  CAS  PubMed  Google Scholar 

  • Sen, D., Patra, S., & Dunson, D. (2018). Constrained inference through posterior projections. arXiv preprint arXiv:1812.05741.

  • Smayda, K. E., Chandrasekaran, B., & Maddox, W. T. (2015). Enhanced cognitive and perceptual processing: A computational basis for the musician advantage in speech learning. Frontiers in Psychology, 1–14.

  • Smith, P. L., & Ratcliff, R. (2004). Psychology and neurobiology of simple decisions. Trends in Neurosciences, 27, 161–168.

    Article  CAS  PubMed  Google Scholar 

  • Smith, P. L., & Vickers, D. (1988). The accumulator model of two-choice discrimination. Journal of Mathematical Psychology, 32, 135–168.

    Article  MathSciNet  Google Scholar 

  • Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108, 550.

    Article  CAS  PubMed  Google Scholar 

  • Wade, S. (2023). Bayesian cluster analysis. Philosophical Transactions of the Royal Society A, 381, 1–20.

    MathSciNet  Google Scholar 

  • Wang, J.-L., Chiou, J.-M., & Müller, H.-G. (2016). Functional data analysis. Annual Review of Statistics and its Application, 3, 257–295.

    Article  ADS  Google Scholar 

  • Wang, Y., Jongman, A., & Sereno, J. A. (2003). Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training. The Journal of the Acoustical Society of America, 113, 1033–1043.

    Article  ADS  PubMed  Google Scholar 

  • Wang, Y., Spence, M. M., Jongman, A., & Sereno, J. A. (1999). Training American listeners to perceive Mandarin tones. The Journal of the Acoustical Society of America, 106, 3649–3658.

    Article  ADS  CAS  PubMed  Google Scholar 

  • Whitmore, G., & Seshadri, V. (1987). A heuristic derivation of the inverse gaussian distribution. The American Statistician, 41, 280–281.

    MathSciNet  Google Scholar 

  • Winn, M. B., Wendt, D., Koelewijn, T., & Kuchinsky, S. E. (2018). Best practices and advice for using pupillometry to measure listening effort: An introduction for those who want to get started. Trends in Hearing, 22, 1–32.

    Article  Google Scholar 

  • Zekveld, A. A., Kramer, S. E., & Festen, J. M. (2011). Cognitive load during speech perception in noise: The influence of age, hearing loss, and cognition on the pupil response. Ear and Hearing, 32, 498–510.

    Article  PubMed  Google Scholar 

Download references

Funding

This research was funded by the National Science Foundation grant DMS 1953712 and National Institute on Deafness and Other Communication Disorders Grants R01DC013315 and R01DC015504 awarded to Sarkar and Chandrasekaran.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhra Sarkar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 401 KB)

Appendices

Appendix

Appendix A: Proof of Lemma 1

Proof

It is easy to check that the offset parameters \(\delta _{s}\) are not identifiable since

$$\begin{aligned} P(d \mid s,\delta _{s},{\mu }_{1:d_{0},s}, {{\textbf{b}}}_{1:d_{0},s})= & {} \int _{\delta _{s}}^{\infty } g(\tau \mid \delta _{s},\mu _{d,s},b_{d,s}) \prod _{d^{\prime } \ne d} \left\{ 1 - G(\tau \mid \delta _{s},\mu _{d^{\prime },s},b_{d^{\prime },s})\right\} d\tau \\= & {} \int _{0}^{\infty } g(\tau \mid 0,\mu _{d,s},b_{d,s}) \prod _{d^{\prime } \ne d} \left\{ 1 - G(\tau \mid 0,\mu _{d^{\prime },s},b_{d^{\prime },s})\right\} d\tau \\= & {} P(d \mid s, 0,{\mu }_{1:d_{0},s},{{\textbf{b}}}_{1:d_{0},s}). \end{aligned}$$

Next we will show that the drift parameters and decision boundaries are not separately identifiable, even if we fix offset parameters to a constant.

First note that Eq. (3) can also be represented as

$$\begin{aligned} \int _{\delta _{s}}^{\infty } \ldots \int _{\delta _{s}}^{\infty } \prod _{d^{\prime }\ne d} g(\tau _{d^{\prime }} \mid {\theta }_{d^{\prime },s}) \int _{\delta _{s}}^{\wedge _{d\ne d^{\prime }}\tau _{d^{\prime }} } g(\tau _{d} \mid {\theta }_{d,s}) d\tau _{d} \prod _{d^{\prime }\ne d} d\tau _{d^{\prime }}. \end{aligned}$$
(A.1)

First observe that \(\tau ^{\star }=\wedge _{d^{\prime }\ne d}\tau _{d^{\prime }} =\tau _{-1}^{\star }\wedge \tau _{1}\), where \(\tau _{-1}^{\star }=\wedge _{d^{\prime }\ne \{1,d\}}\tau _{d^{\prime }}\). Thus the integral above can be written as

$$\begin{aligned}&\int _{\delta _{s}}^{\infty } \ldots \int _{\delta _{s}}^{\infty } \prod _{d^{\prime }\ne \{1,d\}} g(\tau _{d^{\prime }} \mid {\theta }_{d^{\prime },s}) \left\{ \int _{\delta _{s}}^{\infty } g(\tau _{1} \mid {\theta }_{1,s}) \int _{\delta _{s}}^{\tau _{-1}^{\star } \wedge \tau _{1} } g(\tau _{d} \mid {\theta }_{d,s}) d\tau _{d}\right\} \prod _{d^{\prime }\ne \{1,d\}} d\tau _{d^{\prime }}\\&\quad =\int _{\delta _{s}}^{\infty } \ldots \int _{\delta _{s}}^{\infty } \prod _{d^{\prime }\ne \{1,d\}} g(\tau _{d^{\prime }} \mid {\theta }_{d^{\prime },s}) \left\{ \int _{\delta _{s}}^{\tau _{-1}^{\star }} g(\tau _{d} \mid {\theta }_{d,s}) \int _{\tau _{d}}^{\infty } g(\tau _{1} \mid {\theta }_{1,s}) d\tau _{1} d\tau _{d} \right\} \prod _{d^{\prime }\ne \{1,d\}} d\tau _{d^{\prime }}. \end{aligned}$$

Proceeding sequentially one can show that the integral above is the same as in (3).

Using the above we express the probability in (3) as in (A.1). As the offset parameter \(\delta _{s}\) is already shown to be not identifiable, we need to fix the same. Without loss of generality, we fix the offset parameter at 0. The probability density function of inverse Gaussian distribution, with parameters \({\theta }_{d^{\prime },s}=(\mu _{d^{\prime },s},b_{d^{\prime },s})\) evaluated at \(\tau _{d^{\prime }}\), \(g(\tau _{d^{\prime }}\mid {\theta }_{d^{\prime },s})\) can be obtained from (1) by replacing \(\delta _{s}=0\) and \(d=d^{\prime }\).

Consider the transformation of \(\tau _{d^{\prime }}\) to \(\tau _{d^{\prime }}^{\star }\) as \(\tau _{d^{\prime }}=c^2\tau _{d^{\prime }}^{\star }\), for some constant \(c>0\), and for all \(d^{\prime }\). Further, define \(b_{d^{\prime },s}^{\star }=b_{d^{\prime },s}/c\) and \(\mu _{d^{\prime },s}^{\star }=c\mu _{d^{\prime },s}\), for all \(d^{\prime }\). Then observe that

$$\begin{aligned} g(\tau _{d^{\prime }}\mid {\theta }_{d^{\prime },s}) d\tau _{d^{\prime }}&= (2\pi )^{-1/2} b_{d^{\prime },s}^{\star } (\tau _{d^{\prime }}^{\star })^{-3/2}\exp \left\{ -(2\tau _{d^{\prime }}^{\star })^{-1} \left( b_{d^{\prime },s}^{\star } -\mu _{d^{\prime },s}^{\star } \tau _{j}^{\star } \right) ^{2} \right\} d\tau _{d^{\prime }}^{\star }\\&=g(\tau _{d^{\prime }}^{\star }\mid {\theta }_{d^{\prime },s}^{\star } ) d\tau _{d^{\prime }}^{\star }, \end{aligned}$$

where \(g(\tau _{d^{\prime }}^{\star }\mid {\theta }_{d^{\prime },s}^{\star } )\) is the pdf of inverse Gaussian distribution with parameters \(\mu _{d^{\prime },s}^{\star }\) and \(b_{d^{\prime },s}^{\star }\), evaluated at the point \(\tau _{d^{\prime }}^{\star }\).

Applying the transformation on \(\tau _{d^{\prime }}\) for all \(d^{\prime }\) we get that the integral in (A.1) with \(\delta _{s}=0\) is same as

$$\begin{aligned} \int _{0}^{\infty } \ldots \int _{0}^{\infty } \prod _{d^{\prime }\ne d} g(\tau _{d^{\prime }}^{\star } \mid {\theta }_{d^{\prime },s}^{\star }) \int _{0}^{\wedge _{d^{\prime }\ne d}\tau _{d^{\prime }}^{\star } } g(\tau _{d}^{\star } \mid {\theta }_{d,s}^{\star }) d\tau _{d}^{\star } \prod _{d^{\prime }\ne d} d\tau _{d^{\prime }}^{\star }. \end{aligned}$$

As c is arbitrary, this shows that the drifts and boundaries are not separately estimable. \(\square \)

Appendix B: Proof of Theorem 1

Proof

Let \({{\textbf{P}}}({\mu }_{1:d_{0},s})=\{p_{1}({\mu }_{1:d_{0},s}), \dots , p_{d_{0}}({\mu }_{1:d_{0},s}) \}^\textrm{T}\) be the function, given by (4), from \(\mathcal{S}_{0,k}\) to unit probability simplex \(\Delta ^{d_{0}-1}\). For notational simplicity, we write \({\mu }_{1:d_{0},s} = {\mu }= (\mu _{1}, \dots , \mu _{d_{0}})^\textrm{T}\). We first find the matrix of partial derivative \(\nabla {{\textbf{P}}}\) with respect to \({\mu }\).

For \({\mu }\in \mathcal{S}_{0,k}\), \( \textbf{1}^{T} {\mu }=k\), and hence the probability reduces to

$$\begin{aligned} p_{d}\left( {\mu }\right) = \frac{\left( be^{b}\right) ^{d_{0}}}{ (2 \pi )^{d_{0}/2}}\int _{0}^{\infty } \int _{\tau _{d}}^{\infty } \cdots \int _{\tau _{d}}^{\infty } |{\tau }|^{-3/2} \exp \left\{ -\frac{1}{2} \left( \textbf{1}^{T} {\tau }^{-1} \textbf{1} + {\mu }^{T} {\tau }{{\mu }} \right) \right\} d{\tau }_{-d} d\tau _{d}, \end{aligned}$$

for \(d=1,\ldots , d_{0}\), where \({\tau }=\textrm{diag}(\tau _{1}, \ldots , \tau _{d_{0}})\), and \({\tau }_{-d}\) is the sub-vector of \({\tau }\) excluding the d-th element. Next, differentiating \(p_{d}\left( {\mu }\right) \) with respect to \({\mu }\), we get

$$\begin{aligned} \frac{\partial {p_{d}\left( {\mu }\right) }}{\partial {\mu }}= & {} \displaystyle \frac{\left( be^{b}\right) ^{d_{0}}}{ (2 \pi )^{d_{0}/2}}\int _{0}^{\infty } \int _{\tau _{d}}^{\infty } \cdots \int _{\tau _{d}}^{\infty } |{\tau }|^{-3/2} \left( -{\tau }{\mu }\right) \exp \left\{ -\frac{1}{2} \left( \textbf{1}^{T} {\tau }^{-1} \textbf{1} + {\mu }^{T} {\tau }{{\mu }} \right) \right\} d{\tau }_{-d} d\tau _{d}, \\= & {} \begin{bmatrix} \mu _{1} \eta _{2}&\quad \cdots&\quad \mu _{d-1} \eta _{2}&\quad \mu _{d} \eta _{1}&\quad \mu _{d+1} \eta _{2}&\quad \cdots&\quad \mu _{d_{0}} \eta _{2} \end{bmatrix}^{T}, \end{aligned}$$

where \( \eta _{1} = - E\left\{ \tau _{1} {\mathbb {I}}\left( \tau _{2}> \tau _{1}, \ldots , \tau _{d_{0}}>\tau _{1}\right) \left| {\mu }\right. \right\} \), and \( \eta _{2} = -E\left\{ \tau _{2} {\mathbb {I}}\left( \tau _{2}> \tau _{1}, \ldots , \tau _{d_{0}}>\tau _{1}\right) \left| {\mu }\right. \right\} \), and \({\mathbb {I}}(A)\) is the indicator function of the event A. Here the expectation is considered under the joint distribution of \(\left( \tau _{1}, \ldots , \tau _{d}\right) \), which is independent inverse Gaussian. Clearly \(\eta _{1}>\eta _{2}>0\).

From the above derivation, it is easy to obtain that

$$\begin{aligned} \nabla {{\textbf{P}}}\left( {\mu }\right) = \begin{bmatrix} \mu _{1} \eta _{1} &{}\quad \mu _{2} \eta _{2} &{}\quad \cdots &{}\quad \mu _{d_{0}} \eta _{2} \\ \mu _{1} \eta _{2} &{}\quad \mu _{2} \eta _{1} &{}\quad \cdots &{}\quad \mu _{d_{0}} \eta _{2} \\ \vdots &{}\quad \vdots &{}\quad \cdots &{}\quad \vdots \\ \mu _{1} \eta _{2} &{}\quad \mu _{2} \eta _{2} &{}\quad \cdots &{}\quad \mu _{d_{0}} \eta _{1} \end{bmatrix} = {{\textbf{M}}}\left\{ \left( \eta _{1}-\eta _{2}\right) I+\eta _{2} \textbf{1} \textbf{1}^{T} \right\} , \end{aligned}$$

where \({{\textbf{M}}}=\textrm{diag}\left( \mu _{1}, \ldots , \mu _{d_{0}}\right) \).

Now, suppose there exists \({\mu }\) and \({\nu }\) in \(\mathcal{S}_{k}\) such that \({\mu }\ne {\nu }\) and \({{\textbf{P}}}\left( {\mu }\right) = {{\textbf{P}}}\left( {\nu }\right) \). Define \({\gamma }: [0,1] \rightarrow {\mathbb {R}}^{d_{0}}\) such that \({\gamma }(t)= {\mu }+ t \left( {\nu }- {\mu }\right) \), \(t\in [0,1]\). Further, define \(h(t)= \langle {{\textbf{P}}}\left( {\gamma }(t) \right) - {{\textbf{P}}}\left( {\mu }\right) , {\nu }- {\mu }\rangle \), as the cross-product of \({{\textbf{P}}}\left( {\gamma }(t) \right) - {{\textbf{P}}}\left( {\mu }\right) \) and \({\nu }-{\mu }\). Then \(h(1)=h(0)=0\) under the proposition that \({{\textbf{P}}}\left( {\mu }\right) = {{\textbf{P}}}\left( {\nu }\right) \). Therefore, by the Mean Value Theorem, as \({\mu }\ne {\nu }\), there exists some point \(c\in (0,1)\) such that \(\left. \partial h(t) / \partial t \right| _{t=c} =0\). Now,

$$\begin{aligned} \frac{\partial h(t) }{ \partial t}= & {} \sum _{d^{\prime }=1}^{d_{0}} \left( \nu _{d^{\prime }} - \mu _{d^{\prime }} \right) \frac{\partial }{\partial t}\left[ p_{d^{\prime }} \left\{ {\gamma }(t) \right\} - p_{d^{\prime }} \left( {\mu }\right) \right] \\= & {} \sum _{d^{\prime }=1}^{d_{0}} \left( \nu _{d^{\prime }} - \mu _{d^{\prime }} \right) \left\{ \frac{\partial }{\partial {\gamma }} p_{d^{\prime }} \left( {\gamma }\right) \right\} ^{T} \frac{\partial {\gamma }(t)}{\partial t}\\= & {} \left( {\nu }-{\mu }\right) ^{T} \nabla {{\textbf{P}}}\{{\gamma }(t)\} \left( {\nu }-{\mu }\right) \\= & {} \left( \eta _{1}-\eta _{2}\right) \left( {\nu }-{\mu }\right) ^{T} {\Gamma }(t) \left( {\nu }-{\mu }\right) + \eta _{2} \left( {\nu }-{\mu }\right) ^{T} M \textbf{1} \textbf{1}^{T} \left( {\nu }-{\mu }\right) \\= & {} \left( \eta _{1}-\eta _{2}\right) \left( {\nu }-{\mu }\right) ^{T} {\Gamma }(t) \left( {\nu }-{\mu }\right) , \end{aligned}$$

as \(\textbf{1}^{T} \left( {\nu }-{\mu }\right) =0\), where \({\Gamma }(t)=\textrm{diag}\{ {\gamma }(t) \}\).

As every component of \({\mu }\) and \({\nu }\) is positive, for any \(c\in (0,1)\), the matrix \({\Gamma }(c)\) is positive definite. Further, as \(\eta _{1}>\eta _{2}\), \(\left. \partial h(t) / \partial t \right| _{t=c} =0\) only if \( {\mu }={\nu }\), which contradicts the proposition. \(\square \)

Appendix C: Algorithm for Minimal Distance Mapping

The problem of finding projection of a point \({\mu }\) onto the space \(\mathcal{S}_{k,\varepsilon }\) is equivalent to the following nonlinear optimization problem:

$$\begin{aligned} \textrm{minimize}_{{{\textbf{w}}}} \Vert {{{\textbf{w}}}} -{\mu }\Vert ^2 \quad \text{ such } \text{ that } \quad \sum _{i=1}^{d_{0}} w_{i}=k,\quad w_{i} \ge \varepsilon . \end{aligned}$$

Duchi et al. (2008, Algorithm 1) provides a solution to the problem of projection of a given point \({\mu }\) onto the space \(\mathcal{S}_{k,\varepsilon }\) for \(\varepsilon =0\), which is modified for any given \(\varepsilon \) below.

figure b

Appendix D: Proof of Lemma 3

Proof

We consider the unconditional distribution of \(\tau _{1:d_{0}}\), given the parameters \(\mu _{1:d_{0}}\) as the proposal distribution, g. Clearly, the proposal distribution g and the target conditional joint distribution f satisfies \(f(\tau _{1:d_{0}}|\mu _{1:d_{0}})/g(\tau _{1:d_{0}}|\mu _{1:d_{0}})\le M\), where \(M^{-1}=P\left( \tau _{d} \le \tau _{1:d_{0}}\right) \). Therefore, for any random sample \(U\sim U(0,1)\), \(f(\tau _{1:d_{0}}|\mu _{1:d_{0}})\ge M U g(\tau _{1:d_{0}}|\mu _{1:d_{0}})\) if the sample satisfies the condition \(\tau _{d} \le \tau _{1:d_{0}}\), and \(f(\tau _{1:d_{0}}|\mu _{1:d_{0}})< M U g(\tau _{1:d_{0}}|\mu _{1:d_{0}})\) otherwise. Hence, by Lemma 2.3.1 of Robert and Casella (2004), algorithm above produces samples from the target distribution. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mukhopadhyay, M., McHaney, J.R., Chandrasekaran, B. et al. Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning. Psychometrika (2024). https://doi.org/10.1007/s11336-024-09947-8

Download citation

  • Received:

  • Published:

  • DOI: https://doi.org/10.1007/s11336-024-09947-8

Keywords

Navigation