Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning

Mukhopadhyay, Minerva; McHaney, Jacie R.; Chandrasekaran, Bharath; Sarkar, Abhra

doi:10.1007/s11336-024-09947-8

Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning

Theory & Methods
Published: 19 February 2024

(2024)
Cite this article

Psychometrika Aims and scope Submit manuscript

Minerva Mukhopadhyay¹,
Jacie R. McHaney²,
Bharath Chandrasekaran² &
…
Abhra Sarkar ORCID: orcid.org/0000-0002-6924-8464³

154 Accesses
Explore all metrics

Abstract

Understanding how the adult human brain learns novel categories is an important problem in neuroscience. Drift-diffusion models are popular in such contexts for their ability to mimic the underlying neural mechanisms. One such model for gradual longitudinal learning was recently developed in Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021). In practice, category response accuracies are often the only reliable measure recorded by behavioral scientists to describe human learning. Category response accuracies are, however, often the only reliable measure recorded by behavioral scientists to describe human learning. To our knowledge, however, drift-diffusion models for such scenarios have never been considered in the literature before. To address this gap, in this article, we build carefully on Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021), but now with latent response times integrated out, to derive a novel biologically interpretable class of ‘inverse-probit’ categorical probability models for observed categories alone. However, this new marginal model presents significant identifiability and inferential challenges not encountered originally for the joint model in Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021). We address these new challenges using a novel projection-based approach with a symmetry-preserving identifiability constraint that allows us to work with conjugate priors in an unconstrained space. We adapt the model for group and individual-level inference in longitudinal settings. Building again on the model’s latent variable representation, we design an efficient Markov chain Monte Carlo algorithm for posterior computation. We evaluate the empirical performance of the method through simulation experiments. The practical efficacy of the method is illustrated in applications to longitudinal tone learning studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Small is beautiful: In defense of the small-N design

Article Open access 19 March 2018

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Article Open access 17 April 2024

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Notes

We can see this in a simpler example. Suppose we are interested in generating a sample from the conditional distribution of ${\tau }=(\tau _{1},\tau _{2})$ given $d=\arg \min _{j} \tau _{j}=1$, where $\tau _{i} \sim \texttt{Uniform}(0,1)$, $i=1,2$, independently. The conditional density of $\varvec{\tau }$ given $d=1$ is $f_{\varvec{\tau }\mid d} (\tau _{1},\tau _{2})= 0.5$ if $0<\tau _{1}\le \tau _{2}<1$, and $=0$ otherwise. However, if we draw $\tau _{1}$ from $\texttt{Uniform}(0,1)$ first and let that realization be $\tau ^{\star }$, and draw $\tau _{2}$ from the truncated uniform distribution (left truncated at $\tau ^{\star }$), then the pdf of the realization of $(\tau _{1},\tau _{2})$ is $\tau ^{\star -1}$.

References

Agresti, A. (2018). An introduction to categorical data analysis. Wiley.
Google Scholar
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, 88, 669–679.
Article MathSciNet Google Scholar
Ashby, F. G., Noble, S., Filoteo, J. V., Waldron, E. M., & Ell, S. W. (2003). Category learning deficits in Parkinson’s disease. Neuropsychology, 17, 115.
Article PubMed Google Scholar
Beck, A. (2017). First-order methods in optimization. SIAM.
Book Google Scholar
Bogacz, R., Wagenmakers, E.-J., Forstmann, B. U., & Nieuwenhuis, S. (2010). The neural basis of the speed-accuracy tradeoff. Trends in Neurosciences, 33, 10–16.
Article CAS PubMed Google Scholar
Borooah, V. K. (2002). Logit and probit: Ordered and multinomial models. Sage.
Book Google Scholar
Brody, C. D., & Hanks, T. D. (2016). Neural underpinnings of the evidence accumulator. Current Opinion in Neurobiology, 37, 149–157.
Article CAS PubMed PubMed Central Google Scholar
Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57, 153–178.
Article PubMed Google Scholar
Burgette, L. F., & Nordheim, E. V. (2012). The trace restriction: An alternative identification strategy for the Bayesian multinomial probit model. Journal of Business & Economic Statistics, 30, 404–410.
Article MathSciNet Google Scholar
Burgette, L. F., Puelz, D., & Hahn, P. R. (2021). A symmetric prior for multinomial probit models. Bayesian Analysis, 16, 991–1008.
Article MathSciNet Google Scholar
Cavanagh, J. F., Wiecki, T. V., Cohen, M. X., Figueroa, C. M., Samanta, J., Sherman, S. J., & Frank, M. J. (2011). Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold. Nature Neuroscience, 14, 1462.
Article CAS PubMed PubMed Central Google Scholar
Chandrasekaran, B., Yi, H.-G., & Maddox, W. T. (2014). Dual-learning systems during speech category learning. Psychonomic Bulletin & Review, 21, 488–495.
Article Google Scholar
Chandrasekaran, B., Yi, H.-G., Smayda, K. E., & Maddox, W. T. (2016). Effect of explicit dimensional instruction on speech category learning. Attention, Perception, & Psychophysics, 78, 566–582.
Article Google Scholar
Chhikara, R. (1988). The inverse Gaussian distribution: Theory, methodology, and applications. CRC Press.
Google Scholar
Chib, S., & Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika, 85, 347–361.
Article Google Scholar
Cox, D. R., & Miller, H. D. (1965). The theory of stochastic processes. CRC Press.
Google Scholar
de Boor, C. (1978). A practical guide to splines. Springer.
Book Google Scholar
Deo, S. (2018). Algebraic topology. Texts and Readings in Mathematics (Vol. 27). Hindustan Book Agency.
Ding, L., & Gold, J. I. (2013). The basal ganglia’s contributions to perceptual decision making. Neuron, 79, 640–649.
Article CAS PubMed PubMed Central Google Scholar
Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on machine learning (pp. 272–279).
Dufau, S., Grainger, J., & Ziegler, J. C. (2012). How to say “no’’ to a nonword: A leaky competing accumulator model of lexical decision. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38, 1117.
PubMed Google Scholar
Dunson, D. B., & Neelon, B. (2003). Bayesian inference on order-constrained parameters in generalized linear models. Biometrics, 59, 286–295.
Article MathSciNet PubMed Google Scholar
Eilers, P. H., & Marx, B. D. (1996). Flexible smoothing with b-splines and penalties. Statistical Science, 11, 89–102.
Article MathSciNet Google Scholar
Filoteo, J. V., Lauritzen, S., & Maddox, W. T. (2010). Removing the frontal lobes: The effects of engaging executive functions on perceptual category learning. Psychological Science, 21, 415–423.
Article PubMed Google Scholar
Glimcher, P. W., & Fehr, E. (2013). Neuroeconomics: Decision making and the brain. Academic Press.
Google Scholar
Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annual Review of Neuroscience, 30, 535–574.
Article CAS PubMed Google Scholar
Gunn, L. H., & Dunson, D. B. (2005). A transformation approach for incorporating monotone or unimodal constraints. Biostatistics, 6, 434–449.
Article PubMed Google Scholar
Heekeren, H. R., Marrett, S., Bandettini, P. A., & Ungerleider, L. G. (2004). A general mechanism for perceptual decision-making in the human brain. Nature, 431, 859.
Article ADS CAS PubMed Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Article Google Scholar
Johndrow, J., Dunson, D., & Lum, K. (2013). Diagonal orthant multinomial probit models. In Artificial intelligence and statistics (pp. 29–38).
Kim, S., Potter, K., Craigmile, P. F., Peruggia, M., & Van Zandt, T. (2017). A Bayesian race model for recognition memory. Journal of the American Statistical Association, 112, 77–91.
Article MathSciNet CAS Google Scholar
Lau, J. W., & Green, P. J. (2007). Bayesian model-based clustering procedures. Journal of Computational and Graphical Statistics, 16, 526–558.
Article MathSciNet Google Scholar
Leite, F. P., & Ratcliff, R. (2010). Modeling reaction time and accuracy of multiple-alternative decisions. Attention, Perception, & Psychophysics, 72, 246–273.
Article Google Scholar
Llanos, F., McHaney, J. R., Schuerman, W. L., Yi, H. G., Leonard, M. K., & Chandrasekaran, B. (2020). Non-invasive peripheral nerve stimulation selectively enhances speech category learning in adults. NPJ Science of Learning, 1, 1–11.
Google Scholar
Lu, J. (1995). Degradation processes and related reliability models. PhD thesis, McGill University, Montreal, Canada.
McHaney, J. R., Tessmer, R., Roark, C. L., & Chandrasekaran, B. (2021). Working memory relates to individual differences in speech category learning: Insights from computational modeling and pupillometry. Brain and Language, 22, 1–15.
Google Scholar
Milosavljevic, M., Malmaud, J., Huth, A., Koch, C., & Rangel, A. (2010). The drift diffusion model can account for the accuracy and reaction time of value-based choices under high and low time pressure. Judgment and Decision Making, 5, 437–449.
Article Google Scholar
Morris, J. S. (2015). Functional regression. Annual Review of Statistics and its Application, 2, 321–359.
Article ADS Google Scholar
Parthasarathy, A., Hancock, K. E., Bennett, K., DeGruttola, V., & Polley, D. B. (2020). Bottom-up and top-down neural signatures of disordered multi-talker speech perception in adults with normal hearing. Elife, 9, e51419.
Article CAS PubMed PubMed Central Google Scholar
Paulon, G., Llanos, F., Chandrasekaran, B., & Sarkar, A. (2021). Bayesian semiparametric longitudinal drift-diffusion mixed models for tone learning in adults. Journal of the American Statistical Association, 116, 1114–1127.
Article MathSciNet CAS PubMed Google Scholar
Peelle, J. E. (2018). Listening effort: How the cognitive consequences of acoustic challenge are reflected in brain and behavior. Ear and Hearing, 39, 204–214.
Article PubMed PubMed Central Google Scholar
Purcell, B. A. (2013). Neural mechanisms of perceptual decision making. Vanderbilt University.
Google Scholar
Ramsay, J. O., & Silverman, B. W. (2007). Applied functional data analysis: Methods and case studies. Springer.
Google Scholar
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.
Article Google Scholar
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59.
Article Google Scholar
Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20, 873–922.
Article PubMed PubMed Central Google Scholar
Ratcliff, R., & Rouder, J. N. (1998). Modeling response times for two-choice decisions. Psychological Science, 9, 347–356.
Article Google Scholar
Ratcliff, R., Smith, P. L., Brown, S. D., & McKoon, G. (2016). Diffusion decision model: Current issues and history. Trends in Cognitive Sciences, 20, 260–281.
Article PubMed PubMed Central Google Scholar
Reetzke, R., Xie, Z., Llanos, F., & Chandrasekaran, B. (2018). Tracing the trajectory of sensory plasticity across different stages of speech learning in adulthood. Current Biology, 28, 1419–1427.
Article CAS PubMed Google Scholar
Roark, C. L., Smayda, K. E., & Chandrasekaran, B. (2021). Auditory and visual category learning in musicians and nonmusicians. Journal of Experimental Psychology: General.
Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods. Springer Texts in Statistics (2nd ed.). Springer.
Robison, M. K., & Unsworth, N. (2019). Pupillometry tracks fluctuations in working memory performance. Attention, Perception, & Psychophysics, 81, 407–419.
Article Google Scholar
Ross, S. M., Kelly, J. J., Sullivan, R. J., Perry, W. J., Mercer, D., Davis, R. M., Washburn, T. D., Sager, E. V., Boyce, J. B., & Bristow, V. L. (1996). Stochastic processes. Wiley.
Google Scholar
Rudin, W. (1991). Functional analysis. International Series in Pure and Applied Mathematics (2nd ed.). McGraw-Hill Inc.
Schall, J. D. (2001). Neural basis of deciding, choosing and acting. Nature Reviews Neuroscience, 2, 33.
Article CAS PubMed Google Scholar
Sen, D., Patra, S., & Dunson, D. (2018). Constrained inference through posterior projections. arXiv preprint arXiv:1812.05741.
Smayda, K. E., Chandrasekaran, B., & Maddox, W. T. (2015). Enhanced cognitive and perceptual processing: A computational basis for the musician advantage in speech learning. Frontiers in Psychology, 1–14.
Smith, P. L., & Ratcliff, R. (2004). Psychology and neurobiology of simple decisions. Trends in Neurosciences, 27, 161–168.
Article CAS PubMed Google Scholar
Smith, P. L., & Vickers, D. (1988). The accumulator model of two-choice discrimination. Journal of Mathematical Psychology, 32, 135–168.
Article MathSciNet Google Scholar
Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108, 550.
Article CAS PubMed Google Scholar
Wade, S. (2023). Bayesian cluster analysis. Philosophical Transactions of the Royal Society A, 381, 1–20.
MathSciNet Google Scholar
Wang, J.-L., Chiou, J.-M., & Müller, H.-G. (2016). Functional data analysis. Annual Review of Statistics and its Application, 3, 257–295.
Article ADS Google Scholar
Wang, Y., Jongman, A., & Sereno, J. A. (2003). Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training. The Journal of the Acoustical Society of America, 113, 1033–1043.
Article ADS PubMed Google Scholar
Wang, Y., Spence, M. M., Jongman, A., & Sereno, J. A. (1999). Training American listeners to perceive Mandarin tones. The Journal of the Acoustical Society of America, 106, 3649–3658.
Article ADS CAS PubMed Google Scholar
Whitmore, G., & Seshadri, V. (1987). A heuristic derivation of the inverse gaussian distribution. The American Statistician, 41, 280–281.
MathSciNet Google Scholar
Winn, M. B., Wendt, D., Koelewijn, T., & Kuchinsky, S. E. (2018). Best practices and advice for using pupillometry to measure listening effort: An introduction for those who want to get started. Trends in Hearing, 22, 1–32.
Article Google Scholar
Zekveld, A. A., Kramer, S. E., & Festen, J. M. (2011). Cognitive load during speech perception in noise: The influence of age, hearing loss, and cognition on the pupil response. Ear and Hearing, 32, 498–510.
Article PubMed Google Scholar

Download references

Funding

This research was funded by the National Science Foundation grant DMS 1953712 and National Institute on Deafness and Other Communication Disorders Grants R01DC013315 and R01DC015504 awarded to Sarkar and Chandrasekaran.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Indian Institute of Technology, Kanpur, 208016, Uttar Pradesh, India
Minerva Mukhopadhyay
Department of Communication Sciences and Disorders, Northwestern University, 70 Arts Circle Drive, Evanston, IL, 60208, USA
Jacie R. McHaney & Bharath Chandrasekaran
Department of Statistics and Data Sciences, University of Texas at Austin, 105 East 24th Street D9800, Austin, TX, 78712, USA
Abhra Sarkar

Authors

Minerva Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Jacie R. McHaney
View author publications
You can also search for this author in PubMed Google Scholar
Bharath Chandrasekaran
View author publications
You can also search for this author in PubMed Google Scholar
Abhra Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abhra Sarkar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 401 KB)

Appendices

Appendix

Appendix A: Proof of Lemma 1

Proof

It is easy to check that the offset parameters $\delta _{s}$ are not identifiable since

$$\begin{aligned} P(d \mid s,\delta _{s},{\mu }_{1:d_{0},s}, {{\textbf{b}}}_{1:d_{0},s})= & {} \int _{\delta _{s}}^{\infty } g(\tau \mid \delta _{s},\mu _{d,s},b_{d,s}) \prod _{d^{\prime } \ne d} \left\{ 1 - G(\tau \mid \delta _{s},\mu _{d^{\prime },s},b_{d^{\prime },s})\right\} d\tau \\= & {} \int _{0}^{\infty } g(\tau \mid 0,\mu _{d,s},b_{d,s}) \prod _{d^{\prime } \ne d} \left\{ 1 - G(\tau \mid 0,\mu _{d^{\prime },s},b_{d^{\prime },s})\right\} d\tau \\= & {} P(d \mid s, 0,{\mu }_{1:d_{0},s},{{\textbf{b}}}_{1:d_{0},s}). \end{aligned}$$

Next we will show that the drift parameters and decision boundaries are not separately identifiable, even if we fix offset parameters to a constant.

First note that Eq. (3) can also be represented as

$$\begin{aligned} \int _{\delta _{s}}^{\infty } \ldots \int _{\delta _{s}}^{\infty } \prod _{d^{\prime }\ne d} g(\tau _{d^{\prime }} \mid {\theta }_{d^{\prime },s}) \int _{\delta _{s}}^{\wedge _{d\ne d^{\prime }}\tau _{d^{\prime }} } g(\tau _{d} \mid {\theta }_{d,s}) d\tau _{d} \prod _{d^{\prime }\ne d} d\tau _{d^{\prime }}. \end{aligned}$$

(A.1)

First observe that $\tau ^{\star }=\wedge _{d^{\prime }\ne d}\tau _{d^{\prime }} =\tau _{-1}^{\star }\wedge \tau _{1}$, where $\tau _{-1}^{\star }=\wedge _{d^{\prime }\ne \{1,d\}}\tau _{d^{\prime }}$. Thus the integral above can be written as

$$\begin{aligned}&\int _{\delta _{s}}^{\infty } \ldots \int _{\delta _{s}}^{\infty } \prod _{d^{\prime }\ne \{1,d\}} g(\tau _{d^{\prime }} \mid {\theta }_{d^{\prime },s}) \left\{ \int _{\delta _{s}}^{\infty } g(\tau _{1} \mid {\theta }_{1,s}) \int _{\delta _{s}}^{\tau _{-1}^{\star } \wedge \tau _{1} } g(\tau _{d} \mid {\theta }_{d,s}) d\tau _{d}\right\} \prod _{d^{\prime }\ne \{1,d\}} d\tau _{d^{\prime }}\\&\quad =\int _{\delta _{s}}^{\infty } \ldots \int _{\delta _{s}}^{\infty } \prod _{d^{\prime }\ne \{1,d\}} g(\tau _{d^{\prime }} \mid {\theta }_{d^{\prime },s}) \left\{ \int _{\delta _{s}}^{\tau _{-1}^{\star }} g(\tau _{d} \mid {\theta }_{d,s}) \int _{\tau _{d}}^{\infty } g(\tau _{1} \mid {\theta }_{1,s}) d\tau _{1} d\tau _{d} \right\} \prod _{d^{\prime }\ne \{1,d\}} d\tau _{d^{\prime }}. \end{aligned}$$

Proceeding sequentially one can show that the integral above is the same as in (3).

Using the above we express the probability in (3) as in (A.1). As the offset parameter $\delta _{s}$ is already shown to be not identifiable, we need to fix the same. Without loss of generality, we fix the offset parameter at 0. The probability density function of inverse Gaussian distribution, with parameters ${\theta }_{d^{\prime },s}=(\mu _{d^{\prime },s},b_{d^{\prime },s})$ evaluated at $\tau _{d^{\prime }}$, $g(\tau _{d^{\prime }}\mid {\theta }_{d^{\prime },s})$ can be obtained from (1) by replacing $\delta _{s}=0$ and $d=d^{\prime }$.

Consider the transformation of $\tau _{d^{\prime }}$ to $\tau _{d^{\prime }}^{\star }$ as $\tau _{d^{\prime }}=c^2\tau _{d^{\prime }}^{\star }$, for some constant $c>0$, and for all $d^{\prime }$. Further, define $b_{d^{\prime },s}^{\star }=b_{d^{\prime },s}/c$ and $\mu _{d^{\prime },s}^{\star }=c\mu _{d^{\prime },s}$, for all $d^{\prime }$. Then observe that

$$\begin{aligned} g(\tau _{d^{\prime }}\mid {\theta }_{d^{\prime },s}) d\tau _{d^{\prime }}&= (2\pi )^{-1/2} b_{d^{\prime },s}^{\star } (\tau _{d^{\prime }}^{\star })^{-3/2}\exp \left\{ -(2\tau _{d^{\prime }}^{\star })^{-1} \left( b_{d^{\prime },s}^{\star } -\mu _{d^{\prime },s}^{\star } \tau _{j}^{\star } \right) ^{2} \right\} d\tau _{d^{\prime }}^{\star }\\&=g(\tau _{d^{\prime }}^{\star }\mid {\theta }_{d^{\prime },s}^{\star } ) d\tau _{d^{\prime }}^{\star }, \end{aligned}$$

where $g(\tau _{d^{\prime }}^{\star }\mid {\theta }_{d^{\prime },s}^{\star } )$ is the pdf of inverse Gaussian distribution with parameters $\mu _{d^{\prime },s}^{\star }$ and $b_{d^{\prime },s}^{\star }$, evaluated at the point $\tau _{d^{\prime }}^{\star }$.

Applying the transformation on $\tau _{d^{\prime }}$ for all $d^{\prime }$ we get that the integral in (A.1) with $\delta _{s}=0$ is same as

$$\begin{aligned} \int _{0}^{\infty } \ldots \int _{0}^{\infty } \prod _{d^{\prime }\ne d} g(\tau _{d^{\prime }}^{\star } \mid {\theta }_{d^{\prime },s}^{\star }) \int _{0}^{\wedge _{d^{\prime }\ne d}\tau _{d^{\prime }}^{\star } } g(\tau _{d}^{\star } \mid {\theta }_{d,s}^{\star }) d\tau _{d}^{\star } \prod _{d^{\prime }\ne d} d\tau _{d^{\prime }}^{\star }. \end{aligned}$$

As c is arbitrary, this shows that the drifts and boundaries are not separately estimable. $\square $

Appendix B: Proof of Theorem 1

Proof

Let ${{\textbf{P}}}({\mu }_{1:d_{0},s})=\{p_{1}({\mu }_{1:d_{0},s}), \dots , p_{d_{0}}({\mu }_{1:d_{0},s}) \}^\textrm{T}$ be the function, given by (4), from $\mathcal{S}_{0,k}$ to unit probability simplex $\Delta ^{d_{0}-1}$. For notational simplicity, we write ${\mu }_{1:d_{0},s} = {\mu }= (\mu _{1}, \dots , \mu _{d_{0}})^\textrm{T}$. We first find the matrix of partial derivative $\nabla {{\textbf{P}}}$ with respect to ${\mu }$.

For ${\mu }\in \mathcal{S}_{0,k}$, $ \textbf{1}^{T} {\mu }=k$, and hence the probability reduces to

$$\begin{aligned} p_{d}\left( {\mu }\right) = \frac{\left( be^{b}\right) ^{d_{0}}}{ (2 \pi )^{d_{0}/2}}\int _{0}^{\infty } \int _{\tau _{d}}^{\infty } \cdots \int _{\tau _{d}}^{\infty } |{\tau }|^{-3/2} \exp \left\{ -\frac{1}{2} \left( \textbf{1}^{T} {\tau }^{-1} \textbf{1} + {\mu }^{T} {\tau }{{\mu }} \right) \right\} d{\tau }_{-d} d\tau _{d}, \end{aligned}$$

for $d=1,\ldots , d_{0}$, where ${\tau }=\textrm{diag}(\tau _{1}, \ldots , \tau _{d_{0}})$, and ${\tau }_{-d}$ is the sub-vector of ${\tau }$ excluding the d-th element. Next, differentiating $p_{d}\left( {\mu }\right) $ with respect to ${\mu }$, we get

$$\begin{aligned} \frac{\partial {p_{d}\left( {\mu }\right) }}{\partial {\mu }}= & {} \displaystyle \frac{\left( be^{b}\right) ^{d_{0}}}{ (2 \pi )^{d_{0}/2}}\int _{0}^{\infty } \int _{\tau _{d}}^{\infty } \cdots \int _{\tau _{d}}^{\infty } |{\tau }|^{-3/2} \left( -{\tau }{\mu }\right) \exp \left\{ -\frac{1}{2} \left( \textbf{1}^{T} {\tau }^{-1} \textbf{1} + {\mu }^{T} {\tau }{{\mu }} \right) \right\} d{\tau }_{-d} d\tau _{d}, \\= & {} \begin{bmatrix} \mu _{1} \eta _{2}&\quad \cdots&\quad \mu _{d-1} \eta _{2}&\quad \mu _{d} \eta _{1}&\quad \mu _{d+1} \eta _{2}&\quad \cdots&\quad \mu _{d_{0}} \eta _{2} \end{bmatrix}^{T}, \end{aligned}$$

where $ \eta _{1} = - E\left\{ \tau _{1} {\mathbb {I}}\left( \tau _{2}> \tau _{1}, \ldots , \tau _{d_{0}}>\tau _{1}\right) \left| {\mu }\right. \right\} $, and $ \eta _{2} = -E\left\{ \tau _{2} {\mathbb {I}}\left( \tau _{2}> \tau _{1}, \ldots , \tau _{d_{0}}>\tau _{1}\right) \left| {\mu }\right. \right\} $, and ${\mathbb {I}}(A)$ is the indicator function of the event A. Here the expectation is considered under the joint distribution of $\left( \tau _{1}, \ldots , \tau _{d}\right) $, which is independent inverse Gaussian. Clearly $\eta _{1}>\eta _{2}>0$.

From the above derivation, it is easy to obtain that

$$\begin{aligned} \nabla {{\textbf{P}}}\left( {\mu }\right) = \begin{bmatrix} \mu _{1} \eta _{1} &{}\quad \mu _{2} \eta _{2} &{}\quad \cdots &{}\quad \mu _{d_{0}} \eta _{2} \\ \mu _{1} \eta _{2} &{}\quad \mu _{2} \eta _{1} &{}\quad \cdots &{}\quad \mu _{d_{0}} \eta _{2} \\ \vdots &{}\quad \vdots &{}\quad \cdots &{}\quad \vdots \\ \mu _{1} \eta _{2} &{}\quad \mu _{2} \eta _{2} &{}\quad \cdots &{}\quad \mu _{d_{0}} \eta _{1} \end{bmatrix} = {{\textbf{M}}}\left\{ \left( \eta _{1}-\eta _{2}\right) I+\eta _{2} \textbf{1} \textbf{1}^{T} \right\} , \end{aligned}$$

where ${{\textbf{M}}}=\textrm{diag}\left( \mu _{1}, \ldots , \mu _{d_{0}}\right) $.

Now, suppose there exists ${\mu }$ and ${\nu }$ in $\mathcal{S}_{k}$ such that ${\mu }\ne {\nu }$ and ${{\textbf{P}}}\left( {\mu }\right) = {{\textbf{P}}}\left( {\nu }\right) $. Define ${\gamma }: [0,1] \rightarrow {\mathbb {R}}^{d_{0}}$ such that ${\gamma }(t)= {\mu }+ t \left( {\nu }- {\mu }\right) $, $t\in [0,1]$. Further, define $h(t)= \langle {{\textbf{P}}}\left( {\gamma }(t) \right) - {{\textbf{P}}}\left( {\mu }\right) , {\nu }- {\mu }\rangle $, as the cross-product of ${{\textbf{P}}}\left( {\gamma }(t) \right) - {{\textbf{P}}}\left( {\mu }\right) $ and ${\nu }-{\mu }$. Then $h(1)=h(0)=0$ under the proposition that ${{\textbf{P}}}\left( {\mu }\right) = {{\textbf{P}}}\left( {\nu }\right) $. Therefore, by the Mean Value Theorem, as ${\mu }\ne {\nu }$, there exists some point $c\in (0,1)$ such that $\left. \partial h(t) / \partial t \right| _{t=c} =0$. Now,

$$\begin{aligned} \frac{\partial h(t) }{ \partial t}= & {} \sum _{d^{\prime }=1}^{d_{0}} \left( \nu _{d^{\prime }} - \mu _{d^{\prime }} \right) \frac{\partial }{\partial t}\left[ p_{d^{\prime }} \left\{ {\gamma }(t) \right\} - p_{d^{\prime }} \left( {\mu }\right) \right] \\= & {} \sum _{d^{\prime }=1}^{d_{0}} \left( \nu _{d^{\prime }} - \mu _{d^{\prime }} \right) \left\{ \frac{\partial }{\partial {\gamma }} p_{d^{\prime }} \left( {\gamma }\right) \right\} ^{T} \frac{\partial {\gamma }(t)}{\partial t}\\= & {} \left( {\nu }-{\mu }\right) ^{T} \nabla {{\textbf{P}}}\{{\gamma }(t)\} \left( {\nu }-{\mu }\right) \\= & {} \left( \eta _{1}-\eta _{2}\right) \left( {\nu }-{\mu }\right) ^{T} {\Gamma }(t) \left( {\nu }-{\mu }\right) + \eta _{2} \left( {\nu }-{\mu }\right) ^{T} M \textbf{1} \textbf{1}^{T} \left( {\nu }-{\mu }\right) \\= & {} \left( \eta _{1}-\eta _{2}\right) \left( {\nu }-{\mu }\right) ^{T} {\Gamma }(t) \left( {\nu }-{\mu }\right) , \end{aligned}$$

as $\textbf{1}^{T} \left( {\nu }-{\mu }\right) =0$, where ${\Gamma }(t)=\textrm{diag}\{ {\gamma }(t) \}$.

As every component of ${\mu }$ and ${\nu }$ is positive, for any $c\in (0,1)$, the matrix ${\Gamma }(c)$ is positive definite. Further, as $\eta _{1}>\eta _{2}$, $\left. \partial h(t) / \partial t \right| _{t=c} =0$ only if $ {\mu }={\nu }$, which contradicts the proposition. $\square $

Appendix C: Algorithm for Minimal Distance Mapping

The problem of finding projection of a point ${\mu }$ onto the space $\mathcal{S}_{k,\varepsilon }$ is equivalent to the following nonlinear optimization problem:

$$\begin{aligned} \textrm{minimize}_{{{\textbf{w}}}} \Vert {{{\textbf{w}}}} -{\mu }\Vert ^2 \quad \text{ such } \text{ that } \quad \sum _{i=1}^{d_{0}} w_{i}=k,\quad w_{i} \ge \varepsilon . \end{aligned}$$

Duchi et al. (2008, Algorithm 1) provides a solution to the problem of projection of a given point ${\mu }$ onto the space $\mathcal{S}_{k,\varepsilon }$ for $\varepsilon =0$, which is modified for any given $\varepsilon $ below.

Appendix D: Proof of Lemma 3

Proof

We consider the unconditional distribution of $\tau _{1:d_{0}}$, given the parameters $\mu _{1:d_{0}}$ as the proposal distribution, g. Clearly, the proposal distribution g and the target conditional joint distribution f satisfies $f(\tau _{1:d_{0}}|\mu _{1:d_{0}})/g(\tau _{1:d_{0}}|\mu _{1:d_{0}})\le M$, where $M^{-1}=P\left( \tau _{d} \le \tau _{1:d_{0}}\right) $. Therefore, for any random sample $U\sim U(0,1)$, $f(\tau _{1:d_{0}}|\mu _{1:d_{0}})\ge M U g(\tau _{1:d_{0}}|\mu _{1:d_{0}})$ if the sample satisfies the condition $\tau _{d} \le \tau _{1:d_{0}}$, and $f(\tau _{1:d_{0}}|\mu _{1:d_{0}})< M U g(\tau _{1:d_{0}}|\mu _{1:d_{0}})$ otherwise. Hence, by Lemma 2.3.1 of Robert and Casella (2004), algorithm above produces samples from the target distribution. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mukhopadhyay, M., McHaney, J.R., Chandrasekaran, B. et al. Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning. Psychometrika (2024). https://doi.org/10.1007/s11336-024-09947-8

Download citation

Received: 12 February 2023
Published: 19 February 2024
DOI: https://doi.org/10.1007/s11336-024-09947-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning

Abstract

Access this article

Similar content being viewed by others

Small is beautiful: In defense of the small-N design

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 401 KB)

Appendices

Appendix

Appendix A: Proof of Lemma 1

Proof

Appendix B: Proof of Theorem 1

Proof

Appendix C: Algorithm for Minimal Distance Mapping

Appendix D: Proof of Lemma 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning

Abstract

Access this article

Similar content being viewed by others

Small is beautiful: In defense of the small-N design

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 401 KB)

Appendices

Appendix

Appendix A: Proof of Lemma 1

Proof

Appendix B: Proof of Theorem 1

Proof

Appendix C: Algorithm for Minimal Distance Mapping

Appendix D: Proof of Lemma 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation