Abstract
Understanding how the adult human brain learns novel categories is an important problem in neuroscience. Drift-diffusion models are popular in such contexts for their ability to mimic the underlying neural mechanisms. One such model for gradual longitudinal learning was recently developed in Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021). In practice, category response accuracies are often the only reliable measure recorded by behavioral scientists to describe human learning. Category response accuracies are, however, often the only reliable measure recorded by behavioral scientists to describe human learning. To our knowledge, however, drift-diffusion models for such scenarios have never been considered in the literature before. To address this gap, in this article, we build carefully on Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021), but now with latent response times integrated out, to derive a novel biologically interpretable class of ‘inverse-probit’ categorical probability models for observed categories alone. However, this new marginal model presents significant identifiability and inferential challenges not encountered originally for the joint model in Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021). We address these new challenges using a novel projection-based approach with a symmetry-preserving identifiability constraint that allows us to work with conjugate priors in an unconstrained space. We adapt the model for group and individual-level inference in longitudinal settings. Building again on the model’s latent variable representation, we design an efficient Markov chain Monte Carlo algorithm for posterior computation. We evaluate the empirical performance of the method through simulation experiments. The practical efficacy of the method is illustrated in applications to longitudinal tone learning studies.
Similar content being viewed by others
Notes
We can see this in a simpler example. Suppose we are interested in generating a sample from the conditional distribution of \({\tau }=(\tau _{1},\tau _{2})\) given \(d=\arg \min _{j} \tau _{j}=1\), where \(\tau _{i} \sim \texttt{Uniform}(0,1)\), \(i=1,2\), independently. The conditional density of \(\varvec{\tau }\) given \(d=1\) is \(f_{\varvec{\tau }\mid d} (\tau _{1},\tau _{2})= 0.5\) if \(0<\tau _{1}\le \tau _{2}<1\), and \(=0\) otherwise. However, if we draw \(\tau _{1}\) from \(\texttt{Uniform}(0,1)\) first and let that realization be \(\tau ^{\star }\), and draw \(\tau _{2}\) from the truncated uniform distribution (left truncated at \(\tau ^{\star }\)), then the pdf of the realization of \((\tau _{1},\tau _{2})\) is \(\tau ^{\star -1}\).
References
Agresti, A. (2018). An introduction to categorical data analysis. Wiley.
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, 88, 669–679.
Ashby, F. G., Noble, S., Filoteo, J. V., Waldron, E. M., & Ell, S. W. (2003). Category learning deficits in Parkinson’s disease. Neuropsychology, 17, 115.
Beck, A. (2017). First-order methods in optimization. SIAM.
Bogacz, R., Wagenmakers, E.-J., Forstmann, B. U., & Nieuwenhuis, S. (2010). The neural basis of the speed-accuracy tradeoff. Trends in Neurosciences, 33, 10–16.
Borooah, V. K. (2002). Logit and probit: Ordered and multinomial models. Sage.
Brody, C. D., & Hanks, T. D. (2016). Neural underpinnings of the evidence accumulator. Current Opinion in Neurobiology, 37, 149–157.
Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57, 153–178.
Burgette, L. F., & Nordheim, E. V. (2012). The trace restriction: An alternative identification strategy for the Bayesian multinomial probit model. Journal of Business & Economic Statistics, 30, 404–410.
Burgette, L. F., Puelz, D., & Hahn, P. R. (2021). A symmetric prior for multinomial probit models. Bayesian Analysis, 16, 991–1008.
Cavanagh, J. F., Wiecki, T. V., Cohen, M. X., Figueroa, C. M., Samanta, J., Sherman, S. J., & Frank, M. J. (2011). Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold. Nature Neuroscience, 14, 1462.
Chandrasekaran, B., Yi, H.-G., & Maddox, W. T. (2014). Dual-learning systems during speech category learning. Psychonomic Bulletin & Review, 21, 488–495.
Chandrasekaran, B., Yi, H.-G., Smayda, K. E., & Maddox, W. T. (2016). Effect of explicit dimensional instruction on speech category learning. Attention, Perception, & Psychophysics, 78, 566–582.
Chhikara, R. (1988). The inverse Gaussian distribution: Theory, methodology, and applications. CRC Press.
Chib, S., & Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika, 85, 347–361.
Cox, D. R., & Miller, H. D. (1965). The theory of stochastic processes. CRC Press.
de Boor, C. (1978). A practical guide to splines. Springer.
Deo, S. (2018). Algebraic topology. Texts and Readings in Mathematics (Vol. 27). Hindustan Book Agency.
Ding, L., & Gold, J. I. (2013). The basal ganglia’s contributions to perceptual decision making. Neuron, 79, 640–649.
Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on machine learning (pp. 272–279).
Dufau, S., Grainger, J., & Ziegler, J. C. (2012). How to say “no’’ to a nonword: A leaky competing accumulator model of lexical decision. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38, 1117.
Dunson, D. B., & Neelon, B. (2003). Bayesian inference on order-constrained parameters in generalized linear models. Biometrics, 59, 286–295.
Eilers, P. H., & Marx, B. D. (1996). Flexible smoothing with b-splines and penalties. Statistical Science, 11, 89–102.
Filoteo, J. V., Lauritzen, S., & Maddox, W. T. (2010). Removing the frontal lobes: The effects of engaging executive functions on perceptual category learning. Psychological Science, 21, 415–423.
Glimcher, P. W., & Fehr, E. (2013). Neuroeconomics: Decision making and the brain. Academic Press.
Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annual Review of Neuroscience, 30, 535–574.
Gunn, L. H., & Dunson, D. B. (2005). A transformation approach for incorporating monotone or unimodal constraints. Biostatistics, 6, 434–449.
Heekeren, H. R., Marrett, S., Bandettini, P. A., & Ungerleider, L. G. (2004). A general mechanism for perceptual decision-making in the human brain. Nature, 431, 859.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Johndrow, J., Dunson, D., & Lum, K. (2013). Diagonal orthant multinomial probit models. In Artificial intelligence and statistics (pp. 29–38).
Kim, S., Potter, K., Craigmile, P. F., Peruggia, M., & Van Zandt, T. (2017). A Bayesian race model for recognition memory. Journal of the American Statistical Association, 112, 77–91.
Lau, J. W., & Green, P. J. (2007). Bayesian model-based clustering procedures. Journal of Computational and Graphical Statistics, 16, 526–558.
Leite, F. P., & Ratcliff, R. (2010). Modeling reaction time and accuracy of multiple-alternative decisions. Attention, Perception, & Psychophysics, 72, 246–273.
Llanos, F., McHaney, J. R., Schuerman, W. L., Yi, H. G., Leonard, M. K., & Chandrasekaran, B. (2020). Non-invasive peripheral nerve stimulation selectively enhances speech category learning in adults. NPJ Science of Learning, 1, 1–11.
Lu, J. (1995). Degradation processes and related reliability models. PhD thesis, McGill University, Montreal, Canada.
McHaney, J. R., Tessmer, R., Roark, C. L., & Chandrasekaran, B. (2021). Working memory relates to individual differences in speech category learning: Insights from computational modeling and pupillometry. Brain and Language, 22, 1–15.
Milosavljevic, M., Malmaud, J., Huth, A., Koch, C., & Rangel, A. (2010). The drift diffusion model can account for the accuracy and reaction time of value-based choices under high and low time pressure. Judgment and Decision Making, 5, 437–449.
Morris, J. S. (2015). Functional regression. Annual Review of Statistics and its Application, 2, 321–359.
Parthasarathy, A., Hancock, K. E., Bennett, K., DeGruttola, V., & Polley, D. B. (2020). Bottom-up and top-down neural signatures of disordered multi-talker speech perception in adults with normal hearing. Elife, 9, e51419.
Paulon, G., Llanos, F., Chandrasekaran, B., & Sarkar, A. (2021). Bayesian semiparametric longitudinal drift-diffusion mixed models for tone learning in adults. Journal of the American Statistical Association, 116, 1114–1127.
Peelle, J. E. (2018). Listening effort: How the cognitive consequences of acoustic challenge are reflected in brain and behavior. Ear and Hearing, 39, 204–214.
Purcell, B. A. (2013). Neural mechanisms of perceptual decision making. Vanderbilt University.
Ramsay, J. O., & Silverman, B. W. (2007). Applied functional data analysis: Methods and case studies. Springer.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59.
Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20, 873–922.
Ratcliff, R., & Rouder, J. N. (1998). Modeling response times for two-choice decisions. Psychological Science, 9, 347–356.
Ratcliff, R., Smith, P. L., Brown, S. D., & McKoon, G. (2016). Diffusion decision model: Current issues and history. Trends in Cognitive Sciences, 20, 260–281.
Reetzke, R., Xie, Z., Llanos, F., & Chandrasekaran, B. (2018). Tracing the trajectory of sensory plasticity across different stages of speech learning in adulthood. Current Biology, 28, 1419–1427.
Roark, C. L., Smayda, K. E., & Chandrasekaran, B. (2021). Auditory and visual category learning in musicians and nonmusicians. Journal of Experimental Psychology: General.
Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods. Springer Texts in Statistics (2nd ed.). Springer.
Robison, M. K., & Unsworth, N. (2019). Pupillometry tracks fluctuations in working memory performance. Attention, Perception, & Psychophysics, 81, 407–419.
Ross, S. M., Kelly, J. J., Sullivan, R. J., Perry, W. J., Mercer, D., Davis, R. M., Washburn, T. D., Sager, E. V., Boyce, J. B., & Bristow, V. L. (1996). Stochastic processes. Wiley.
Rudin, W. (1991). Functional analysis. International Series in Pure and Applied Mathematics (2nd ed.). McGraw-Hill Inc.
Schall, J. D. (2001). Neural basis of deciding, choosing and acting. Nature Reviews Neuroscience, 2, 33.
Sen, D., Patra, S., & Dunson, D. (2018). Constrained inference through posterior projections. arXiv preprint arXiv:1812.05741.
Smayda, K. E., Chandrasekaran, B., & Maddox, W. T. (2015). Enhanced cognitive and perceptual processing: A computational basis for the musician advantage in speech learning. Frontiers in Psychology, 1–14.
Smith, P. L., & Ratcliff, R. (2004). Psychology and neurobiology of simple decisions. Trends in Neurosciences, 27, 161–168.
Smith, P. L., & Vickers, D. (1988). The accumulator model of two-choice discrimination. Journal of Mathematical Psychology, 32, 135–168.
Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108, 550.
Wade, S. (2023). Bayesian cluster analysis. Philosophical Transactions of the Royal Society A, 381, 1–20.
Wang, J.-L., Chiou, J.-M., & Müller, H.-G. (2016). Functional data analysis. Annual Review of Statistics and its Application, 3, 257–295.
Wang, Y., Jongman, A., & Sereno, J. A. (2003). Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training. The Journal of the Acoustical Society of America, 113, 1033–1043.
Wang, Y., Spence, M. M., Jongman, A., & Sereno, J. A. (1999). Training American listeners to perceive Mandarin tones. The Journal of the Acoustical Society of America, 106, 3649–3658.
Whitmore, G., & Seshadri, V. (1987). A heuristic derivation of the inverse gaussian distribution. The American Statistician, 41, 280–281.
Winn, M. B., Wendt, D., Koelewijn, T., & Kuchinsky, S. E. (2018). Best practices and advice for using pupillometry to measure listening effort: An introduction for those who want to get started. Trends in Hearing, 22, 1–32.
Zekveld, A. A., Kramer, S. E., & Festen, J. M. (2011). Cognitive load during speech perception in noise: The influence of age, hearing loss, and cognition on the pupil response. Ear and Hearing, 32, 498–510.
Funding
This research was funded by the National Science Foundation grant DMS 1953712 and National Institute on Deafness and Other Communication Disorders Grants R01DC013315 and R01DC015504 awarded to Sarkar and Chandrasekaran.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix
Appendix A: Proof of Lemma 1
Proof
It is easy to check that the offset parameters \(\delta _{s}\) are not identifiable since
Next we will show that the drift parameters and decision boundaries are not separately identifiable, even if we fix offset parameters to a constant.
First note that Eq. (3) can also be represented as
First observe that \(\tau ^{\star }=\wedge _{d^{\prime }\ne d}\tau _{d^{\prime }} =\tau _{-1}^{\star }\wedge \tau _{1}\), where \(\tau _{-1}^{\star }=\wedge _{d^{\prime }\ne \{1,d\}}\tau _{d^{\prime }}\). Thus the integral above can be written as
Proceeding sequentially one can show that the integral above is the same as in (3).
Using the above we express the probability in (3) as in (A.1). As the offset parameter \(\delta _{s}\) is already shown to be not identifiable, we need to fix the same. Without loss of generality, we fix the offset parameter at 0. The probability density function of inverse Gaussian distribution, with parameters \({\theta }_{d^{\prime },s}=(\mu _{d^{\prime },s},b_{d^{\prime },s})\) evaluated at \(\tau _{d^{\prime }}\), \(g(\tau _{d^{\prime }}\mid {\theta }_{d^{\prime },s})\) can be obtained from (1) by replacing \(\delta _{s}=0\) and \(d=d^{\prime }\).
Consider the transformation of \(\tau _{d^{\prime }}\) to \(\tau _{d^{\prime }}^{\star }\) as \(\tau _{d^{\prime }}=c^2\tau _{d^{\prime }}^{\star }\), for some constant \(c>0\), and for all \(d^{\prime }\). Further, define \(b_{d^{\prime },s}^{\star }=b_{d^{\prime },s}/c\) and \(\mu _{d^{\prime },s}^{\star }=c\mu _{d^{\prime },s}\), for all \(d^{\prime }\). Then observe that
where \(g(\tau _{d^{\prime }}^{\star }\mid {\theta }_{d^{\prime },s}^{\star } )\) is the pdf of inverse Gaussian distribution with parameters \(\mu _{d^{\prime },s}^{\star }\) and \(b_{d^{\prime },s}^{\star }\), evaluated at the point \(\tau _{d^{\prime }}^{\star }\).
Applying the transformation on \(\tau _{d^{\prime }}\) for all \(d^{\prime }\) we get that the integral in (A.1) with \(\delta _{s}=0\) is same as
As c is arbitrary, this shows that the drifts and boundaries are not separately estimable. \(\square \)
Appendix B: Proof of Theorem 1
Proof
Let \({{\textbf{P}}}({\mu }_{1:d_{0},s})=\{p_{1}({\mu }_{1:d_{0},s}), \dots , p_{d_{0}}({\mu }_{1:d_{0},s}) \}^\textrm{T}\) be the function, given by (4), from \(\mathcal{S}_{0,k}\) to unit probability simplex \(\Delta ^{d_{0}-1}\). For notational simplicity, we write \({\mu }_{1:d_{0},s} = {\mu }= (\mu _{1}, \dots , \mu _{d_{0}})^\textrm{T}\). We first find the matrix of partial derivative \(\nabla {{\textbf{P}}}\) with respect to \({\mu }\).
For \({\mu }\in \mathcal{S}_{0,k}\), \( \textbf{1}^{T} {\mu }=k\), and hence the probability reduces to
for \(d=1,\ldots , d_{0}\), where \({\tau }=\textrm{diag}(\tau _{1}, \ldots , \tau _{d_{0}})\), and \({\tau }_{-d}\) is the sub-vector of \({\tau }\) excluding the d-th element. Next, differentiating \(p_{d}\left( {\mu }\right) \) with respect to \({\mu }\), we get
where \( \eta _{1} = - E\left\{ \tau _{1} {\mathbb {I}}\left( \tau _{2}> \tau _{1}, \ldots , \tau _{d_{0}}>\tau _{1}\right) \left| {\mu }\right. \right\} \), and \( \eta _{2} = -E\left\{ \tau _{2} {\mathbb {I}}\left( \tau _{2}> \tau _{1}, \ldots , \tau _{d_{0}}>\tau _{1}\right) \left| {\mu }\right. \right\} \), and \({\mathbb {I}}(A)\) is the indicator function of the event A. Here the expectation is considered under the joint distribution of \(\left( \tau _{1}, \ldots , \tau _{d}\right) \), which is independent inverse Gaussian. Clearly \(\eta _{1}>\eta _{2}>0\).
From the above derivation, it is easy to obtain that
where \({{\textbf{M}}}=\textrm{diag}\left( \mu _{1}, \ldots , \mu _{d_{0}}\right) \).
Now, suppose there exists \({\mu }\) and \({\nu }\) in \(\mathcal{S}_{k}\) such that \({\mu }\ne {\nu }\) and \({{\textbf{P}}}\left( {\mu }\right) = {{\textbf{P}}}\left( {\nu }\right) \). Define \({\gamma }: [0,1] \rightarrow {\mathbb {R}}^{d_{0}}\) such that \({\gamma }(t)= {\mu }+ t \left( {\nu }- {\mu }\right) \), \(t\in [0,1]\). Further, define \(h(t)= \langle {{\textbf{P}}}\left( {\gamma }(t) \right) - {{\textbf{P}}}\left( {\mu }\right) , {\nu }- {\mu }\rangle \), as the cross-product of \({{\textbf{P}}}\left( {\gamma }(t) \right) - {{\textbf{P}}}\left( {\mu }\right) \) and \({\nu }-{\mu }\). Then \(h(1)=h(0)=0\) under the proposition that \({{\textbf{P}}}\left( {\mu }\right) = {{\textbf{P}}}\left( {\nu }\right) \). Therefore, by the Mean Value Theorem, as \({\mu }\ne {\nu }\), there exists some point \(c\in (0,1)\) such that \(\left. \partial h(t) / \partial t \right| _{t=c} =0\). Now,
as \(\textbf{1}^{T} \left( {\nu }-{\mu }\right) =0\), where \({\Gamma }(t)=\textrm{diag}\{ {\gamma }(t) \}\).
As every component of \({\mu }\) and \({\nu }\) is positive, for any \(c\in (0,1)\), the matrix \({\Gamma }(c)\) is positive definite. Further, as \(\eta _{1}>\eta _{2}\), \(\left. \partial h(t) / \partial t \right| _{t=c} =0\) only if \( {\mu }={\nu }\), which contradicts the proposition. \(\square \)
Appendix C: Algorithm for Minimal Distance Mapping
The problem of finding projection of a point \({\mu }\) onto the space \(\mathcal{S}_{k,\varepsilon }\) is equivalent to the following nonlinear optimization problem:
Duchi et al. (2008, Algorithm 1) provides a solution to the problem of projection of a given point \({\mu }\) onto the space \(\mathcal{S}_{k,\varepsilon }\) for \(\varepsilon =0\), which is modified for any given \(\varepsilon \) below.
Appendix D: Proof of Lemma 3
Proof
We consider the unconditional distribution of \(\tau _{1:d_{0}}\), given the parameters \(\mu _{1:d_{0}}\) as the proposal distribution, g. Clearly, the proposal distribution g and the target conditional joint distribution f satisfies \(f(\tau _{1:d_{0}}|\mu _{1:d_{0}})/g(\tau _{1:d_{0}}|\mu _{1:d_{0}})\le M\), where \(M^{-1}=P\left( \tau _{d} \le \tau _{1:d_{0}}\right) \). Therefore, for any random sample \(U\sim U(0,1)\), \(f(\tau _{1:d_{0}}|\mu _{1:d_{0}})\ge M U g(\tau _{1:d_{0}}|\mu _{1:d_{0}})\) if the sample satisfies the condition \(\tau _{d} \le \tau _{1:d_{0}}\), and \(f(\tau _{1:d_{0}}|\mu _{1:d_{0}})< M U g(\tau _{1:d_{0}}|\mu _{1:d_{0}})\) otherwise. Hence, by Lemma 2.3.1 of Robert and Casella (2004), algorithm above produces samples from the target distribution. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mukhopadhyay, M., McHaney, J.R., Chandrasekaran, B. et al. Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning. Psychometrika (2024). https://doi.org/10.1007/s11336-024-09947-8
Received:
Published:
DOI: https://doi.org/10.1007/s11336-024-09947-8