Optimal speech motor control and token-to-token variability: a Bayesian modeling approach

Patri, Jean-François; Diard, Julien; Perrier, Pascal

doi:10.1007/s00422-015-0664-4

Optimal speech motor control and token-to-token variability: a Bayesian modeling approach

Original Article
Published: 26 October 2015

Volume 109, pages 611–626, (2015)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

Jean-François Patri^1,2,
Julien Diard^3,4 &
Pascal Perrier^1,2

479 Accesses
21 Citations
Explore all metrics

Abstract

The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gait change in tongue movement

Article Open access 16 August 2021

Speech rhythms and their neural foundations

Article 06 May 2020

Sigma-Lognormal Modeling of Speech

Article Open access 07 February 2021

Notes

Muscle force F generated by the biomechanical model is specified as
$$\begin{aligned} F=\rho [\exp (cA)-1], \end{aligned}$$
(1)
where c is a form parameter accounting for the gain of the feedback from the muscle to the motoneurons pool and $\rho $ a magnitude parameter directly related to force-generating capability. A is the muscle activation corresponding to
$$\begin{aligned} A=l-\lambda +\mu \dot{l}, \end{aligned}$$
(2)
where l is the actual muscle length, $\dot{l}$ the muscle shortening or lengthening velocity and $\mu $ a damping coefficient due to proprioceptive feedback (Payan and Perrier 1997).
For simplicity, the main text presents the case of sequences of 3 phonemes, without loss of generality. For a general n-phoneme sequence, the proposed cost function would correspond to the perimeter of the corresponding $(n-1)$-simplex defined by the n control variables in the six-dimensional control space. For the present three-phoneme case, the 2-simplex corresponds to the triangle introduced in the text. Rigorously, influence of every phoneme of the sequence on every other one would be rather modeled by a cost function involving distances between every pair of phonemes. In order to avoid the corresponding quadratic combinatorial growth of the number of terms in the cost function, its definition has been simplified into the one presented here.

References

Attias H (2003) Planning by probabilistic inference. In: Bishop CM, Frey BJ (eds) Proceedings of the ninth international workshop on artificial intelligence and statistics, Key West
Bessière P, Laugier C, Siegwart R (eds) (2008) Probabilistic reasoning and decision making in sensory-motor systems. Springer tracts in advanced robotics, vol 46. Springer, Berlin
Bessière P, Mazer E, Ahuactzin JM, Mekhnacha K (2013) Bayesian programming. CRC Press, Boca Raton
Google Scholar
Boutilier C, Dean T, Hanks S (1999) Decision theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 10:1–94
Google Scholar
Bowers JS, Davis CJ (2012) Bayesian just-so stories in psychology and neuroscience. Psychol Bull 138(3):389–414
Article PubMed Google Scholar
Brown LD (1981) A complete class theorem for statistical problems with finite sample spaces. Ann Stat 9(6):1289–1300
Article Google Scholar
Calliope (1984) La parole et son traitement automatique. Masson, Paris
Colas F, Diard J, Bessière P (2010) Common bayesian models for common cognitive issues. Acta Biotheor 58(2–3):191–216
Article PubMed Google Scholar
Daunizeau J, den Ouden HEM, Pessiglione M, Kiebel SJ, Stephan KE, Friston KJ (2010) Observing the observer (I): meta-bayesian models of learning and decision-making. PLoS One 5(12):e15554
Article PubMed Central CAS PubMed Google Scholar
Feldman AG (1986) Once more on the equilibrium-point hypothesis ($\lambda $ model) for motor control. J Mot Behav 18(1):17–54
Article CAS PubMed Google Scholar
Friston K (2010) The free-energy principle: a unified brain theory? Nat Rev Neurosci 11(2):127–138
Article CAS PubMed Google Scholar
Friston K (2011) What is optimal about motor control? Neuron 72(3):488–498
Article CAS PubMed Google Scholar
Friston KJ, Frith CD (2015) Active inference, communication and hermeneutics. Cortex 68:129–143
Article PubMed Central PubMed Google Scholar
Friston KJ, Daunizeau J, Kiebel SJ (2009) Reinforcement learning or active inference? PLoS One 4(7):e6421
Article PubMed Central PubMed Google Scholar
Friston K, Mattout J, Kilner J (2011) Action understanding and active inference. Biol Cybern 104(1–2):137–160
Article PubMed Central PubMed Google Scholar
Friston K, Samothrakis S, Montague R (2012) Active inference and agency: optimal control without cost functions. Biol Cybern 106(8–9):523–541
Article PubMed Google Scholar
Ganesh G, Haruno M, Kawato M, Burdet E (2010) Motor memory and local minimization of error and effort, not global optimization, determine motor behavior. J Neurophysiol 104(1):382–390
Article CAS PubMed Google Scholar
Goodman ND, Mansinghka VK, Roy DM, Bonawitz K, Tenenbaum JB (2008) Church: a language for generative models. In: Proceedings of the 24th conference on uncertainty in artificial intelligence, vol 22, p 23
Gordon AD, Henzinger TA, Nori AV, Rajamani SK (2014) Probabilistic programming. In: Proceedings of the 36th international conference on software engineering (ICSE 2014, Future of Software Engineering track). ACM, New York, pp 167–181
Guenther FH (1995) Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychol Rev 102(3):594–621
Article CAS PubMed Google Scholar
Guenther FH, Hampson M, Johnson D (1998) A theoretical investigation of reference frames for the planning of speech movements. Psychol Rev 105(4):611–633
Article CAS PubMed Google Scholar
Hahn U (2014) The Bayesian boom: good thing or bad? Front Psychol 5. Art ID 765
Honda K (1996) Organization of tongue articulation for vowels. J Phon 24:39–52
Article Google Scholar
Jones M, Love B (2011) Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of bayesian models of cognition. Behav Brain Sci 34:169–231
Article PubMed Google Scholar
Jordan MI (1996) Computational motor control. In: Gazzaniga MS (ed) The cognitive neurosciences. MIT Press, Cambridge, pp 597–609
Google Scholar
Kaelbling L, Littman M, Cassandra A (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134
Article Google Scholar
Kappen HJ, Gómez V, Opper M (2012) Optimal control as a graphical model inference problem. Mach Learn 87(2):159–182
Article Google Scholar
Kawato M (1999) Internal models for motor control and trajectory planning. Curr Opin Neurobiol 9(6):718–727
Article CAS PubMed Google Scholar
Laboissière R, Ostry DJ, Feldman AG (1996) The control of multi-muscle systems: human jaw and hyoid movements. Biol Cybern 74(4):373–384
Article PubMed Google Scholar
Lebeltel O, Bessière P, Diard J, Mazer E (2004) Bayesian robot programming. Auton Robot 16(1):49–79
Article Google Scholar
Ma WJ (2010) Signal detection theory, uncertainty, and poisson-like population codes. Vis Res 50:2308–2319
Article PubMed Google Scholar
Ma WJ (2012) Organizing probabilistic models of perception. Trends Cogn Sci 16(10):511–518
Article PubMed Google Scholar
Ma L, Perrier P, Dang J (2006) Anticipatory coarticulation in vowel-consonant-vowel sequences: a crosslinguistic study of french and mandarin speakers. In: Proceedings of the 7th international seminar on speech production. Ubatuba, pp 151–158
Marr D, Vision (1982) A computational investigation into the human representation and processing of visual information. W.H. Freeman, New York
Google Scholar
Ménard L (2002) Production et perception des voyelles au cours de la croissance du conduit vocal: variabilité, invariance et normalisation. Unpublished Ph.D. thesis, Université Stendhal de Grenoble
Murphy K (2002) Dynamic bayesian networks: representation, inference and learning. Unpublished Ph.D. thesis, University of California, Berkeley, Berkeley, CA
Nelson W (1983) Physical principles for economies of skilled movements. Biol Cybern 46:135–147
Article CAS PubMed Google Scholar
Payan Y, Perrier P (1997) Synthesis of VV sequences with a 2D biomechanical tongue model controlled by the equilibrium point hypothesis. Speech Commun 22(2):185–205
Article Google Scholar
Perkell SJ, Nelson LW (1985) Variability in production of the vowels /i/ and /a/. J Acoust Soc Am 77:1889–1895
Article CAS PubMed Google Scholar
Perkell J, Matthies M, Lane H, Guenther F, Wilhelms-Tricarico R, Wozniak J, Guiod P (1997) Speech motor control: acoustic goals, saturation effects, auditory feedback and internal models. Speech Commun 22(2):227–250
Article Google Scholar
Perrier P, Boë LJ, Sock R (1992) Vocal tract area function estimation from midsagittal dimensions with ct scans and a vocal tract castmodeling the transition with two sets of coefficients. J Speech Lang Hear Res 35(1):53–67
Article CAS Google Scholar
Perrier P, Payan Y, Zandipour M, Perkell J (2003) Influences of tongue biomechanics on speech movements during the production of velar stop consonants: a modeling study. J Acoust Soc Am 114(3):1582–1599
Article PubMed Google Scholar
Perrier P, Ma L, Payan Y (2005) Modeling the production of VCV sequences via the inversion of a biomechanical model of the tongue. In: Proceedings of interspeech 2005, Lisbon, Portugal, pp 1041–1044
Poggio T, Girosi F (1989) A theory of networks for approximation and learning. Tech. rep., Artificial Intelligence Laboratory & Center for Biological Information Processing, MIT, Cambridge, MA, USA
Pouget A, Beck JM, Ma WJ, Latham PE (2013) Probabilistic brains: knowns and unknowns. Nat Neurosci 16(9):1170–1178
Article PubMed Central CAS PubMed Google Scholar
Robert C (2007) The Bayesian choice: from decision-theoretic foundations to computational implementation. Springer, New York
Robert-Ribes J (1995) Modèles d’intégration audiovisuelle de signaux linguistiques: de la perception humaine a la reconnaissance automatique des voyelles. Unpublished Ph.D. thesis, Institut National Polytechnique de Grenoble
Schmolesky MT, Wang Y, Hanes DP, Thompson KG, Leutgeb S, Schall JD, Leventhal AG (1998) Signal timing across the macaque visual system. J Neurophysiol 79(6):3272–3278
CAS PubMed Google Scholar
Shim JK, Latash ML, Zatsiorsky VM (2003) Prehension synergies: trial-to-trial variability and hierarchical organization of stable performance. Exp Brain Res 152(2):173–184
Article PubMed Central PubMed Google Scholar
Todorov E (2004) Optimality principles in sensorimotor control. Nat Neurosci 7(9):907–915
Article PubMed Central CAS PubMed Google Scholar
Todorov E, Jordan MI (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5(11):1226–1235
Article CAS PubMed Google Scholar
Tourville JA, Reilly KJ, Guenther FH (2008) Neural mechanisms underlying auditory feedback control of speech. Neuroimage 39(3):1429–1443
Article PubMed Central PubMed Google Scholar
Toussaint M (2009) Probabilistic inference as a model of planned behavior. Künstl Intell 3(9):23–29
Google Scholar
Uno Y, Kawato M, Suzuki R (1989) Formation control of optimal trajectory in human multijoint arm movement: minimum torque-change model. Biol Cybern 61:89–101
Article CAS PubMed Google Scholar
Wolpert DM (2007) Probabilistic models in human sensorimotor control. Hum Mov Sci 26:511–524
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgments

Authors wish to thank Pierre Bessière and Jean-Luc Schwartz for guidance and inspiring conversations.

Author information

Authors and Affiliations

GIPSA-Lab, Université Grenoble Alpes, 11 Rue des Mathématiques, Saint-Martin-d’Hères, F-38000, Grenoble, France
Jean-François Patri & Pascal Perrier
GIPSA-Lab, CNRS, F-38000, Grenoble, France
Jean-François Patri & Pascal Perrier
LPNC, Université Grenoble Alpes, 1251 Avenue Centrale, Saint-Martin-d’Hères, F-38000, Grenoble, France
Julien Diard
LPNC, CNRS, F-38000, Grenoble, France
Julien Diard

Authors

Jean-François Patri
View author publications
You can also search for this author in PubMed Google Scholar
Julien Diard
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Perrier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean-François Patri.

Additional information

The research leading to these results has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013 Grant Agreement No. 339152, “Speech Unit(e)s,” PI: Jean-Luc-Schwartz).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 56 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patri, JF., Diard, J. & Perrier, P. Optimal speech motor control and token-to-token variability: a Bayesian modeling approach. Biol Cybern 109, 611–626 (2015). https://doi.org/10.1007/s00422-015-0664-4

Download citation

Received: 20 April 2015
Accepted: 30 September 2015
Published: 26 October 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s00422-015-0664-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal speech motor control and token-to-token variability: a Bayesian modeling approach

Abstract

Access this article

Similar content being viewed by others

Gait change in tongue movement

Speech rhythms and their neural foundations

Sigma-Lognormal Modeling of Speech

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 56 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal speech motor control and token-to-token variability: a Bayesian modeling approach

Abstract

Access this article

Similar content being viewed by others

Gait change in tongue movement

Speech rhythms and their neural foundations

Sigma-Lognormal Modeling of Speech

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 56 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation