Abstract
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
Similar content being viewed by others
Notes
Muscle force F generated by the biomechanical model is specified as
$$\begin{aligned} F=\rho [\exp (cA)-1], \end{aligned}$$(1)where c is a form parameter accounting for the gain of the feedback from the muscle to the motoneurons pool and \(\rho \) a magnitude parameter directly related to force-generating capability. A is the muscle activation corresponding to
$$\begin{aligned} A=l-\lambda +\mu \dot{l}, \end{aligned}$$(2)where l is the actual muscle length, \(\dot{l}\) the muscle shortening or lengthening velocity and \(\mu \) a damping coefficient due to proprioceptive feedback (Payan and Perrier 1997).
For simplicity, the main text presents the case of sequences of 3 phonemes, without loss of generality. For a general n-phoneme sequence, the proposed cost function would correspond to the perimeter of the corresponding \((n-1)\)-simplex defined by the n control variables in the six-dimensional control space. For the present three-phoneme case, the 2-simplex corresponds to the triangle introduced in the text. Rigorously, influence of every phoneme of the sequence on every other one would be rather modeled by a cost function involving distances between every pair of phonemes. In order to avoid the corresponding quadratic combinatorial growth of the number of terms in the cost function, its definition has been simplified into the one presented here.
References
Attias H (2003) Planning by probabilistic inference. In: Bishop CM, Frey BJ (eds) Proceedings of the ninth international workshop on artificial intelligence and statistics, Key West
Bessière P, Laugier C, Siegwart R (eds) (2008) Probabilistic reasoning and decision making in sensory-motor systems. Springer tracts in advanced robotics, vol 46. Springer, Berlin
Bessière P, Mazer E, Ahuactzin JM, Mekhnacha K (2013) Bayesian programming. CRC Press, Boca Raton
Boutilier C, Dean T, Hanks S (1999) Decision theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 10:1–94
Bowers JS, Davis CJ (2012) Bayesian just-so stories in psychology and neuroscience. Psychol Bull 138(3):389–414
Brown LD (1981) A complete class theorem for statistical problems with finite sample spaces. Ann Stat 9(6):1289–1300
Calliope (1984) La parole et son traitement automatique. Masson, Paris
Colas F, Diard J, Bessière P (2010) Common bayesian models for common cognitive issues. Acta Biotheor 58(2–3):191–216
Daunizeau J, den Ouden HEM, Pessiglione M, Kiebel SJ, Stephan KE, Friston KJ (2010) Observing the observer (I): meta-bayesian models of learning and decision-making. PLoS One 5(12):e15554
Feldman AG (1986) Once more on the equilibrium-point hypothesis (\(\lambda \) model) for motor control. J Mot Behav 18(1):17–54
Friston K (2010) The free-energy principle: a unified brain theory? Nat Rev Neurosci 11(2):127–138
Friston K (2011) What is optimal about motor control? Neuron 72(3):488–498
Friston KJ, Frith CD (2015) Active inference, communication and hermeneutics. Cortex 68:129–143
Friston KJ, Daunizeau J, Kiebel SJ (2009) Reinforcement learning or active inference? PLoS One 4(7):e6421
Friston K, Mattout J, Kilner J (2011) Action understanding and active inference. Biol Cybern 104(1–2):137–160
Friston K, Samothrakis S, Montague R (2012) Active inference and agency: optimal control without cost functions. Biol Cybern 106(8–9):523–541
Ganesh G, Haruno M, Kawato M, Burdet E (2010) Motor memory and local minimization of error and effort, not global optimization, determine motor behavior. J Neurophysiol 104(1):382–390
Goodman ND, Mansinghka VK, Roy DM, Bonawitz K, Tenenbaum JB (2008) Church: a language for generative models. In: Proceedings of the 24th conference on uncertainty in artificial intelligence, vol 22, p 23
Gordon AD, Henzinger TA, Nori AV, Rajamani SK (2014) Probabilistic programming. In: Proceedings of the 36th international conference on software engineering (ICSE 2014, Future of Software Engineering track). ACM, New York, pp 167–181
Guenther FH (1995) Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychol Rev 102(3):594–621
Guenther FH, Hampson M, Johnson D (1998) A theoretical investigation of reference frames for the planning of speech movements. Psychol Rev 105(4):611–633
Hahn U (2014) The Bayesian boom: good thing or bad? Front Psychol 5. Art ID 765
Honda K (1996) Organization of tongue articulation for vowels. J Phon 24:39–52
Jones M, Love B (2011) Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of bayesian models of cognition. Behav Brain Sci 34:169–231
Jordan MI (1996) Computational motor control. In: Gazzaniga MS (ed) The cognitive neurosciences. MIT Press, Cambridge, pp 597–609
Kaelbling L, Littman M, Cassandra A (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134
Kappen HJ, Gómez V, Opper M (2012) Optimal control as a graphical model inference problem. Mach Learn 87(2):159–182
Kawato M (1999) Internal models for motor control and trajectory planning. Curr Opin Neurobiol 9(6):718–727
Laboissière R, Ostry DJ, Feldman AG (1996) The control of multi-muscle systems: human jaw and hyoid movements. Biol Cybern 74(4):373–384
Lebeltel O, Bessière P, Diard J, Mazer E (2004) Bayesian robot programming. Auton Robot 16(1):49–79
Ma WJ (2010) Signal detection theory, uncertainty, and poisson-like population codes. Vis Res 50:2308–2319
Ma WJ (2012) Organizing probabilistic models of perception. Trends Cogn Sci 16(10):511–518
Ma L, Perrier P, Dang J (2006) Anticipatory coarticulation in vowel-consonant-vowel sequences: a crosslinguistic study of french and mandarin speakers. In: Proceedings of the 7th international seminar on speech production. Ubatuba, pp 151–158
Marr D, Vision (1982) A computational investigation into the human representation and processing of visual information. W.H. Freeman, New York
Ménard L (2002) Production et perception des voyelles au cours de la croissance du conduit vocal: variabilité, invariance et normalisation. Unpublished Ph.D. thesis, Université Stendhal de Grenoble
Murphy K (2002) Dynamic bayesian networks: representation, inference and learning. Unpublished Ph.D. thesis, University of California, Berkeley, Berkeley, CA
Nelson W (1983) Physical principles for economies of skilled movements. Biol Cybern 46:135–147
Payan Y, Perrier P (1997) Synthesis of VV sequences with a 2D biomechanical tongue model controlled by the equilibrium point hypothesis. Speech Commun 22(2):185–205
Perkell SJ, Nelson LW (1985) Variability in production of the vowels /i/ and /a/. J Acoust Soc Am 77:1889–1895
Perkell J, Matthies M, Lane H, Guenther F, Wilhelms-Tricarico R, Wozniak J, Guiod P (1997) Speech motor control: acoustic goals, saturation effects, auditory feedback and internal models. Speech Commun 22(2):227–250
Perrier P, Boë LJ, Sock R (1992) Vocal tract area function estimation from midsagittal dimensions with ct scans and a vocal tract castmodeling the transition with two sets of coefficients. J Speech Lang Hear Res 35(1):53–67
Perrier P, Payan Y, Zandipour M, Perkell J (2003) Influences of tongue biomechanics on speech movements during the production of velar stop consonants: a modeling study. J Acoust Soc Am 114(3):1582–1599
Perrier P, Ma L, Payan Y (2005) Modeling the production of VCV sequences via the inversion of a biomechanical model of the tongue. In: Proceedings of interspeech 2005, Lisbon, Portugal, pp 1041–1044
Poggio T, Girosi F (1989) A theory of networks for approximation and learning. Tech. rep., Artificial Intelligence Laboratory & Center for Biological Information Processing, MIT, Cambridge, MA, USA
Pouget A, Beck JM, Ma WJ, Latham PE (2013) Probabilistic brains: knowns and unknowns. Nat Neurosci 16(9):1170–1178
Robert C (2007) The Bayesian choice: from decision-theoretic foundations to computational implementation. Springer, New York
Robert-Ribes J (1995) Modèles d’intégration audiovisuelle de signaux linguistiques: de la perception humaine a la reconnaissance automatique des voyelles. Unpublished Ph.D. thesis, Institut National Polytechnique de Grenoble
Schmolesky MT, Wang Y, Hanes DP, Thompson KG, Leutgeb S, Schall JD, Leventhal AG (1998) Signal timing across the macaque visual system. J Neurophysiol 79(6):3272–3278
Shim JK, Latash ML, Zatsiorsky VM (2003) Prehension synergies: trial-to-trial variability and hierarchical organization of stable performance. Exp Brain Res 152(2):173–184
Todorov E (2004) Optimality principles in sensorimotor control. Nat Neurosci 7(9):907–915
Todorov E, Jordan MI (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5(11):1226–1235
Tourville JA, Reilly KJ, Guenther FH (2008) Neural mechanisms underlying auditory feedback control of speech. Neuroimage 39(3):1429–1443
Toussaint M (2009) Probabilistic inference as a model of planned behavior. Künstl Intell 3(9):23–29
Uno Y, Kawato M, Suzuki R (1989) Formation control of optimal trajectory in human multijoint arm movement: minimum torque-change model. Biol Cybern 61:89–101
Wolpert DM (2007) Probabilistic models in human sensorimotor control. Hum Mov Sci 26:511–524
Acknowledgments
Authors wish to thank Pierre Bessière and Jean-Luc Schwartz for guidance and inspiring conversations.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research leading to these results has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013 Grant Agreement No. 339152, “Speech Unit(e)s,” PI: Jean-Luc-Schwartz).
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Patri, JF., Diard, J. & Perrier, P. Optimal speech motor control and token-to-token variability: a Bayesian modeling approach. Biol Cybern 109, 611–626 (2015). https://doi.org/10.1007/s00422-015-0664-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00422-015-0664-4