Phonemic Restoration Based on the Movement Continuity of Articulation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10639)


Phonemic restoration describes one of human capabilities that can retrieve the defective speech signal after adding a certain noise. It is believed that the movement continuity of articulation is one of main factors for realizing the phonemic restoration. This paper proposes an effective method based on this consideration to retrieve missing speech signal and makes the relevant hypothesis verified to some degree. For the proposed method, the mapping relationship between acoustic and articulatory features is established based on deep neural network (DNN), where a hierarchical DNN architecture with bottleneck feature is realized to improve the performance for acoustic-to-articulatory inversion, then missing articulatory feature obtained from missing speech signal is restored with cubic spline function. 25 sentences are selected from the database MNGU0 and short durations of the sentences are replaced by zeros and/or noise for evaluating. Experimental results show that the proposed method can effectively improve perceptual evaluation of speech quality (PESQ) of the speech with missing signal. And these experimental results provide preliminary experimental clues for verifying the first hypothesis of phonemic restoration—coarticulation.


Phonemic restoration effect Coarticulation Movement continuity of articulation Deep neural network Spline interpolation 



The research is partially supported by the National Basic Research Program of China (No. 2013CB329301), and the National Natural Science Foundation of China (No. 61233009). Besides, we are especially grateful to the partial support by JSPS KAKENHI Grant (16K00297).


  1. 1.
    Kashino, M.: Phonemic restoration: the brain creates missing speech sounds. Acoust. Sci. Technol. 27(6), 318–321 (2006)CrossRefGoogle Scholar
  2. 2.
    Warren, R.M.: Perceptual restoration of missing speech sounds. Science 167(3917), 392–393 (1970)CrossRefGoogle Scholar
  3. 3.
    Riecke, L., Vanbussel, M., Hausfeld, L.: Hearing an illusory vowel in noise: suppression of auditory cortical activity. J. Neurosci. Off. J. Soc. Neurosci. 32(23), 8024–8034 (2012)CrossRefGoogle Scholar
  4. 4.
    Başkent, D.: Effect of speech degradation on top-down repair: phonemic restoration with simulations of cochlear implants and combined electric-acoustic stimulation. J. Assoc. Res. Otolaryngol. 13(5), 683 (2012)CrossRefGoogle Scholar
  5. 5.
    Liederman, J., Gilbert, K., Fisher, J.M.: Are women more influenced than men by top-down semantic information when listening to disrupted speech? Lang. Speech 54(1), 33–48 (2011)CrossRefGoogle Scholar
  6. 6.
    Başkent, D., Eiler, C.L., Edwards, B.: Phonemic restoration by hearing-impaired listeners with mild to moderate sensorineural hearing loss. Hear. Res. 260(1), 54–62 (2010)CrossRefGoogle Scholar
  7. 7.
    Newman, R.S.: Perceptual restoration in children versus adults. Appl. Psycholinguist. 25(4), 481–493 (2004)CrossRefGoogle Scholar
  8. 8.
    Harding, P., Milner, B.: Speech enhancement by reconstruction from cleaned acoustic features. In: INTERSPEECH, Italy, pp. 1189–1192 (2011)Google Scholar
  9. 9.
    Kolossa, D., Häb-Umbach, R.: Robust Speech Recognition of Uncertain or Missing Data. Springer, Heidelberg (2011)CrossRefMATHGoogle Scholar
  10. 10.
    Devault, D., Sagae, K., Traum, D.R.: Detecting the status of a predictive incremental speech understanding model for real-time decision-making in a spoken dialogue system. In: Interspeech, Italy, pp. 1021–1024 (2011)Google Scholar
  11. 11.
    Cohen, M.M., Massaro, D.W.: Modeling coarticulation in synthetic visual speech. In: Thalmann, N.M., Thalmann, D. (eds.) Models and Techniques in Computer Animation. Springer, Tokyo (1993)Google Scholar
  12. 12.
    Liu, P., Yu, Q., Wu, Z.: A deep recurrent approach for acoustic-to-articulatory inversion. In: International Conference on Acoustics, Speech and Signal Processing, Australia, pp. 4450–4454 (2015)Google Scholar
  13. 13.
    Liu, Z.C., Ling, Z.H., Dai, L.R.: Articulatory-to-acoustic conversion with cascaded prediction of spectral and excitation features using neural networks. In: Interspeech, USA, pp. 1502-1506 (2016)Google Scholar
  14. 14.
    Hinton, G.E.: A Practical guide to training restricted Boltzmann machines. Momentum 9(1), 599–619 (2012)Google Scholar
  15. 15.
    Ren, B., Wang, L., Lu, L., Ueda, Y., Kai, A.: Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition. Multimedia Tools Appl. 75(9), 5093–5108 (2016)CrossRefGoogle Scholar
  16. 16.
    Ueda, Y., Wang, L., Kai, A., Ren, B.: Environment-dependent denoising autoencoder for distant-talking speech recognition. EURASIP J. Adv. Sig. Process. 92(1), 1–11 (2015)Google Scholar
  17. 17.
    Zhang, Z., Wang, L., Kai, A., Odani, K., Li, W., Iwahashi, M.: Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification. EURASIP J. Audio Music Speech Process. 12(1), 1–13 (2015)Google Scholar
  18. 18.
    Niwa, K., Koizumi, Y., Kawase, T.: Pinpoint extraction of distant sound source based on DNN mapping from multiple beamforming outputs to prior SNR. In: International Conference on Acoustics, Speech and Signal Processing, China, pp. 435–439 (2016)Google Scholar
  19. 19.
    Canevari, C., Badino, L., Fadiga, L.: Relevance weighted reconstruction of articulatory features in deep-neural-network-based acoustic-to-articulatory mapping. In: Interspeech, France (2013)Google Scholar
  20. 20.
    Schonle, P.W., Grabe, K., Wenig, P., Hohne, J., Schrader, J., Conrad, B.: Electromagnetic articulography: use of alternating magnetic fields for tracking movements of multiple points inside and outside the vocal tract. Brain Lang. 31(1), 26–35 (1987)CrossRefGoogle Scholar
  21. 21.
    Kawahara, H.: Speech representation and transformation using adaptive interpolation of weighted spectrum: VOCODER revisited. In: International Conference on Acoustics, Speech, and Signal Processing, Germany, p. 1303 (1997)Google Scholar
  22. 22.
    Atal, B.S., Rioul, O.: Neural networks for estimating articulatory positions from speech. J. Acoust. Soc. Am. 86(S1), S67 (1989)CrossRefGoogle Scholar
  23. 23.
    Rahim, A.M., Goodyear, C., Kleijn, B., Schroeter, J., Sondhi, M.: On the use of neural networks in articulatory speech synthesis. J. Acoust. Soc. Am. 93(2), 1109–1121 (1993)CrossRefGoogle Scholar
  24. 24.
    Kjellstom, H., Engwall, O.: Audiovisual-to-articulatory inversion. Speech Commun. 51(3), 195–209 (2009)CrossRefGoogle Scholar
  25. 25.
    Hazewinkel, M.: Spline Interpolation, Encyclopedia of Mathematics. Springer, Dordrecht (2001). (in Russian)Google Scholar
  26. 26.
    Richmond, K., Hoole, P., King, S.: Announcing the electromagnetic articulography (Day 1) subset of the mngu0 articulatory corpus. In: Conference on the International Speech Communication Association, INTERSPEECH 2011, Italy, pp. 1505–1508 (2011)Google Scholar
  27. 27.
    Pennock, S.: Accuracy of the perceptual evaluation of speech quality (PESQ) algorithm. In: Measurement of Speech and Audio Quality in Networks Line Workshop Mesaqin (2002)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Tianjin Key Laboratory of Cognitive Computing and ApplicationTianjin UniversityTianjinChina
  2. 2.Japan Advanced Institute of Science and TechnologyIshikawaJapan

Personalised recommendations