The Use of Artificial Neural Networks in the Speech Understanding Model - SUM

  • Daniel Nehme Müller
  • Mozart Lemos de Siqueira
  • Philippe O. A. Navaux
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4669)


Recent neurocognitive researches demonstrate how the natural processing of auditory sentences occurs. Nowadays, there is not an appropriate human-computer speech interaction, and this constitutes a computational challenge to be overtaked. In this direction, we propose a speech comprehension software architecture to represent the flow of this neurocognitive model. In this architecture, the first step is the speech signal processing to written words and prosody coding. Afterwards, this coding is used as input in syntactic and prosodic-semantic analyses. Both analyses are done concomitantly and their outputs are matched to verify the best result. The computational implementation applies wavelets transforms to speech signal codification and data prosodic extraction and connectionist models to syntactic parsing and prosodic-semantic mapping.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Price, P.: Spoken language understanding. In: Cole, R.A., et al. (eds.) Survey of the State of the Art in Human Language Technology, Cambridge University Press, Stanford, Cambridge (1996)Google Scholar
  2. 2.
    Higashinaka, R., et al.: Incorporating discourse features into confidence scoring of intention recognition results in spoken dialogue systems. Speech Communication 48, 417–436 (2006)CrossRefGoogle Scholar
  3. 3.
    Tufekci, Z., Gowdy, J.N.: Feature extraction using discrete wavelet transform for speech recognition. In: Proc. IEEE Southeastcon 2000, pp. 116–123. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  4. 4.
    Indrebo, K.M., et al.: Sub-banded reconstructed phase spaces for speech recognition. Speech Communication 48, 760–774 (2006)CrossRefGoogle Scholar
  5. 5.
    Wang, Y.-Y., Acero, A.: Rapid development of spoken language understanding grammars. Speech Communication 48, 390–416 (2006)CrossRefGoogle Scholar
  6. 6.
    Erdogan, H.: Using semantic analysis to improve speech recognition performance. Computer Speech and Language 19, 321–343 (2005)CrossRefGoogle Scholar
  7. 7.
    Kurimo, M.: Thematic indexing of spoken documents by using self-organizing maps. Speech Communication 38, 29–45 (2002)zbMATHCrossRefGoogle Scholar
  8. 8.
    Kompe, R.: Prosody in Speech Understanding Systems. Springer, Berlin (1997)Google Scholar
  9. 9.
    Zhang, T., Hasegawa-Johnson, M., Levinson, S.: A hybrid model for spontaneous speech understanding. In: Proceedings of the AAAI Workshop on Spoken Language Understanding, Pittsburgh, pp. 60–67 (2005)Google Scholar
  10. 10.
    Friederici, A.D., Alter, K.: Lateralization of auditory language functions: A dynamic dual pathway model. Brain and Language 89, 267–276 (2004)CrossRefGoogle Scholar
  11. 11.
    Eckstein, K., Friederici, A.D.: Late interaction of syntactic and prosodic processes in sentence comprehension as revealed by erps. Cognitive Brain Research 25, 130–143 (2005)CrossRefGoogle Scholar
  12. 12.
    Heim, S., et al.: Distributed cortical networks for syntax processing: Broca’s area as the common denominator. Brain and Language 85, 402–408 (2003)CrossRefGoogle Scholar
  13. 13.
    Rossi, S., et al.: When word category information encounters morphosyntax: An erp study. Neuroscience Letters 384, 228–233 (2005)CrossRefGoogle Scholar
  14. 14.
    Müller, D.N., de Siqueira, M.L., Navaux, P.O.A.: A connectionist approach to speech understanding. In: Proceedings of 2006 International Joint Conference on Neural Networks - IJCNN’2006, pp. 7181–7188 (July, 2006)Google Scholar
  15. 15.
    Daubechies, I.: Ten lectures on wavelets, Siam (1992)Google Scholar
  16. 16.
    Mallat, S.G.: A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pat. Anal. Mach. Intell. 11, 674–693 (1989)zbMATHCrossRefGoogle Scholar
  17. 17.
    Ricotti, L.P.: Multitapering and a wavelet variant of mfcc in speech recognition. IEEE Proc. Vision, Image and Signal Processing 152, 29–35 (2005)CrossRefGoogle Scholar
  18. 18.
    Kadambe, S., Boudreaux-Bartels, G.F.: Application of the wavelet transform for pitch detection of speech signals. IEEE Trans. Information Theory 38, 917–924 (1992)CrossRefGoogle Scholar
  19. 19.
    Mayberry III, M.R., Miikkulainen, R.: SARDSRN: a neural network shift-reduce parser. In: Proceedings of IJCAI-99, pp. 820–825. Kaufmann (1999)Google Scholar
  20. 20.
    Elias Chan, A.: Pampalk: Growing hierarchical self organising map (ghsom) toolbox: visualisations and enhancemen. In: Proceedings of the 9th International Conference on Neural Information Processing (ICONIP’02), vol. 5, pp. 2537–2541 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Daniel Nehme Müller
    • 1
  • Mozart Lemos de Siqueira
    • 1
  • Philippe O. A. Navaux
    • 1
  1. 1.Federal University of Rio Grande do Sul, Porto Alegre, Rio Grande do SulBrazil

Personalised recommendations