Advertisement

Recognition of Multiple Language Voice Navigation Queries in Traffic Situations

  • Gellért Sárosi
  • Tamás Mozsolics
  • Balázs Tarján
  • András Balog
  • Péter Mihajlik
  • Tibor Fegyó
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6800)

Abstract

This paper introduces our work and results related to a multiple language continuous speech recognition task. The aim was to design a system that introduces tolerable amount of recognition errors for point of interest words in voice navigational queries even in the presence of real-life traffic noise. Additional challenges were that no task-specific training databases were available for language and acoustic modeling. Instead, general purpose acoustic database were obtained and (probabilistic) context free grammars were constructed for the acoustic and language models, respectively. Public pronunciation lexicon was used for the English language, whereas rule- and exception dictionary based pronunciation modeling was applied for French, German, Italian, Spanish and Hungarian. For the last four languages the classical phoneme-based pronunciation modeling approach was compared to grapheme-based pronunciation modeling technique, as well. Noise robustness was addressed by applying various feature extraction methods. The results show that achieving high word recognition accuracy is feasible if cooperative speakers can be assumed.

Keywords

Point of interest speech recognition context free grammar noise robustness feature extraction multiple languages navigation system 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chelba, C., Schalkwyk, J., Brants, T., Ha, V., Harb, B., Neveitt, W., Parada, C., Xu, P.: Query Language Modeling for Voice Search. In: Proceedings of the 2010 IEEE Workshop on Spoken Language Technology (2010)Google Scholar
  2. 2.
    Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Garrett, M., Strope, B.: Google Search by Voice: A Case Study (2010)Google Scholar
  3. 3.
    Yu, D., Ju, Y.-C., Wang, Y.-Y., Zweig, G., Acero, A.: Automated Directory Assistance System - from Theory to Practice. In: INTERSPEECH 2007, pp. 2709–2712 (2007)Google Scholar
  4. 4.
    Lee, S.H., Chung, H., Park, J.G., Young, H.-Y., Lee, Y.: A Commercial Car Navigation System using Korean Large Vocabulary Automatic Speech Recognizer. In: APSIPA 2009 Annual Summit and Conference, pp. 286–289 (2009)Google Scholar
  5. 5.
    Kim, D.-S., Lee, S.-Y., Rhee, M., Kil, R.M.: Auditory Processing of Speech Signals for Robust Speech Recognition in Real-World Noisy Environments. IEEE Transactions on Speech and Audio Processing 7(1), 55–69 (1999)CrossRefGoogle Scholar
  6. 6.
    Milner, B.: A comparison of front-end configurations for robust speech recognition. In: ICASSP 1993, pp. 797–800 (1993)Google Scholar
  7. 7.
    European Language Resource Association, http://catalog.elra.info/
  8. 8.
    Hungarian Telephone Speech Database (Magyar Telefonos Beszéd Adatbázis), http://alpha.tmit.bme.hu/speech/hdbMTBA.php
  9. 9.
    Center for Spoken Language Research of Colorado: Phoenix parser for spontaneous speech, http://cslr.colorado.edu/~whw/phoenix/
  10. 10.
    Harris, T.K.: Bi-grams Generated from Phoenix Grammars and Sparse Data for the Universal Speech Interface. In: Language and Statistics Class Project, CMU (May 2002)Google Scholar
  11. 11.
    CMU Language Compilation Suite for Dialog Systems, https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/logios/
  12. 12.
    A text phonetization system for the MBROLA system, http://tcts.fpms.ac.be/synthesis/mbrola/tts/French/liaphon.tar.gz
  13. 13.
  14. 14.
    British English pronunciation dictionary, http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/beep.html
  15. 15.
    Kanthak, S., Ney, H.: Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition. In: ICASSP 1993, pp. 845–848 (1993)Google Scholar
  16. 16.
    Young, S., Ollason, D., Valtchev, V., Woodland, P.: The HTK book (for HTK version 3.4) (March 2009), http://htk.eng.cam.ac.uk
  17. 17.
    Mauuary, L.: Blind Equalization in the Cepstral Domain for robust Telephone based Speech Recognition. In: Proc. of EUSPICO 1998, vol. 1, pp. 59–363 (1998)Google Scholar
  18. 18.
    Mohri, M., Pereira, F., Riley, M.: Weighted Finite-State Transducers in speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)CrossRefGoogle Scholar
  19. 19.
    Szarvas, M.: Efficient Large Vocabulary Continuous Speech Recognition Using Weighted Finite-state Transducers – The Development of a Hungarian Dictation System. PhD Thesis, Department of Computer Science, Tokyo Institute of Technology, Tokyo (March 2003)Google Scholar
  20. 20.
    CMU Speech Recognition Engine (SphinxTrain 1.0), http://www.speech.cs.cmu.edu/
  21. 21.
    Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 87(4), 1738–1752 (1990)CrossRefGoogle Scholar
  22. 22.
    Yapanel, U.H., Hansen, J.H.L.: A New Perspective on Feature Extraction for Robust In-Vehicle Speech Recognition. In: EUROSPEECH 2003, pp. 1281–1284 (2003)Google Scholar
  23. 23.
    Kim, C., Stern, R.M.: Feature Extraction for Robust Speech Recognition using a Power-Law Nonlinearity and Power-Bias Subtraction. In: INTERSPEECH 2009, pp. 28–31 (2009)Google Scholar
  24. 24.
    Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C., Allerhand, M.H.: Complex sounds and auditory images. In: Cazals, Y., Demany, L., Horner, K. (eds.) Auditory and Perception, pp. 429–446. Pergamon Press, Oxford (1992)CrossRefGoogle Scholar
  25. 25.
    Sárosi, G., Mozsáry, M., Mihajlik, P., Fegyó, T.: Comparison of Feature Extraction Methods for Speech Recognition in Noise-Free and in Traffic Noise Environment. In: Proc. of the 6th Conference on Speech Technology and Human-Computer Dialogue, Romania, Brasov (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Gellért Sárosi
    • 1
  • Tamás Mozsolics
    • 1
    • 2
  • Balázs Tarján
    • 1
  • András Balog
    • 1
    • 2
  • Péter Mihajlik
    • 1
    • 2
  • Tibor Fegyó
    • 1
    • 3
  1. 1.Department of Telecommunications and Media InformaticsBudapest University of Technology and EconomicsHungary
  2. 2.THINKTech Research Center Nonprofit LLC.Hungary
  3. 3.Aitia International Inc.Hungary

Personalised recommendations