A Robust Speech Recognition System for Service-Robotics Applications

  • Masrur Doostdar
  • Stefan Schiffer
  • Gerhard Lakemeyer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5399)


Mobile service robots in human environments need to have versatile abilities to perceive and to interact with their environment. Spoken language is a natural way to interact with a robot, in general, and to instruct it, in particular. However, most existing speech recognition systems often suffer from high environmental noise present in the target domain and they require in-depth knowledge of the underlying theory in case of necessary adaptation to reach the desired accuracy. We propose and evaluate an architecture for a robust speaker independent speech recognition system using off-the-shelf technology and simple additional methods. We first use close speech detection to segment closed utterances which alleviates the recognition process. By further utilizing a combination of an FSG based and an N-gram based speech decoder we reduce false positive recognitions while achieving high accuracy.


Linear Discriminant Analysis Speech Recognition Language Model False Recognition Speech Recognition System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    van der Zant, T., Wisspeintner, T.: Robocup x: A proposal for a new league where robocup goes real world. In: Bredenfeld, A., et al. (eds.) RoboCup 2005. LNCS, vol. 4020, pp. 166–172. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Huang, X., Alleva, F., Hon, H.W., Hwang, M.Y., Rosenfeld, R.: The SPHINX-II speech recognition system: an overview. Computer Speech and Language 7(2), 137–148 (1993)CrossRefGoogle Scholar
  3. 3.
    Lamel, L., Rabiner, L., Rosenberg, A., Wilpon, J.: An improved endpoint detector for isolated word recognition. IEEE Trans. on Acoustics, Speech, and Signal Processing [see also IEEE Trans. on Signal Processing] 29(4), 777–785 (1981)CrossRefGoogle Scholar
  4. 4.
    Macho, D., Padrell, J., Abad, A., Nadeu, C., Hernando, J., McDonough, J., Wolfel, M., Klee, U., Omologo, M., Brutti, A., Svaizer, P., Potamianos, G., Chu, S.: Automatic speech activity detection, source localization, and speech recognition on the chil seminar corpus. In: IEEE Int. Conf. on Multimedia and Expo, 2005 (ICME 2005), July 6, pp. 876–879 (2005)Google Scholar
  5. 5.
    Padrell, J., Macho, D., Nadeu, C.: Robust speech activity detection using lda applied to ff parameters. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2005), March 18-23, vol. 1, pp. 557–560 (2005)Google Scholar
  6. 6.
    Rentzeperis, E., Stergiou, A., Boukis, C., Souretis, G., Pnevmatikakis, A., Polymenakos, L.: An Adaptive Speech Activity Detector Based on Signal Energy and LDA. In: 3rd Joint Workshop on Multi-Modal Interaction and Related Machine Learning Algorithms (2006)Google Scholar
  7. 7.
    Ruhi Sarikaya, J.H.L.H.: Robust Speech Activity Detection in the Presence of Noise. In: Proc. of the 5th Int. Conf. on Spoken Language Processing (1998)Google Scholar
  8. 8.
    Lin, Q., Lubensky, D., Picheny, M., Rao, P.S.: Key-phrase spotting using an integrated language model of n-grams and finite-state grammar. In: Proc. of the 5th European Conference on Speech Communication and Technology (EUROSPEECH 1997), pp. 255–258 (1997)Google Scholar
  9. 9.
    Wessel, F., Schlüter, R., Macherey, K., Ney, H.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. on Speech and Audio Processing 9(3), 288–298 (2001)CrossRefGoogle Scholar
  10. 10.
    Seymore, K., Chen, S., Doh, S., Eskenazi, M., Gouvea, E., Raj, B., Ravishankar, M., Rosenfeld, R., Siegler, M., Stern, R., Thayer, E.: The 1997 CMU Sphinx-3 English Broadcast News transcription system. In: Proc. of the DARPA Speech Recognition Workshop (1998)Google Scholar
  11. 11.
    Calmes, L., Lakemeyer, G., Wagner, H.: Azimuthal sound localization using coincidence of timing across frequency on a robotic platform. Journal of the Acoustical Society of America 121(4), 2034–2048 (2007)CrossRefGoogle Scholar
  12. 12.
    Calmes, L., Wagner, H., Schiffer, S., Lakemeyer, G.: Combining sound localization and laser based object recognition. In: Tapus, A., Michalowski, M., Sabanovic, S. (eds.) Papers from the AAAI Spring Symposium, Stanford, CA, pp. 1–6. AAAI Press, Menlo Park (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Masrur Doostdar
    • 1
  • Stefan Schiffer
    • 1
  • Gerhard Lakemeyer
    • 1
  1. 1.Knowledge-based Systems Group Department of Computer Science 5RWTH Aachen UniversityGermany

Personalised recommendations