RoboASR: A Dynamic Speech Recognition System for Service Robots

Abdelhamid, Abdelaziz A.; Abdulla, Waleed H.; MacDonald, Bruce A.

doi:10.1007/978-3-642-34103-8_49

Abdelaziz A. Abdelhamid²³,
Waleed H. Abdulla²³ &
Bruce A. MacDonald²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7621))

Included in the following conference series:

International Conference on Social Robotics

7086 Accesses
5 Citations

Abstract

This paper proposes a new method for building dynamic speech decoding graphs for state based spoken human-robot interaction (HRI). The current robotic speech recognition systems are based on either finite state grammar (FSG) or statistical N-gram models or a dual FSG and N-gram using a multi-pass decoding. The proposed method is based on merging both FSG and N-gram into a single decoding graph by converting the FSG rules into a weighted finite state acceptor (WFSA) then composing it with a large N-gram based weighted finite state transducer (WFST). This results in a tiny decoding graph that can be used in a single pass decoding. The proposed method is applied in our speech recognition system (RoboASR) for controlling service robots with limited resources. There are three advantages of the proposed approach. First, it takes the advantage of both FSG and N-gram decoders by composing both of them into a single tiny decoding graph. Second, it is robust, the resulting tiny decoding graph is highly accurate due to it fitness to the HRI state. Third, it has a fast response time in comparison to the current state of the art speech recognition systems. The proposed system has a large vocabulary containing 64K words with more than 69K entries. Experimental results show that the average response time is 0.05% of the utterance length and the average ratio between the true and false positives is 89% when tested on 15 interaction scenarios using live speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kanda, T., Shiomi, M., Miyashita, Z., Ishiguro, H., Hagita, N.: Communication robot in a shopping mall. IEEE Transactions on Robotics, 897–913 (2010)
Google Scholar
Paliwal, K.K., Yao, K.: Robust speech recognition under noisy ambient conditions. In: Human-Centric Interfaces for Ambient Intelligence. Academic Press, Elsevier (2009)
Google Scholar
Alonso-Martin, F., Salichs, M.A.: Integration of a voice recognition system in a social robot. IEEE Transactions on Cybernetics and Systems, 215–245 (2011)
Google Scholar
Heinrich, S., Wermter, S.: Towards robust speech recognition for human-robot interaction. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 468–473 (2011)
Google Scholar
Doostdar, M., Schiffer, S., Lakemeyer, G.: A Robust Speech Recognition System for Service-Robotics Applications. In: Iocchi, L., Matsubara, H., Weitzenfeld, A., Zhou, C. (eds.) RoboCup 2008. LNCS, vol. 5399, pp. 1–12. Springer, Heidelberg (2009)
Chapter Google Scholar
Lin, Q., Lubensky, D., Picheny, M., Rao, P.S.: Key-phrase spotting using an integrated language model of N-grams and finite-state grammar. In: Proceedings of the European Conference on Speech Communication and Technology, pp. 255–258 (1997)
Google Scholar
Levit, M., Chang, S., Buntschuh, B.: Garbage modeling with decoys for a sequential recognition scenario. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 468–473 (2009)
Google Scholar
Allauzen, C., Schalkwyk, J.: Generalized composition algorithm for weighted finite state transducers. In: Proceedings of the International Speech Communication Association (2009)
Google Scholar
Rabinar, L., Juang, B.-H.: Fundamental of speech recognition. Prentice-Hall (1993)
Google Scholar
Mohri, M., Pereira, F., Riley, M.: Weighted finite state transducers in speech recognition. Transactions on Computer Speech and Language 16, 69–88 (2002)
Article Google Scholar
Novak, J.R., Minemaysu, N., Hirose, K.: Painless WFST cascade construction for LVCSR-Transducersaurus. In: Proceedings of the International Speech Communication Association (2011)
Google Scholar
Broadbent, E., Jayawardena, C., Kerse, N., Stafford, R.Q., MacDonald, B.A.: Human-robot interaction research to improve quality of life in elder care - An approach and issues. In: Proceedings of the Workshop on Human-Robot Interaction in Elder Care, pp. 7–11 (2011)
Google Scholar
Abdelhamid, A.A., Abdulla, W.H., MacDonald, B.A.: WFST-based large vocabulary continuous speech decoder for service robots. In: Proceedings of the International Conference on Imaging and Signal Processing for Healthcare and Technology, pp. 150–154 (2012)
Google Scholar
Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Proceedings of the APSIPA, pp. 131–137 (2009)
Google Scholar
Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Loof, J., Schluter, R., Ney, H.: The RWTH Aachen university open source speech recognition system. In: Proceedings of the International Conference of Speech Communication Association, pp. 2111–2114 (2009)
Google Scholar
Young, S., Russell, N., Thornton, J.: Token passing: A simple conceptual model for connected speech recognition systems. Tech. Rep. (1989)
Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book. Cambridge University (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Computer Engineering, The University of Auckland, New Zealand
Abdelaziz A. Abdelhamid, Waleed H. Abdulla & Bruce A. MacDonald

Authors

Abdelaziz A. Abdelhamid
View author publications
You can also search for this author in PubMed Google Scholar
Waleed H. Abdulla
View author publications
You can also search for this author in PubMed Google Scholar
Bruce A. MacDonald
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Interactive Digial Media Institute, Social Robotics Laboratory, National University of Singapore, 4 Engineering Drive 3, 117576, Singapore, Singapore
Shuzhi Sam Ge & John-John Cabibihan &
Artificial Intelligence Laboratory, Department of Computer Science, Stanford University, Gates Building, 94305-9010, Stanford, CA, USA
Oussama Khatib
Robotics Institute, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, 15213, Pittsburgh, PA, USA
Reid Simmons
Faculty of Information Technology, University of Technology, Human-robot Collaboration Studio, 2007, Sydney, NSW, Australia
Mary-Anne Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abdelhamid, A.A., Abdulla, W.H., MacDonald, B.A. (2012). RoboASR: A Dynamic Speech Recognition System for Service Robots. In: Ge, S.S., Khatib, O., Cabibihan, JJ., Simmons, R., Williams, MA. (eds) Social Robotics. ICSR 2012. Lecture Notes in Computer Science(), vol 7621. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34103-8_49

Download citation

DOI: https://doi.org/10.1007/978-3-642-34103-8_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34102-1
Online ISBN: 978-3-642-34103-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics