Multidomain Voice Activity Detection during Human-Robot Interaction

Alonso-Martin, Fernando; Castro-González, Álvaro; Gorostiza, Javier F.; Salichs, Miguel A.

doi:10.1007/978-3-319-02675-6_7

Multidomain Voice Activity Detection during Human-Robot Interaction

Fernando Alonso-Martin²²,
Álvaro Castro-González²²,
Javier F. Gorostiza²² &
…
Miguel A. Salichs²²

Conference paper

6833 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8239))

Abstract

The continuous increase of social robots is leading quickly to the cohabitation of humans and social robots at homes. The main way of interaction in these robots is based on verbal communication. Usually social robots are endowed with microphones to receive the voice signal of the people they interact with. However, due to the principle the microphones are based on, they receive all kind of non verbal signals too. Therefore, it is crucial to differentiate whether the received signal is voice or not.

In this work, we present a Voice Activity Detection (VAD) system to manage this problem. In order to achieve it, the audio signal captured by the robot is analyzed on-line and several characteristics, or statistics, are extracted. The statistics belong to three different domains: the time, the frequency, and the time-frequency. The combination of these statistics results in a robust VAD system that, by means of the microphones located in a robot, is able to detect when a person starts to talk and when he ends.

Finally, several experiments are conducted to test the performance of the system. These experiments show a high percentage of success in the classification of different audio signal as voice or unvoice.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aibinu, A., Salami, M., Shafie, A.: Artificial neural network based autoregressive modeling technique with application in voice activity detection. In: Engineering Applications of Artificial Intelligence (2012)
Google Scholar
Alonso-Martín, F., Gorostiza, J., Malfaz, M., Salichs, M.: User Localization During Human-Robot Interaction. In: Sensors (2012)
Google Scholar
Alonso-Martin, F., Salichs, M.: Integration of a voice recognition system in a social robot. Cybernetics and Systems 42(4), 215–245 (2011)
Article Google Scholar
Bachu, R.G.: Separation of Voiced and Unvoiced using Zero crossing rate and Energy of the Speech Signal. In: American Society for Engineering Education (ASEE) Zone Conference Proceedings, pp. 1–7 (2008)
Google Scholar
Burred, J., Lerch, A.: Hierarchical automatic audio signal classification. Journal of the Audio Engineering Society (2004)
Google Scholar
Chen, S.-H., Guido, R.C., Truong, T.-K., Chang, Y.: Improved voice activity detection algorithm using wavelet and support vector machine. Computer Speech & Language 24(3), 531–543 (2010)
Article Google Scholar
Cheveigné, A.D., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America (2002)
Google Scholar
Cournapeau, D.: Online unsupervised classification with model comparison in the variational bayes framework for voice activity detection. IEEE Journal of Selected Topics in Signal Processing, 1071–1083 (2010)
Google Scholar
DesBlache, A., Galand, C., Vermot-Gauchy, R.: Voice activity detection process and means for implementing said process. US Patent 4,672,669 (1987)
Google Scholar
Dou, H., Wu, Z., Feng, Y., Qian, Y.: Voice activity detection based on the bispectrum. In: 2010 IEEE 10th International Conference on (2010)
Google Scholar
Fiebrink, R., Wang, G., Cook, P.: Support for MIR Prototyping and Real-Time Applications in the ChucK Programming Language. In: ISMIR (2008)
Google Scholar
Ghaemmaghami, H., Baker, B.J., Vogt, R.J., Sridharan, S.: Noise robust voice activity detection using features extracted from the time-domain autocorrelation function (2010)
Google Scholar
Kim, K., Kim, S.: Quick audio retrieval using multiple feature vectors. IEEE Transactions on Consumer Electronics 52 (2006)
Google Scholar
Larson, E., Maddox, R.: Real-time time-domain pitch tracking using wavelets. In: Proceedings of the University of Illinois at Urbana Champaign Research Experience for Undergraduates Program (2005)
Google Scholar
McLeod, P., Wyvill, G.: A smarter way to find pitch. In: Proceedings of International Computer Music Conference, ICMC (2005)
Google Scholar
Moattar, M.: A new approach for robust realtime voice activity detection using spectral pattern. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4478–4481 (2010)
Google Scholar
Moattar, M., Homayounpour, M.: A Simple but efficient real-time voice activity detection algorithm. In: EUSIPCO. EURASIP (2009)
Google Scholar
Moattar, M., Homayounpour, M.: A Weighted Feature Voting Approach for Robust and Real-Time Voice Activity Detection. ETRI J. (2011)
Google Scholar
Nikias, C., Raghuveer, M.: Bispectrum estimation: A digital signal processing framework. Proceedings of the IEEE (1987)
Google Scholar
Yang, X., Tan, B., Ding, J., Zhang, J.: Comparative Study on Voice Activity Detection Algorithm. In: 2010 International Conference on Electrical and Control Engineering (ICECE), Wuhan, pp. 599–602 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Carlos III Madrid, Spain
Fernando Alonso-Martin, Álvaro Castro-González, Javier F. Gorostiza & Miguel A. Salichs

Authors

Fernando Alonso-Martin
View author publications
You can also search for this author in PubMed Google Scholar
Álvaro Castro-González
View author publications
You can also search for this author in PubMed Google Scholar
Javier F. Gorostiza
View author publications
You can also search for this author in PubMed Google Scholar
Miguel A. Salichs
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bristol Robotics Laboratory, University of the West of England, BS348QZ, Bristol, UK
Guido Herrmann
Bristol Robotics Laboratory, University of the West of England, BS161QD, UK
Martin J. Pearson
Bristol Robotics Laboratory, University of the West of England, BS161QY, Bristol, UK
Alexander Lenz , Paul Bremner , Adam Spiers & Ute Leonards , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alonso-Martin, F., Castro-González, Á., Gorostiza, J.F., Salichs, M.A. (2013). Multidomain Voice Activity Detection during Human-Robot Interaction. In: Herrmann, G., Pearson, M.J., Lenz, A., Bremner, P., Spiers, A., Leonards, U. (eds) Social Robotics. ICSR 2013. Lecture Notes in Computer Science(), vol 8239. Springer, Cham. https://doi.org/10.1007/978-3-319-02675-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-02675-6_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02674-9
Online ISBN: 978-3-319-02675-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics