Skip to main content

Multidomain Voice Activity Detection during Human-Robot Interaction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8239))

Abstract

The continuous increase of social robots is leading quickly to the cohabitation of humans and social robots at homes. The main way of interaction in these robots is based on verbal communication. Usually social robots are endowed with microphones to receive the voice signal of the people they interact with. However, due to the principle the microphones are based on, they receive all kind of non verbal signals too. Therefore, it is crucial to differentiate whether the received signal is voice or not.

In this work, we present a Voice Activity Detection (VAD) system to manage this problem. In order to achieve it, the audio signal captured by the robot is analyzed on-line and several characteristics, or statistics, are extracted. The statistics belong to three different domains: the time, the frequency, and the time-frequency. The combination of these statistics results in a robust VAD system that, by means of the microphones located in a robot, is able to detect when a person starts to talk and when he ends.

Finally, several experiments are conducted to test the performance of the system. These experiments show a high percentage of success in the classification of different audio signal as voice or unvoice.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aibinu, A., Salami, M., Shafie, A.: Artificial neural network based autoregressive modeling technique with application in voice activity detection. In: Engineering Applications of Artificial Intelligence (2012)

    Google Scholar 

  2. Alonso-Martín, F., Gorostiza, J., Malfaz, M., Salichs, M.: User Localization During Human-Robot Interaction. In: Sensors (2012)

    Google Scholar 

  3. Alonso-Martin, F., Salichs, M.: Integration of a voice recognition system in a social robot. Cybernetics and Systems 42(4), 215–245 (2011)

    Article  Google Scholar 

  4. Bachu, R.G.: Separation of Voiced and Unvoiced using Zero crossing rate and Energy of the Speech Signal. In: American Society for Engineering Education (ASEE) Zone Conference Proceedings, pp. 1–7 (2008)

    Google Scholar 

  5. Burred, J., Lerch, A.: Hierarchical automatic audio signal classification. Journal of the Audio Engineering Society (2004)

    Google Scholar 

  6. Chen, S.-H., Guido, R.C., Truong, T.-K., Chang, Y.: Improved voice activity detection algorithm using wavelet and support vector machine. Computer Speech & Language 24(3), 531–543 (2010)

    Article  Google Scholar 

  7. Cheveigné, A.D., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America (2002)

    Google Scholar 

  8. Cournapeau, D.: Online unsupervised classification with model comparison in the variational bayes framework for voice activity detection. IEEE Journal of Selected Topics in Signal Processing, 1071–1083 (2010)

    Google Scholar 

  9. DesBlache, A., Galand, C., Vermot-Gauchy, R.: Voice activity detection process and means for implementing said process. US Patent 4,672,669 (1987)

    Google Scholar 

  10. Dou, H., Wu, Z., Feng, Y., Qian, Y.: Voice activity detection based on the bispectrum. In: 2010 IEEE 10th International Conference on (2010)

    Google Scholar 

  11. Fiebrink, R., Wang, G., Cook, P.: Support for MIR Prototyping and Real-Time Applications in the ChucK Programming Language. In: ISMIR (2008)

    Google Scholar 

  12. Ghaemmaghami, H., Baker, B.J., Vogt, R.J., Sridharan, S.: Noise robust voice activity detection using features extracted from the time-domain autocorrelation function (2010)

    Google Scholar 

  13. Kim, K., Kim, S.: Quick audio retrieval using multiple feature vectors. IEEE Transactions on Consumer Electronics 52 (2006)

    Google Scholar 

  14. Larson, E., Maddox, R.: Real-time time-domain pitch tracking using wavelets. In: Proceedings of the University of Illinois at Urbana Champaign Research Experience for Undergraduates Program (2005)

    Google Scholar 

  15. McLeod, P., Wyvill, G.: A smarter way to find pitch. In: Proceedings of International Computer Music Conference, ICMC (2005)

    Google Scholar 

  16. Moattar, M.: A new approach for robust realtime voice activity detection using spectral pattern. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4478–4481 (2010)

    Google Scholar 

  17. Moattar, M., Homayounpour, M.: A Simple but efficient real-time voice activity detection algorithm. In: EUSIPCO. EURASIP (2009)

    Google Scholar 

  18. Moattar, M., Homayounpour, M.: A Weighted Feature Voting Approach for Robust and Real-Time Voice Activity Detection. ETRI J. (2011)

    Google Scholar 

  19. Nikias, C., Raghuveer, M.: Bispectrum estimation: A digital signal processing framework. Proceedings of the IEEE (1987)

    Google Scholar 

  20. Yang, X., Tan, B., Ding, J., Zhang, J.: Comparative Study on Voice Activity Detection Algorithm. In: 2010 International Conference on Electrical and Control Engineering (ICECE), Wuhan, pp. 599–602 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alonso-Martin, F., Castro-González, Á., Gorostiza, J.F., Salichs, M.A. (2013). Multidomain Voice Activity Detection during Human-Robot Interaction. In: Herrmann, G., Pearson, M.J., Lenz, A., Bremner, P., Spiers, A., Leonards, U. (eds) Social Robotics. ICSR 2013. Lecture Notes in Computer Science(), vol 8239. Springer, Cham. https://doi.org/10.1007/978-3-319-02675-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02675-6_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02674-9

  • Online ISBN: 978-3-319-02675-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics