Skip to main content

Multidomain Voice Activity Detection during Human-Robot Interaction

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8239)

Abstract

The continuous increase of social robots is leading quickly to the cohabitation of humans and social robots at homes. The main way of interaction in these robots is based on verbal communication. Usually social robots are endowed with microphones to receive the voice signal of the people they interact with. However, due to the principle the microphones are based on, they receive all kind of non verbal signals too. Therefore, it is crucial to differentiate whether the received signal is voice or not.

In this work, we present a Voice Activity Detection (VAD) system to manage this problem. In order to achieve it, the audio signal captured by the robot is analyzed on-line and several characteristics, or statistics, are extracted. The statistics belong to three different domains: the time, the frequency, and the time-frequency. The combination of these statistics results in a robust VAD system that, by means of the microphones located in a robot, is able to detect when a person starts to talk and when he ends.

Finally, several experiments are conducted to test the performance of the system. These experiments show a high percentage of success in the classification of different audio signal as voice or unvoice.

Keywords

  • Audio Signal
  • Social Robot
  • Voice Activity Detection
  • Human Voice
  • Voice Signal

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-02675-6_7
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-02675-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aibinu, A., Salami, M., Shafie, A.: Artificial neural network based autoregressive modeling technique with application in voice activity detection. In: Engineering Applications of Artificial Intelligence (2012)

    Google Scholar 

  2. Alonso-Martín, F., Gorostiza, J., Malfaz, M., Salichs, M.: User Localization During Human-Robot Interaction. In: Sensors (2012)

    Google Scholar 

  3. Alonso-Martin, F., Salichs, M.: Integration of a voice recognition system in a social robot. Cybernetics and Systems 42(4), 215–245 (2011)

    CrossRef  Google Scholar 

  4. Bachu, R.G.: Separation of Voiced and Unvoiced using Zero crossing rate and Energy of the Speech Signal. In: American Society for Engineering Education (ASEE) Zone Conference Proceedings, pp. 1–7 (2008)

    Google Scholar 

  5. Burred, J., Lerch, A.: Hierarchical automatic audio signal classification. Journal of the Audio Engineering Society (2004)

    Google Scholar 

  6. Chen, S.-H., Guido, R.C., Truong, T.-K., Chang, Y.: Improved voice activity detection algorithm using wavelet and support vector machine. Computer Speech & Language 24(3), 531–543 (2010)

    CrossRef  Google Scholar 

  7. Cheveigné, A.D., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America (2002)

    Google Scholar 

  8. Cournapeau, D.: Online unsupervised classification with model comparison in the variational bayes framework for voice activity detection. IEEE Journal of Selected Topics in Signal Processing, 1071–1083 (2010)

    Google Scholar 

  9. DesBlache, A., Galand, C., Vermot-Gauchy, R.: Voice activity detection process and means for implementing said process. US Patent 4,672,669 (1987)

    Google Scholar 

  10. Dou, H., Wu, Z., Feng, Y., Qian, Y.: Voice activity detection based on the bispectrum. In: 2010 IEEE 10th International Conference on (2010)

    Google Scholar 

  11. Fiebrink, R., Wang, G., Cook, P.: Support for MIR Prototyping and Real-Time Applications in the ChucK Programming Language. In: ISMIR (2008)

    Google Scholar 

  12. Ghaemmaghami, H., Baker, B.J., Vogt, R.J., Sridharan, S.: Noise robust voice activity detection using features extracted from the time-domain autocorrelation function (2010)

    Google Scholar 

  13. Kim, K., Kim, S.: Quick audio retrieval using multiple feature vectors. IEEE Transactions on Consumer Electronics 52 (2006)

    Google Scholar 

  14. Larson, E., Maddox, R.: Real-time time-domain pitch tracking using wavelets. In: Proceedings of the University of Illinois at Urbana Champaign Research Experience for Undergraduates Program (2005)

    Google Scholar 

  15. McLeod, P., Wyvill, G.: A smarter way to find pitch. In: Proceedings of International Computer Music Conference, ICMC (2005)

    Google Scholar 

  16. Moattar, M.: A new approach for robust realtime voice activity detection using spectral pattern. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4478–4481 (2010)

    Google Scholar 

  17. Moattar, M., Homayounpour, M.: A Simple but efficient real-time voice activity detection algorithm. In: EUSIPCO. EURASIP (2009)

    Google Scholar 

  18. Moattar, M., Homayounpour, M.: A Weighted Feature Voting Approach for Robust and Real-Time Voice Activity Detection. ETRI J. (2011)

    Google Scholar 

  19. Nikias, C., Raghuveer, M.: Bispectrum estimation: A digital signal processing framework. Proceedings of the IEEE (1987)

    Google Scholar 

  20. Yang, X., Tan, B., Ding, J., Zhang, J.: Comparative Study on Voice Activity Detection Algorithm. In: 2010 International Conference on Electrical and Control Engineering (ICECE), Wuhan, pp. 599–602 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alonso-Martin, F., Castro-González, Á., Gorostiza, J.F., Salichs, M.A. (2013). Multidomain Voice Activity Detection during Human-Robot Interaction. In: Herrmann, G., Pearson, M.J., Lenz, A., Bremner, P., Spiers, A., Leonards, U. (eds) Social Robotics. ICSR 2013. Lecture Notes in Computer Science(), vol 8239. Springer, Cham. https://doi.org/10.1007/978-3-319-02675-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02675-6_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02674-9

  • Online ISBN: 978-3-319-02675-6

  • eBook Packages: Computer ScienceComputer Science (R0)