Skip to main content

HelloNPU: A Corpus for Small-Footprint Wake-Up Word Detection Research

  • Conference paper
  • First Online:
Man-Machine Speech Communication (NCMMSC 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 807))

Included in the following conference series:

Abstract

As the first very step to activate speech interfaces, wake-up word detection aims to achieve a fully hand-free experience by detecting a specific word or phrase to activate the speech recognition and understanding modules. The task usually requires low-latency, highly accurate, small-footprint and easily migratory to power limited environment. In this paper, we describe the creation of HelloNPU, a publicly-available corpus that provides a common testbed to facilitate wake-up word detection research. We also introduce some baseline experimental results on this proposed corpus using the deep KWS approach. We hope the release of this corpus can trigger more studies on small-footprint wake-up word detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  2. Rohlicek, J.R., Russell, W., Roukos, S., Gish, H.: Continuous hidden Markov modeling for speaker-independent wordspotting. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 627–630. IEEE (1990)

    Google Scholar 

  3. Rose, R.C., Paul, D.B.: A hidden Markov model based keyword recognition system. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 129–132. IEEE (1990)

    Google Scholar 

  4. Wilpon, J.G., Miller, L.G., Modi, P.: Improvements and applications for key word recognition using hidden Markov modeling techniques. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 309–312. IEEE (1991)

    Google Scholar 

  5. Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091. IEEE (2014)

    Google Scholar 

  6. Sainath, T.N., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Interspeech, pp. 1478–1482 (2015)

    Google Scholar 

  7. Arik, S.O., Kliegl, M., Child, R., et al.: Convolutional recurrent neural networks for small-footprint keyword spotting (2017)

    Google Scholar 

  8. Silaghi, M.-C., Bourlard, H.: Iterative posterior-based keyword spotting without filler models. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 213–216. IEEE (1999)

    Google Scholar 

  9. Silaghi, M.-C.: Spotting subsequences matching an HMM using the average observation probability criteria with application to keyword spotting. In: Proceedings of the National Conference on Artificial Intelligence. AAAI Press, MIT Press, Menlo Park, Cambridge, London, vol. 20, p. 1118 (1999, 2005)

    Google Scholar 

  10. Li, K.P., Naylor, J.A., Rossen, M.L.: A whole word recurrent neural network for keyword spotting. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 81–84. IEEE (1992)

    Google Scholar 

  11. Fernández, S., Graves, A., Schmidhuber, J.: An application of recurrent neural networks to discriminative keyword spotting. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 220–229. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74695-9_23

    Chapter  Google Scholar 

  12. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  13. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech 2011, pp. 437–440 (2011)

    Google Scholar 

  14. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  15. Ze, H., Senior, A., Schuster, M.: Statistical parametric speech syn-thesis using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, pp. 7962–7966 (2013)

    Google Scholar 

  16. Tucker, G., Wu, M., Sun, M., Panchapagesan, S., Fu, G., Vitaladevuni, S.: Model compression applied to small-footprint keyword spotting. In: Proceedings of Interspeech, pp. 1393–1397 (2016)

    Google Scholar 

  17. Sindhwani, V., Sainath, T.N., Kumar, S.: Structured transforms for small-footprint deep learning. In: Neural Information Processing Systems, pp. 3088–3096 (2015)

    Google Scholar 

  18. Prabhavalkar, R., Alvarez, R., Parada, C., Nakkiran, P., Sainath, T.N.: Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks. In: IEEE Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 4704–4708 (2015)

    Google Scholar 

  19. Panchapagesan, S., Sun, M., Khare, A., Matsoukas, S., Mandal, A., Hoffmeister, B., Vitaladevuni, S.: Multi-task learning and weighted cross-entropy for DNN-based keyword spotting. In: Proceedings of Interspeech, pp. 760–764 (2016)

    Google Scholar 

  20. Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: A study on data augmentation of reverberant speech for robust speech In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)

    Google Scholar 

  21. Snyder, D., Chen, G., Povey, D.: Musan: a music, speech, and noise corpus. arXiv preprint arXiv:1510.08484 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Senmao Wang , Jingyong Hou , Lei Xie or Yufeng Hao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S., Hou, J., Xie, L., Hao, Y. (2018). HelloNPU: A Corpus for Small-Footprint Wake-Up Word Detection Research. In: Tao, J., Zheng, T., Bao, C., Wang, D., Li, Y. (eds) Man-Machine Speech Communication. NCMMSC 2017. Communications in Computer and Information Science, vol 807. Springer, Singapore. https://doi.org/10.1007/978-981-10-8111-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8111-8_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8110-1

  • Online ISBN: 978-981-10-8111-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics