HelloNPU: A Corpus for Small-Footprint Wake-Up Word Detection Research

Wang, Senmao; Hou, Jingyong; Xie, Lei; Hao, Yufeng

doi:10.1007/978-981-10-8111-8_7

Senmao Wang¹⁴,
Jingyong Hou¹⁴,
Lei Xie¹⁴ &
…
Yufeng Hao¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 807))

Included in the following conference series:

National Conference on Man-Machine Speech Communication

453 Accesses
1 Citations

Abstract

As the first very step to activate speech interfaces, wake-up word detection aims to achieve a fully hand-free experience by detecting a specific word or phrase to activate the speech recognition and understanding modules. The task usually requires low-latency, highly accurate, small-footprint and easily migratory to power limited environment. In this paper, we describe the creation of HelloNPU, a publicly-available corpus that provides a common testbed to facilitate wake-up word detection research. We also introduce some baseline experimental results on this proposed corpus using the deep KWS approach. We hope the release of this corpus can trigger more studies on small-footprint wake-up word detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Rohlicek, J.R., Russell, W., Roukos, S., Gish, H.: Continuous hidden Markov modeling for speaker-independent wordspotting. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 627–630. IEEE (1990)
Google Scholar
Rose, R.C., Paul, D.B.: A hidden Markov model based keyword recognition system. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 129–132. IEEE (1990)
Google Scholar
Wilpon, J.G., Miller, L.G., Modi, P.: Improvements and applications for key word recognition using hidden Markov modeling techniques. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 309–312. IEEE (1991)
Google Scholar
Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091. IEEE (2014)
Google Scholar
Sainath, T.N., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Interspeech, pp. 1478–1482 (2015)
Google Scholar
Arik, S.O., Kliegl, M., Child, R., et al.: Convolutional recurrent neural networks for small-footprint keyword spotting (2017)
Google Scholar
Silaghi, M.-C., Bourlard, H.: Iterative posterior-based keyword spotting without filler models. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 213–216. IEEE (1999)
Google Scholar
Silaghi, M.-C.: Spotting subsequences matching an HMM using the average observation probability criteria with application to keyword spotting. In: Proceedings of the National Conference on Artificial Intelligence. AAAI Press, MIT Press, Menlo Park, Cambridge, London, vol. 20, p. 1118 (1999, 2005)
Google Scholar
Li, K.P., Naylor, J.A., Rossen, M.L.: A whole word recurrent neural network for keyword spotting. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 81–84. IEEE (1992)
Google Scholar
Fernández, S., Graves, A., Schmidhuber, J.: An application of recurrent neural networks to discriminative keyword spotting. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 220–229. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74695-9_23
Chapter Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech 2011, pp. 437–440 (2011)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Ze, H., Senior, A., Schuster, M.: Statistical parametric speech syn-thesis using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, pp. 7962–7966 (2013)
Google Scholar
Tucker, G., Wu, M., Sun, M., Panchapagesan, S., Fu, G., Vitaladevuni, S.: Model compression applied to small-footprint keyword spotting. In: Proceedings of Interspeech, pp. 1393–1397 (2016)
Google Scholar
Sindhwani, V., Sainath, T.N., Kumar, S.: Structured transforms for small-footprint deep learning. In: Neural Information Processing Systems, pp. 3088–3096 (2015)
Google Scholar
Prabhavalkar, R., Alvarez, R., Parada, C., Nakkiran, P., Sainath, T.N.: Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks. In: IEEE Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 4704–4708 (2015)
Google Scholar
Panchapagesan, S., Sun, M., Khare, A., Matsoukas, S., Mandal, A., Hoffmeister, B., Vitaladevuni, S.: Multi-task learning and weighted cross-entropy for DNN-based keyword spotting. In: Proceedings of Interspeech, pp. 760–764 (2016)
Google Scholar
Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: A study on data augmentation of reverberant speech for robust speech In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)
Google Scholar
Snyder, D., Chen, G., Povey, D.: Musan: a music, speech, and noise corpus. arXiv preprint arXiv:1510.08484 (2015)

Download references

Author information

Authors and Affiliations

School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China
Senmao Wang, Jingyong Hou & Lei Xie
Beijing Haitian Ruisheng Science Technology Ltd. (Speechocean), Beijing, China
Yufeng Hao

Authors

Senmao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jingyong Hou
View author publications
You can also search for this author in PubMed Google Scholar
Lei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Senmao Wang , Jingyong Hou , Lei Xie or Yufeng Hao .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jianhua Tao
Computer Science and Technology, Tsinghua University, Beijing, China
Thomas Fang Zheng
Beijing University of Technology , Beijing, China
Changchun Bao
Tsinghua University , Beijing, China
Dong Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Ya Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Hou, J., Xie, L., Hao, Y. (2018). HelloNPU: A Corpus for Small-Footprint Wake-Up Word Detection Research. In: Tao, J., Zheng, T., Bao, C., Wang, D., Li, Y. (eds) Man-Machine Speech Communication. NCMMSC 2017. Communications in Computer and Information Science, vol 807. Springer, Singapore. https://doi.org/10.1007/978-981-10-8111-8_7

Download citation

DOI: https://doi.org/10.1007/978-981-10-8111-8_7
Published: 03 February 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8110-1
Online ISBN: 978-981-10-8111-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics