A Unified Confidence Measure Framework Using Auxiliary Normalization Graph

Chen, Zhehuai; Qian, Yanmin; Yu, Kai

doi:10.1007/978-3-319-67777-4_11

Zhehuai Chen¹⁸,
Yanmin Qian¹⁸ &
Kai Yu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10559))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

2227 Accesses
2 Citations

Abstract

Due to the distinct search space and efficiency demands in different ASR applications, the state-of-the-art confidence measures and their decoding frameworks are heterogeneous among keyword spotting, domain-specific recognition and LVCSR. Inspired by the success in applying a phone level language model to replace the word lattice in discriminative training, the auxiliary normalization graph is proposed in this work, and it is constructed to model the observation probability in hypothesis posterior based confidence measure. In this way, confidence measure normalizing term modelling can be independent from the original search space and the confidence measure can be grouped into an unified framework. Experiments on three typical ASR applications show that the proposed method using a unified confidence measure framework achieves comparable performance to the separately optimized system on each task.

K. Yu—This work was supported by the Shanghai Sailing Program No. 16YF1405300, the China NSFC projects (No. 61573241 and No. 61603252) and the Interdisciplinary Program (14JCZ03) of Shanghai Jiao Tong University in China. Experiments have been carried out on the PI supercomputer at Shanghai Jiao Tong University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The LVCSR based KWS is not included in the discussion because it’s mostly a problem to enhance the acoustic model performance and keyword indexing algorithm. Besides, the computational burden is not suitable for resource-limited scenarios.
2.
While in language like English, the mapping is many-to-many.
3.
Grammar language model based decoding is taken, as the in-domain and out-domain evaluation discussed in Sect. 2.2 are similar between grammar and class based model.
4.
The comparison between CMs in CTC and HMM frameworks has been conducted in previous research [15], all the comparisons below are within the CTC.
5.
As in [14], the acoustic model is a small size one applied in the embedded application. Therefore, computation time is comparable between all above portions in the three tasks.

References

Weintraub, M.: Keyword-spotting using SRI’s DECIPHER large-vocabulary speech-recognition system. In: 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 463–466. IEEE (1993)
Google Scholar
Woodland, P.C., Odell, J.J., Valtchev, V., Young, S.J.: Large vocabulary continuous speech recognition using HTK. In: 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. II–125. IEEE (1994)
Google Scholar
Ward, W., Issar, S.: A class based language model for speech recognition. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1996, Conference Proceedings, vol. 1, pp. 416–418. IEEE (1996)
Google Scholar
Vasserman, L., Haynor, B., Aleksic, P.: Contextual language model adaptation using dynamic classes. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 441–446. IEEE (2016)
Google Scholar
Cleveland, J., Thakur, D., Dames, P., Phillips, C., Kientz, T., Daniilidis, K., Bergstrom, J., Kumar, V.: Automated system for semantic object labeling with soft-object recognition and dynamic programming segmentation. IEEE Trans. Autom. Sci. Eng. 14(2), 820–833 (2017)
Article Google Scholar
Hakkani-Tür, D., Béchet, F., Riccardi, G., Tur, G.: Beyond ASR 1-best: using word confusion networks in spoken language understanding. Comput. Speech Lang. 20(4), 495–514 (2006)
Article Google Scholar
Hu, W., Qian, Y., Soong, F.K.: A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). In: INTERSPEECH, pp. 1886–1890 (2013)
Google Scholar
Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091. IEEE (2014)
Google Scholar
Young, S.R.: Detecting misrecognitions and out-of-vocabulary words. In: 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1994, vol. 2. pp. II–21. IEEE (1994)
Google Scholar
Wessel, F., Schluter, R., Macherey, K., Ney, H.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 9(3), 288–298 (2001)
Article Google Scholar
Rose, R.C., Juang, B.-H., Lee, C.-H.: A training procedure for verifying string hypotheses in continuous speech recognition. In: 1995 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1995, vol. 1, pp. 281–284. IEEE (1995)
Google Scholar
Chen, S.F., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Soltau, H., Zweig, G.: Advances in speech transcription at IBM under the DARPA EARS program. IEEE Trans. Audio Speech Lang. Process. 14(5), 1596–1608 (2006)
Article Google Scholar
Povey, D., Peddinti, V., Galvez, D., Ghahrmani, P., Manohar, V., Na, X., Wang, Y., Khudanpur, S.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Submitted to Interspeech (2016)
Google Scholar
Chen, Z., Deng, W., Xu, T., Yu, K.: Phone synchronous decoding with CTC lattice. In: Interspeech 2016, pp. 1923–1927 (2016). http://dx.doi.org/10.21437/Interspeech.2016-831
Chen, Z., Zhuang, Y., Yu, K.: Confidence measures for CTC-based phone synchronous decoding. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2017)
Google Scholar
Chen, Z., Zhuang, Y., Qian, Y., Yu, K.: Phone synchronous speech recognition with CTC lattices. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 86–97 (2017)
Google Scholar
Stolcke, A., et al.: SRILM-an extensible language modeling toolkit. In: Interspeech 2002, vol. 2002, 2002 p. (2002)
Google Scholar
Chen, I.-F., Ni, C., Lim, B.P., Chen, N.F., Lee, C.-H.: A novel keyword+ LVCSR-filler based grammar network representation for spoken keyword search. In: 2014 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 192–196. IEEE (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, SpeechLab, Department of Computer Science and Engineering, Brain Science and Technology Research Center, Shanghai Jiao Tong University, Shanghai, China
Zhehuai Chen, Yanmin Qian & Kai Yu

Authors

Zhehuai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yanmin Qian
View author publications
You can also search for this author in PubMed Google Scholar
Kai Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Yu .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Yi Sun
Dalian University of Technology, Dalian, China
Huchuan Lu
Dalian University of Technology, Dalian, China
Lihe Zhang
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
Beijing Institute of Technology, Beijing, China
Hua Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Z., Qian, Y., Yu, K. (2017). A Unified Confidence Measure Framework Using Auxiliary Normalization Graph. In: Sun, Y., Lu, H., Zhang, L., Yang, J., Huang, H. (eds) Intelligence Science and Big Data Engineering. IScIDE 2017. Lecture Notes in Computer Science(), vol 10559. Springer, Cham. https://doi.org/10.1007/978-3-319-67777-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-67777-4_11
Published: 14 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67776-7
Online ISBN: 978-3-319-67777-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics