Skip to main content

A Unified Confidence Measure Framework Using Auxiliary Normalization Graph

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering (IScIDE 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10559))

Abstract

Due to the distinct search space and efficiency demands in different ASR applications, the state-of-the-art confidence measures and their decoding frameworks are heterogeneous among keyword spotting, domain-specific recognition and LVCSR. Inspired by the success in applying a phone level language model to replace the word lattice in discriminative training, the auxiliary normalization graph is proposed in this work, and it is constructed to model the observation probability in hypothesis posterior based confidence measure. In this way, confidence measure normalizing term modelling can be independent from the original search space and the confidence measure can be grouped into an unified framework. Experiments on three typical ASR applications show that the proposed method using a unified confidence measure framework achieves comparable performance to the separately optimized system on each task.

K. Yu—This work was supported by the Shanghai Sailing Program No. 16YF1405300, the China NSFC projects (No. 61573241 and No. 61603252) and the Interdisciplinary Program (14JCZ03) of Shanghai Jiao Tong University in China. Experiments have been carried out on the PI supercomputer at Shanghai Jiao Tong University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The LVCSR based KWS is not included in the discussion because it’s mostly a problem to enhance the acoustic model performance and keyword indexing algorithm. Besides, the computational burden is not suitable for resource-limited scenarios.

  2. 2.

    While in language like English, the mapping is many-to-many.

  3. 3.

    Grammar language model based decoding is taken, as the in-domain and out-domain evaluation discussed in Sect. 2.2 are similar between grammar and class based model.

  4. 4.

    The comparison between CMs in CTC and HMM frameworks has been conducted in previous research [15], all the comparisons below are within the CTC.

  5. 5.

    As in [14], the acoustic model is a small size one applied in the embedded application. Therefore, computation time is comparable between all above portions in the three tasks.

References

  1. Weintraub, M.: Keyword-spotting using SRI’s DECIPHER large-vocabulary speech-recognition system. In: 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 463–466. IEEE (1993)

    Google Scholar 

  2. Woodland, P.C., Odell, J.J., Valtchev, V., Young, S.J.: Large vocabulary continuous speech recognition using HTK. In: 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. II–125. IEEE (1994)

    Google Scholar 

  3. Ward, W., Issar, S.: A class based language model for speech recognition. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1996, Conference Proceedings, vol. 1, pp. 416–418. IEEE (1996)

    Google Scholar 

  4. Vasserman, L., Haynor, B., Aleksic, P.: Contextual language model adaptation using dynamic classes. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 441–446. IEEE (2016)

    Google Scholar 

  5. Cleveland, J., Thakur, D., Dames, P., Phillips, C., Kientz, T., Daniilidis, K., Bergstrom, J., Kumar, V.: Automated system for semantic object labeling with soft-object recognition and dynamic programming segmentation. IEEE Trans. Autom. Sci. Eng. 14(2), 820–833 (2017)

    Article  Google Scholar 

  6. Hakkani-Tür, D., Béchet, F., Riccardi, G., Tur, G.: Beyond ASR 1-best: using word confusion networks in spoken language understanding. Comput. Speech Lang. 20(4), 495–514 (2006)

    Article  Google Scholar 

  7. Hu, W., Qian, Y., Soong, F.K.: A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). In: INTERSPEECH, pp. 1886–1890 (2013)

    Google Scholar 

  8. Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091. IEEE (2014)

    Google Scholar 

  9. Young, S.R.: Detecting misrecognitions and out-of-vocabulary words. In: 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1994, vol. 2. pp. II–21. IEEE (1994)

    Google Scholar 

  10. Wessel, F., Schluter, R., Macherey, K., Ney, H.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 9(3), 288–298 (2001)

    Article  Google Scholar 

  11. Rose, R.C., Juang, B.-H., Lee, C.-H.: A training procedure for verifying string hypotheses in continuous speech recognition. In: 1995 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1995, vol. 1, pp. 281–284. IEEE (1995)

    Google Scholar 

  12. Chen, S.F., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Soltau, H., Zweig, G.: Advances in speech transcription at IBM under the DARPA EARS program. IEEE Trans. Audio Speech Lang. Process. 14(5), 1596–1608 (2006)

    Article  Google Scholar 

  13. Povey, D., Peddinti, V., Galvez, D., Ghahrmani, P., Manohar, V., Na, X., Wang, Y., Khudanpur, S.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Submitted to Interspeech (2016)

    Google Scholar 

  14. Chen, Z., Deng, W., Xu, T., Yu, K.: Phone synchronous decoding with CTC lattice. In: Interspeech 2016, pp. 1923–1927 (2016). http://dx.doi.org/10.21437/Interspeech.2016-831

  15. Chen, Z., Zhuang, Y., Yu, K.: Confidence measures for CTC-based phone synchronous decoding. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2017)

    Google Scholar 

  16. Chen, Z., Zhuang, Y., Qian, Y., Yu, K.: Phone synchronous speech recognition with CTC lattices. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 86–97 (2017)

    Google Scholar 

  17. Stolcke, A., et al.: SRILM-an extensible language modeling toolkit. In: Interspeech 2002, vol. 2002, 2002 p. (2002)

    Google Scholar 

  18. Chen, I.-F., Ni, C., Lim, B.P., Chen, N.F., Lee, C.-H.: A novel keyword+ LVCSR-filler based grammar network representation for spoken keyword search. In: 2014 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 192–196. IEEE (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Chen, Z., Qian, Y., Yu, K. (2017). A Unified Confidence Measure Framework Using Auxiliary Normalization Graph. In: Sun, Y., Lu, H., Zhang, L., Yang, J., Huang, H. (eds) Intelligence Science and Big Data Engineering. IScIDE 2017. Lecture Notes in Computer Science(), vol 10559. Springer, Cham. https://doi.org/10.1007/978-3-319-67777-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67777-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67776-7

  • Online ISBN: 978-3-319-67777-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics