Skip to main content

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

Abstract

This paper summarizes the research on discriminative language modeling focusing on its application to automatic speech recognition (ASR). A discriminative language model (DLM) is typically a linear or log-linear model consisting of a weight vector associated with a feature vector representation of a sentence. This flexible representation can include linguistically and statistically motivated features that incorporate morphological and syntactic information. At test time, DLMs are used to rerank the output of an ASR system, represented as an N-best list or lattice. During training, both negative and positive examples are used with the aim of directly optimizing the error rate. Various machine learning methods, including the structured perceptron, large margin methods and maximum regularized conditional log-likelihood, have been used for estimating the parameters of DLMs. Typically positive examples for DLM training come from the manual transcriptions of acoustic data while the negative examples are obtained by processing the same acoustic data with an ASR system. Recent research generalizes DLM training by either using automatic transcriptions for the positive examples or simulating the negative examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arisoy, E., Ramabhadran, B., Kuo, H.K.J.: Feature combination approaches for discriminative language models. In: Proceedings of Interspeech, Florence, Italy (2011)

    Google Scholar 

  2. Arisoy, E., Can, D., Parlak, S., Sak, H., Saraçlar, M.: Turkish broadcast news transcription and retrieval. IEEE Trans. Audio Speech Lang. Process. 17(5), 874–883 (2009)

    Article  Google Scholar 

  3. Arisoy, E., Saraçlar, M., Roark, B., Shafran, I.: Discriminative language modeling with linguistic and statistically derived features. IEEE Trans. Audio Speech Lang. Process. 20(2), 540–550 (2012)

    Google Scholar 

  4. Bergsma, S., Lin, D., Schuurmans, D.: Improved natural language learning via variance-regularization support vector machines. In: Proceedings of CoNLL, pp. 172–181. CoNLL, Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  5. Çelebi, A., Sak, H., Dikici, E., Saraçlar, M., Lehr, M., Prud’hommeaux, E., Xu, P., Glenn, N., Karakos, D., Khudanpur, S., Roark, B., Sagae, K., Shafran, I., Bikel, D., Callison-Burch, C., Cao, Y., Hall, K., Hasler, E., Koehn, P., Lopez, A., Post, M., Riley, D.: Semi-supervised discriminative language modeling for Turkish ASR. In: Proceedings of ICASSP, pp. 5025–5028 (2012)

    Google Scholar 

  6. Chelba, C., Jelinek, F.: Structured language modeling. Comput. Speech Lang. 14(4), 283–332 (2000)

    Article  Google Scholar 

  7. Cherry, C., Quirk, C.: Discriminative, syntactic language modeling through latent SVMs. In: Proceedings of the 8th AMTA Conference, Hawaii, pp. 65–74, October 2008

    Google Scholar 

  8. Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of EMNLP, pp. 1–8 (2002)

    Google Scholar 

  9. Collins, M., Roark, B., Saraçlar, M.: Discriminative syntactic language modeling for speech recognition. In: ACL, pp. 507–514 (2005)

    Google Scholar 

  10. Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003)

    MathSciNet  MATH  Google Scholar 

  11. Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Technical report, Helsinki University of Technology, Palo Alto, CA, Publications in Computer and Information Science Report A81, March 2005

    Google Scholar 

  12. Dikici, E., Çelebi, A., Saraçlar, M.: Performance comparison of training algorithms for semi-supervised discriminative language modeling. In: Proceedings of Interspeech, Portland, Oregon, September 2012

    Google Scholar 

  13. Dikici, E., Semerci, M., Saraçlar, M., Alpaydın, E.: Classification and ranking approaches to discriminative language modeling for ASR. IEEE Trans. Audio Speech Lang. Process. 21(2), 291–300 (2013)

    Article  Google Scholar 

  14. Goel, V., Bryne, W.: Minimum bayes-risk automatic speech recognition. Comput. Speech Lang. 14, 115–135 (2000)

    Article  Google Scholar 

  15. Goel, V., Kumar, S., Byrne, W.: Segmental minimum bayes-risk ASR voting strategies. In: Proceedings of Interspeech, pp. 139–142 (2000)

    Google Scholar 

  16. Joachims, T.: Optimizing search engines using clickthrough data. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 133–142 (2002)

    Google Scholar 

  17. Jyothi, P., Fosler-Lussier, E.: Discriminative language modeling using simulated ASR errors. In: Proceedings of Interspeech, pp. 1049–1052 (2010)

    Google Scholar 

  18. Jyothi, P., Johnson, L., Chelba, C., Strope, B.: Distributed discriminative language models for Google voice search. In: Proceedings of ICASSP, pp. 5017–5021 (2012)

    Google Scholar 

  19. Khudanpur, S., Wu, J.: Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Comput. Speech Lang. 14, 355–372 (2000)

    Article  Google Scholar 

  20. Kirchhoff, K., Vergyri, D., Bilmes, J., Duh, K., Stolcke, A.: Morphology-based language modeling for conversational Arabic speech recognition. Comput. Speech Lang. 20(4), 589–608 (2006)

    Article  Google Scholar 

  21. Kuo, H.K.J., Arisoy, E., Mangu, L., Saon, G.: Minimum bayes risk discriminative language models for Arabic speech recognition. In: Proceedings of ASRU, pp. 208–213 (2011)

    Google Scholar 

  22. Kurata, G., Sethy, A., Ramabhadran, B., Rastrow, A., Itoh, N., Nishimura, M.: Acoustically discriminative language model training with pseudo-hypothesis. Speech Commun. 54(2), 219–228 (2012)

    Article  Google Scholar 

  23. Lehr, M., Shafran, I.: Learning a discriminative weighted finite-state transducer for speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(5), 1360–1367 (2011)

    Article  Google Scholar 

  24. Li, Z., Khudanpur, S.: Large-scale discriminative n-gram language models for statistical machine translation. In: Proceedings of the 8th AMTA Conference, Hawaii, pp. 133–142, October 2008

    Google Scholar 

  25. Li, Z., Wang, Z., Khudanpur, S., Eisner, J.: Unsupervised discriminative language model training for machine translation using simulated confusion sets. In: Coling 2010, Posters, Beijing, China, pp. 656–664, August 2010

    Google Scholar 

  26. Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang. 14, 373–400 (2000)

    Article  Google Scholar 

  27. McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of ACL, pp. 91–98. ACL, Association for Computational Linguistics, Stroudsburg (2005)

    Google Scholar 

  28. Roark, B.: Probabilistic top-down parsing and language modeling. Comput. Linguist. 27(2), 249–276 (2001)

    Article  MathSciNet  Google Scholar 

  29. Roark, B., Saraçlar, M., Collins, M.: Discriminative n-gram language modeling. Comput. Speech Lang. 21(2), 373–392 (2007)

    Article  Google Scholar 

  30. Rosenfeld, R., Chen, S.F., Zhu, X.: Whole-sentence exponential language models: a vehicle for linguistic-statistical integration. Comput. Speech Lang. 15(1), 55–73 (2001)

    Article  Google Scholar 

  31. Sagae, K., Lehr, M., Prud’hommeaux, E.T., Xu, P., Glenn, N., Karakos, D., Khudanpur, S., Roark, B., Saralar, M., Shafran, I., Bikel, D., Callison-Burch, C., Cao, Y., Hall, K., Hasler, E., Koehn, P., Lopez, A., Post, M., Riley, D.: Hallucinated N-best lists for discriminative language modeling. In: Proceedings of ICASSP (2012)

    Google Scholar 

  32. Sak, H., Saraçlar, M., Gungor, T.: Morpholexical and discriminative language models for Turkish automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(8), 2341–2351 (2012)

    Article  Google Scholar 

  33. Saraçlar, M., Roark, B.: Joint discriminative language modeling and utterance classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 561–564, 18–23 March 2005

    Google Scholar 

  34. Saraçlar, M.: Turkish broadcast news speech and transcripts LDC2012S06, Philadelphia, Linguistic Data Consortium, Web Download (2012)

    Google Scholar 

  35. Sarikaya, R., Afify, M., Deng, Y., Erdogan, H., Gao, Y.: Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic. IEEE Trans. Audio Speech Lang. Process. 16(7), 1330–1339 (2008)

    Article  Google Scholar 

  36. Shafran, I., Hall, K.: Corrective models for speech recognition of inflected languages. In: Proceedings of EMNLP, Sydney, Australia, pp. 390–398 (2006)

    Google Scholar 

  37. Shen, L., Joshi, A.K.: Ranking and reranking with perceptron. Mach. Learn. 60, 73–96 (2005)

    Article  Google Scholar 

  38. Singh-Miller, N., Collins, C.: Trigger-based language modeling using a loss-sensitive perceptron algorithm. In: Proceedings of ICASSP, vol. 4, pp. IV-25–IV-28, April 2007

    Google Scholar 

  39. Tan, Q., Audhkhasi, K., Georgiou, P., Ettelaie, E., Narayanan, S.: Automatic speech recognition system channel modeling. In: Proceedings of Interspeech, pp. 2442–2445 (2010)

    Google Scholar 

  40. Watanabe, T., Suzuki, J., Tsukada, H., Isozaki, H.: Online large-margin training for statistical machine translation. In: Proceedings of EMNLP-CoNLL, pp. 764–773, June 2007

    Google Scholar 

  41. Xu, P., Karakos, D., Khudanpur, S.: Self-supervised discriminative training of statistical language models. In: Proceedings of ASRU, pp. 317–322 (2009)

    Google Scholar 

  42. Zhou, Z., Gao, J., Soong, F., Meng, H.: A comparative study of discriminative methods for reranking LVCSR n-best hypotheses in domain adaptation and generalization. In: Proceedings of ICASSP, pp. 141–144 (2006)

    Google Scholar 

Download references

Acknowledgments

This research is supported in part by TUBITAK Project numbers 105E102, 109E142 and the Bogazici University Research Fund (BU-BAP) projects 07HA201D, 14A02D3 (D-7948).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Murat Saraclar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Saraclar, M., Dikici, E., Arisoy, E. (2015). A Decade of Discriminative Language Modeling for Automatic Speech Recognition. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23132-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23131-0

  • Online ISBN: 978-3-319-23132-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics