A Decade of Discriminative Language Modeling for Automatic Speech Recognition

Saraclar, Murat; Dikici, Erinc; Arisoy, Ebru

doi:10.1007/978-3-319-23132-7_2

Murat Saraclar⁷,
Erinc Dikici⁷ &
Ebru Arisoy⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

International Conference on Speech and Computer

1629 Accesses
1 Citations

Abstract

This paper summarizes the research on discriminative language modeling focusing on its application to automatic speech recognition (ASR). A discriminative language model (DLM) is typically a linear or log-linear model consisting of a weight vector associated with a feature vector representation of a sentence. This flexible representation can include linguistically and statistically motivated features that incorporate morphological and syntactic information. At test time, DLMs are used to rerank the output of an ASR system, represented as an N-best list or lattice. During training, both negative and positive examples are used with the aim of directly optimizing the error rate. Various machine learning methods, including the structured perceptron, large margin methods and maximum regularized conditional log-likelihood, have been used for estimating the parameters of DLMs. Typically positive examples for DLM training come from the manual transcriptions of acoustic data while the negative examples are obtained by processing the same acoustic data with an ASR system. Recent research generalizes DLM training by either using automatic transcriptions for the positive examples or simulating the negative examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic Speech Recognition in English Language: A Review

Improving Automatic Speech Recognition with Dialect-Specific Language Models

Statistical and Linguistic Knowledge Based Speech Recognition System: Language Acquisition Device for Machines

References

Arisoy, E., Ramabhadran, B., Kuo, H.K.J.: Feature combination approaches for discriminative language models. In: Proceedings of Interspeech, Florence, Italy (2011)
Google Scholar
Arisoy, E., Can, D., Parlak, S., Sak, H., Saraçlar, M.: Turkish broadcast news transcription and retrieval. IEEE Trans. Audio Speech Lang. Process. 17(5), 874–883 (2009)
Article Google Scholar
Arisoy, E., Saraçlar, M., Roark, B., Shafran, I.: Discriminative language modeling with linguistic and statistically derived features. IEEE Trans. Audio Speech Lang. Process. 20(2), 540–550 (2012)
Google Scholar
Bergsma, S., Lin, D., Schuurmans, D.: Improved natural language learning via variance-regularization support vector machines. In: Proceedings of CoNLL, pp. 172–181. CoNLL, Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Çelebi, A., Sak, H., Dikici, E., Saraçlar, M., Lehr, M., Prud’hommeaux, E., Xu, P., Glenn, N., Karakos, D., Khudanpur, S., Roark, B., Sagae, K., Shafran, I., Bikel, D., Callison-Burch, C., Cao, Y., Hall, K., Hasler, E., Koehn, P., Lopez, A., Post, M., Riley, D.: Semi-supervised discriminative language modeling for Turkish ASR. In: Proceedings of ICASSP, pp. 5025–5028 (2012)
Google Scholar
Chelba, C., Jelinek, F.: Structured language modeling. Comput. Speech Lang. 14(4), 283–332 (2000)
Article Google Scholar
Cherry, C., Quirk, C.: Discriminative, syntactic language modeling through latent SVMs. In: Proceedings of the 8th AMTA Conference, Hawaii, pp. 65–74, October 2008
Google Scholar
Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of EMNLP, pp. 1–8 (2002)
Google Scholar
Collins, M., Roark, B., Saraçlar, M.: Discriminative syntactic language modeling for speech recognition. In: ACL, pp. 507–514 (2005)
Google Scholar
Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003)
MathSciNet MATH Google Scholar
Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Technical report, Helsinki University of Technology, Palo Alto, CA, Publications in Computer and Information Science Report A81, March 2005
Google Scholar
Dikici, E., Çelebi, A., Saraçlar, M.: Performance comparison of training algorithms for semi-supervised discriminative language modeling. In: Proceedings of Interspeech, Portland, Oregon, September 2012
Google Scholar
Dikici, E., Semerci, M., Saraçlar, M., Alpaydın, E.: Classification and ranking approaches to discriminative language modeling for ASR. IEEE Trans. Audio Speech Lang. Process. 21(2), 291–300 (2013)
Article Google Scholar
Goel, V., Bryne, W.: Minimum bayes-risk automatic speech recognition. Comput. Speech Lang. 14, 115–135 (2000)
Article Google Scholar
Goel, V., Kumar, S., Byrne, W.: Segmental minimum bayes-risk ASR voting strategies. In: Proceedings of Interspeech, pp. 139–142 (2000)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 133–142 (2002)
Google Scholar
Jyothi, P., Fosler-Lussier, E.: Discriminative language modeling using simulated ASR errors. In: Proceedings of Interspeech, pp. 1049–1052 (2010)
Google Scholar
Jyothi, P., Johnson, L., Chelba, C., Strope, B.: Distributed discriminative language models for Google voice search. In: Proceedings of ICASSP, pp. 5017–5021 (2012)
Google Scholar
Khudanpur, S., Wu, J.: Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Comput. Speech Lang. 14, 355–372 (2000)
Article Google Scholar
Kirchhoff, K., Vergyri, D., Bilmes, J., Duh, K., Stolcke, A.: Morphology-based language modeling for conversational Arabic speech recognition. Comput. Speech Lang. 20(4), 589–608 (2006)
Article Google Scholar
Kuo, H.K.J., Arisoy, E., Mangu, L., Saon, G.: Minimum bayes risk discriminative language models for Arabic speech recognition. In: Proceedings of ASRU, pp. 208–213 (2011)
Google Scholar
Kurata, G., Sethy, A., Ramabhadran, B., Rastrow, A., Itoh, N., Nishimura, M.: Acoustically discriminative language model training with pseudo-hypothesis. Speech Commun. 54(2), 219–228 (2012)
Article Google Scholar
Lehr, M., Shafran, I.: Learning a discriminative weighted finite-state transducer for speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(5), 1360–1367 (2011)
Article Google Scholar
Li, Z., Khudanpur, S.: Large-scale discriminative n-gram language models for statistical machine translation. In: Proceedings of the 8th AMTA Conference, Hawaii, pp. 133–142, October 2008
Google Scholar
Li, Z., Wang, Z., Khudanpur, S., Eisner, J.: Unsupervised discriminative language model training for machine translation using simulated confusion sets. In: Coling 2010, Posters, Beijing, China, pp. 656–664, August 2010
Google Scholar
Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang. 14, 373–400 (2000)
Article Google Scholar
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of ACL, pp. 91–98. ACL, Association for Computational Linguistics, Stroudsburg (2005)
Google Scholar
Roark, B.: Probabilistic top-down parsing and language modeling. Comput. Linguist. 27(2), 249–276 (2001)
Article MathSciNet Google Scholar
Roark, B., Saraçlar, M., Collins, M.: Discriminative n-gram language modeling. Comput. Speech Lang. 21(2), 373–392 (2007)
Article Google Scholar
Rosenfeld, R., Chen, S.F., Zhu, X.: Whole-sentence exponential language models: a vehicle for linguistic-statistical integration. Comput. Speech Lang. 15(1), 55–73 (2001)
Article Google Scholar
Sagae, K., Lehr, M., Prud’hommeaux, E.T., Xu, P., Glenn, N., Karakos, D., Khudanpur, S., Roark, B., Saralar, M., Shafran, I., Bikel, D., Callison-Burch, C., Cao, Y., Hall, K., Hasler, E., Koehn, P., Lopez, A., Post, M., Riley, D.: Hallucinated N-best lists for discriminative language modeling. In: Proceedings of ICASSP (2012)
Google Scholar
Sak, H., Saraçlar, M., Gungor, T.: Morpholexical and discriminative language models for Turkish automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(8), 2341–2351 (2012)
Article Google Scholar
Saraçlar, M., Roark, B.: Joint discriminative language modeling and utterance classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 561–564, 18–23 March 2005
Google Scholar
Saraçlar, M.: Turkish broadcast news speech and transcripts LDC2012S06, Philadelphia, Linguistic Data Consortium, Web Download (2012)
Google Scholar
Sarikaya, R., Afify, M., Deng, Y., Erdogan, H., Gao, Y.: Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic. IEEE Trans. Audio Speech Lang. Process. 16(7), 1330–1339 (2008)
Article Google Scholar
Shafran, I., Hall, K.: Corrective models for speech recognition of inflected languages. In: Proceedings of EMNLP, Sydney, Australia, pp. 390–398 (2006)
Google Scholar
Shen, L., Joshi, A.K.: Ranking and reranking with perceptron. Mach. Learn. 60, 73–96 (2005)
Article Google Scholar
Singh-Miller, N., Collins, C.: Trigger-based language modeling using a loss-sensitive perceptron algorithm. In: Proceedings of ICASSP, vol. 4, pp. IV-25–IV-28, April 2007
Google Scholar
Tan, Q., Audhkhasi, K., Georgiou, P., Ettelaie, E., Narayanan, S.: Automatic speech recognition system channel modeling. In: Proceedings of Interspeech, pp. 2442–2445 (2010)
Google Scholar
Watanabe, T., Suzuki, J., Tsukada, H., Isozaki, H.: Online large-margin training for statistical machine translation. In: Proceedings of EMNLP-CoNLL, pp. 764–773, June 2007
Google Scholar
Xu, P., Karakos, D., Khudanpur, S.: Self-supervised discriminative training of statistical language models. In: Proceedings of ASRU, pp. 317–322 (2009)
Google Scholar
Zhou, Z., Gao, J., Soong, F., Meng, H.: A comparative study of discriminative methods for reranking LVCSR n-best hypotheses in domain adaptation and generalization. In: Proceedings of ICASSP, pp. 141–144 (2006)
Google Scholar

Download references

Acknowledgments

This research is supported in part by TUBITAK Project numbers 105E102, 109E142 and the Bogazici University Research Fund (BU-BAP) projects 07HA201D, 14A02D3 (D-7948).

Author information

Authors and Affiliations

Electrical and Electronics Engineering Department, Bogazici University, 34342, Bebek, Istanbul, Turkey
Murat Saraclar & Erinc Dikici
Electrical and Electronics Engineering Department, MEF University, 34396, Sariyer, Istanbul, Turkey
Ebru Arisoy

Authors

Murat Saraclar
View author publications
You can also search for this author in PubMed Google Scholar
Erinc Dikici
View author publications
You can also search for this author in PubMed Google Scholar
Ebru Arisoy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Murat Saraclar .

Editor information

Editors and Affiliations

SPIIRAS, Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Patras, Patras, Greece
Nikos Fakotakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saraclar, M., Dikici, E., Arisoy, E. (2015). A Decade of Discriminative Language Modeling for Automatic Speech Recognition. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-23132-7_2
Published: 04 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

Abstract

Access this chapter

Similar content being viewed by others

Automatic Speech Recognition in English Language: A Review

Improving Automatic Speech Recognition with Dialect-Specific Language Models

Statistical and Linguistic Knowledge Based Speech Recognition System: Language Acquisition Device for Machines

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

Abstract

Access this chapter

Similar content being viewed by others

Automatic Speech Recognition in English Language: A Review

Improving Automatic Speech Recognition with Dialect-Specific Language Models

Statistical and Linguistic Knowledge Based Speech Recognition System: Language Acquisition Device for Machines

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation