Abstract
This paper summarizes the research on discriminative language modeling focusing on its application to automatic speech recognition (ASR). A discriminative language model (DLM) is typically a linear or log-linear model consisting of a weight vector associated with a feature vector representation of a sentence. This flexible representation can include linguistically and statistically motivated features that incorporate morphological and syntactic information. At test time, DLMs are used to rerank the output of an ASR system, represented as an N-best list or lattice. During training, both negative and positive examples are used with the aim of directly optimizing the error rate. Various machine learning methods, including the structured perceptron, large margin methods and maximum regularized conditional log-likelihood, have been used for estimating the parameters of DLMs. Typically positive examples for DLM training come from the manual transcriptions of acoustic data while the negative examples are obtained by processing the same acoustic data with an ASR system. Recent research generalizes DLM training by either using automatic transcriptions for the positive examples or simulating the negative examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arisoy, E., Ramabhadran, B., Kuo, H.K.J.: Feature combination approaches for discriminative language models. In: Proceedings of Interspeech, Florence, Italy (2011)
Arisoy, E., Can, D., Parlak, S., Sak, H., Saraçlar, M.: Turkish broadcast news transcription and retrieval. IEEE Trans. Audio Speech Lang. Process. 17(5), 874–883 (2009)
Arisoy, E., Saraçlar, M., Roark, B., Shafran, I.: Discriminative language modeling with linguistic and statistically derived features. IEEE Trans. Audio Speech Lang. Process. 20(2), 540–550 (2012)
Bergsma, S., Lin, D., Schuurmans, D.: Improved natural language learning via variance-regularization support vector machines. In: Proceedings of CoNLL, pp. 172–181. CoNLL, Association for Computational Linguistics, Stroudsburg (2010)
Çelebi, A., Sak, H., Dikici, E., Saraçlar, M., Lehr, M., Prud’hommeaux, E., Xu, P., Glenn, N., Karakos, D., Khudanpur, S., Roark, B., Sagae, K., Shafran, I., Bikel, D., Callison-Burch, C., Cao, Y., Hall, K., Hasler, E., Koehn, P., Lopez, A., Post, M., Riley, D.: Semi-supervised discriminative language modeling for Turkish ASR. In: Proceedings of ICASSP, pp. 5025–5028 (2012)
Chelba, C., Jelinek, F.: Structured language modeling. Comput. Speech Lang. 14(4), 283–332 (2000)
Cherry, C., Quirk, C.: Discriminative, syntactic language modeling through latent SVMs. In: Proceedings of the 8th AMTA Conference, Hawaii, pp. 65–74, October 2008
Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of EMNLP, pp. 1–8 (2002)
Collins, M., Roark, B., Saraçlar, M.: Discriminative syntactic language modeling for speech recognition. In: ACL, pp. 507–514 (2005)
Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003)
Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Technical report, Helsinki University of Technology, Palo Alto, CA, Publications in Computer and Information Science Report A81, March 2005
Dikici, E., Çelebi, A., Saraçlar, M.: Performance comparison of training algorithms for semi-supervised discriminative language modeling. In: Proceedings of Interspeech, Portland, Oregon, September 2012
Dikici, E., Semerci, M., Saraçlar, M., Alpaydın, E.: Classification and ranking approaches to discriminative language modeling for ASR. IEEE Trans. Audio Speech Lang. Process. 21(2), 291–300 (2013)
Goel, V., Bryne, W.: Minimum bayes-risk automatic speech recognition. Comput. Speech Lang. 14, 115–135 (2000)
Goel, V., Kumar, S., Byrne, W.: Segmental minimum bayes-risk ASR voting strategies. In: Proceedings of Interspeech, pp. 139–142 (2000)
Joachims, T.: Optimizing search engines using clickthrough data. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 133–142 (2002)
Jyothi, P., Fosler-Lussier, E.: Discriminative language modeling using simulated ASR errors. In: Proceedings of Interspeech, pp. 1049–1052 (2010)
Jyothi, P., Johnson, L., Chelba, C., Strope, B.: Distributed discriminative language models for Google voice search. In: Proceedings of ICASSP, pp. 5017–5021 (2012)
Khudanpur, S., Wu, J.: Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Comput. Speech Lang. 14, 355–372 (2000)
Kirchhoff, K., Vergyri, D., Bilmes, J., Duh, K., Stolcke, A.: Morphology-based language modeling for conversational Arabic speech recognition. Comput. Speech Lang. 20(4), 589–608 (2006)
Kuo, H.K.J., Arisoy, E., Mangu, L., Saon, G.: Minimum bayes risk discriminative language models for Arabic speech recognition. In: Proceedings of ASRU, pp. 208–213 (2011)
Kurata, G., Sethy, A., Ramabhadran, B., Rastrow, A., Itoh, N., Nishimura, M.: Acoustically discriminative language model training with pseudo-hypothesis. Speech Commun. 54(2), 219–228 (2012)
Lehr, M., Shafran, I.: Learning a discriminative weighted finite-state transducer for speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(5), 1360–1367 (2011)
Li, Z., Khudanpur, S.: Large-scale discriminative n-gram language models for statistical machine translation. In: Proceedings of the 8th AMTA Conference, Hawaii, pp. 133–142, October 2008
Li, Z., Wang, Z., Khudanpur, S., Eisner, J.: Unsupervised discriminative language model training for machine translation using simulated confusion sets. In: Coling 2010, Posters, Beijing, China, pp. 656–664, August 2010
Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang. 14, 373–400 (2000)
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of ACL, pp. 91–98. ACL, Association for Computational Linguistics, Stroudsburg (2005)
Roark, B.: Probabilistic top-down parsing and language modeling. Comput. Linguist. 27(2), 249–276 (2001)
Roark, B., Saraçlar, M., Collins, M.: Discriminative n-gram language modeling. Comput. Speech Lang. 21(2), 373–392 (2007)
Rosenfeld, R., Chen, S.F., Zhu, X.: Whole-sentence exponential language models: a vehicle for linguistic-statistical integration. Comput. Speech Lang. 15(1), 55–73 (2001)
Sagae, K., Lehr, M., Prud’hommeaux, E.T., Xu, P., Glenn, N., Karakos, D., Khudanpur, S., Roark, B., Saralar, M., Shafran, I., Bikel, D., Callison-Burch, C., Cao, Y., Hall, K., Hasler, E., Koehn, P., Lopez, A., Post, M., Riley, D.: Hallucinated N-best lists for discriminative language modeling. In: Proceedings of ICASSP (2012)
Sak, H., Saraçlar, M., Gungor, T.: Morpholexical and discriminative language models for Turkish automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(8), 2341–2351 (2012)
Saraçlar, M., Roark, B.: Joint discriminative language modeling and utterance classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 561–564, 18–23 March 2005
Saraçlar, M.: Turkish broadcast news speech and transcripts LDC2012S06, Philadelphia, Linguistic Data Consortium, Web Download (2012)
Sarikaya, R., Afify, M., Deng, Y., Erdogan, H., Gao, Y.: Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic. IEEE Trans. Audio Speech Lang. Process. 16(7), 1330–1339 (2008)
Shafran, I., Hall, K.: Corrective models for speech recognition of inflected languages. In: Proceedings of EMNLP, Sydney, Australia, pp. 390–398 (2006)
Shen, L., Joshi, A.K.: Ranking and reranking with perceptron. Mach. Learn. 60, 73–96 (2005)
Singh-Miller, N., Collins, C.: Trigger-based language modeling using a loss-sensitive perceptron algorithm. In: Proceedings of ICASSP, vol. 4, pp. IV-25–IV-28, April 2007
Tan, Q., Audhkhasi, K., Georgiou, P., Ettelaie, E., Narayanan, S.: Automatic speech recognition system channel modeling. In: Proceedings of Interspeech, pp. 2442–2445 (2010)
Watanabe, T., Suzuki, J., Tsukada, H., Isozaki, H.: Online large-margin training for statistical machine translation. In: Proceedings of EMNLP-CoNLL, pp. 764–773, June 2007
Xu, P., Karakos, D., Khudanpur, S.: Self-supervised discriminative training of statistical language models. In: Proceedings of ASRU, pp. 317–322 (2009)
Zhou, Z., Gao, J., Soong, F., Meng, H.: A comparative study of discriminative methods for reranking LVCSR n-best hypotheses in domain adaptation and generalization. In: Proceedings of ICASSP, pp. 141–144 (2006)
Acknowledgments
This research is supported in part by TUBITAK Project numbers 105E102, 109E142 and the Bogazici University Research Fund (BU-BAP) projects 07HA201D, 14A02D3 (D-7948).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Saraclar, M., Dikici, E., Arisoy, E. (2015). A Decade of Discriminative Language Modeling for Automatic Speech Recognition. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)