Synonyms
Generative models
Definition
A language model assigns a probability to a piece of unseen text, based on some training data. For example, a language model based on a big English newspaper archive is expected to assign a higher probability to “a bit of text” than to “aw pit tov tags,” because the words in the former phrase (or word pairs or word triples if so-called N-gram models are used) occur more frequently in the data than the words in the latter phrase. For information retrieval, typical usage is to build a language model for each document. At search time, the top ranked document is the one whose language model assigns the highest probability to the query.
Historical Background
The term language models originates from probabilistic models of language generation developed for automatic speech recognition systems in the early 1980s [9]. Speech recognition systems use a language model to complement the results of the acoustic modelwhich models the relation between words (or...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Allan J, Aslam J, Belkin N, Buckley C, Callan J, Croft B, Dumais S, Fuhr N, Harman D, Harper DJ, Hiemstra D, Hofmann T, Hovy E, Kraaij W, Lafferty J, Lavrenko V, Lewis D, Liddy L, Manmatha R, McCallum A, Ponte J, Prager J, Radev D, Resnik P, Robertson S, Rosenfeld R, Roukos S, Sanderson M, Schwartz R, Singhal A, Smeaton A, Turtle H, Voorhees E, Weischedel E, Xu J, Zhai CX, editors. Challenges in information retrieval and language modeling. SIGIR Forum. 2003;37(1):31–47.
Balog K, Azzopardi L, Rijke M. Formal models for expert finding in enterprise corpora. In: Proceedings of 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2006. p. 43–50.
Basharin GP, Langville AN, Naumov VA. The life and work of A.A. Markov. Linear Algebra Appl. 2004;386(1):3–26.
Berger A, Lafferty J. Information retrieval as statistical translation. In: Proceedings of 22nd ACM Conference on Research and Development in Information Retrieval; 1999. p. 222–9.
Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Machine Learn Res. 2003;3(5):993–1022.
Hiemstra D, Jong F. Disambiguation strategies for cross-language information retrieval. Lecture notes in computer science. In: Proceedings of the 3rd European Conference on Research and Advanced Technology for Digital Libraries; 1999. p. 274–93.
Hiemstra D, Kraaij W. Twenty-one at TREC-7: ad-hoc and cross-language track. In: Proceedings of 7th Text Retrieval Conference; 1998. p. 227–38.
Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1999. p. 50–57.
Jelinek F. Statistical methods for speech recognition. Cambridge, MA: MIT Press; 1997.
Jin H, Schwartz R, Sista S, Walls F. Topic tracking for radio, TV broadcast and newswire. In: Proceedings of DARPA Broadcast News Workshop; 1999.
Kraaij W, Westerveld T, Hiemstra D. The importance of prior probabilities for entry page search. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2002. p. 27–34.
Kraft DH, Bruce Croft W, Harper DJ, Zobel J. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2001.
Lavrenko V, Croft WB. Relevance models in information retrieval. In: Bruce Croft W, Lafferty J, editors. Language modeling for information retrieval. Kluwer: Dordecht; 2003. p. 11–56.
Miller DRH, Leek T, Schwartz RM. A hidden Markov model information retrieval system. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1999. p. 214–21.
Ponte JM, Bruce CW. A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1998. p. 275–81.
Schwartz RM, Sista S, Leek T. Unsupervised topic discovery. In: Proceedings of Language Models for Information Retrieval Workshop; 2001.
Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27(379–423):623–56.
Spitters M, Kraaij W. Language models for topic tracking. In: Bruce Croft W, Lafferty J, editors. Language modeling for information retrieval. Dordecht: Kluwer; 2003. p. 95–124.
Xu J, Weischedel R. A probabilistic approach to term translation for cross-lingual retrieval. In: Bruce Croft W, Lafferty J, editors. Language modeling for information retrieval. Dordecht: Kluwer; 2003. p. 125–40.
Zhai C, Lafferty J. Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of ACM International Conference on Information and Knowledge Management; 2001. p. 403–10.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Hiemstra, D. (2018). Language Models. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_923
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_923
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering