Abstract
This chapter provides an overview of the maximum entropy framework and its application to a problem in natural language processing. The framework provides a way to combine many pieces of evidence from an annotated training set into a single probability model. The framework has been applied to many tasks in natural language processing, including part-of-speech tagging. This chapter covers the maximum entropy formulation, its relationship to maximum likelihood, a parameter estimation method, and the details of the part-of-speech tagging application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Berger AL, Della Pietra SA, Della Pietra VJ (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71
Borthwick A (1999) A maximum entropy approach to named entity recognition. PhD thesis, New York University
Chen S, Rosenfeld R (1999) A Gaussian prior for smoothing maximum entropy models. Technical report CMUCS-99-108, Carnegie Mellon University
Church KW, Mercer RL (1993) Introduction to the special issue on computational linguistics using large corpora. Comput Linguist 19(1):1–24
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
Curran JR, Clark S (2003) Investigating GIS and smoothing for maximum entropy taggers. In: Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, pp 91–98
Darroch J, Ratcliff D (1972) Generalized iterative scaling for log-linear models. Ann Stat 43(5):1470–1480
Goodman J (2002) Sequential conditional generalized iterative scaling. In: Proceedings of the Association for Computational Linguistics
Ittycheriah A, Franz M, Zhu W, Ratnaparkhi A (2001) Question answering using maximum-entropy components. In: Procedings of NAACL
Jaynes ET (1957) Information theory and statistical mechanics. Phys Rev 106(4):620–630
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 282–289
Lau R, Rosenfeld R, Roukos S (1993) Adaptive language modeling using the maximum entropy principle. In: Proceedings of the ARPA human language technology workshop. Morgan Kaufmann, San Francisco, pp 108–113
Malouf R (2002) A comparison of algorithms for maximum entropy parameter estimation. In: Sixth conference on natural language learning, pp 49–55
Marcus MP, Santorini B, Marcinkiewicz MA (1994) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2): 313–330
Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. In: Brill E, Church K (eds) Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Somerset, pp 133–142
Ratnaparkhi A (1999) Learning to parse natural language with maximum entropy models. Mach Learn 34(1–3):151–175
Sha F, Pereira F (2003) Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL, pp 213–220
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Ratnaparkhi, A. (2017). Maximum Entropy Models for Natural Language Processing. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_525
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_525
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering