Abstract
The maximum entropy (ME) method is a powerful supervised machine learning technique that is useful for various tasks. In this paper, we introduce new studies that successfully employ ME for natural language processing (NLP) problems including machine translation and information extraction. Specifically, we demonstrate, using simulation results, three applications of ME for NLP: estimation of categories, extraction of important features, and correction of error data items. We also evaluate the comparative performance of the proposed ME methods with other state-of-the-art approaches.
Similar content being viewed by others
Notes
We used anonymous symbols such as A to F because the six translation systems are available on the market, and we did not want to influence the market.
There are many studies on feature selection [2, 14]. However, their main purpose is to decrease features for better learning, while our purpose of extracting features is to examine the experimental results. In addition to a study using the ME method for extracting features and examining experimental results, we conducted another study that estimated the referential properties of noun phrases [15]. In the study, we classified features into two types: (i) strong features on which an output category was strongly dependent and by which it was necessarily determined (the normalized alpha values of the features were almost the same as 1.0, which is the maximum value because it is a kind of probability.) and (ii) weak features that, although they showed a tendency concerning which category was likely to be the output category, another category could be an output category when other stronger features appeared. The classified results were highly useful for examining the experimental results.
This correction is identical as changing an original category to the category estimated by the method in Sect. “Method of Categorization”. The technique for error correction is thus related to that for category estimation.
References
Berger AL, Pietra SAD, Pietra VJD. A maximum entropy approach to natural language processing. Comput Linguist. 1996;22(1):39–71.
Ristad ES. Maximum entropy modeling for natural language. Madrid: ACL/EACL Tutorial Program;1997.
Pietra SD, Pietra VD, Lafferty J. Inducing features of random fields. Technical report, Carnegie Mellon University CMU-CS-95-144. 1995.
Utiyama M. Maximum entropy modeling package. 2006. http://www.nict.go.jp/x/x161/members/mutiyama/software.html#maxent.
Murata M, Ma Q, Uchimoto K, Kanamaru T, Isahara H. Japanese-to-English translations of tense, aspect, and modality using machine-learning methods and comparison with machine-translation systems on market. Lang Resour Eval. 2007;40:233–242.
Ratnaparkhi A. A maximum entropy model for part-of-speech tagging. In: Proceedings of empirical methods for natural language processing. 1996. p. 133–142.
Borthwick A, Sterling J, Agichtein E, Grishman R. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Proceedings of the sixth workshop on very large corpora. 1998. p. 152–160.
Ratnaparkhi A. A linear observed time statistical parser based on maximum entropy models. In: Proceedings of empirical methods for natural language processing. 1997.
Nigam K, Lafferty J, McCallum A. Using maximum entropy for text classification. In: Proceedings of the IJCAI-99 workshop on machine learning for information filtering. 1999. p. 61–67.
Uchimoto K, Murata M, Ozaku H, Ma Q, Isahara H. Named entity extraction based on maximum entropy model and transformation rules. In: Proceedings of the 38th annual meeting of the association of computational linguistics. 2000.
Ittycheriah A, Franz M, Zhu WJ, Ratnaparkhi A. Question answering using maximum entropy components. NAACL-2001. 2001.
Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H. Correction of errors in a verb modality corpus used for machine translation with a machine-learning method. ACM Trans Asian Lang Inf Process. 2005;4(1):18–37.
Murata M, Nishimura R, Doi K, Kanamaru T, Torisawa K. Analysis of the degree of importance of information using newspapers and questionnaires. In: Proceedings of 2008 IEEE international conference on natural language processing and knowledge engineering (IEEE NLP-KE 2008). 2008. p. 137–144.
Jebara T, Jaakkola T. Feature selection and dualities in maximum entropy discrimination. In uncertainity in artificial intelligence. 2000. p. 291–300.
Murata M, Uchimoto K, Ma Q, Isahara H. A machine-learning approach to estimating the referential properties of Japanese noun phrases. Computational linguistics and intelligent text processing, second international conference, CICLing 2001, Mexico City, February 2001 proceedings. 2001. p. 142–154.
Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press; 2000.
Taira H, Haruno M. Feature selection in SVM text categorization. In: Proceedings of AAAI2001. 2001. p. 480–486.
Nakagawa T, Kudoh T, Matsumoto Y. Unknown word gussing and part-of-speech tagging using support vector machine. In: NLPRS’2001. 2001.
Suzuki J, Sasaki Y, Maeda E. SVM answer selection for open-domain question answering. In: Proceedings of the 19th international conference on computational linguistics (COLING-2002). 2002. p. 974–980.
Murata M, Ma Q, Isahara H. Comparison of three machine-learning methods for Thai part-of-speech tagging. ACM Trans Asian Lang Inf Process. 2002;1(2):145–158.
Yang Y, Liu X. A re-examination of text categorization methods. Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’99). 1999. p. 42–49.
Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H. Japanese word sense disambiguation using the simple bayes and support vector machine methods. In: Proceedings of SENSEVAL-2. 2001.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Murata, M., Uchimoto, K., Utiyama, M. et al. Using the Maximum Entropy Method for Natural Language Processing: Category Estimation, Feature Extraction, and Error Correction. Cogn Comput 2, 272–279 (2010). https://doi.org/10.1007/s12559-010-9046-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-010-9046-3