Skip to main content
Log in

Using the Maximum Entropy Method for Natural Language Processing: Category Estimation, Feature Extraction, and Error Correction

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

The maximum entropy (ME) method is a powerful supervised machine learning technique that is useful for various tasks. In this paper, we introduce new studies that successfully employ ME for natural language processing (NLP) problems including machine translation and information extraction. Specifically, we demonstrate, using simulation results, three applications of ME for NLP: estimation of categories, extraction of important features, and correction of error data items. We also evaluate the comparative performance of the proposed ME methods with other state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. There are many studies on categorization using the ME method [6, 7, 8, 9, 10, 11].

  2. We used anonymous symbols such as A to F because the six translation systems are available on the market, and we did not want to influence the market.

  3. There are many studies on feature selection [2, 14]. However, their main purpose is to decrease features for better learning, while our purpose of extracting features is to examine the experimental results. In addition to a study using the ME method for extracting features and examining experimental results, we conducted another study that estimated the referential properties of noun phrases [15]. In the study, we classified features into two types: (i) strong features on which an output category was strongly dependent and by which it was necessarily determined (the normalized alpha values of the features were almost the same as 1.0, which is the maximum value because it is a kind of probability.) and (ii) weak features that, although they showed a tendency concerning which category was likely to be the output category, another category could be an output category when other stronger features appeared. The classified results were highly useful for examining the experimental results.

  4. This correction is identical as changing an original category to the category estimated by the method in Sect. “Method of Categorization”. The technique for error correction is thus related to that for category estimation.

  5. Although the simple Bayes method sometimes shows high performance in such restricted tasks as text categorization [21] and word disambiguation [22], it generally shows low performance in categorization tasks.

References

  1. Berger AL, Pietra SAD, Pietra VJD. A maximum entropy approach to natural language processing. Comput Linguist. 1996;22(1):39–71.

    Google Scholar 

  2. Ristad ES. Maximum entropy modeling for natural language. Madrid: ACL/EACL Tutorial Program;1997.

    Google Scholar 

  3. Pietra SD, Pietra VD, Lafferty J. Inducing features of random fields. Technical report, Carnegie Mellon University CMU-CS-95-144. 1995.

  4. Utiyama M. Maximum entropy modeling package. 2006. http://www.nict.go.jp/x/x161/members/mutiyama/software.html#maxent.

  5. Murata M, Ma Q, Uchimoto K, Kanamaru T, Isahara H. Japanese-to-English translations of tense, aspect, and modality using machine-learning methods and comparison with machine-translation systems on market. Lang Resour Eval. 2007;40:233–242.

    Article  Google Scholar 

  6. Ratnaparkhi A. A maximum entropy model for part-of-speech tagging. In: Proceedings of empirical methods for natural language processing. 1996. p. 133–142.

  7. Borthwick A, Sterling J, Agichtein E, Grishman R. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Proceedings of the sixth workshop on very large corpora. 1998. p. 152–160.

  8. Ratnaparkhi A. A linear observed time statistical parser based on maximum entropy models. In: Proceedings of empirical methods for natural language processing. 1997.

  9. Nigam K, Lafferty J, McCallum A. Using maximum entropy for text classification. In: Proceedings of the IJCAI-99 workshop on machine learning for information filtering. 1999. p. 61–67.

  10. Uchimoto K, Murata M, Ozaku H, Ma Q, Isahara H. Named entity extraction based on maximum entropy model and transformation rules. In: Proceedings of the 38th annual meeting of the association of computational linguistics. 2000.

  11. Ittycheriah A, Franz M, Zhu WJ, Ratnaparkhi A. Question answering using maximum entropy components. NAACL-2001. 2001.

  12. Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H. Correction of errors in a verb modality corpus used for machine translation with a machine-learning method. ACM Trans Asian Lang Inf Process. 2005;4(1):18–37.

    Google Scholar 

  13. Murata M, Nishimura R, Doi K, Kanamaru T, Torisawa K. Analysis of the degree of importance of information using newspapers and questionnaires. In: Proceedings of 2008 IEEE international conference on natural language processing and knowledge engineering (IEEE NLP-KE 2008). 2008. p. 137–144.

  14. Jebara T, Jaakkola T. Feature selection and dualities in maximum entropy discrimination. In uncertainity in artificial intelligence. 2000. p. 291–300.

  15. Murata M, Uchimoto K, Ma Q, Isahara H. A machine-learning approach to estimating the referential properties of Japanese noun phrases. Computational linguistics and intelligent text processing, second international conference, CICLing 2001, Mexico City, February 2001 proceedings. 2001. p. 142–154.

  16. Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press; 2000.

    Google Scholar 

  17. Taira H, Haruno M. Feature selection in SVM text categorization. In: Proceedings of AAAI2001. 2001. p. 480–486.

  18. Nakagawa T, Kudoh T, Matsumoto Y. Unknown word gussing and part-of-speech tagging using support vector machine. In: NLPRS’2001. 2001.

  19. Suzuki J, Sasaki Y, Maeda E. SVM answer selection for open-domain question answering. In: Proceedings of the 19th international conference on computational linguistics (COLING-2002). 2002. p. 974–980.

  20. Murata M, Ma Q, Isahara H. Comparison of three machine-learning methods for Thai part-of-speech tagging. ACM Trans Asian Lang Inf Process. 2002;1(2):145–158.

    Article  Google Scholar 

  21. Yang Y, Liu X. A re-examination of text categorization methods. Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’99). 1999. p. 42–49.

  22. Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H. Japanese word sense disambiguation using the simple bayes and support vector machine methods. In: Proceedings of SENSEVAL-2. 2001.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masaki Murata.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Murata, M., Uchimoto, K., Utiyama, M. et al. Using the Maximum Entropy Method for Natural Language Processing: Category Estimation, Feature Extraction, and Error Correction. Cogn Comput 2, 272–279 (2010). https://doi.org/10.1007/s12559-010-9046-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-010-9046-3

Keywords

Navigation