Cognitive Computation

, Volume 2, Issue 4, pp 272–279 | Cite as

Using the Maximum Entropy Method for Natural Language Processing: Category Estimation, Feature Extraction, and Error Correction

  • Masaki MurataEmail author
  • Kiyotaka Uchimoto
  • Masao Utiyama
  • Qing Ma
  • Ryo Nishimura
  • Yasuhiko Watanabe
  • Kouichi Doi
  • Kentaro Torisawa


The maximum entropy (ME) method is a powerful supervised machine learning technique that is useful for various tasks. In this paper, we introduce new studies that successfully employ ME for natural language processing (NLP) problems including machine translation and information extraction. Specifically, we demonstrate, using simulation results, three applications of ME for NLP: estimation of categories, extraction of important features, and correction of error data items. We also evaluate the comparative performance of the proposed ME methods with other state-of-the-art approaches.


Maximum entropy Category estimation Important feature Error correction 


  1. 1.
    Berger AL, Pietra SAD, Pietra VJD. A maximum entropy approach to natural language processing. Comput Linguist. 1996;22(1):39–71.Google Scholar
  2. 2.
    Ristad ES. Maximum entropy modeling for natural language. Madrid: ACL/EACL Tutorial Program;1997.Google Scholar
  3. 3.
    Pietra SD, Pietra VD, Lafferty J. Inducing features of random fields. Technical report, Carnegie Mellon University CMU-CS-95-144. 1995.Google Scholar
  4. 4.
    Utiyama M. Maximum entropy modeling package. 2006.
  5. 5.
    Murata M, Ma Q, Uchimoto K, Kanamaru T, Isahara H. Japanese-to-English translations of tense, aspect, and modality using machine-learning methods and comparison with machine-translation systems on market. Lang Resour Eval. 2007;40:233–242.CrossRefGoogle Scholar
  6. 6.
    Ratnaparkhi A. A maximum entropy model for part-of-speech tagging. In: Proceedings of empirical methods for natural language processing. 1996. p. 133–142.Google Scholar
  7. 7.
    Borthwick A, Sterling J, Agichtein E, Grishman R. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Proceedings of the sixth workshop on very large corpora. 1998. p. 152–160.Google Scholar
  8. 8.
    Ratnaparkhi A. A linear observed time statistical parser based on maximum entropy models. In: Proceedings of empirical methods for natural language processing. 1997.Google Scholar
  9. 9.
    Nigam K, Lafferty J, McCallum A. Using maximum entropy for text classification. In: Proceedings of the IJCAI-99 workshop on machine learning for information filtering. 1999. p. 61–67.Google Scholar
  10. 10.
    Uchimoto K, Murata M, Ozaku H, Ma Q, Isahara H. Named entity extraction based on maximum entropy model and transformation rules. In: Proceedings of the 38th annual meeting of the association of computational linguistics. 2000.Google Scholar
  11. 11.
    Ittycheriah A, Franz M, Zhu WJ, Ratnaparkhi A. Question answering using maximum entropy components. NAACL-2001. 2001.Google Scholar
  12. 12.
    Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H. Correction of errors in a verb modality corpus used for machine translation with a machine-learning method. ACM Trans Asian Lang Inf Process. 2005;4(1):18–37.Google Scholar
  13. 13.
    Murata M, Nishimura R, Doi K, Kanamaru T, Torisawa K. Analysis of the degree of importance of information using newspapers and questionnaires. In: Proceedings of 2008 IEEE international conference on natural language processing and knowledge engineering (IEEE NLP-KE 2008). 2008. p. 137–144. Google Scholar
  14. 14.
    Jebara T, Jaakkola T. Feature selection and dualities in maximum entropy discrimination. In uncertainity in artificial intelligence. 2000. p. 291–300.Google Scholar
  15. 15.
    Murata M, Uchimoto K, Ma Q, Isahara H. A machine-learning approach to estimating the referential properties of Japanese noun phrases. Computational linguistics and intelligent text processing, second international conference, CICLing 2001, Mexico City, February 2001 proceedings. 2001. p. 142–154.Google Scholar
  16. 16.
    Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press; 2000.Google Scholar
  17. 17.
    Taira H, Haruno M. Feature selection in SVM text categorization. In: Proceedings of AAAI2001. 2001. p. 480–486.Google Scholar
  18. 18.
    Nakagawa T, Kudoh T, Matsumoto Y. Unknown word gussing and part-of-speech tagging using support vector machine. In: NLPRS’2001. 2001.Google Scholar
  19. 19.
    Suzuki J, Sasaki Y, Maeda E. SVM answer selection for open-domain question answering. In: Proceedings of the 19th international conference on computational linguistics (COLING-2002). 2002. p. 974–980.Google Scholar
  20. 20.
    Murata M, Ma Q, Isahara H. Comparison of three machine-learning methods for Thai part-of-speech tagging. ACM Trans Asian Lang Inf Process. 2002;1(2):145–158.CrossRefGoogle Scholar
  21. 21.
    Yang Y, Liu X. A re-examination of text categorization methods. Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’99). 1999. p. 42–49.Google Scholar
  22. 22.
    Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H. Japanese word sense disambiguation using the simple bayes and support vector machine methods. In: Proceedings of SENSEVAL-2. 2001.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Masaki Murata
    • 1
    Email author
  • Kiyotaka Uchimoto
    • 2
  • Masao Utiyama
    • 2
  • Qing Ma
    • 3
  • Ryo Nishimura
    • 3
  • Yasuhiko Watanabe
    • 3
  • Kouichi Doi
    • 4
  • Kentaro Torisawa
    • 2
  1. 1.Tottori UniversityTottoriJapan
  2. 2.National Institute of Information and Communications TechnologyKyotoJapan
  3. 3.Ryukoku UniversityShigaJapan
  4. 4.Pharma Security Consulting Inc.TokyoJapan

Personalised recommendations