Information Extraction from Text

Chapter

Abstract

Information extraction is the task of finding structured information from unstructured or semi-structured text. It is an important task in text mining and has been extensively studied in various research communities including natural language processing, information retrieval and Web mining. It has a wide range of applications in domains such as biomedical literature mining and business intelligence. Two fundamental tasks of information extraction are named entity recognition and relation extraction. The former refers to finding names of entities such as people, organizations and locations. The latter refers to finding the semantic relations such as FounderOf and HeadquarteredIn between entities. In this chapter we provide a survey of the major work on named entity recognition and relation extraction in the past few decades, with a focus on work from the natural language processing community.

Keywords

Information extraction named entity recognition relation extraction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Automatic content extraction (ACE) evaluation. http://www.itl. nist.gov/iad/mig/tests/ace/.Google Scholar
  2. 2.
    BioCreAtIvE. http://www.biocreative.org/.Google Scholar
  3. 3.
    Eugene Agichtein and Luis Gravano. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM Conference on Digital Libraries, pages 85–94, 2000.Google Scholar
  4. 4.
    Douglas E. Appelt, Jerry R. Hobbs, John Bear, David Israel, and Mabry Tyson. FASTUS: A finite-state processor for information extraction from real-world text. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1993.Google Scholar
  5. 5.
    Andrew Arnold, Ramesh Nallapati, and William W. Cohen. Exploiting feature hierarchy for transfer learning in named entity recognition. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pages 245–253, 2008.Google Scholar
  6. 6.
    Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. Open information extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 2670–2676, 2007.Google Scholar
  7. 7.
    Michele Banko and Oren Etzioni. The tradeoffs between open and traditional relation extraction. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pages 28– 36, 2008.Google Scholar
  8. 8.
    Oliver Bender, Franz Josef Och, and Hermann Ney. Maximum entropy models for named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning, 2003.Google Scholar
  9. 9.
    Adam L. Bergert, Vincent J. Della Pietra, and Stephen A. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, March 1996. [10] Daniel M. Bikel, Scott Miller, Richard Schwartz, and Ralph Weischedel. Nymble: a high-performance learning name-finder. In Proceedings of the 5th Conference on Applied Natural Language Processing, pages 194–201, 1997.Google Scholar
  10. 10.
    Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 1247–1250, 2008.Google Scholar
  11. 11.
    Sergey Brin. Extracting patterns and relations from the World Wide Web. In Proceedings of the 1998 International Workshop on the Web and Databases, 1998.Google Scholar
  12. 12.
    Razvan Bunescu and Raymond Mooney. A shortest path dependency kernel for relation extraction. In Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing, pages 724–731, 2005.Google Scholar
  13. 13.
    Razvan Bunescu and Raymond Mooney. Subsequence kernels for relation extraction. In Advances in Neural Information Processing Systems 18, pages 171–178. 2006.Google Scholar
  14. 14.
    Richard H. Byrd, Jorge Nocedal, and Robert B. Schnabel. Representations of quasi-newton matrices and their use in limited memory methods. Journal of Mathematical Programming, 63(2):129–156, January 1994.MATHGoogle Scholar
  15. 15.
    Mary Elaine Califf and Raymond J. Mooney. Relational learning of pattern-match rules for information extraction. In Proceedings of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference, pages 328–334, 1999.Google Scholar
  16. 16.
    Nathanael Chambers and Dan Jurafsky. Template-based information extraction without the templates. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 976–986, 2011.Google Scholar
  17. 17.
    Yee Seng Chan and Dan Roth. Exploiting background knowledge for relation extraction. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 152–160, 2010.Google Scholar
  18. 18.
    Yee Seng Chan and Dan Roth. Exploiting syntactico-semantic structures for relation extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 551–560, 2011.Google Scholar
  19. 19.
    Chia-Hui Chang, Mohammed Kayed, Moheb Ramzy Girgis, and Khaled F. Shaalan. A survey of Web information extraction sys tems. IEEE Transactions on Knowledge and Data Engineering, 18(10):1411–1428, October 2006.Google Scholar
  20. 20.
    Tao Cheng, Xifeng Yan, and Kevin Chen-Chuan Chang. Supporting entity search: a large-scale prototype search engine. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pages 1144–1146, 2007.Google Scholar
  21. 21.
    Hai Leong Chieu and Hwee Tou Ng. Named entity recognition with a maximum entropy approach. In Proceedings of the Seventh Conference on Natural Language Learning, pages 160–163, 2003.Google Scholar
  22. 22.
    Fabio Ciravegna. Adaptive information extraction from text by rule induction and generalisation. In Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2, pages 1251–1256, 2001.Google Scholar
  23. 23.
    Michael Collins and Nigel Duffy. Convolution kernels for natural language. In Advances in Neural Information Processing Systems 13. 2001.Google Scholar
  24. 24.
    Valter Crescenzi, Giansalvatore Mecca, and Paolo Merialdo. Road- Runner: Towards automatic data extraction from large Web sites. In Proceedings of the 27th International Conference on Very Large Data Bases, pages 109–118, 2001.Google Scholar
  25. 25.
    Aron Culotta and Jeffrey Sorensen. Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pages 423–429, 2004.Google Scholar
  26. 26.
    James R. Curran and Stephen Clark. Language independent NER using a maximum entropy tagger. In Proceedings of the 7th Conference on Natural Language Learning, 2003.Google Scholar
  27. 27.
    Gerald DeJong. Prediction and substantiation: A new approach to natural language processing. Cognitive Science, 3:251–173, 1979.CrossRefGoogle Scholar
  28. 28.
    Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying relations for open information extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1535–1545, 2011.Google Scholar
  29. 29.
    Jenny Finkel, Shipra Dingare, Christopher D. Manning, Malvina Nissim, Beatrice Alex, and Claire Grover. Exploring the boundaries: gene and protein identification in biomedical text. BMC Bioinformatics, 6(Suppl 1)(S5), 2005.Google Scholar
  30. 30.
    Sergio Flesca, Giuseppe Manco, Elio Masciari, Eugenio Rende, and Andrea Tagarelli. Web wrapper induction: a brief survey. AI Communications, 17(2):57–61, April 2004.Google Scholar
  31. 31.
    Ralph Grishman, John Sterling, and Catherine Macleod. New York University: Description of the PROTEUS system as used for MUC- 3. In Proceedings of the 3rd Message Understadning Conference, pages 183–190, 1991.Google Scholar
  32. 32.
    Ralph Grishman and Beth Sundheim. Message understanding conference-6: A brief history. In Proceedings of the 16th International Conference on Computational Linguistics, pages 466–471, 1996.Google Scholar
  33. 33.
    Guoping Hu, Jingjing Liu, Hang Li, Yunbo Cao, Jian-Yun Nie, and Jianfeng Gao. A supervised learning approach to entity search. In Proceedings of the 3rd Asia Information Retrieval Symposium, pages 54–66, 2006.Google Scholar
  34. 34.
    Hideki Isozaki and Hideto Kazawa. Efficient support vector classifiers for named entity recognition. In Proceedings of the 19th International Conference on Computational Linguistics, 2002.Google Scholar
  35. 35.
    Jing Jiang and ChengXiang Zhai. Exploiting domain structure for named entity recognition. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 74–81, 2006.Google Scholar
  36. 36.
    Jing Jiang and ChengXiang Zhai. A systematic exploration of the feature space for relation extraction. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 113–120, 2007.Google Scholar
  37. 37.
    Nanda Kambhatla. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In The Companion Volume to the Proceedings of 42st Annual Meeting of the Association for Computational Linguistics, pages 178–181, 2004.Google Scholar
  38. 38.
    Dan Klein, Joseph Smarr, Huy Nguyen, and Christopher D. Manning. Named entity recognition with character-level models. In Proceedings of the 7th Conference on Natural Language Learning, 2003.Google Scholar
  39. 39.
    Nicholas Kushmerick, Daniel S. Weld, and Robert Doorenbos. Wrapper induction for information extraction. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, 1997.Google Scholar
  40. 40.
    John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pages 282–289, 2001.Google Scholar
  41. 41.
    Wendy Lehnert, Claire Cardie, Divid Fisher, Ellen Riloff, and Robert Williams. University of Massachusetts: Description of the CIRCUS system as used for MUC-3. In Proceedings of the 3rd Message Understadning Conference, pages 223–233, 1991.Google Scholar
  42. 42.
    Cane Wing-ki Leung, Jing Jiang, Kian Ming A. Chai, Hai Leong Chieu, and Loo-Nin Teow. Unsupervised information extraction with distributional prior knowledge. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 814–824, 2011.Google Scholar
  43. 43.
    Xin Li and Dan Roth. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics, pages 1–7, 2002.Google Scholar
  44. 44.
    Liu Ling, Calton Pu, and Wei Han. XWRAP: An XML-enabled wrapper construction system for Web information sources. In Proceedings of the 16th International Conference on Data Engineering, pages 611–621, 2000.Google Scholar
  45. 45.
    Robert Malouf. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of the 6th Conference on Natural Language Learning, 2002.Google Scholar
  46. 46.
    Zvika Marx, Ido Dagan, and Eli Shamir. Cross-component clustering for template learning. In Proceedings of the 2002 ICML Workshop on Text Learning, 2002.Google Scholar
  47. 47.
    Andrew McCallum, Dayne Freitag, and Fernando C. N. Pereira. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the 17th International Conference on Machine Learning, pages 591–598, 2000.Google Scholar
  48. 48.
    Andrew McCallum and Wei Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the 7th Conference on Natural Language Learning, 2003.Google Scholar
  49. 49.
    Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–1011, 2009.Google Scholar
  50. 50.
    Truc Vien T. Nguyen and Alessandro Moschitti. End-to-end relation extraction using distant supervision from external semantic repositories. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 277–282, 2011.Google Scholar
  51. 51.
    Tomoko Ohta, Yuka Tateisi, and Jin-Dong Kim. The GENIA corpus: an annotated research abstract corpus in molecular biology domain. In Proceedings of the 2nd International Conference on Human Language Technology Research, pages 82–86, 2002.Google Scholar
  52. 52.
    Longhua Qian, Guodong Zhou, Fang Kong, Qiaoming Zhu, and Peide Qian. Exploiting constituent dependencies for tree kernelbased semantic relation extraction. In Proceedings of the 22nd International Conference on Computational Linguistics, pages 697–704, 2008.Google Scholar
  53. 53.
    Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. 77, 77(2):257–286, 1989.Google Scholar
  54. 54.
    Lance A. Ramshaw and Mitch P. Marcus. Text chunking using transformation-based learning. In Proceedings of the 3rd Workship on Very Large Corpora, pages 82–94, 1995.Google Scholar
  55. 55.
    Lisa F. Rau. Extracting company names from text. In Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications, pages 29–32, 1991.Google Scholar
  56. 56.
    Benjamin Rosenfeld and Ronen Feldman. Clustering for unsupervised relation identification. In Proceedings of the 16th ACM conference on Conference on Information and Knowledge Management, pages 411–418, 2007.Google Scholar
  57. 57.
    Sunita Sarawagi and William W. Cohen. Semi-markov conditional random fields for information extraction. In Advances in Neural Information Processing Systems 17, pages 1185–1192. 2005.Google Scholar
  58. 58.
    Burr Settles. Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, pages 104–107, 2004.Google Scholar
  59. 59.
    Yusuke Shinyama and Satoshi Sekine. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 304–311, 2006.Google Scholar
  60. 60.
    Stephen Soderland. Learning information extraction rules for semistructured and free text. Machine Learning, 34(1-3):233–272, February 1999.MATHCrossRefGoogle Scholar
  61. 61.
    Stephen Soderland, David Fisher, Jonathan Aseltine, and Wendy Lehnert. CRYSTAL inducing a conceptual dictionary. In Proceed ings of the 14th International Joint Conference on Artificial Intelligence, pages 1314–1319, 1995.Google Scholar
  62. 62.
    Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning, pages 142–147, 2003. [64] Richard Tzong-Han Tsai, Shih-Hung Wu, Wen-Chi Chou, Yu-Chun Lin, Ding He, Jieh Hsiang, Ting-Yi Sung, and Wen-Lian Hsu. Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics, 7(92), 2006.Google Scholar
  63. 63.
    Vladimir Vapnik. Statistical Learning Theory. John Wiley & Sons, 2008.Google Scholar
  64. 64.
    Fei Wu and Daniel S. Weld. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 118–127, 2010. [67] Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. Kernel methods for relation extraction. Journal of Machine Learning Research, 3:1083–1106, February 2003.Google Scholar
  65. 65.
    Min Zhang, Jie Zhang, and Jian Su. Exploring syntactic features for relation extraction using a convolution tree kernel. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 288–295, 2006.Google Scholar
  66. 66.
    Min Zhang, Jie Zhang, Jian Su, and GuoDong Zhou. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 825–832, 2006.Google Scholar
  67. 67.
    Shubin Zhao and Ralph Grishman. Extracting relations with integrated information using kernel methods. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 419–426, 2005.Google Scholar
  68. 68.
    GuoDong Zhou, Jian Su, Jie Zhang, and Min Zhang. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 427–434, 2005.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Singapore Management UniversitySingaporeSingapore

Personalised recommendations