Skip to main content

A Machine Learning Approach to Information Extraction

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2005)

Abstract

Information extraction is concerned with applying natural language processing to automatically extract the essential details from text documents. A great disadvantage of current approaches is their intrinsic dependence to the application domain and the target language. Several machine learning techniques have been applied in order to facilitate the portability of the information extraction systems. This paper describes a general method for building an information extraction system using regular expressions along with supervised learning algorithms. In this method, the extraction decisions are lead by a set of classifiers instead of sophisticated linguistic analyses. The paper also shows a system called TOPO that allows to extract the information related with natural disasters from newspaper articles in Spanish language. Experimental results of this system indicate that the proposed method can be a practical solution for building information extraction systems reaching an F-measure as high as 72%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bouckaert, R.: Low level information extraction. In: Proceedings of the workshop on Text Learning (TextML 2002), Sydney, Australia (2002)

    Google Scholar 

  2. Cowie, J., Lehnert, W.: Information Extraction. Communications of the ACM 39(1), 80–91 (1996)

    Article  Google Scholar 

  3. Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Ph.d. thesis, Computer Science Department, Carnegie Mellon University (1998)

    Google Scholar 

  4. Hobbs, J.R.: The Generic Information Extraction System. In: Proceedings of the Fifth Message Understanding Conference (1993)

    Google Scholar 

  5. Kushmerick, N., Johnston, E., McGuinness, S.: Information Extraction by Text Classification. In: Kushmerick, N. (ed.) Seventeenth International Join Conference on Artificial Intelligence (IJCAI 2001), Adaptive Text Extraction and Mining (Working Notes), Seattle, Washington, pp. 44–50 (2001)

    Google Scholar 

  6. LA RED: Guía Metodológica de DesInventar. OSSO/ITDG, Lima (2003)

    Google Scholar 

  7. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  8. Michel, T.: Machine Learning. McGraw-Hill, New York (1997)

    Google Scholar 

  9. Muslea, I.: Extraction Patterns for Information Extractions Tasks: A Survey. In: Proceedings of the AAAI Workshop on Machine Learning for Information Extraction (1999)

    Google Scholar 

  10. Peng, F.: Models Development in IE Tasks - A survey. CS685 (Intelligent Computer Interface) course project, Computer Science Department, University of Waterloo (1999)

    Google Scholar 

  11. Riloff, E.: Automatically Generating Extraction Patterns from untagged text. In: Proceedings of the 13th National Conference on Artificial Intelligence (AAAI), pp. 1044–1049 (1996)

    Google Scholar 

  12. Roth, D., Yih, W.: Relational Learning Via Propositional Algorithms: An Information Extraction Case Study. In: Proceedings of the 15th International Conference on Artificial Intelligence, IJCAI (2001)

    Google Scholar 

  13. Sebastiani, F.: Machine Learning in Automated Text Categorization: a Survey. Technical Report IEI-B4-31-1999, Istituto di Elaborazione dell’Informazione (1999)

    Google Scholar 

  14. Seymore, K., McCallum, A., Rosenfeld, R.: Learning Hidden Markov Model structure for Information Extraction. In: Proceedings of the 20th National Conference on Artificial Intelligence, AAAI (1999)

    Google Scholar 

  15. Sonderland, S., Fisher, D., Aseltine, J., Lehnert, W.: CRYSTAL: Inducing a Conceptual Dictionary. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1314–1321 (1995)

    Google Scholar 

  16. Sonderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning (34), 233–272 (1999)

    Google Scholar 

  17. Turno, J.: Information Extraction, Multilinguality and Portability. Revista Iberoamericana de Inteligencia Artificial (22), 57–78 (2003)

    Google Scholar 

  18. Zavrel, J., Berck, P., Lavrijssen, W.: Information Extraction by Text Classification: Corpus Mining for Features. In: Proceedings of the workshop Information Extraction meets Corpus Linguistics, Athens, Greece (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Téllez-Valero, A., Montes-y-Gómez, M., Villaseñor-Pineda, L. (2005). A Machine Learning Approach to Information Extraction. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30586-6_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24523-0

  • Online ISBN: 978-3-540-30586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics