A Machine Learning Approach to Information Extraction

Téllez-Valero, Alberto; Montes-y-Gómez, Manuel; Villaseñor-Pineda, Luis

doi:10.1007/978-3-540-30586-6_58

Alberto Téllez-Valero¹⁷,
Manuel Montes-y-Gómez^17,18 &
Luis Villaseñor-Pineda¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2404 Accesses
5 Citations

Abstract

Information extraction is concerned with applying natural language processing to automatically extract the essential details from text documents. A great disadvantage of current approaches is their intrinsic dependence to the application domain and the target language. Several machine learning techniques have been applied in order to facilitate the portability of the information extraction systems. This paper describes a general method for building an information extraction system using regular expressions along with supervised learning algorithms. In this method, the extraction decisions are lead by a set of classifiers instead of sophisticated linguistic analyses. The paper also shows a system called TOPO that allows to extract the information related with natural disasters from newspaper articles in Spanish language. Experimental results of this system indicate that the proposed method can be a practical solution for building information extraction systems reaching an F-measure as high as 72%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Semantic Relation Extraction. Resources, Tools and Strategies

A Model for Information Extraction in Portuguese Based on Text Patterns

Decisions Prediction Techniques Using Language Processing and Learning Algorithms

References

Bouckaert, R.: Low level information extraction. In: Proceedings of the workshop on Text Learning (TextML 2002), Sydney, Australia (2002)
Google Scholar
Cowie, J., Lehnert, W.: Information Extraction. Communications of the ACM 39(1), 80–91 (1996)
Article Google Scholar
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Ph.d. thesis, Computer Science Department, Carnegie Mellon University (1998)
Google Scholar
Hobbs, J.R.: The Generic Information Extraction System. In: Proceedings of the Fifth Message Understanding Conference (1993)
Google Scholar
Kushmerick, N., Johnston, E., McGuinness, S.: Information Extraction by Text Classification. In: Kushmerick, N. (ed.) Seventeenth International Join Conference on Artificial Intelligence (IJCAI 2001), Adaptive Text Extraction and Mining (Working Notes), Seattle, Washington, pp. 44–50 (2001)
Google Scholar
LA RED: Guía Metodológica de DesInventar. OSSO/ITDG, Lima (2003)
Google Scholar
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Michel, T.: Machine Learning. McGraw-Hill, New York (1997)
Google Scholar
Muslea, I.: Extraction Patterns for Information Extractions Tasks: A Survey. In: Proceedings of the AAAI Workshop on Machine Learning for Information Extraction (1999)
Google Scholar
Peng, F.: Models Development in IE Tasks - A survey. CS685 (Intelligent Computer Interface) course project, Computer Science Department, University of Waterloo (1999)
Google Scholar
Riloff, E.: Automatically Generating Extraction Patterns from untagged text. In: Proceedings of the 13th National Conference on Artificial Intelligence (AAAI), pp. 1044–1049 (1996)
Google Scholar
Roth, D., Yih, W.: Relational Learning Via Propositional Algorithms: An Information Extraction Case Study. In: Proceedings of the 15th International Conference on Artificial Intelligence, IJCAI (2001)
Google Scholar
Sebastiani, F.: Machine Learning in Automated Text Categorization: a Survey. Technical Report IEI-B4-31-1999, Istituto di Elaborazione dell’Informazione (1999)
Google Scholar
Seymore, K., McCallum, A., Rosenfeld, R.: Learning Hidden Markov Model structure for Information Extraction. In: Proceedings of the 20th National Conference on Artificial Intelligence, AAAI (1999)
Google Scholar
Sonderland, S., Fisher, D., Aseltine, J., Lehnert, W.: CRYSTAL: Inducing a Conceptual Dictionary. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1314–1321 (1995)
Google Scholar
Sonderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning (34), 233–272 (1999)
Google Scholar
Turno, J.: Information Extraction, Multilinguality and Portability. Revista Iberoamericana de Inteligencia Artificial (22), 57–78 (2003)
Google Scholar
Zavrel, J., Berck, P., Lavrijssen, W.: Information Extraction by Text Classification: Corpus Mining for Features. In: Proceedings of the workshop Information Extraction meets Corpus Linguistics, Athens, Greece (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technologies Group, Computer Science Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Mexico
Alberto Téllez-Valero, Manuel Montes-y-Gómez & Luis Villaseñor-Pineda
Department of Information Systems and Computation, Polytechnic University of Valencia, Spain
Manuel Montes-y-Gómez

Authors

Alberto Téllez-Valero
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Montes-y-Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Luis Villaseñor-Pineda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Téllez-Valero, A., Montes-y-Gómez, M., Villaseñor-Pineda, L. (2005). A Machine Learning Approach to Information Extraction. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_58

Download citation

DOI: https://doi.org/10.1007/978-3-540-30586-6_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Machine Learning Approach to Information Extraction

Abstract

Access this chapter

Preview

Similar content being viewed by others

Semantic Relation Extraction. Resources, Tools and Strategies

A Model for Information Extraction in Portuguese Based on Text Patterns

Decisions Prediction Techniques Using Language Processing and Learning Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Machine Learning Approach to Information Extraction

Abstract

Access this chapter

Preview

Similar content being viewed by others

Semantic Relation Extraction. Resources, Tools and Strategies

A Model for Information Extraction in Portuguese Based on Text Patterns

Decisions Prediction Techniques Using Language Processing and Learning Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation