A Hybrid Machine Learning Approach for Information Extraction from Free Text

Neumann, Günter

doi:10.1007/3-540-31314-1_47

Günter Neumann²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2209 Accesses
1 Citations

Abstract

We present a hybrid machine learning approach for information extraction from unstructured documents by integrating a learned classifier based on the Maximum Entropy Modeling (MEM), and a classifier based on our work on Data-Oriented Parsing (DOP). The hybrid behavior is achieved through a voting mechanism applied by an iterative tag-insertion algorithm. We have tested the method on a corpus of German newspaper articles about company turnover, and achieved 85.2% F-measure using the hybrid approach, compared to 79.3% for MEM and 51.9% for DOP when running them in isolation.

Thanks to Volker Morbach for his great help during the implementation and evaluation phase of the project. This work was supported by a research grant from BMBF to the DFKI project Quetal (FKZ: 01 IW C02).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BENDER, O., OCH, F., and NEY, H. (2003): Maximum Entropy Models for Named Entity Recognition In: Proceedings of CoNLL-2003, pp. 148–151.
Google Scholar
BOD, R., SCHA, R. and SIMA’AN, K. (2003): Data-Oriented Parsing. CSLI Publications, University of Chicago Press.
Google Scholar
CHIEU, H. L. and NG, H. T. (2002): A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text. In Proceedings of AAAI 2002.
Google Scholar
DARROCH, J. N. and RATCLIFF, D. (1972). Generalized Iterative Scaling for Log-Linear Models. Annals of Mathematical Statistics, 43, pages 1470–1480.
MathSciNet Google Scholar
FLORIAN, R., ITTYCHERIAH, A., JING, H., and ZHANG, T. (2003): Named Entity Recognition through Classifier Combination. In: Proceedings of CoNLL-2003, pp. 168–171.
Google Scholar
FREITAG, D. (1998): Multistrategy Learning for Information Extraction. In Proceedings of the 15th ICML, pages 161–169.
Google Scholar
NEUMANN, G. (2003): A Data-Driven Approach to Head-Driven Phrase Structure Grammar. In: R. Scha, R. Bod, and K. Sima’an (eds.) Data-Oriented Parsing, pages 233–251.
Google Scholar
NEUMANN, G. and PISKORSKI, J. (2002): A Shallow Text Processing Core Engine. Journal of Computational Intelligence, 18, 451–476.
Google Scholar
PIETRA, S. D., PIETRA, V. J. and LAFFERTY, J. D. (1997): Inducing Features of Random Fields. Journal of IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 380–393.
Google Scholar
RATNAPARKHI, A. (1998): Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. Thesis, University of Pennsylvania, Philadelphia, PA.
Google Scholar

Download references

Author information

Authors and Affiliations

LT-Lab, DFKI Saarbrücken, D-66123, Saarbrücken, Germany
Günter Neumann

Authors

Günter Neumann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Technische und Betriebliche Informationssysteme, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Myra Spiliopoulou
Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Rudolf Kruse , Christian Borgelt & Andreas Nürnberger , &
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neumann, G. (2006). A Hybrid Machine Learning Approach for Information Extraction from Free Text. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_47

Download citation

DOI: https://doi.org/10.1007/3-540-31314-1_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics