Abstract
We present a hybrid machine learning approach for information extraction from unstructured documents by integrating a learned classifier based on the Maximum Entropy Modeling (MEM), and a classifier based on our work on Data-Oriented Parsing (DOP). The hybrid behavior is achieved through a voting mechanism applied by an iterative tag-insertion algorithm. We have tested the method on a corpus of German newspaper articles about company turnover, and achieved 85.2% F-measure using the hybrid approach, compared to 79.3% for MEM and 51.9% for DOP when running them in isolation.
Thanks to Volker Morbach for his great help during the implementation and evaluation phase of the project. This work was supported by a research grant from BMBF to the DFKI project Quetal (FKZ: 01 IW C02).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BENDER, O., OCH, F., and NEY, H. (2003): Maximum Entropy Models for Named Entity Recognition In: Proceedings of CoNLL-2003, pp. 148–151.
BOD, R., SCHA, R. and SIMA’AN, K. (2003): Data-Oriented Parsing. CSLI Publications, University of Chicago Press.
CHIEU, H. L. and NG, H. T. (2002): A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text. In Proceedings of AAAI 2002.
DARROCH, J. N. and RATCLIFF, D. (1972). Generalized Iterative Scaling for Log-Linear Models. Annals of Mathematical Statistics, 43, pages 1470–1480.
FLORIAN, R., ITTYCHERIAH, A., JING, H., and ZHANG, T. (2003): Named Entity Recognition through Classifier Combination. In: Proceedings of CoNLL-2003, pp. 168–171.
FREITAG, D. (1998): Multistrategy Learning for Information Extraction. In Proceedings of the 15th ICML, pages 161–169.
NEUMANN, G. (2003): A Data-Driven Approach to Head-Driven Phrase Structure Grammar. In: R. Scha, R. Bod, and K. Sima’an (eds.) Data-Oriented Parsing, pages 233–251.
NEUMANN, G. and PISKORSKI, J. (2002): A Shallow Text Processing Core Engine. Journal of Computational Intelligence, 18, 451–476.
PIETRA, S. D., PIETRA, V. J. and LAFFERTY, J. D. (1997): Inducing Features of Random Fields. Journal of IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 380–393.
RATNAPARKHI, A. (1998): Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. Thesis, University of Pennsylvania, Philadelphia, PA.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Berlin · Heidelberg
About this paper
Cite this paper
Neumann, G. (2006). A Hybrid Machine Learning Approach for Information Extraction from Free Text. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_47
Download citation
DOI: https://doi.org/10.1007/3-540-31314-1_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)