Abstract
Template-based information extraction generalizes over standard token-level binary relation extraction in the sense that it attempts to fill a complex template comprising multiple slots on the basis of information given in a text. In the approach presented in this paper, templates and possible fillers are defined by a given ontology. The information extraction task consists in filling these slots within a template with previously recognized entities or literal values. We cast the task as a structure prediction problem and propose a joint probabilistic model based on factor graphs to account for the interdependence in slot assignments. Inference is implemented as a heuristic building on Markov chain Monte Carlo sampling. As our main contribution, we investigate the impact of soft constraints modeled as single slot factors which measure preferences of individual slots for ranges of fillers, as well as pairwise slot factors modeling the compatibility between fillers of two slots. Instead of relying on expert knowledge to acquire such soft constraints, in our approach they are directly captured in the model and learned from training data. We show that both types of factors are effective in improving information extraction on a real-world data set of full-text papers from the biomedical domain. Pairwise factors are shown to particularly improve the performance of our extraction model by up to \({+}0.43\) points in precision, leading to an F\(_1\) score of 0.90 for individual templates.
Keywords
- Ontology-based information extraction
- Slot filling
- Probabilistic graphical models
- Soft constraints
- Database population
This is a preview of subscription content, access via your institution.
Buying options


Notes
References
Adel, H., Roth, B., Schütze, H.: Comparing convolutional neural networks to traditional models for slot filling. In: Proceedings of NAACL/HLT, pp. 828–838 (2016)
Banko, M., Cafarella, M., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of IJCAI, pp. 2670–2676 (2007)
Brazda, N., ter Horst, H., Hartung, M., Wiljes, C., Estrada, V., Klinger, R., Kuchinke, W., Müller, H.W., Cimiano, P.: SCIO: an ontology to support the formalization of pre-clinical spinal cord injury experiments. In: Proceedings of the 3rd JOWO Workshops: Ontologies and Data in the Life Sciences (2017)
Bunescu, R., Mooney, R.: Collective information extraction with relational markov networks. In: Proceedings of ACL, pp. 438–445 (2004)
Chang, M.W., Ratinov, L., Roth, D.: Structured learning with constrained conditional models. Mach. Learn. 88(3), 399–431 (2012)
Freitag, D.: Machine learning for information extraction in informal domains. Mach. Learn. 39(2–3), 169–202 (2000)
Haghighi, A., Klein, D.: An entity-level approach to information extraction. In: Proceedings of ACL, pp. 291–295 (2010)
Henry, S., McInnes, B.: Literature based discovery: models, methods, and trends. J. Biomed. Inform. 74, 20–32 (2017)
Kluegl, P., Toepfer, M., Lemmerich, F., Hotho, A., Puppe, F.: Collective information extraction with context-specific consistencies. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7523, pp. 728–743. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33460-3_52
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)
Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and sum product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
Lopez de Lacalle, O., Lapata, M.: Unsupervised relation extraction with general domain knowledge. In: Proceedings of EMNLP, pp. 415–425 (2013)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of ACL, pp. 1003–1011 (2009)
Paassen, B., Stöckel, A., Dickfelder, R., Göpfert, J.P., Brazda, N., Kirchhoffer, T., Müller, H.W., Klinger, R., Hartung, M., Cimiano, P.: Ontology-based extraction of structured information from publications on preclinical experiments for spinal cord injury treatments. In: Proceedings of the 3rd Workshop on Semantic Web and Information Extraction (SWAIE), pp. 25–32 (2014)
Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation extraction with matrix factorization and universal schemas. In: Proceedings of NAACL/HLT, pp. 74–84 (2013)
Singh, S., Yao, L., Belanger, D., Kobren, A., Anzaroot, S., Wick, M., Passos, A., Pandya, H., Choi, J.D., Martin, B., McCallum, A.: Universal schema for slot filling and cold start: UMass IESL at TACKBP 2013. In: Proceedings of TAC-KBP (2013)
Smith, N.A.: Linguistic Structure Prediction. Morgan and Claypool, San Rafael (2011)
Sundheim, B.M.: Overview of the fourth message understanding evaluation and conference. In: Proceedings of MUC, pp. 3–21 (1992)
Wick, M., Rohanimanesh, K., Culotta, A., McCallum, A.: SampleRank: learning preferences from atomic gradients. In: Proceedings of the NIPS Workshop on Advances in Ranking, pp. 1–5 (2009)
Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci. 36(3), 306–323 (2010)
Zhang, Y., Zhong, V., Chen, D., Angeli, G., Manning, C.D.: Position-aware attention and supervised data improve slot filling. In: Proceedings of EMNLP, pp. 35–45 (2017)
Acknowledgments
This work has been funded by the Federal Ministry of Education and Research (BMBF, Germany) in the PSINK project (project numbers 031L0028A/B).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
ter Horst, H., Hartung, M., Klinger, R., Brazda, N., Müller, H.W., Cimiano, P. (2018). Assessing the Impact of Single and Pairwise Slot Constraints in a Factor Graph Model for Template-Based Information Extraction. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-91947-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)