DAVE: Extracting Domain Attributes and Values from Text Corpus
Open Information Extraction (OpenIE) has been studied extensively, targeting at extracting structured information from free text. In this paper, we work on a novel OpenIE problem defined as Domain-specified Attribute-Value Extraction: Given a text corpus and a domain Knowledge Base (KB) with a number of domain attributes and corresponding attribute values, the task is to extend the KB by identifying more domain attributes and attribute values from the text corpus. Existing solutions adopted from the other OpenIE problems rely heavily on either using deep linguistic parsing or identifying effective lexical patterns. However, linguistic parsing does not always work well especially on short texts, while learning lexical patterns is too strict to reach a high extraction recall. In this paper, we propose an effective graph-based iterative extraction approach based on the cooccurrence between attribute terms and attribute value terms in the same sentences. Our experiments performed on two large real world data collections demonstrate that our method outperforms state-of-the-art approaches in reaching 10% higher extraction precision and recall.
This research is partially supported by National Natural Science Foundation of China (Grant No. 61632016, 61572336, 61472263, 61232006), the Postdoctoral scientific research funding of Jiangsu Province (No. 1501090B), the National Postdoctoral Funding (No. 2015M581859, 2016T90493) and the Natural Science Research Project of Jiangsu Higher Education Institution (No. 17KJA520003).
- 1.Agichtein, E., Gravano, L., Pavel, J., Sokolova, V., Voskoboynik, A.: Snowball: a prototype system for extracting relations from large text collections. In: ACM SIGMOD International Conference on Management of Data, p. 612 (2001)Google Scholar
- 2.Akbik, A.: KRAKEN: N-ary facts in open information extraction. In: Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction, pp. 52–56 (2012)Google Scholar
- 4.Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: International Conference on World Wide Web, pp. 355–366 (2013)Google Scholar
- 6.Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545 (2011)Google Scholar
- 9.Min, B., Shi, S., Grishman, R., Lin, C.Y.: Ensemble semantics for large-scale unsupervised relation extraction. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1027–1037 (2012)Google Scholar
- 12.Wu, F., Weld, D.S.: Open information extraction using Wikipedia. In: ACL 2010, Proceedings of the Meeting of the Association for Computational Linguistics, 11–16 July 2010, Uppsala, Sweden, pp. 118–127 (2010)Google Scholar
- 13.Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: TextRunner: open information extraction on the web. In: Human Language Technologies: the Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 25–26 (2007)Google Scholar