Advertisement

Weakly Supervised, Data-Driven Acquisition of Rules for Open Information Extraction

  • Fabrizio GottiEmail author
  • Philippe LanglaisEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11489)

Abstract

We propose a way to acquire rules for Open Information Extraction, based on lemma sequence patterns (including potential typographical symbols) linking two named entities in a sentence. Rule acquisition is data-driven and requires little supervision. Given an arbitrary relation, we identify, in a large corpus, pairs of entities that are linked by the relation and then gather, score and rank other phrases that link the same entity pairs. We experimented with 81 relations and acquired 20 extraction rules for each by mining ClueWeb12. We devised a semi-automatic evaluation protocol to measure recall and precision and found them to be at most 79.9% and 62.4% respectively. Verbal patterns are of better quality than non-verbal ones, although the latter achieve a maximum recall of 76.5%. The strategy proposed does not necessitate expensive resources or time-consuming handcrafted resources, but does require a large amount of text.

Keywords

Open Information Extraction Weak supervision Natural Language Processing 

References

  1. 1.
    Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, India, pp. 2670–2676 (2007)Google Scholar
  2. 2.
    Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), p. 3, July 2010Google Scholar
  3. 3.
    Carlson, A., Betteridge, J., Wang, R.C., Hruschka, Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM 2010, pp. 101–110. ACM, New York (2010)Google Scholar
  4. 4.
    Cetto, M., Niklaus, C., Freitas, A., Handschuh, S.: Graphene: semantically-linked propositions in open information extraction. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2300–2311. ACL (2018)Google Scholar
  5. 5.
    Del Corro, L., Abujabal, A., Gemulla, R., Weikum, G.: FINET: context-aware fine-grained named entity typing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 868–878. ACL (2015)Google Scholar
  6. 6.
    Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, pp. 355–366. ACM, New York (2013)Google Scholar
  7. 7.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 1535–1545 (2011)Google Scholar
  8. 8.
    Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 1156–1165. ACM, New York (2014)Google Scholar
  9. 9.
    Gashteovski, K., Gemulla, R., Del Corro, L.: MinIE: minimizing facts in open information extraction. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2630–2640. ACL (2017)Google Scholar
  10. 10.
    Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., Weld, D.S.: Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies - Volume 1, HLT 2011, pp. 541–550. ACL (2011)Google Scholar
  11. 11.
    Léchelle, W., Gotti, F., Langlais, P.: WiRe57 : A Fine-Grained Benchmark for Open Information Extraction. arXiv:1809.08962 [cs], September 2018
  12. 12.
    Mausam Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Joint Conference on Empirical Methods in NLP and Computational Natural Language Learning, pp. 523–534 (2012)Google Scholar
  13. 13.
    Mishra, B.D., Tandon, N., Clark, P.: Domain-targeted, high precision knowledge extraction. Trans. ACL 5, 233–246 (2017)Google Scholar
  14. 14.
    Nakashole, N., Weikum, G., Suchanek, F.: PATTY: a taxonomy of relational patterns with semantic types. In: Joint Conference on Empirical Methods in NLP and Computational Natural Language Learning, pp. 1135–1145 (2012)Google Scholar
  15. 15.
    Pal, H., Mausam: demonyms and compound relational nouns in nominal open IE. In: AKBC@NAACL-HLT (2016)Google Scholar
  16. 16.
    Saha, S., Pal, H., Mausam: bootstrapping for numerical open IE. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 317–323. ACL, Vancouver (2017)Google Scholar
  17. 17.
    Stanovsky, G., Dagan, I.: Creating a large benchmark for open information extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2300–2305. ACL, Austin (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.RALIUniversité de MontréalMontrealCanada

Personalised recommendations