Relation Extraction from the Web Using Distant Supervision

  • Isabelle Augenstein
  • Diana Maynard
  • Fabio Ciravegna
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8876)


Extracting information from Web pages requires the ability to work at Web scale in terms of the number of documents, the number of domains and domain complexity. Recent approaches have used existing knowledge bases to learn to extract information with promising results. In this paper we propose the use of distant supervision for relation extraction from the Web. Distant supervision is a method which uses background information from the Linking Open Data cloud to automatically label sentences with relations to create training data for relation classifiers. Although the method is promising, existing approaches are still not suitable for Web extraction as they suffer from three main issues: data sparsity, noise and lexical ambiguity. Our approach reduces the impact of data sparsity by making entity recognition tools more robust across domains, as well as extracting relations across sentence boundaries. We reduce the noise caused by lexical ambiguity by employing statistical methods to strategically select training data. Our experiments show that using a more robust entity recognition approach and expanding the scope of relation extraction results in about 8 times the number of extractions, and that strategically selecting training data can result in an error reduction of about 30%.


Noun Phrase Relation Extraction Lexical Ambiguity Sentence Boundary Musical Artist 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alfonseca, E., Filippova, K., Delort, J.Y., Garrido, G.: Pattern Learning for Relation Extraction with a Hierarchical Topic Model. In: Proceedings of ACL (2012)Google Scholar
  2. 2.
    Augenstein, I.: Joint information extraction from the web using linked data. In: Janowicz, K., et al. (eds.) ISWC 2014, Part II. LNCS, vol. 8797, pp. 505–512. Springer, Heidelberg (2014)Google Scholar
  3. 3.
    Augenstein, I.: Seed Selection for Distantly Supervised Web-Based Relation Extraction. In: Proceedings of the COLING Workshop on Semantic Web and Information Extraction (2014)Google Scholar
  4. 4.
    Augenstein, I., Padó, S., Rudolph, S.: LODifier: Generating Linked Data from Unstructured Text. In: Proceedings of ESWC, pp. 210–224 (2012)Google Scholar
  5. 5.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge. In: Proceedings of ACM SIGMOD, pp. 1247–1250 (2008)Google Scholar
  6. 6.
    Bunescu, R.C., Mooney, R.J.: Learning to Extract Relations from the Web using Minimal Supervision. In: Proceedings of ACL (2007)Google Scholar
  7. 7.
    Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R., Mitchell, T.M.: Toward an Architecture for Never-Ending Language Learning. In: Proceedings of AAAI (2010)Google Scholar
  8. 8.
    Craven, M., Kumlien, J.: Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In: Proceedings of ISMB (1999)Google Scholar
  9. 9.
    Del Corro, L., Gemulla, R.: ClausIE: Clause-Based Open Information Extraction. In: Proceedings of WWW, pp. 355–366 (2013)Google Scholar
  10. 10.
    Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale Information Extraction in KnowItAll. In: Proceedings of WWW, pp. 100–110 (2004)Google Scholar
  11. 11.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of EMNLP, pp. 1535–1545 (2011)Google Scholar
  12. 12.
    Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: Proceedings of ACL (2005)Google Scholar
  13. 13.
    Gerber, D., Ngomo, A.C.N., Gerber, D., Ngomo, A.C.N., Unger, C., Bühmann, L., Lehmann, J., Ngomo, A.C.N., Gerber, D., Cimiano, P.: Extracting Multilingual Natural-Language Patterns for RDF Predicates. In: Proceedings of EKAW, pp. 87–96 (2012)Google Scholar
  14. 14.
    Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L.S., Weld, D.S.: Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. In: Proceedings of ACL, pp. 541–550 (2011)Google Scholar
  15. 15.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research 5, 361–397 (2004)Google Scholar
  16. 16.
    Mausam, S.M., Soderland, S., Bart, R., Etzioni, O.: Open Language Learning for Information Extraction. In: Proceedings of EMNLP-CoNLL, pp. 523–534 (2012)Google Scholar
  17. 17.
    Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant Supervision for Relation Extraction with an Incomplete Knowledge Base. In: Proceedings of HLT-NAACL, pp. 777–782 (2013)Google Scholar
  18. 18.
    Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of ACL, vol. 2, pp. 1003–1011 (2009)Google Scholar
  19. 19.
    Nakashole, U., Theobald, M., Weikum, G.: Scalable Knowledge Harvesting with High Precision and High Recall. In: Proceedings of WSDM, pp. 227–236 (2011)Google Scholar
  20. 20.
    Nguyen, T.V.T., Moschitti, A.: End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories. In: Proceedings of ACL (Short Papers), pp. 277–282 (2011)Google Scholar
  21. 21.
    Presutti, V., Draicchio, F., Gangemi, A.: Knowledge Extraction Based on Discourse Representation Theory and Linguistic Frames. In: Proceedings of EKAW, pp. 114–129 (2012)Google Scholar
  22. 22.
    Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 148–163. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation Extraction with Matrix Factorization and Universal Schemas. In: Proceedings of HLT-NAACL, pp. 74–84 (2013)Google Scholar
  24. 24.
    Roller, R., Stevenson, M.: Self-supervised relation extraction using UMLS. In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 116–127. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  25. 25.
    Roth, B., Klakow, D.: Combining Generative and Discriminative Model Scores for Distant Supervision. In: Proceedings of ACL-EMNLP, pp. 24–29 (2013)Google Scholar
  26. 26.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A Large Ontology from Wikipedia and WordNet. Web Semantics: Science, Services and Agents on the World Wide Web 6(3), 203–217 (2008)CrossRefGoogle Scholar
  27. 27.
    Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance Multi-label Learning for Relation Extraction. In: Proceedings of EMNLP-CoNLL, pp. 455–465 (2012)Google Scholar
  28. 28.
    Takamatsu, S., Sato, I., Nakagawa, H.: Reducing Wrong Labels in Distant Supervision for Relation Extraction. In: Proceedings of ACL, pp. 721–729 (2012)Google Scholar
  29. 29.
    Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo, A.C., Gerber, D., Cimiano, P.: Template-Based Question Answering over RDF Data. In: Proceedings of WWW, pp. 639–648 (2012)Google Scholar
  30. 30.
    Vlachos, A., Clark, S.: Application-Driven Relation Extraction with Limited Distant Supervision. In: Proceedings of the COLING Workshop on Information Discovery in Text (2014)Google Scholar
  31. 31.
    Vrandečić, D., Krötzsch, M.: Wikidata: A Free Collaborative Knowledge Base. Communications of the ACM (2014)Google Scholar
  32. 32.
    Wu, F., Weld, D.S.: Autonomously Semantifying Wikipedia. In: Proceedings of the CIKM, pp. 41–50 (2007)Google Scholar
  33. 33.
    Wu, F., Weld, D.S.: Open Information Extraction Using Wikipedia. In: Proceedings of ACL, pp. 118–127 (2010)Google Scholar
  34. 34.
    Xu, W., Hoffmann, R., Zhao, L., Grishman, R.: Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction. In: Proceedings of ACL, pp. 665–670 (2013)Google Scholar
  35. 35.
    Yao, L., Riedel, S., McCallum, A.: Collective Cross-document Relation Extraction Without Labelled Data. In: Proceedings of EMNLP, pp. 1013–1023 (2010)Google Scholar
  36. 36.
    Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: TextRunner: Open Information Extraction on the Web. In: Proceedings of HLT-NAACL: Demonstrations, pp. 25–26 (2007)Google Scholar
  37. 37.
    Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.R.: StatSnowball: a Statistical Approach to Extracting Entity Relationships. In: Proceedings of WWW, pp. 101–110 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Isabelle Augenstein
    • 1
  • Diana Maynard
    • 1
  • Fabio Ciravegna
    • 1
  1. 1.Department of Computer ScienceThe University of SheffieldUK

Personalised recommendations