Advertisement

Knowledge and Information Systems

, Volume 17, Issue 1, pp 17–33 | Cite as

Self-supervised relation extraction from the Web

  • Benjamin Rozenfeld
  • Ronen FeldmanEmail author
Regular Paper

Abstract

Web extraction systems attempt to use the immense amount of unlabeled text in the Web in order to create large lists of entities and relations. Unlike traditional Information Extraction methods, the Web extraction systems do not label every mention of the target entity or relation, instead focusing on extracting as many different instances as possible while keeping the precision of the resulting list reasonably high. SRES is a self-supervised Web relation extraction system that learns powerful extraction patterns from unlabeled text, using short descriptions of the target relations and their attributes. SRES automatically generates the training data needed for its pattern-learning component. The performance of SRES is further enhanced by classifying its output instances using the properties of the instances and the patterns. The features we use for classification and the trained classification model are independent from the target relation, which we demonstrate in a series of experiments. We also compare the performance of SRES to the performance of the state-of-the-art KnowItAll system, and to the performance of its pattern learning component, which learns simpler pattern language than SRES.

Keywords

Web extraction Text mining Pattern learning Unsupervised learning Relationship extraction 

References

  1. 1.
    Agichtein E, Gravano L (2000) Snowball: extracting relations from large plain-text collections. In: Proceedings of the 5th ACM international conference on digital libraries (DL)Google Scholar
  2. 2.
    Brin S (1998) Extracting patterns and relations from the World Wide Web. In: WebDB workshop at 6th international conference on extending database technology, EDBT’98, ValenciaGoogle Scholar
  3. 3.
    Chen J, Ji D et al (2005) Unsupervised feature selection for relation extraction IJCNLP-05, Jeju IslandGoogle Scholar
  4. 4.
    Ciravegna F (2001) Adaptive information extraction from text by rule induction and generalization. In: Proceedings of the 17th IJCAI, SeattleGoogle Scholar
  5. 5.
    Cowie J and Lehnert W (1996). Information extraction. Commun Assoc Comput Mach 39(1): 80–91 Google Scholar
  6. 6.
    Downey D, Etzioni O et al (2004) Learning text patterns for web information extraction and assessment (extended version). Technical Report UW-CSE-04-05-01Google Scholar
  7. 7.
    Etzioni O and Cafarella M et al (2005). Unsupervised named-entity extraction from the Web: an experimental study. Artif Intell 165(1): 91–134 CrossRefGoogle Scholar
  8. 8.
    Feldman R and Rosenfeld B et al (2006). TEG—a hybrid approach to information extraction. Knowl Inf Syst 9(1): 1–18 CrossRefGoogle Scholar
  9. 9.
    Freitag D (1998) Machine learning for information extraction in informal domains. Computer Science Department, Carnegie Mellon University, Pittsburgh p 188Google Scholar
  10. 10.
    Freitag D, McCallum AK (1999) Information extraction with HMMs and shrinkage. In: Proceedings of the AAAI-99 workshop on machine learning for information extractionGoogle Scholar
  11. 11.
    Genkin A, Lewis DD et al (2004) Large-scale bayesian logistic regression for text categorization. DIMACS, New Brunswick pp 1–41Google Scholar
  12. 12.
    Grishman R (1996) The role of syntax in information extraction. In: Advances in Text Processing: Tipster Program Phase II. Morgan KaufmannGoogle Scholar
  13. 13.
    Grishman R (1997) Information extraction: techniques and challenges. SCIE: 10–27Google Scholar
  14. 14.
    Hasegawa T, Sekine S et al (2004) Discovering relations among named entities from large corpora. ACL 2004Google Scholar
  15. 15.
    Kushmerick N and Weld DS et al (1997). Wrapper induction for information extraction. IJCAI 97: 729–737 Google Scholar
  16. 16.
    Li Z and Ng WK et al (2005). Web data extraction based on structural similarity. Knowl Inf Syst 8(4): 438–461 CrossRefMathSciNetGoogle Scholar
  17. 17.
    Miller G (1990). WordNet: an on-line lexical database. Int J Lexicogr 3(4): 235–312 CrossRefGoogle Scholar
  18. 18.
    Ravichandran D, Hovy E (2002) Learning surface text patterns for a question answering system. 40th ACL ConferenceGoogle Scholar
  19. 19.
    Riloff E (1993) Automatically constructing a dictionary for information extraction tasks. AAAI-93Google Scholar
  20. 20.
    Riloff E, Jones R (1999) Learning dictionaries for information extraction by multi-level boot-strapping. AAAI-99Google Scholar
  21. 21.
    Soderland S (1999). Learning information extraction rules for semi-structured and free text. Mach Learn 34(1–3): 233–272 zbMATHCrossRefGoogle Scholar
  22. 22.
    Wong T-L, Lam W (2007) Learning to extract and summarize hot item features from multiple auction web sites. Knowl Inf SystGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  1. 1.Information Systems, HU School of Business AdministrationHebrew UniversityJerusalemIsrael

Personalised recommendations