Advertisement

PORE: Positive-Only Relation Extraction from Wikipedia Text

  • Gang Wang
  • Yong Yu
  • Haiping Zhu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4825)

Abstract

Extracting semantic relations is of great importance for the creation of the Semantic Web content. It is of great benefit to semi-automatically extract relations from the free text of Wikipedia using the structured content readily available in it. Pattern matching methods that employ information redundancy cannot work well since there is not much redundancy information in Wikipedia, compared to the Web. Multi-class classification methods are not reasonable since no classification of relation types is available in Wikipedia. In this paper, we propose PORE (Positive-Only Relation Extraction), for relation extraction from Wikipedia text. The core algorithm B-POL extends a state-of-the-art positive-only learning algorithm using bootstrapping, strong negative identifi cation, and transductive inference to work with fewer positive training exam ples. We conducted experiments on several relations with different amount of training data. The experimental results show that B-POL can work effectively given only a small amount of positive training examples and it significantly out per forms the original positive learning approaches and a multi-class SVM. Furthermore, although PORE is applied in the context of Wiki pedia, the core algorithm B-POL is a general approach for Ontology Population and can be adapted to other domains.

Keywords

Relation Extraction Ontology Population Positive-Only Learning 

References

  1. 1.
    Ding, L., Finin, T.: Characterizing the Semantic Web on the Web. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Ramakrishnan, C., Kochut, K.J., Sheth, A.P.: A Framework for Schema-Driven Relationship Discovery from Unstructured text. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Sergey, B.: Extracting Patterns and Relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) The World Wide Web and Databases. LNCS, vol. 1590, Springer, Heidelberg (1999)Google Scholar
  4. 4.
    Agichtein, E., Gravano, L.: Snowball: Extracting Relations from Large Plain-text Collections. In: ACM DL 2000 (2000)Google Scholar
  5. 5.
    Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: COLING 2006 (2006)Google Scholar
  6. 6.
    Ravichandran, D. and Hovy, E.H. 2002. Learning Surface Text Patterns for a Question Answering System. ACL’02.Google Scholar
  7. 7.
    Boer, V., Someren, M., Wielinga, B.J.: Extracting Instances of Relations from Web Documents using Redundancy. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Cimiano, P., Handschuh, S., Staab, S.: Towards the Self-Annotating Web. In: WWW 2004 (2004)Google Scholar
  9. 9.
    Mori, J., Tsujishita, T., Matsuo, Y., Ishizuka, M.: Extracting Relations in Social Networks from Web using Similarity between Collective Contexts. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Tang, J., Hong, M., Li, J., Liang, B.: Tree-structured Conditional Random Fields for Semantic Annotation. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Chang, C.-C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines, Software (2001), available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  12. 12.
    Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Suder, R.: Semantic Wikipedia. In: WWW 2006 (2006)Google Scholar
  13. 13.
    Auer, S., Lehmann, J.: What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. In: ESWC 2007 (2007)Google Scholar
  14. 14.
    Yu, H., Zhai, C.X., Han, J.: Text Classification from Positive and Unlabeled Documents. In: CIKM 2003 (2003)Google Scholar
  15. 15.
    Li, X., Liu, B.: Learning to Classify Texts Using Positive and Unlabeled Data. In: IJCAI 2003 (2003)Google Scholar
  16. 16.
    Rocchio, J.: Relevance Feedback in Information Retrieval. In: Salton, G. (ed.) The smart retrieval system: experiments in automatic document processing (1971)Google Scholar
  17. 17.
    Denoyer, L.: The Wikipedia XML Corpus. SIGIR Forum (2006)Google Scholar
  18. 18.
    Suchanek, F.M., Ifrim, G., Weikum, G.: Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents. In: KDD 2006 (2006)Google Scholar
  19. 19.
    Chen, J., Ji, D., Tan, C.L., Niu, Z.: Relation Extraction Using Label Propagation Based Semi-supervised Learning. In: ACL 2006 (2006)Google Scholar
  20. 20.
    Zhang, Z.: Weakly-Supervised Relation Classification for Information Extraction. In: CIKM 2004 (2004)Google Scholar
  21. 21.
    Wang, T., Li, Y., Bontcheva, K., Cunningham, H., Wang, J.: Automatic Extraction of Hierarchical Relations from Text. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  22. 22.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (2005)Google Scholar
  23. 23.
    Schölkopf, B., et al.: New Support Vector Algorithms. Neural Computation (2000)Google Scholar
  24. 24.
    Wang, G., Zhang, H., Wang, H., Yu, Y.: Enhancing Relation Extraction by Eliciting Selectional Constraint Features from Wikipedia. In: NLDB 2007 (2007)Google Scholar
  25. 25.
    Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic extraction of semantic relation-ships for WordNet by means of pattern learning from Wikipedia. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, Springer, Heidelberg (2005)Google Scholar
  26. 26.
    Zhou, G.D., Su, J., Zhang, J., Zhang, M.: Exploring Various Knowledge in Relation Extraction. In: ACL 2005 (2005)Google Scholar
  27. 27.
    Schutz, A., Buielaar, P.: RelExt: A Tool for Relation Extraction from Text in Ontology Extension. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, Springer, Heidelberg (2005)CrossRefGoogle Scholar
  28. 28.
    Manevitz, L.M., Yousef, M.: One-Class SVMs for Document Classification. Journal of Machine Learning Research 2, 139–154 (2001)CrossRefGoogle Scholar
  29. 29.
  30. 30.
    Zhu, X.: Semi-supervised Learning Literature Survey. TR 1530, Univ. of Wisconsin, Madison (December 2006)Google Scholar
  31. 31.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia. In: WWW 2007 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Gang Wang
    • 1
  • Yong Yu
    • 1
  • Haiping Zhu
    • 1
  1. 1.Apex Data & Knowledge Management Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240China

Personalised recommendations