Mining the Web Through Verbs: A Case Study

  • Peyman Sazedj
  • H. Sofia Pinto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4519)


ining non-taxonomic relations is an important part of the Semantic Web puzzle. Building on the work of the semantic annotation community, we address the problem of extracting relation instances among annotated entities. In particular, we analyze the problem of verb-based relation instantiation in some detail and present a heuristic domain independent approach, based on verb chunking and entity clustering, which doesn’t require parsing. We also address the problem of mapping linguistic tuples to relations from the ontology. A case study conducted within the biography domain demonstrates the validity of our results in contrast to related work, whilst examining the complexity of the extraction task and the feasibility of verb-based extraction in general.


  1. 1.
    Sazedj, P., Pinto, H.S.: FactBox - a Framework for Instantiating Ontological Relations from Text. In: Workshop on Web Content Mining with Human Language Technologies at ISWC, Athens, Georgia, USA (November 2006)Google Scholar
  2. 2.
    Marsh, E., Perzanowski, D.: MUC-7 Evaluation of IE Technology: Overview of Results,
  3. 3.
    Lin, D., Pantel, P.: DIRT - Discovery of Inference Rules from Text. In: Proceedings of KDD, pp. 323–328 (2001)Google Scholar
  4. 4.
    Ravichandran, D., Hovy, E.H.: Learning surface text patterns for a Question Answering System. In: ACL, pp. 41–47 (2002)Google Scholar
  5. 5.
    Schutz, A., Buitelaar, P.: RelExt: A Tool for Relation Extraction from Text in Ontology Extension. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 593–606. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Maedche, A., Staab, S.: Discovering Conceptual Relations from Text. In: Proceedings of ECAI, pp. 321–325 (2000)Google Scholar
  7. 7.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING, pp. 539–545 (1992)Google Scholar
  8. 8.
    Ciaramita, M., et al.: Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology. In: Proceedings of IJCAI, pp. 659–664 (2005)Google Scholar
  9. 9.
    Specia, L., Motta, E.: A Hybrid Approach for Relation Extraction Aimed at the Semantic Web. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds.) FQAS 2006. LNCS (LNAI), vol. 4027, pp. 564–576. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Snow, R., Jurafsky, D., Ng, A.Y.: Learning Syntactic Patterns for Automatic Hypernym Discovery. In: NIPS (2004)Google Scholar
  11. 11.
    Brin, S.: Extracting Patterns and Relations from the World Wide Web. In: WebDB, pp. 172–183 (1998)Google Scholar
  12. 12.
    Agichtein, E.: Extracting Relations From Large Text Collections. Ph.D. thesis, Columbia University (2005)Google Scholar
  13. 13.
    Ciravegna, F., Chapman, S., Dingli, A., Wilks, Y.: Learning to harvest information for the semantic web. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R., et al. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 312–326. Springer, Heidelberg (2004)Google Scholar
  14. 14.
    Zelenko, D., Aone, C., Richardella, A.: Kernel Methods for Relation Extraction. Journal of Machine Learning Research 3, 1083–1106 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Culotta, A., Sorensen, J.S.: Dependency tree kernels for relation extraction. In: ACL, pp. 423–429 (2004)Google Scholar
  16. 16.
    Suchanek, F.M., Ifrim, G., Weikum, G.: Combining linguistic and statistical analysis to extract relations from web documents. In: KDD, pp. 712–717 (2006)Google Scholar
  17. 17.
    Miller, S., et al.: A Novel Use of Statistical Parsing to Extract Information from Text. In: ANLP, pp. 226–233 (2000)Google Scholar
  18. 18.
    Moldovan, D.I., Rus, V.: Logic Form Transformation of WordNet and its Applicability to Question Answering. In: ACL, pp. 394–401 (2001)Google Scholar
  19. 19.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)CrossRefGoogle Scholar
  20. 20.
    Cimiano, P., Hartung, M., Ratsch, E.: Finding the Appropriate Generalization Level for Binary Relations Extracted from the Genia Corpus. In: LREC, pp. 161–169 (2006)Google Scholar
  21. 21.
    Cunningham, H., Gaizauskas, R.J., Wilks, Y.: A General Architecture for Language Engineering (GATE) - a new approach to Language Engineering R&D. Technical Report, Dept. of Computer Science, University of Sheffield (1996)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Peyman Sazedj
    • 1
  • H. Sofia Pinto
    • 1
  1. 1.Inesc-ID Rua Alves Redol 9, Apartado 13069 1000-029 LisboaPortugal

Personalised recommendations