Using the Web to Validate Lexico-Semantic Relations

  • Hernani Pereira Costa
  • Hugo Gonçalo Oliveira
  • Paulo Gomes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7026)


The evaluation of semantic relations acquired automatically from text is a challenging task, which generally ends up being done by humans. Despite less prone to errors, manual evaluation is hardly repeatable, time-consuming and sometimes subjective. In this paper, we evaluate relational triples automatically, exploiting popular similarity measures on the Web. After using these measures to quantify triples according to the co-occurrence of their arguments and textual patterns denoting their relation, some scores revealed to be highly correlated with the correction rate of the triples. The measures were also used to select correct triples in a set, with best F 1 scores around 96%.


Semantic Relation Textual Pattern Computational Linguistics Good Pattern Wrong Relation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bennett, C.H., Gacs, P., Gcs, P., Member, S., Li, M., Vitanyi, P.M.B., Zurek, W.H.: Information Distance. IEEE Transactions on Information Theory 44, 1407–1423 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Blohm, S., Cimiano, P., Stemle, E.: Harvesting relations from the web: quantifiying the impact of filtering functions. In: Proc. 22nd National Conf. on Artificial Intelligence, pp. 1316–1321. AAAI (2007)Google Scholar
  3. 3.
    Bollegala, D., Honma, T., Matsuo, Y., Ishizuka, M.: Mining for personal name aliases on the web. In: Proc. 17th International Conf. on the World Wide Web, pp. 1107–1108. ACM (2008)Google Scholar
  4. 4.
    Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proc. 16th International Conf. on the World Wide Web, pp. 757–766. ACM, New York (2007)Google Scholar
  5. 5.
    Brank, J., Grobelnik, M., Mladenić, D.: A survey of ontology evaluation techniques. In: Proc. Conf. on Data Mining and Data Warehouses, SIKDD (2005)Google Scholar
  6. 6.
    Cederberg, S., Widdows, D.: Using LSA and Noun Coordination Information to Improve the Precision and Recall of Automatic Hyponymy Extraction. In: Proc. Conf. on Computational Natural Language Learning, pp. 111–118 (2003)Google Scholar
  7. 7.
    Cilibrasi, R., Vitanyi, P.M.B.: Normalized Web Distance and Word Similarity. Computing Research Repository, ArXiv e-prints (2009)Google Scholar
  8. 8.
    Cimiano, P., Staab, S.: Learning by googling. SIGKDD Explorations Newsletter 6(2), 24–33 (2004)CrossRefGoogle Scholar
  9. 9.
    Cimiano, P., Wenderoth, J.: Automatic Acquisition of Ranked Qualia Structures from the Web. In: Proc. 45th Annual Meeting of the Association of Computational Linguistics, pp. 888–895. ACL, Prague (2007)Google Scholar
  10. 10.
    Costa, R.P., Seco, N.: Hyponymy extraction and web search behavior analysis based on query reformulation. In: Geffner, H., Prada, R., Machado Alexandre, I., David, N. (eds.) IBERAMIA 2008. LNCS (LNAI), vol. 5290, pp. 332–341. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Downey, D., Etzioni, O., Soderland, S.: A probabilistic model of redundancy in information extraction. In: Proc. 19th International Joint Conf. on Artificial Intelligence, pp. 1034–1041. Morgan Kaufmann Publishers Inc., San Francisco (2005)Google Scholar
  12. 12.
    Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence 165(1), 91–134 (2005)CrossRefGoogle Scholar
  13. 13.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). MIT (May 1998)Google Scholar
  14. 14.
    Gracia, J.L., Mena, E.: Web-Based Measure of Semantic Relatedness. In: Bailey, J., Maier, D., Schewe, K.-D., Thalheim, B., Wang, X.S. (eds.) WISE 2008. LNCS, vol. 5175, pp. 136–150. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  15. 15.
    Harris, Z.: Distributional structure. In: Papers in Structural and Transformational Linguistics, pp. 775–794. D. Reidel Publishing Comp., Dordrecht (1970)CrossRefGoogle Scholar
  16. 16.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. 14th Conf. on Computational Linguistics, pp. 539–545. ACL, Morristown (1992)CrossRefGoogle Scholar
  17. 17.
    Lenat, D.: CYC: A Large-Scale Investment in Knowledge Infrastructure. Communications of the ACM 38, 33–38 (1995)CrossRefGoogle Scholar
  18. 18.
    Magnini, B., Negri, M., Prevete, R., Tanev, H.: Is It the Right Answer? Exploiting Web Redundancy for Answer Validation. In: Proc. 40th Annual Meeting of the Association for Computational Linguistics, pp. 425–432 (2002)Google Scholar
  19. 19.
    Oliveira, P.C.: Probabilistic Reasoning in the Semantic Web using Markov Logic, pp. 67–73. University of Coimbra, Faculty of Sciences and Technology, Department of Informatics Engineering (July 2009)Google Scholar
  20. 20.
    Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: Proc. 21st International Conf. on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pp. 113–120. ACL, Sydney (2006)Google Scholar
  21. 21.
    Turney, P.D.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  22. 22.
    Wu, F., Weld, D.S.: Open Information Extraction Using Wikipedia. In: Proc. 48th Annual Meeting of the Association for Computational Linguistics, pp. 118–127. ACL, Uppsala (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hernani Pereira Costa
    • 1
  • Hugo Gonçalo Oliveira
    • 1
  • Paulo Gomes
    • 1
  1. 1.Cognitive and Media Systems Group, CISUCUniversity of CoimbraPortugal

Personalised recommendations