Abstract
In this paper we explain how to build a recognizing textual entailment (RTE) system which only uses semantic similarity measures based on WordNet. We show how the widely used WordNet-based semantic measures can be generalized to build sentence level semantic metrics in order to be used in both mono-lingual and cross-lingual textual entailment. We experiment with a wide variety of RTE datasets and evaluate the contribution of an algorithm which expands the RTE monolingual corpus. Results achieved with this method yielded significant statistical differences when predicting RTE test sets. We provide an efficiency analysis of these metrics drawing some conclusions about their practical utility in recognizing textual entailment. We also analyze the cross-lingual textual entailment task, we create a bilingual English–Spanish corpus, and propose a procedure to create a cross-lingual textual entailment corpus for any pair of languages. Finally, we show that the proposed method is enough to build an average score RTE system in both monolingual and cross-lingual textual entailment, that uses semantic information from WordNet as the only source of lexical-semantic knowledge.
Similar content being viewed by others
Notes
Intel Core 2 Duo 2.00 GHz, 4 GB RAM.
References
Bentivogli L, Dagan I, Dang H, Giampiccolo D, Magnini B (2009) The Fifth PASCAL RTE Challenge. In: Proceedings of the Text Analysis Conference, Gaithersburg, Maryland
Castillo J (2010) A semantic oriented approach to textual entailment using WordNet-based measures. MICAI 2010. LNCS, vol 6437. Springer, Heidelberg, pp 44–55
Herrera J, Penas A, Verdejo F (2005) Textual entailment recognition based on dependency analysis and WordNet. In: Proceedings of the 1st. PASCAL Recognising Textual Entailment Challenge Workshop
Ofoghi B, Yearwood J (2009) From lexical entailment to recognizing textual entailment using linguistic resources. In: ALTA Workshop (2009)
Castillo J (2010) A machine learning approach for recognizing textual entailment in Spanish. In: Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas (2010)
Castillo J, Cardenas M (2010) Using sentence semantic similarity based on WordNet in recognizing textual entailment. Iberamia 2010 (2nd edition of the Ibero-American Conference on Artificial Intelligence), In LNCS, vol 6433. Springer, Heidelberg, pp 366–375
Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet::similarity—measuring the relatedness of concepts. In: Proceedings of the AAAI-04
Patwardhan S, Pedersen T (2006) Using WordNet based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006
Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Castillo J (2010) Using machine translation systems to expand a corpus in textual entailment. In: Proceedings of the Icetal 2010, LNCS, vol 6233, pp 97–102
Resnik P (1995) Information content to evaluate semantic similarity in a taxonomy. In: Proceedings of IJCAI 1995, pp 448–453
Lin D (1997) An information-theoretic definition of similarity. In: Proceedings of Conference on Machine Learning, pp 296–304
Jiang J, Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the ROCLING X
Pirrò G., Seco N (2008) Design, implementation and evaluation of a new similarity metric combining feature and intrinsic information content. In: ODBASE 2008, Springer LNCS
Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: Proceedings of the 32nd ACL
Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. MIT Press, pp 265–283
Hirst G, St-Onge D (1998) Lexical chains as representations of context for the detection and correction of malapropisms. MIT Press, pp 305–332
Banerjee S, Pedersen T (2002) An adapted lesk algorithm for word sense disambiguation using WordNet. In: Proceeding of CICLING-02
Castillo J (2010) Recognizing textual entailment: experiments with machine learning algorithms and RTE corpora. In: Proceedings of Cicling 2010
Li Y, McLean D, Bandar, Z., O’Shea J, Crockett K (2006) Sentence similarity based on semantic nets and corpus statistics. In: IEEE TKDE, pp 1138–1150
Li Y, Bandar A, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. In: IEEE TKDE, pp 871–882
Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone. In: Proceedings of SIGDOC’86
Gelbukh A, Sidorov G, Han SY (2005) On Some optimization heuristics for lesk-like WSD algorithms. LNCS, vol 3513, Springer, pp 402–405
Kuncheva L (2010) Full-class set classification using the Hungarian algorithm. Int J Mach Learn Cybern 1:53–61
Bentivogli L, Clark P, Dagan I, Dang H, Giampiccolo D (2010) The sixth pascal recognizing textual entailment challenge. In: Proceedings of Textual Analysis Conference. NIST, Maryland, USA
Mehdad Y, Negri M, Federico M (2010) Towards cross-lingual textual entailment. In: Proceedings of the 11th NAACL HLT
Marlow J, Clough P, Recuero J, Artiles J (2008) Exploring the effects of language skills on multilingual web search. In: Proceedings of the 30th European Conference on IR Research (ECIR’08), Glasgow, UK. LNCS, vol 4956, pp 126–137. Springer, Heidelberg
Lilleng J, Tomassen S (2007) Cross-lingual information retrieval by feature vectors. NLDB 2007. LNCS, pp 229–239
Negri M, Mehdad Y (2010) Creating a bi-lingual entailment corpus through translations with mechanical Turk: $100 for a 10-day Rush. In: Proceedings of the 11th NAACL HLT
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Castillo, J.J. A WordNet-based semantic approach to textual entailment and cross-lingual textual entailment. Int. J. Mach. Learn. & Cyber. 2, 177–189 (2011). https://doi.org/10.1007/s13042-011-0026-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-011-0026-z