Language Resources and Evaluation

, Volume 50, Issue 1, pp 95–124 | Cite as

SICK through the SemEval glasses. Lesson learned from the evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment

  • Luisa Bentivogli
  • Raffaella Bernardi
  • Marco Marelli
  • Stefano Menini
  • Marco Baroni
  • Roberto Zamparelli
Original Paper

Abstract

This paper is an extended description of SemEval-2014 Task 1, the task on the evaluation of Compositional Distributional Semantics Models on full sentences. Systems participating in the task were presented with pairs of sentences and were evaluated on their ability to predict human judgments on (1) semantic relatedness and (2) entailment. Training and testing data were subsets of the SICK (Sentences Involving Compositional Knowledge) data set. SICK was developed with the aim of providing a proper benchmark to evaluate compositional semantic systems, though task participation was open to systems based on any approach. Taking advantage of the SemEval experience, in this paper we analyze the SICK data set, in order to evaluate the extent to which it meets its design goal and to shed light on the linguistic phenomena that are still challenging for state-of-the-art computational semantic systems. Qualitative and quantitative error analyses show that many systems are quite sensitive to changes in the proportion of sentence pair types, and degrade in the presence of additional lexico-syntactic complexities which do not affect human judgements. More compositional systems seem to perform better when the task proportions are changed, but the effect needs further confirmation.

Keywords

Compositionality Computational semantics Distributional semantics models 

References

  1. Agirre, E., Cer, D., Diab, M., & Gonzalez-Agirre, A. (2012). SemEval-2012 Task 6: A pilot on semantic textual similarity. In Proceedings of SemEval 2012: The Sixth International Workshop on Semantic Evaluation.Google Scholar
  2. Alves, A. O., Ferrugento, A., Lorenço, M., & Rodrigues, F. (2014). ASAP: Automatic semantic alignment for phrases. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  3. Baroni, M., & Zamparelli, R. (2010). Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In Proceedings of EMNLP.Google Scholar
  4. Beltagy, I., Roller, S., Boleda, G., Erk, K., & Mooney, R. J. (2014). UTexas: Natural language semantics using distributional semantics and probablisitc logic. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  5. Bentivogli, L., Dagan, I., Dang, H. T., Giampiccolo, D., & Magnini, B. (2009). The fifth PASCAL recognizing textual entailment challenge. In Proceedings of the Text Analysis Conference.Google Scholar
  6. Bestgen, Y. (2014). CECL: A new baseline and a non-compositional approach for the SICK benchmark. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  7. Biçici, E., & Way, A. (2014). RTM-DCU: Referential translation machines for semantic similarity. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  8. Bjerva, J., Bos, J., van der Goot, R., & Nissim, M. (2014). The meaning factory: Formal semantics for recognizing textual entailment and determining semantic similarity. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  9. Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1), 13–47.CrossRefGoogle Scholar
  10. Chen, D. L., & Dolan, W. B. (2011). Collecting highly parallel data for paraphrase evaluation. In Proceedings of ACL.Google Scholar
  11. Dagan, I., Glickman, O., & Magnini, B. (2006). The PASCAL recognising textual entailment challenge. In J. Quiñonero-Candela, I. Dagan, B. Magnini & F. d’Alché–Buc (Eds.), Machine learning challenges. Evaluating predictive uncertainty, visual object classification, and recognising textual entailment (pp. 177–190). Heidelberg: Springer.Google Scholar
  12. Ferrone, L., & Zanzotto, F. M. (2014). haLF: Comparing a pure CDSM approach and a standard ML system for RTE. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  13. Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of IJCAI.Google Scholar
  14. Grefenstette, E., & Sadrzadeh, M. (2011). Experimental support for a categorical compositional distributional model of meaning. In Proceedings of EMNLP.Google Scholar
  15. Gupta, R., Hannah Bechara, I. E. M., & Orasǎn, C. (2014). UoW: NLP techniques developed at the university of wolverhampton for semantic similarity and textual entailment. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  16. Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47, 853–899.Google Scholar
  17. Jimenez, S., Duenas, G., Baquero, J., & Gelbukh, A. (2014). UNAL-NLP: Combining soft cardinality features for semantic textual similarity, relatedness and entailment. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  18. Lai, A., & Hockenmaier, J. (2014). Illinois-LH: A denotational and distributional approach to semantics. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  19. León, S., Vilarino, D., Pinto, D., Tovar, M., Beltrán, B. (2014). BUAP: Evaluating compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  20. Lien, E., & Kouylekov, M. (2014). UIO-Lien: Entailment recognition using minimal recursion semantics. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  21. Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., & Zamparelli, R. (2014a). SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  22. Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., & Zamparelli, R. (2014b). A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of LREC.Google Scholar
  23. Mitchell, J., & Lapata, M. (2008) .Vector-based models of semantic composition. In Proceedings of ACL.Google Scholar
  24. Mitchell, J., & Lapata, M. (2010). Composition in distributional models of semantics. Cognitive Science, 34(8), 1388–1429.CrossRefGoogle Scholar
  25. Pagin, P., & Westerståhl, D. (2010). Compositionality i: Definitions and variants. Philosophy Compass, 5(3), 250–264. doi:10.1111/j.1747-9991.2009.00228.x.CrossRefGoogle Scholar
  26. Proisl, T., & Evert, S. (2014). SemantiKLUE: Robust semantic similarity at multiple levels using maximum weight matching. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  27. Socher, R., Huval, B., Manning, C., & Ng, A. (2012). Semantic compositionality through recursive matrix-vector spaces. In Proceedings of EMNLP.Google Scholar
  28. Vo, A. N. P., Popescu, O., & Caselli, T. (2014). FBK-TR: SVM for semantic relatedness and corpus patterns for RTE. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar
  29. Zhao, J., Zhu, T. T., & Lan, M. (2014). ECNU: One stone two birds: Ensemble of heterogenous measures for semantic relatedness and textual entailment. In Proceedings of SemEval 2014: International Workshop on Semantic Evaluation.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Luisa Bentivogli
    • 1
  • Raffaella Bernardi
    • 2
  • Marco Marelli
    • 2
  • Stefano Menini
    • 1
  • Marco Baroni
    • 2
  • Roberto Zamparelli
    • 2
  1. 1.Fondazione Bruno Kessler (FBK)PovoItaly
  2. 2.University of TrentoRoveretoItaly

Personalised recommendations