Advertisement

Multi-layer and Co-learning Systems for Semantic Textual Similarity, Semantic Relatedness and Recognizing Textual Entailment

  • Ngoc Phuoc An VoEmail author
  • Octavian Popescu
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 914)

Abstract

Similarity plays a central role in language understanding process. However, it is always difficult to precisely define on which type of data and what similarity metrics we can apply in order to assess the similarity of two texts. Previously, we proposed a four-layer system [69] that takes into account not only string and semantic word similarities, but also word alignment and sentence structure. Our system achieved new state of the art or competitive result to state of the art on different test corpora for the Semantic Textual Similarity (STS) task from 2012 to 2015. The multi-layer architecture helps to deal with heterogeneous corpora which may not have been generated by the same distribution nor same domain. In this extended paper, we looked into the correlation between the two semantic processing tasks Semantic Relatedness (a more broad task of STS) and Recognizing Textual Entailment (RTE) to construct a co-learning model where we integrated our multi-layer architecture and Corpus Patterns technique to ultimately improve the performances of both tasks.

Keywords

Machine learning Natural Language Processing Semantic Textual Similarity Semantic Relatedness Recognizing Textual Entailment Corpus Patterns 

References

  1. 1.
    Agirre, E., et al.: SemEval-2015 task 2: semantic textual similarity, English, Spanish and pilot on interpretability. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). Association for Computational Linguistics, Denver (2015)Google Scholar
  2. 2.
    Agirre, E., et al.: Semeval-2014 task 10: multilingual semantic textual similarity. In: SemEval 2014, p. 81 (2014)Google Scholar
  3. 3.
    Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: SEM 2013 shared task: semantic textual similarity, including a pilot on typed-similarity. In: In* SEM 2013: The Second Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics. Citeseer (2013)Google Scholar
  4. 4.
    Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics (2012)Google Scholar
  5. 5.
    Allison, L., Dix, T.I.: A bit-string longest-common-subsequence algorithm. Inf. Process. Lett. 23(5), 305–310 (1986)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Banerjee, S., Lavie, A.: Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)Google Scholar
  7. 7.
    Bär, D., Biemann, C., Gurevych, I., Zesch, T.: UKP: Computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 435–440. Association for Computational Linguistics (2012)Google Scholar
  8. 8.
    Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL, vol. 1, pp. 238–247 (2014)Google Scholar
  9. 9.
    Baroni, M., Zamparelli, R.: Nouns are vectors, adjectives are matrices: representing adjective-noun constructions in semantic space. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 1183–1193. Association for Computational Linguistics, Stroudsburg (2010). http://dl.acm.org/citation.cfm?id=1870658.1870773
  10. 10.
    Barrón-Cedeno, A., Rosso, P., Agirre, E., Labaka, G.: Plagiarism detection across distant language pairs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 37–45. Association for Computational Linguistics (2010)Google Scholar
  11. 11.
    Berant, J., Dagan, I., Goldberger, J.: Learning entailment relations by global graph structure optimization. Comput. Linguist. 38(1), 73–111 (2012)CrossRefGoogle Scholar
  12. 12.
    Blacoe, W., Lapata, M.: A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 546–556. Association for Computational Linguistics, Jeju Island (2012). http://www.aclweb.org/anthology/D12-1050
  13. 13.
    Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of Compression and Complexity of Sequences 1997, pp. 21–29. IEEE (1997)Google Scholar
  14. 14.
    Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006).  https://doi.org/10.1162/coli.2006.32.1.13zbMATHCrossRefGoogle Scholar
  15. 15.
    Clark, S., Pulman, S.: Combining symbolic and distributional models of meaning. In: AAAI Spring Symposium: Quantum Interaction, pp. 52–55 (2007)Google Scholar
  16. 16.
    Dolan, B., Brockett, C., Quirk, C.: Microsoft research paraphrase corpus (2005). Accessed 29 Mar 2008Google Scholar
  17. 17.
    Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)Google Scholar
  18. 18.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI, vol. 7, pp. 1606–1611 (2007)Google Scholar
  19. 19.
    Galitsky, B.: Machine learning of syntactic parse trees for search and classification of text. Eng. Appl. Artif. Intell. 26(3), 1072–1091 (2013)CrossRefGoogle Scholar
  20. 20.
    Glickman, O., Dagan, I.: Acquiring lexical paraphrases from a single corpus. In: Recent Advances in Natural Language Processing III, pp. 81–90. John Benjamins Publishing, Amsterdam (2004)CrossRefGoogle Scholar
  21. 21.
    Guevara, E.: A regression model of adjective-noun compositionality in distributional semantics. In: Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics, GEMS 2010, pp. 33–37. Association for Computational Linguistics, Stroudsburg (2010). http://dl.acm.org/citation.cfm?id=1870516.1870521
  22. 22.
    Guo, W., Diab, M.: Modeling sentences in the latent space. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pp. 864–872. Association for Computational Linguistics (2012)Google Scholar
  23. 23.
    Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)Google Scholar
  24. 24.
    Han, L., Kashyap, A., Finin, T., Mayfield, J., Weese, J.: UMBC ebiquity-core: semantic textual similarity systems. In: In* SEM 2013: The Second Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics (2013)Google Scholar
  25. 25.
    Han, L., Martineau, J., Cheng, D., Thomas, C.: Samsung: align-and-differentiate approach to semantic textual similarity. In: SemEval-2015, p. 172 (2015)Google Scholar
  26. 26.
    Hänig, C., Remus, R., De La Puente, X.: ExB themis: extensive feature extraction from word alignments for semantic textual similarity. In: SemEval-2015, p. 264 (2015)Google Scholar
  27. 27.
    Hanks, P., Pustejovsky, J.: A pattern dictionary for natural language processing. Revue française de linguistique appliquée 10(2), 63–82 (2005)Google Scholar
  28. 28.
    Harris, Z.S.: Mathematical Structures of Language. Interscience Publishers, Geneva (1968)Google Scholar
  29. 29.
    Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: WordNet: An Electronic Lexical Database, vol. 305, pp. 305–332 (1998)Google Scholar
  30. 30.
    Jezek, E., Hanks, P.: What lexical sets tell us about conceptual categories. Lexis 4(7), 22 (2010)Google Scholar
  31. 31.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008 (1997)Google Scholar
  32. 32.
    Kawahara, D., Peterson, D.W., Popescu, O., Palmer, M.: Inducing example-based semantic frames from a massive amount of verb uses. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (2014)Google Scholar
  33. 33.
    Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 423–430. Association for Computational Linguistics (2003)Google Scholar
  34. 34.
    Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)CrossRefGoogle Scholar
  35. 35.
    Leacock, C., Miller, G.A., Chodorow, M.: Using corpus statistics and wordnet relations for sense identification. Comput. Linguist. 24(1), 147–165 (1998)Google Scholar
  36. 36.
    Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98, pp. 296–304 (1998)Google Scholar
  37. 37.
    Lyon, C., Malcolm, J., Dickerson, B.: Detecting short passages of similar text in large document collections. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp. 118–125 (2001)Google Scholar
  38. 38.
    Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R.: A SICK cure for the evaluation of compositional distributional semantic models. In: Proceedings of LREC 2014, Reykjavik (Iceland): ELRA (2014)Google Scholar
  39. 39.
    Marsi, E., Moen, H., Bungum, L., Sizov, G., Gambäck, B., Lynum, A.: NTNU-CORE: combining strong features for semantic similarity. In: In* SEM 2013: The Second Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics (2013)Google Scholar
  40. 40.
    Meadow, C.T.: Text Information Retrieval Systems. Academic Press, Inc., Cambridge (1992)Google Scholar
  41. 41.
    Mihalcea, R.: Semcor semantically tagged corpus. Unpublished manuscript (1998)Google Scholar
  42. 42.
    Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)Google Scholar
  43. 43.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  44. 44.
    Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)CrossRefGoogle Scholar
  45. 45.
    Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006).  https://doi.org/10.1007/11871842_32CrossRefGoogle Scholar
  46. 46.
    Niles, I., Pease, A.: Towards a standard upper ontology. In: Proceedings of the International Conference on Formal Ontology in Information Systems-Volume 2001, pp. 2–9. ACM (2001)Google Scholar
  47. 47.
    Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  48. 48.
    Pilehvar, M.T., Jurgens, D., Navigli, R.: Align, disambiguate and walk: a unified approach for measuring semantic similarity. In: ACL, vol. 1, pp. 1341–1351 (2013)Google Scholar
  49. 49.
    Plotkin, G.D.: A note on inductive generalization. Mach. Intell. 5(1), 153–163 (1970)MathSciNetzbMATHGoogle Scholar
  50. 50.
    Popescu, O.: Learning corpus pattern with finite state automata. In: Proceedings of the ICSC 2013 (2013)Google Scholar
  51. 51.
    Popescu, O., Cabrio, E., Magnini, B.: Textual entailment using chain clarifying relationships. In: Proceedings of the IJCAI Workshop Learning by Reasoning and its Applications in Intelligent Question-Answering (2011)Google Scholar
  52. 52.
    Popescu, O., Cabrio, E., Magnini, B.: Textual entailment using chain clarifying relationships. In: Proceedings of FAM-LbR/KRAQ’11, ijcai-11 (2011)Google Scholar
  53. 53.
    Popescu, O., Magnini, B.: Sense discriminative patterns for word sense disambiguation. In: SCAR Workshop, NODALIDA (2007)Google Scholar
  54. 54.
    Popescu, O., Palmer, M., Hacks, P.: Mapping CPA onto ontonotes. In: Proceedings of the 9th International Conference on Language Resources and Evaluation - LREC14 (to appear)Google Scholar
  55. 55.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)Google Scholar
  56. 56.
    Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces (2006)Google Scholar
  57. 57.
    Salton, G., McGill, M.J.: Introduction to modern information retrieval (1983)Google Scholar
  58. 58.
    Šarić, F., Glavaš, G., Karan, M., Šnajder, J., Bašić, B.D.: Takelab: systems for measuring semantic text similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 441–448. Association for Computational Linguistics (2012)Google Scholar
  59. 59.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK, vol. 12, pp. 44–49 (1994)Google Scholar
  60. 60.
    Shareghi, E., Bergler, S.: CLaC-CORE: exhaustive feature combination for measuring textual similarity. In: In* SEM 2013: The Second Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics (2013)Google Scholar
  61. 61.
    Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)Google Scholar
  62. 62.
    Sultan, M.A., Bethard, S., Sumner, T.: Back to basics for monolingual alignment: exploiting word similarity and contextual evidence. Trans. Assoc. Comput. Linguist. 2, 219–230 (2014)CrossRefGoogle Scholar
  63. 63.
    Sultan, M.A., Bethard, S., Sumner, T.: Dls@cu: Sentence similarity from word alignment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), p. 241 (2014)Google Scholar
  64. 64.
    Sultan, M.A., Bethard, S., Sumner, T.: Dls@cu: sentence similarity from word alignment and semantic vector composition. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 148–153 (2015)Google Scholar
  65. 65.
    Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers to non-factoid questions from web collections. Comput. Linguist. 37(2), 351–383 (2011)CrossRefGoogle Scholar
  66. 66.
    Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  67. 67.
    Vo, N.P.A., Caselli, T., Popescu, O.: FBK-TR: applying SVM with multiple linguistic features for cross-level semantic similarity. In: SemEval 2014, p. 284 (2014)Google Scholar
  68. 68.
    Vo, N.P.A., Popescu, O.: Corpora for learning the mutual relationship between semantic relatedness and textual entailment. In: The 10th International Conference on Language Resources and Evaluation (LREC) (2016)Google Scholar
  69. 69.
    Vo, N.P.A., Popescu, O.: A multi-layer system for semantic textual similarity. In: The 9th International Joint Conference on Knowledge Discovery and Information Retrieval (KDIR) (2016)Google Scholar
  70. 70.
    Vo, N.P.A., Popescu, O., Caselli, T.: FBK-TR: SVM for semantic relatedness and corpus patterns for RTE. In: SemEval 2014, p. 289 (2014)Google Scholar
  71. 71.
    Weischedel, R., et al.: Ontonotes: a large training corpus for enhanced processing. In: Handbook of Natural Language Processing and Machine Translation. Springer, Heidelberg (2011)Google Scholar
  72. 72.
    Wise, M.J.: String similarity via greedy string tiling and running Karp-Rabin matching. Online Preprint, Dec 119 (1993)Google Scholar
  73. 73.
    Wise, M.J.: Yap 3: improved detection of similarities in computer program and other texts. In: ACM SIGCSE Bulletin, vol. 28, pp. 130–134. ACM (1996)Google Scholar
  74. 74.
    Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)Google Scholar
  75. 75.
    Zanzotto, F.M., Dell’Arciprete, L.: Distributed tree kernels. arXiv preprint arXiv:1206.4607 (2012)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.IBM T. J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations