Advertisement

Interpretable Semantic Textual Similarity Using Lexical and Cosine Similarity

  • Goutam MajumderEmail author
  • Partha Pakray
  • David Eduardo Pinto Avendaño
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 836)

Abstract

Transforming information in a digital way modifies the people views and their daily functioning. Social media is a key platform where people express their views regarding any event and it also plays an important role in daily activities. Digital marketing is an example of such digital transformation of information. In this present era, social channels use their personal information of the users to launch any product or tool. Digital Education plays a key role in transforming information in a digital way. In such cases, Natural Language Processing of people views and blog chatting plays an important role. Adding an explanatory layer is important for an Intelligent Tutoring System (ITS), where students interact with an application through natural language. This paper proposed a method, which will able to measure the interpretability between two sentences by rating the degree of semantic equivalence on a graded scale from 0 (not aligned) to 5 (semantically equivalent). The goal of the paper is not to add an interpretable layer but developed a method which can explain the similarities and differences between the two sentences. This task has been motivated by SemEval 2016 Task 2. The proposed method has been developed and tested over the headlines dataset. For the gold standard data, an accuracy of 0.64 for alignment type and score is reported.

Keywords

Semantic similarity Word2Vec WordNet String similarity 

Notes

Acknowledgement

The work presented here falls under the Research Project Grant No. YSS/2015/000988 and supported by the Department of Science & Technology (DST) and Science and Engineering Research Board (SERB), Govt. of India. The authors would like to acknowledge the Department of Computer Science & Engineering, National Institute of Technology Mizoram, India for providing infrastructural facilities and support.

References

  1. Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: UBC: cubes for English semantic textual similarity and supervised approaches for interpretable STS. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, June, pp. 178–183. ACL (2015)Google Scholar
  2. Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: SemEval-2016 task 2: interpretable semantic textual similarity. In: Proceedings of SemEval (SemEval 2016), San Diego, California, 16–17 June, pp. 512–524. ACL (2016)Google Scholar
  3. Agirrea, E., Baneab, C., Cardiec, C., Cerd, D., Diabe, M., Gonzalez-Agirrea, A., Guof, W.,Lopez-Gazpioa, I., Maritxalara, M., Mihalceab, R., Rigaua, G., Uriaa, L., Wiebe, J.: SemEval-2015 task 2: semantic textual similarity, English, Spanish and pilot on interpretability. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, June, pp. 252–263 (2015)Google Scholar
  4. Aleven, V., Popescu, O., Koedinger, KR.: Pedagogical content knowledge in a tutorial dialogue system to support self-explanation. In: Papers of the AIED-2001 Workshop on Tutorial Dialogue Systems, pp. 59–70 (2001)Google Scholar
  5. Aliguliyev, R.M.: A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst. Appl. 36(4), 7764–7772 (2009)CrossRefGoogle Scholar
  6. Banjade, R., Niraula, N.B., Maharjan, N., Rus, V., Stefanescu, D., Lintean, M., Gautam, D.: NeRoSim: a system for measuring and interpreting semantic textual similarity. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, 4–5 June, pp. 164–171. ACL (2015)Google Scholar
  7. Brockett, C.: Aligning the RTE 2006 corpus. In: Microsoft Research Technical report MSR-TR-2007-77 (2007)Google Scholar
  8. Coelho, A.S., Tatiana, A.S., Calado, P.P., Souza, L.V., Ribeiro-Neto, B., Muntz, R.: Image retrieval using multiple evidence ranking. IEEE Trans. Knowl. Data Eng. 16(4), 408–417 (2004)CrossRefGoogle Scholar
  9. Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006).  https://doi.org/10.1007/11736790_9CrossRefGoogle Scholar
  10. Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Third International Workshop on Paraphrasing. Asia Federation of Natural Language Processing, January 2005Google Scholar
  11. Finkel, JR., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005), pp. 363–370, Stroudsburg, PA, USA, 25–30 June. ACL (2005)Google Scholar
  12. Henry, S., Sands, A.: VRep at SemEval-2016 task 1 and task 2: a system for interpretable semantic similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation in Collocated in 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (SemEval 2016), San Diego, California, 16–17 June, pp. 577–583. ACL (2016)Google Scholar
  13. Hirst, G., St-Onge, D.: WordNet: an electronic lexical database chapter lexical chains as representations of context for the detection and correction of malapropisms, pp. 305–332. MIT Press, April 1998Google Scholar
  14. Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data. 2(2), 10:1–10:25 (2008)CrossRefGoogle Scholar
  15. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (ROCLING X) (1997)Google Scholar
  16. Jordan, P.W., Makatchev, M., Pappuswamy, U., VanLehn, K., Albacete, P.: A natural language tutorial dialogue system for physics. In: Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2006), Melbourne Beach, FL, United States, 11–13 May, pp. 521–526 (2005)Google Scholar
  17. Karumuri, S., Vuggumudi, V.K.R., Chitirala, S.C.R.: UMDuluth-BlueTeam: SVCSTS -a multilingual and chunk level semantic similarity system. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, USA, 4–5 June, pp. 107–110. Association for Computational Linguistic (2015)Google Scholar
  18. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, (ACL 2003), Sapporo, Japan, 7–12 July, vol. 1, pp. 423–430 (2003)Google Scholar
  19. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: WordNet: An Electronic Lexical Database, chap. 13, pp. 265–283. MIT Press (1998)Google Scholar
  20. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC 1986), pp. 24–26. ACM, New York, June 1986Google Scholar
  21. Li, Y., McLean, D., Bandar, Z.A., O’shea, I.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)CrossRefGoogle Scholar
  22. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), San Francisco, CA, USA, pp. 296–304. Morgan Kaufmann Publishers Inc. (1998)Google Scholar
  23. Lopez-Gazpio, I., Eneko, A., Montse, M.: iUBC at SemEval-2016 task 2: RNNs and LSTMs for interpretable STS. In: Proceedings of International Workshop on Semantic Evaluation in Association with 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (SemEval 2016), San Diego, California, 16–17 June, pp. 771–776 (2016)Google Scholar
  24. Magnolini, S., Feltracco, A., Magnini, B.: FBK-HLT-NLP at SemEval-2016 Task 2: a multitask, deep learning approach for interpretable semantic textual similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), San Diego, California, 16–17 June, pp. 783–789 (2016)Google Scholar
  25. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), Boston, Massachusetts, 16–20, July, vol. 1, pp. 775–780. AAAI Press (2006)Google Scholar
  26. Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT 2011), Stroudsburg, PA, USA, 19–24 June, vol. 1, pp. 752–762 (2011)Google Scholar
  27. Nielsen, R.D., Ward, W., Martin, J.H.: Recognizing entailment in intelligent tutoring systems*. Nat. Lang. Eng. 15(4), 479–501 (2009)CrossRefGoogle Scholar
  28. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, May 2010Google Scholar
  29. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, (IJCAI 1995), San Francisco, CA, USA, 20–25 August, vol. 1, pp. 448–453. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  30. Rocchio, J.J.: Relevance Feedback in Information Retrieval. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  31. Ru, V., Lintean, M., Moldovan, C., Baggett, W., Niraula, N., Morgan, B.: The similar corpus: a resource to foster the qualitative understanding of semantic similarity of texts. In: Proceedings of Semantic Relations-II. Enhancing Resources and Applications. The 8th Language Resources and Evaluation Conference, (LREC 2012), 23–25 May 2012Google Scholar
  32. Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic text structuring and summarization. Inf. Process. Manag. Int. J. 33(2), 193–207 (1997). Special issue: methods and tools for the automatic construction of hypertextCrossRefGoogle Scholar
  33. Steinberger, J., Jezek, K.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of 7th International Conference on Information Systems Implementation Modeling (ISIM 2004), Ostrava, CZ, pp. 93–100, April 2004Google Scholar
  34. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003), Edmonton, Canada, 27 May–01 June, vol. 1, pp. 173–180 (2003)Google Scholar
  35. Šarić, F., Glavaš, G., Karan, M., Šnajder, J., Bašić, B.D.: TakeLab: systems for measuring semantic text similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), Stroudsburg, PA, USA, 7–8 July, pp. 441–448 (2012)Google Scholar
  36. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL 1994), Stroudsburg, PA, USA, 27–30 June, pp. 133–138 (1994)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Goutam Majumder
    • 1
    Email author
  • Partha Pakray
    • 1
  • David Eduardo Pinto Avendaño
    • 2
  1. 1.Department of Computer Science and EngineeringNational Institute of Technology MizoramAizawlIndia
  2. 2.Facultad de Ciencias de la ComputaciónBUAPPueblaMexico

Personalised recommendations