Interpretable Semantic Textual Similarity Using Lexical and Cosine Similarity

Majumder, Goutam; Pakray, Partha; Avendaño, David Eduardo Pinto

doi:10.1007/978-981-13-1343-1_59

Goutam Majumder¹⁰,
Partha Pakray¹⁰ &
David Eduardo Pinto Avendaño¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 836))

Included in the following conference series:

Annual Convention of the Computer Society of India

Abstract

Transforming information in a digital way modifies the people views and their daily functioning. Social media is a key platform where people express their views regarding any event and it also plays an important role in daily activities. Digital marketing is an example of such digital transformation of information. In this present era, social channels use their personal information of the users to launch any product or tool. Digital Education plays a key role in transforming information in a digital way. In such cases, Natural Language Processing of people views and blog chatting plays an important role. Adding an explanatory layer is important for an Intelligent Tutoring System (ITS), where students interact with an application through natural language. This paper proposed a method, which will able to measure the interpretability between two sentences by rating the degree of semantic equivalence on a graded scale from 0 (not aligned) to 5 (semantically equivalent). The goal of the paper is not to add an interpretable layer but developed a method which can explain the similarities and differences between the two sentences. This task has been motivated by SemEval 2016 Task 2. The proposed method has been developed and tested over the headlines dataset. For the gold standard data, an accuracy of 0.64 for alignment type and score is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: UBC: cubes for English semantic textual similarity and supervised approaches for interpretable STS. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, June, pp. 178–183. ACL (2015)
Google Scholar
Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: SemEval-2016 task 2: interpretable semantic textual similarity. In: Proceedings of SemEval (SemEval 2016), San Diego, California, 16–17 June, pp. 512–524. ACL (2016)
Google Scholar
Agirrea, E., Baneab, C., Cardiec, C., Cerd, D., Diabe, M., Gonzalez-Agirrea, A., Guof, W.,Lopez-Gazpioa, I., Maritxalara, M., Mihalceab, R., Rigaua, G., Uriaa, L., Wiebe, J.: SemEval-2015 task 2: semantic textual similarity, English, Spanish and pilot on interpretability. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, June, pp. 252–263 (2015)
Google Scholar
Aleven, V., Popescu, O., Koedinger, KR.: Pedagogical content knowledge in a tutorial dialogue system to support self-explanation. In: Papers of the AIED-2001 Workshop on Tutorial Dialogue Systems, pp. 59–70 (2001)
Google Scholar
Aliguliyev, R.M.: A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst. Appl. 36(4), 7764–7772 (2009)
Article Google Scholar
Banjade, R., Niraula, N.B., Maharjan, N., Rus, V., Stefanescu, D., Lintean, M., Gautam, D.: NeRoSim: a system for measuring and interpreting semantic textual similarity. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, 4–5 June, pp. 164–171. ACL (2015)
Google Scholar
Brockett, C.: Aligning the RTE 2006 corpus. In: Microsoft Research Technical report MSR-TR-2007-77 (2007)
Google Scholar
Coelho, A.S., Tatiana, A.S., Calado, P.P., Souza, L.V., Ribeiro-Neto, B., Muntz, R.: Image retrieval using multiple evidence ranking. IEEE Trans. Knowl. Data Eng. 16(4), 408–417 (2004)
Article Google Scholar
Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). https://doi.org/10.1007/11736790_9
Chapter Google Scholar
Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Third International Workshop on Paraphrasing. Asia Federation of Natural Language Processing, January 2005
Google Scholar
Finkel, JR., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005), pp. 363–370, Stroudsburg, PA, USA, 25–30 June. ACL (2005)
Google Scholar
Henry, S., Sands, A.: VRep at SemEval-2016 task 1 and task 2: a system for interpretable semantic similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation in Collocated in 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (SemEval 2016), San Diego, California, 16–17 June, pp. 577–583. ACL (2016)
Google Scholar
Hirst, G., St-Onge, D.: WordNet: an electronic lexical database chapter lexical chains as representations of context for the detection and correction of malapropisms, pp. 305–332. MIT Press, April 1998
Google Scholar
Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data. 2(2), 10:1–10:25 (2008)
Article Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (ROCLING X) (1997)
Google Scholar
Jordan, P.W., Makatchev, M., Pappuswamy, U., VanLehn, K., Albacete, P.: A natural language tutorial dialogue system for physics. In: Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2006), Melbourne Beach, FL, United States, 11–13 May, pp. 521–526 (2005)
Google Scholar
Karumuri, S., Vuggumudi, V.K.R., Chitirala, S.C.R.: UMDuluth-BlueTeam: SVCSTS -a multilingual and chunk level semantic similarity system. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, USA, 4–5 June, pp. 107–110. Association for Computational Linguistic (2015)
Google Scholar
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, (ACL 2003), Sapporo, Japan, 7–12 July, vol. 1, pp. 423–430 (2003)
Google Scholar
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: WordNet: An Electronic Lexical Database, chap. 13, pp. 265–283. MIT Press (1998)
Google Scholar
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC 1986), pp. 24–26. ACM, New York, June 1986
Google Scholar
Li, Y., McLean, D., Bandar, Z.A., O’shea, I.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)
Article Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), San Francisco, CA, USA, pp. 296–304. Morgan Kaufmann Publishers Inc. (1998)
Google Scholar
Lopez-Gazpio, I., Eneko, A., Montse, M.: iUBC at SemEval-2016 task 2: RNNs and LSTMs for interpretable STS. In: Proceedings of International Workshop on Semantic Evaluation in Association with 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (SemEval 2016), San Diego, California, 16–17 June, pp. 771–776 (2016)
Google Scholar
Magnolini, S., Feltracco, A., Magnini, B.: FBK-HLT-NLP at SemEval-2016 Task 2: a multitask, deep learning approach for interpretable semantic textual similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), San Diego, California, 16–17 June, pp. 783–789 (2016)
Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), Boston, Massachusetts, 16–20, July, vol. 1, pp. 775–780. AAAI Press (2006)
Google Scholar
Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT 2011), Stroudsburg, PA, USA, 19–24 June, vol. 1, pp. 752–762 (2011)
Google Scholar
Nielsen, R.D., Ward, W., Martin, J.H.: Recognizing entailment in intelligent tutoring systems*. Nat. Lang. Eng. 15(4), 479–501 (2009)
Article Google Scholar
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, May 2010
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, (IJCAI 1995), San Francisco, CA, USA, 20–25 August, vol. 1, pp. 448–453. Morgan Kaufmann Publishers Inc. (1995)
Google Scholar
Rocchio, J.J.: Relevance Feedback in Information Retrieval. Prentice-Hall, Englewood Cliffs (1971)
Google Scholar
Ru, V., Lintean, M., Moldovan, C., Baggett, W., Niraula, N., Morgan, B.: The similar corpus: a resource to foster the qualitative understanding of semantic similarity of texts. In: Proceedings of Semantic Relations-II. Enhancing Resources and Applications. The 8th Language Resources and Evaluation Conference, (LREC 2012), 23–25 May 2012
Google Scholar
Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic text structuring and summarization. Inf. Process. Manag. Int. J. 33(2), 193–207 (1997). Special issue: methods and tools for the automatic construction of hypertext
Article Google Scholar
Steinberger, J., Jezek, K.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of 7th International Conference on Information Systems Implementation Modeling (ISIM 2004), Ostrava, CZ, pp. 93–100, April 2004
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003), Edmonton, Canada, 27 May–01 June, vol. 1, pp. 173–180 (2003)
Google Scholar
Šarić, F., Glavaš, G., Karan, M., Šnajder, J., Bašić, B.D.: TakeLab: systems for measuring semantic text similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), Stroudsburg, PA, USA, 7–8 July, pp. 441–448 (2012)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL 1994), Stroudsburg, PA, USA, 27–30 June, pp. 133–138 (1994)
Google Scholar

Download references

Acknowledgement

The work presented here falls under the Research Project Grant No. YSS/2015/000988 and supported by the Department of Science & Technology (DST) and Science and Engineering Research Board (SERB), Govt. of India. The authors would like to acknowledge the Department of Computer Science & Engineering, National Institute of Technology Mizoram, India for providing infrastructural facilities and support.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Mizoram, Aizawl, India
Goutam Majumder & Partha Pakray
Facultad de Ciencias de la Computación, BUAP, Puebla, Mexico
David Eduardo Pinto Avendaño

Authors

Goutam Majumder
View author publications
You can also search for this author in PubMed Google Scholar
Partha Pakray
View author publications
You can also search for this author in PubMed Google Scholar
David Eduardo Pinto Avendaño
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Goutam Majumder .

Editor information

Editors and Affiliations

Kalyani University, Kalyani, India
Jyotsna Kumar Mandal
University of Calcutta, Kolkata, India
Devadatta Sinha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Majumder, G., Pakray, P., Avendaño, D.E.P. (2018). Interpretable Semantic Textual Similarity Using Lexical and Cosine Similarity. In: Mandal, J., Sinha, D. (eds) Social Transformation – Digital Way. CSI 2018. Communications in Computer and Information Science, vol 836. Springer, Singapore. https://doi.org/10.1007/978-981-13-1343-1_59

Download citation

DOI: https://doi.org/10.1007/978-981-13-1343-1_59
Published: 24 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1342-4
Online ISBN: 978-981-13-1343-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics