Skip to main content

Interpretable Semantic Textual Similarity Using Lexical and Cosine Similarity

  • Conference paper
  • First Online:
Social Transformation – Digital Way (CSI 2018)

Abstract

Transforming information in a digital way modifies the people views and their daily functioning. Social media is a key platform where people express their views regarding any event and it also plays an important role in daily activities. Digital marketing is an example of such digital transformation of information. In this present era, social channels use their personal information of the users to launch any product or tool. Digital Education plays a key role in transforming information in a digital way. In such cases, Natural Language Processing of people views and blog chatting plays an important role. Adding an explanatory layer is important for an Intelligent Tutoring System (ITS), where students interact with an application through natural language. This paper proposed a method, which will able to measure the interpretability between two sentences by rating the degree of semantic equivalence on a graded scale from 0 (not aligned) to 5 (semantically equivalent). The goal of the paper is not to add an interpretable layer but developed a method which can explain the similarities and differences between the two sentences. This task has been motivated by SemEval 2016 Task 2. The proposed method has been developed and tested over the headlines dataset. For the gold standard data, an accuracy of 0.64 for alignment type and score is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://alt.qcri.org/semeval2015/task2/.

  2. 2.

    http://alt.qcri.org/semeval2016/task2/.

  3. 3.

    https://wordnet.princeton.edu/wordnet/download/old-versions/.

  4. 4.

    https://projects.csail.mit.edu/jwi/.

  5. 5.

    https://opennlp.apache.org/download.html.

  6. 6.

    http://sentiwordnet.isti.cnr.it/.

  7. 7.

    https://groups.google.com/forum/#!topic/word2vec-toolkit/z0Aw5powUco.

References

  • Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: UBC: cubes for English semantic textual similarity and supervised approaches for interpretable STS. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, June, pp. 178–183. ACL (2015)

    Google Scholar 

  • Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: SemEval-2016 task 2: interpretable semantic textual similarity. In: Proceedings of SemEval (SemEval 2016), San Diego, California, 16–17 June, pp. 512–524. ACL (2016)

    Google Scholar 

  • Agirrea, E., Baneab, C., Cardiec, C., Cerd, D., Diabe, M., Gonzalez-Agirrea, A., Guof, W.,Lopez-Gazpioa, I., Maritxalara, M., Mihalceab, R., Rigaua, G., Uriaa, L., Wiebe, J.: SemEval-2015 task 2: semantic textual similarity, English, Spanish and pilot on interpretability. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, June, pp. 252–263 (2015)

    Google Scholar 

  • Aleven, V., Popescu, O., Koedinger, KR.: Pedagogical content knowledge in a tutorial dialogue system to support self-explanation. In: Papers of the AIED-2001 Workshop on Tutorial Dialogue Systems, pp. 59–70 (2001)

    Google Scholar 

  • Aliguliyev, R.M.: A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst. Appl. 36(4), 7764–7772 (2009)

    Article  Google Scholar 

  • Banjade, R., Niraula, N.B., Maharjan, N., Rus, V., Stefanescu, D., Lintean, M., Gautam, D.: NeRoSim: a system for measuring and interpreting semantic textual similarity. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, 4–5 June, pp. 164–171. ACL (2015)

    Google Scholar 

  • Brockett, C.: Aligning the RTE 2006 corpus. In: Microsoft Research Technical report MSR-TR-2007-77 (2007)

    Google Scholar 

  • Coelho, A.S., Tatiana, A.S., Calado, P.P., Souza, L.V., Ribeiro-Neto, B., Muntz, R.: Image retrieval using multiple evidence ranking. IEEE Trans. Knowl. Data Eng. 16(4), 408–417 (2004)

    Article  Google Scholar 

  • Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). https://doi.org/10.1007/11736790_9

    Chapter  Google Scholar 

  • Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Third International Workshop on Paraphrasing. Asia Federation of Natural Language Processing, January 2005

    Google Scholar 

  • Finkel, JR., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005), pp. 363–370, Stroudsburg, PA, USA, 25–30 June. ACL (2005)

    Google Scholar 

  • Henry, S., Sands, A.: VRep at SemEval-2016 task 1 and task 2: a system for interpretable semantic similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation in Collocated in 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (SemEval 2016), San Diego, California, 16–17 June, pp. 577–583. ACL (2016)

    Google Scholar 

  • Hirst, G., St-Onge, D.: WordNet: an electronic lexical database chapter lexical chains as representations of context for the detection and correction of malapropisms, pp. 305–332. MIT Press, April 1998

    Google Scholar 

  • Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data. 2(2), 10:1–10:25 (2008)

    Article  Google Scholar 

  • Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (ROCLING X) (1997)

    Google Scholar 

  • Jordan, P.W., Makatchev, M., Pappuswamy, U., VanLehn, K., Albacete, P.: A natural language tutorial dialogue system for physics. In: Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2006), Melbourne Beach, FL, United States, 11–13 May, pp. 521–526 (2005)

    Google Scholar 

  • Karumuri, S., Vuggumudi, V.K.R., Chitirala, S.C.R.: UMDuluth-BlueTeam: SVCSTS -a multilingual and chunk level semantic similarity system. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, USA, 4–5 June, pp. 107–110. Association for Computational Linguistic (2015)

    Google Scholar 

  • Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, (ACL 2003), Sapporo, Japan, 7–12 July, vol. 1, pp. 423–430 (2003)

    Google Scholar 

  • Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: WordNet: An Electronic Lexical Database, chap. 13, pp. 265–283. MIT Press (1998)

    Google Scholar 

  • Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC 1986), pp. 24–26. ACM, New York, June 1986

    Google Scholar 

  • Li, Y., McLean, D., Bandar, Z.A., O’shea, I.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)

    Article  Google Scholar 

  • Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), San Francisco, CA, USA, pp. 296–304. Morgan Kaufmann Publishers Inc. (1998)

    Google Scholar 

  • Lopez-Gazpio, I., Eneko, A., Montse, M.: iUBC at SemEval-2016 task 2: RNNs and LSTMs for interpretable STS. In: Proceedings of International Workshop on Semantic Evaluation in Association with 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (SemEval 2016), San Diego, California, 16–17 June, pp. 771–776 (2016)

    Google Scholar 

  • Magnolini, S., Feltracco, A., Magnini, B.: FBK-HLT-NLP at SemEval-2016 Task 2: a multitask, deep learning approach for interpretable semantic textual similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), San Diego, California, 16–17 June, pp. 783–789 (2016)

    Google Scholar 

  • Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), Boston, Massachusetts, 16–20, July, vol. 1, pp. 775–780. AAAI Press (2006)

    Google Scholar 

  • Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT 2011), Stroudsburg, PA, USA, 19–24 June, vol. 1, pp. 752–762 (2011)

    Google Scholar 

  • Nielsen, R.D., Ward, W., Martin, J.H.: Recognizing entailment in intelligent tutoring systems*. Nat. Lang. Eng. 15(4), 479–501 (2009)

    Article  Google Scholar 

  • Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, May 2010

    Google Scholar 

  • Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, (IJCAI 1995), San Francisco, CA, USA, 20–25 August, vol. 1, pp. 448–453. Morgan Kaufmann Publishers Inc. (1995)

    Google Scholar 

  • Rocchio, J.J.: Relevance Feedback in Information Retrieval. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  • Ru, V., Lintean, M., Moldovan, C., Baggett, W., Niraula, N., Morgan, B.: The similar corpus: a resource to foster the qualitative understanding of semantic similarity of texts. In: Proceedings of Semantic Relations-II. Enhancing Resources and Applications. The 8th Language Resources and Evaluation Conference, (LREC 2012), 23–25 May 2012

    Google Scholar 

  • Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic text structuring and summarization. Inf. Process. Manag. Int. J. 33(2), 193–207 (1997). Special issue: methods and tools for the automatic construction of hypertext

    Article  Google Scholar 

  • Steinberger, J., Jezek, K.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of 7th International Conference on Information Systems Implementation Modeling (ISIM 2004), Ostrava, CZ, pp. 93–100, April 2004

    Google Scholar 

  • Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003), Edmonton, Canada, 27 May–01 June, vol. 1, pp. 173–180 (2003)

    Google Scholar 

  • Šarić, F., Glavaš, G., Karan, M., Šnajder, J., Bašić, B.D.: TakeLab: systems for measuring semantic text similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), Stroudsburg, PA, USA, 7–8 July, pp. 441–448 (2012)

    Google Scholar 

  • Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL 1994), Stroudsburg, PA, USA, 27–30 June, pp. 133–138 (1994)

    Google Scholar 

Download references

Acknowledgement

The work presented here falls under the Research Project Grant No. YSS/2015/000988 and supported by the Department of Science & Technology (DST) and Science and Engineering Research Board (SERB), Govt. of India. The authors would like to acknowledge the Department of Computer Science & Engineering, National Institute of Technology Mizoram, India for providing infrastructural facilities and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Goutam Majumder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Majumder, G., Pakray, P., Avendaño, D.E.P. (2018). Interpretable Semantic Textual Similarity Using Lexical and Cosine Similarity. In: Mandal, J., Sinha, D. (eds) Social Transformation – Digital Way. CSI 2018. Communications in Computer and Information Science, vol 836. Springer, Singapore. https://doi.org/10.1007/978-981-13-1343-1_59

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1343-1_59

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1342-4

  • Online ISBN: 978-981-13-1343-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics