Semantic Analysis Using Pairwise Sentence Comparison with Word Embeddings

Menon, Vijay Krishna; M., Sabdhi; K., Harikumar; K.P., Soman

doi:10.1007/978-3-319-68385-0_23

Semantic Analysis Using Pairwise Sentence Comparison with Word Embeddings

Vijay Krishna Menon²⁰,
Sabdhi M.²⁰,
Harikumar K.²⁰ &
…
Soman K.P.²⁰

Conference paper
First Online: 21 October 2017

1038 Accesses
2 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 683))

Abstract

Comparing the semantics of a pair of sentences has been an interesting yet unstructured problem. Semantic analysis is mostly elusive due to the fact that the semantics of Natural language constructs cannot be measured, let alone be compared to one another. Methods like Latent Semantic Analysis(LSA) and Latent Dichlaret Analysis(LDA) are able to capture broader semantics between documents, but their contribution in pairwise comparison tasks which require deeper semantics may be limited. In this paper we present a local alignment based scoring scheme for sentence pairs using word embeddings and how this can be used as a feature for some popular text analysis tasks such as summarization, paraphrase comparison, topic profiling and other semantic comparison tasks. We also present a theoretical analysis on the metrics used in this approach and a separability argument using t-SNE plots. Furthermore we detail our Spark implementation model for the pairwise comparison and summarization.

This is a preview of subscription content, log in via an institution.

References

Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 305–316. Springer (2008)
Google Scholar
Amiri, H., Resnik, P., Boyd-Graber, J., III, H.D.: Learning text pair similarity with context-sensitive autoencoders. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1882–1892 (2016)
Google Scholar
Ashwini, B., Menon, V.K., Soman, K.P.: Prediction of Malicious Domains Using Smith Waterman Algorithm, pp. 369–376. Springer, Singapore (2016)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Ganesan, K., Zhai, C., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 340–348 (2010)
Google Scholar
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Google Scholar
Gracia, J., Mena, E.: Web-based measure of semantic relatedness. In: International Conference on Web Information Systems Engineering, pp. 136–150. Springer (2008)
Google Scholar
Hassanzadeh, H., Groza, T., Nguyen, A., Hunter, J.: Uqeresearch: semantic textual similarity quantification. In: SemEval-2015, p. 123 (2015)
Google Scholar
He, H., Lin, J.: Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In: Proceedings of NAACL-HLT, pp. 937–948 (2016)
Google Scholar
He, H., Gimpel, K., Lin, J.J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: EMNLP, pp. 1576–1586 (2015)
Google Scholar
Irving, R.W.: Plagiarism and collusion detection using the smithwaterman algorithm. Technical report, University of Glasgow, Department of Computer Science (2004)
Google Scholar
Jensen, A.S., Boss, N.S.: Textual similarity: comparing texts in order to discover how closely they discuss the same topics. B.S. thesis, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark (2008)
Google Scholar
Kågebäck, M., Mogren, O., Tahmasebi, N., Dubhashi, D.: Extractive summarization using continuous vector space models. In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC), EACL, Citeseer, pp. 31–39 (2014)
Google Scholar
van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, Association for Computational Linguistics, pp. 404–411. Barcelona (2004). http://www.aclweb.org/anthology/W04-3252
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, pp. 3111–3119. Curran Associates Inc., USA (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Ramage, D., Rafferty, A.N., Manning, C.D.: Random walks for text semantic similarity. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, Association for Computational Linguistics, pp. 23–31 (2009)
Google Scholar
Sanborn, A., Skryzalin, J.: Deep learning for semantic similarity. CS224d: Deep Learning for Natural Language Processing Stanford, Stanford University, CA (2015)
Google Scholar
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 384–394 (2010)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, HotCloud 2010, Berkeley, CA, USA, p. 10 (2010)
Google Scholar
Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). doi:10.1145/2934664
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computational Engineering and Networking, Amrita University, Coimbatore, India
Vijay Krishna Menon, Sabdhi M., Harikumar K. & Soman K.P.

Authors

Vijay Krishna Menon
View author publications
You can also search for this author in PubMed Google Scholar
Sabdhi M.
View author publications
You can also search for this author in PubMed Google Scholar
Harikumar K.
View author publications
You can also search for this author in PubMed Google Scholar
Soman K.P.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vijay Krishna Menon .

Editor information

Editors and Affiliations

School of CS/IT, Indian Institute of Information Technology, Trivandrum, Kerala, India
Sabu M. Thampi
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Sushmita Mitra
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal, India
Jayanta Mukhopadhyay
Xiamen University, Xiamen, China
Kuan-Ching Li
Department of Electrical and Electronic, Nazarbayev University, Astana, Kazakhstan
Alex Pappachen James
Dipartimento di Ingegneria, Università degli Studi di Firenze, Firenze, Italy
Stefano Berretti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Menon, V.K., M., S., K., H., K.P., S. (2018). Semantic Analysis Using Pairwise Sentence Comparison with Word Embeddings. In: Thampi, S., Mitra, S., Mukhopadhyay, J., Li, KC., James, A., Berretti, S. (eds) Intelligent Systems Technologies and Applications. ISTA 2017. Advances in Intelligent Systems and Computing, vol 683. Springer, Cham. https://doi.org/10.1007/978-3-319-68385-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-68385-0_23
Published: 21 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68384-3
Online ISBN: 978-3-319-68385-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics