Abstract
Comparing the semantics of a pair of sentences has been an interesting yet unstructured problem. Semantic analysis is mostly elusive due to the fact that the semantics of Natural language constructs cannot be measured, let alone be compared to one another. Methods like Latent Semantic Analysis(LSA) and Latent Dichlaret Analysis(LDA) are able to capture broader semantics between documents, but their contribution in pairwise comparison tasks which require deeper semantics may be limited. In this paper we present a local alignment based scoring scheme for sentence pairs using word embeddings and how this can be used as a feature for some popular text analysis tasks such as summarization, paraphrase comparison, topic profiling and other semantic comparison tasks. We also present a theoretical analysis on the metrics used in this approach and a separability argument using t-SNE plots. Furthermore we detail our Spark implementation model for the pairwise comparison and summarization.
This is a preview of subscription content, log in via an institution.
References
Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 305–316. Springer (2008)
Amiri, H., Resnik, P., Boyd-Graber, J., III, H.D.: Learning text pair similarity with context-sensitive autoencoders. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1882–1892 (2016)
Ashwini, B., Menon, V.K., Soman, K.P.: Prediction of Malicious Domains Using Smith Waterman Algorithm, pp. 369–376. Springer, Singapore (2016)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Ganesan, K., Zhai, C., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 340–348 (2010)
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Gracia, J., Mena, E.: Web-based measure of semantic relatedness. In: International Conference on Web Information Systems Engineering, pp. 136–150. Springer (2008)
Hassanzadeh, H., Groza, T., Nguyen, A., Hunter, J.: Uqeresearch: semantic textual similarity quantification. In: SemEval-2015, p. 123 (2015)
He, H., Lin, J.: Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In: Proceedings of NAACL-HLT, pp. 937–948 (2016)
He, H., Gimpel, K., Lin, J.J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: EMNLP, pp. 1576–1586 (2015)
Irving, R.W.: Plagiarism and collusion detection using the smithwaterman algorithm. Technical report, University of Glasgow, Department of Computer Science (2004)
Jensen, A.S., Boss, N.S.: Textual similarity: comparing texts in order to discover how closely they discuss the same topics. B.S. thesis, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark (2008)
Kågebäck, M., Mogren, O., Tahmasebi, N., Dubhashi, D.: Extractive summarization using continuous vector space models. In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC), EACL, Citeseer, pp. 31–39 (2014)
van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, Association for Computational Linguistics, pp. 404–411. Barcelona (2004). http://www.aclweb.org/anthology/W04-3252
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, pp. 3111–3119. Curran Associates Inc., USA (2013)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Ramage, D., Rafferty, A.N., Manning, C.D.: Random walks for text semantic similarity. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, Association for Computational Linguistics, pp. 23–31 (2009)
Sanborn, A., Skryzalin, J.: Deep learning for semantic similarity. CS224d: Deep Learning for Natural Language Processing Stanford, Stanford University, CA (2015)
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 384–394 (2010)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, HotCloud 2010, Berkeley, CA, USA, p. 10 (2010)
Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). doi:10.1145/2934664
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Menon, V.K., M., S., K., H., K.P., S. (2018). Semantic Analysis Using Pairwise Sentence Comparison with Word Embeddings. In: Thampi, S., Mitra, S., Mukhopadhyay, J., Li, KC., James, A., Berretti, S. (eds) Intelligent Systems Technologies and Applications. ISTA 2017. Advances in Intelligent Systems and Computing, vol 683. Springer, Cham. https://doi.org/10.1007/978-3-319-68385-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-68385-0_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68384-3
Online ISBN: 978-3-319-68385-0
eBook Packages: EngineeringEngineering (R0)