Skip to main content

Semantic Analysis Using Pairwise Sentence Comparison with Word Embeddings

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 683))

Abstract

Comparing the semantics of a pair of sentences has been an interesting yet unstructured problem. Semantic analysis is mostly elusive due to the fact that the semantics of Natural language constructs cannot be measured, let alone be compared to one another. Methods like Latent Semantic Analysis(LSA) and Latent Dichlaret Analysis(LDA) are able to capture broader semantics between documents, but their contribution in pairwise comparison tasks which require deeper semantics may be limited. In this paper we present a local alignment based scoring scheme for sentence pairs using word embeddings and how this can be used as a feature for some popular text analysis tasks such as summarization, paraphrase comparison, topic profiling and other semantic comparison tasks. We also present a theoretical analysis on the metrics used in this approach and a separability argument using t-SNE plots. Furthermore we detail our Spark implementation model for the pairwise comparison and summarization.

This is a preview of subscription content, log in via an institution.

References

  1. Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 305–316. Springer (2008)

    Google Scholar 

  2. Amiri, H., Resnik, P., Boyd-Graber, J., III, H.D.: Learning text pair similarity with context-sensitive autoencoders. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1882–1892 (2016)

    Google Scholar 

  3. Ashwini, B., Menon, V.K., Soman, K.P.: Prediction of Malicious Domains Using Smith Waterman Algorithm, pp. 369–376. Springer, Singapore (2016)

    Google Scholar 

  4. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  5. Ganesan, K., Zhai, C., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 340–348 (2010)

    Google Scholar 

  6. Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)

    Google Scholar 

  7. Gracia, J., Mena, E.: Web-based measure of semantic relatedness. In: International Conference on Web Information Systems Engineering, pp. 136–150. Springer (2008)

    Google Scholar 

  8. Hassanzadeh, H., Groza, T., Nguyen, A., Hunter, J.: Uqeresearch: semantic textual similarity quantification. In: SemEval-2015, p. 123 (2015)

    Google Scholar 

  9. He, H., Lin, J.: Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In: Proceedings of NAACL-HLT, pp. 937–948 (2016)

    Google Scholar 

  10. He, H., Gimpel, K., Lin, J.J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: EMNLP, pp. 1576–1586 (2015)

    Google Scholar 

  11. Irving, R.W.: Plagiarism and collusion detection using the smithwaterman algorithm. Technical report, University of Glasgow, Department of Computer Science (2004)

    Google Scholar 

  12. Jensen, A.S., Boss, N.S.: Textual similarity: comparing texts in order to discover how closely they discuss the same topics. B.S. thesis, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark (2008)

    Google Scholar 

  13. Kågebäck, M., Mogren, O., Tahmasebi, N., Dubhashi, D.: Extractive summarization using continuous vector space models. In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC), EACL, Citeseer, pp. 31–39 (2014)

    Google Scholar 

  14. van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  15. Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, Association for Computational Linguistics, pp. 404–411. Barcelona (2004). http://www.aclweb.org/anthology/W04-3252

  16. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, pp. 3111–3119. Curran Associates Inc., USA (2013)

    Google Scholar 

  17. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  18. Ramage, D., Rafferty, A.N., Manning, C.D.: Random walks for text semantic similarity. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, Association for Computational Linguistics, pp. 23–31 (2009)

    Google Scholar 

  19. Sanborn, A., Skryzalin, J.: Deep learning for semantic similarity. CS224d: Deep Learning for Natural Language Processing Stanford, Stanford University, CA (2015)

    Google Scholar 

  20. Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  21. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 384–394 (2010)

    Google Scholar 

  22. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, HotCloud 2010, Berkeley, CA, USA, p. 10 (2010)

    Google Scholar 

  23. Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). doi:10.1145/2934664

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vijay Krishna Menon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Menon, V.K., M., S., K., H., K.P., S. (2018). Semantic Analysis Using Pairwise Sentence Comparison with Word Embeddings. In: Thampi, S., Mitra, S., Mukhopadhyay, J., Li, KC., James, A., Berretti, S. (eds) Intelligent Systems Technologies and Applications. ISTA 2017. Advances in Intelligent Systems and Computing, vol 683. Springer, Cham. https://doi.org/10.1007/978-3-319-68385-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68385-0_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68384-3

  • Online ISBN: 978-3-319-68385-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics