Abstract
Semantic similarity plays an important role in understanding the context of text data. In this paper, semantic similarity between large text data is computed using different neural embeddings. we review the utility of different deep neural embeddings for text data representation. Most of the earlier papers have studied the semantic similarity of text by using individual word embeddings. In this paper, we have evaluated the neural embedding techniques on large text data with the help of Essay Dataset. We have used recent neural embedding methods such as Google Sentence Encoder, ELMo, and GloVe along with traditional similarity metrics including TF-IDF and Jaccard Index for experimental investigation. Experimental evaluation in this research paper shows that Google Sentence Encoder and ELMo embeddings perform best on semantic similarity task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cambria, E., White, B.: Jumping NLPp curves: a review of natural language processing research. IEEE Comput. Intell. Mag. 9(2), 48–57 (2014)
Cer, D., Yang, Y., Kong, S.Y., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., et al.: Universal Sentence Encoder. arXiv preprint arXiv:1803.11175 (2018)
Clark, E., Celikyilmaz, A., Smith, N.A.: Sentence movers similarity: automatic evaluation for multi-sentence texts. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2748–2760 (2019)
Khurana, D., Koli, A., Khatter, K., Singh, S.: Natural Language Processing: State of the Art, Current Trends and Challenges. arXiv preprint arXiv:1708.05148 (2017)
Melamud, O., Goldberger, J., Dagan, I.: context2vec: Learning generic context embedding with bidirectional LSTM. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61. Association for Computational Linguistics, Berlin, Germany (2016). https://doi.org/10.18653/v1/K16-1006
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Pawar, A., Mago, V.: Challenging the boundaries of unsupervised learning for semantic similarity. IEEE Access 7, 16291–16308 (2019)
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Prize, A.S.A.: The Hewlett Foundation: Automated Essay Scoring (2012). https://www.kaggle.com/c/asap-aes/
Tashu, T.M., Horváth, T.: Pair-wise: automatic essay evaluation using word mover’s distance. CSEDU 1, 59–66 (2018)
Wang, B., Wang, A., Chen, F., Wang, Y., Kuo, C.C.J.: Evaluating Word Embedding Models: Methods and Experimental Results. arXiv preprint arXiv:1901.09785 (2019)
Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Comput Intell. Mag. 13(3), 55–75 (2018)
Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–85 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hendre, M., Mukherjee, P., Godse, M. (2021). Utility of Neural Embeddings in Semantic Similarity of Text Data. In: Bhateja, V., Peng, SL., Satapathy, S.C., Zhang, YD. (eds) Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing, vol 1176. Springer, Singapore. https://doi.org/10.1007/978-981-15-5788-0_21
Download citation
DOI: https://doi.org/10.1007/978-981-15-5788-0_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5787-3
Online ISBN: 978-981-15-5788-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)