Semantic textual similarity between sentences using bilingual word semantics

Shajalal, Md.; Aono, Masaki

doi:10.1007/s13748-019-00180-4

Semantic textual similarity between sentences using bilingual word semantics

Regular Paper
Published: 09 March 2019

Volume 8, pages 263–272, (2019)
Cite this article

Progress in Artificial Intelligence Aims and scope Submit manuscript

811 Accesses
23 Citations
Explore all metrics

Abstract

Semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical similarity measures cannot compute the similarity beyond a trivial level. Moreover, they only can capture the textual similarity, but not semantic. In this paper, we propose a method for semantic textual similarity that leverages bilingual word-level semantics to compute the semantic similarity between sentences. To capture word-level semantics, we employ distribute representation of words in two different languages. The similarity function based on the concept-to-concept relationship corresponding to the words is also utilized for the same purpose. Multiple new semantic similarity measures are introduced based on word-embedding models trained on two different corpora in two different languages. Apart from these, another new semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentences is then computed by applying a linear ranking approach to all proposed measures with their importance score estimated employing a supervised feature selection technique. We conducted experiments on the SemEval Semantic Textual Similarity (STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual similarity and outperforms some known related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SimiT: A Text Similarity Method Using Lexicon and Dependency Representations

Article 17 June 2020

Hybrid Conceptual and Statistical Measure for Semantic Textual Similarity Evaluation

Cross-Lingual Semantic Textual Similarity Modeling Using Neural Networks

Notes

SemEval: https://en.wikipedia.org/wiki/SemEval.
Indri’s Stopwords: http://www.lemurproject.org/stopwords/stoplist.dft.
STS2017: http://alt.qcri.org/semeval2017/task1/.
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient.

References

Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R.: Semeval-2015 task 2: Semantic textual similarity, English, Spanish and pilot on interpretability. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 252–263 (2015)
Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 497–511 (2016)
Aliguliyev, R.M.: A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst. Appl. 36(4), 7764–7772 (2009)
Article Google Scholar
Bär, D., Biemann, C., Gurevych, I., Zesch, T.: Ukp: computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Association for Computational Linguistics, pp. 435–440 (2012)
Barrow, J., Peskov, D.: UMDeep at SemEval-2017 task 1: end-to-end shared weight LSTM model for semantic textual similarity. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 180–184 (2017)
Biçici, E.: RTM at SemEval-2017 task 1: referential translation machines for predicting semantic similarity. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 203–207 (2017)
Bjerva, J., Östling, R.: ResSim at SemEval-2017 task 1: multilingual word representations for semantic textual similarity. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 154–158 (2017)
Callan, J., Hoy, M., Yoo, C., Zhao, l.: Clueweb09 data set (2009)
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055 (2017)
España-Bonet, C., Barrón-Cedeño, A.: Lump at SemEval-2017 task 1: towards an interlingua semantic similarity. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 144–149 (2017)
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)
Ferreira, R., Lins, R.D., Freitas, F., Simske, S.J., Riss, M.: A new sentence similarity assessment measure based on a three-layer sentence representation. In: Proceedings of the 2014 ACM Symposium on Document Engineering, ACM, pp. 25–34 (2014)
Fewzee, P., Karray, F.: Elastic net for paralinguistic speech recognition. In : Proceedings of the 14th ACM International Conference on Multimodal Interaction, ACM, pp. 509–516 (2012)
Han, L., Kashyap, A.L., Finin, T., Mayfield, J., Weese, J.: UMBC\_ebiquity-core: semantic textual similarity systems. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol. 1, pp. 44–52 (2013)
Hassanzadeh, H., Groza, H., Nguyen, A., Hunter, J.: Uqeresearch: semantic textual similarity quantification. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 123–127 (2015)
Hoerl, A., Kennard, R.: Ridge Regression, in Encyclopedia of Statistical Sciences, vol. 8, pp. 129–136. Wiley, New York (1988)
Google Scholar
Jijkoun, V., de Rijke, M.: Recognizing textual entailment using lexical similarity. In: Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment, Citeseer, pp. 73–76 (2005)
Kenter, T., De Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, ACM, pp. 1411–1420 (2015)
Kozareva, Z., Vazquez, S., Montoyo, A.: Adaptation of a machine-learning textual entailment system to a multilingual answer validation exercise. In: CLEF (Working Notes), 2006
Li, H., Xu, J.: Semantic matching in search. Found. Trends Inf. Retr. 7(5), 343–469 (2014)
Article MathSciNet Google Scholar
Li, Y., McLean, D., Bandar, Z.A., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 8, 1138–1150 (2006)
Article Google Scholar
Lintean, M.C., Rus, V.: Measuring semantic similarity in short texts through greedy pairing and word semantics. In: FLAIRS Conference (2012)
Metzler, D., Dumais, S., Meek, C.: Similarity measures for short segments of text. In: European Conference on Information Retrieval, pp. 16–27. Springer, Berlin (2007)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. AAAI 6, 775–780 (2006)
Google Scholar
Šarić, F., Glavaš, G., Karan, M., Šnajder, J., Bašić, B.D.: Takelab: systems for measuring semantic text similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Association for Computational Linguistics, pp. 441–448 (2012)
Shajalal, Md., Ullah, M.Z., Chy, A.N., Aono N.: Query subtopic diversification based on cluster ranking and semantic features. In: Advanced Informatics: Concepts, Theory And Application (ICAICTA), 2016 International Conference On, IEEE, pp. 1–6 (2016)
Tibshirani, Robert: Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(3), 273–282 (2011)
Article MathSciNet Google Scholar
Zhang, Z. , Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: Proceedings of the IEEE International Conference on Computer Vision, p. 4166–4174 (2015)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Mathematics, Bangladesh Agricultural University, 2202, Mymensingh, Bangladesh
Md. Shajalal
Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi, Japan
Masaki Aono

Authors

Md. Shajalal
View author publications
You can also search for this author in PubMed Google Scholar
Masaki Aono
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md. Shajalal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shajalal, M., Aono, M. Semantic textual similarity between sentences using bilingual word semantics. Prog Artif Intell 8, 263–272 (2019). https://doi.org/10.1007/s13748-019-00180-4

Download citation

Received: 04 December 2018
Accepted: 02 March 2019
Published: 09 March 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s13748-019-00180-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic textual similarity between sentences using bilingual word semantics

Abstract

Access this article

Similar content being viewed by others

SimiT: A Text Similarity Method Using Lexicon and Dependency Representations

Hybrid Conceptual and Statistical Measure for Semantic Textual Similarity Evaluation

Cross-Lingual Semantic Textual Similarity Modeling Using Neural Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic textual similarity between sentences using bilingual word semantics

Abstract

Access this article

Similar content being viewed by others

SimiT: A Text Similarity Method Using Lexicon and Dependency Representations

Hybrid Conceptual and Statistical Measure for Semantic Textual Similarity Evaluation

Cross-Lingual Semantic Textual Similarity Modeling Using Neural Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation