Sentence similarity based on semantic kernels for intelligent text retrieval

Amir, Samir; Tanasescu, Adrian; Zighed, Djamel A.

doi:10.1007/s10844-016-0434-3

Sentence similarity based on semantic kernels for intelligent text retrieval

Published: 28 November 2016

Volume 48, pages 675–689, (2017)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Samir Amir¹,
Adrian Tanasescu¹ &
Djamel A. Zighed¹

757 Accesses
14 Citations
Explore all metrics

Abstract

We propose a new approach to compute semantic similarity between sentences. It is based on the semantic kernel, composed of subject, verb, and object that, we suppose, summarize the general meaning of each sentence. Thanks to linguistics resources available such as Stanford Parser, many features are then extracted from the semantic kernels and aggregated by mean of weights. The weighting is produced by a supervised machine learning technique on a training data set provided by human experts as ground truth. The cross validation shows good performances. Thanks to this similarity measure between sentences, one can build an intelligent text retrieval engine more sensitive to the semantic content, specifically suited for short texts than the classical methods based on bag of words. An application is being developed for highlighting parts of speech in scientific articles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Approach to Semantic Text Similarity Computing

A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents

Robust semantic text similarity using LSA, machine learning, and linguistic resources

Article 30 October 2015

Notes

http://lsa.colorado.edu/

References

Breaux, H.J. (1968). A modification of efroymson’s technique for stepwise regression analysis. Communications of the ACM, 11(8), 556–558.
Article Google Scholar
Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computer Linguistic, 32(1), 13–47.
Article MATH Google Scholar
Che, L.M., Wei, C.J., Cheng, H.T., Hui, C.H., & Chen, C.H. (2012). A sentence similarity metric based on semantic patterns. Advances in Information Sciences and Service Sciences, 4(1), 576–585.
Google Scholar
Croft, D., Coupland, S., Shell, J., & Brown, S. (2013). A fast and efficient semantic short text similarity metric.
De Boni, M., & Manandhar, S. (2003). The use of sentence similarity as a semantic relevance metric for question answering. In New directions in question answering, papers from 2003 AAAI spring symposium (pp. 138–144). Stanford: Stanford University.
de Marneffe, M.-C., & Manning, C.D. (2008). The stanford typed dependencies representation. In Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, CrossParser ’08 (pp. 1–8). Stroudsburg: Association for Computational Linguistics.
Hardin, J.W., & Hilbe, J. (2001). Generalized linear models and extensions. College station: Stata Press.
MATH Google Scholar
Hatzlvassiloglou, V., Klavans, J.L., & Eskin, E. (1999). Detecting text similarity over short passages:Exploring linguistic feature combinations via machine learning. In 1999 Joint SIGDAT conference on empirical methods in natural language processing and very large corpora (pp. 203–212).
Heidinger, V. (1984). Analyzing Syntax and Semantics: Workbook: Gallaudet university press.
Hirst, G., & St-Onge, D. (1994). WORDNET: A Lexical database for English. In Human language technology, proceedings of a workshop held at plainsboro, New Jersey, USA, March 8-11.
Hirst, G., & St Onge, D. (1998). Lexical Chains as representation of context for the detection and correction malapropisms: The MIT Press.
Islam, A., & Inkpen, D. (2008). Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions Knowledge Discovery Data, 2(2), 10:1-10:25.
Google Scholar
Jurafsky, D., & Martin, J.H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 1st edn. Upper Saddle River: Prentice Hall PTR.
Google Scholar
Landauer, T.K., Foltz, P.W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25, 259–284.
Article Google Scholar
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Kleef, P.V., Auer, S., & Bizer, C. (2015). Dbpedia- A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2), 167–195.
Google Scholar
Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., & Crockett, K. (2006). Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8), 1138–1150.
Article Google Scholar
Oliva, J., Serrano, J.I., Dolores del Castillo, M., & Iglesias, A. (2011). Symss: A syntax-based measure for short-text semantic similarity. Data Knowledge Engineering, 70(4), 390–405.
Article Google Scholar
O’Shea, J., Bandar, Z., Crockett, K.A., & McLean, D. (2008). A comparative study of two short text semantic similarity measures. In Proceedings onAgent and multi-agent systems: Technologies and applications, second KES international symposium, KES-AMSTA 2008, incheon, korea, march 26-28, 2008 (pp. 172–181).
O’shea, J., Bandar, Z., & Crockett, K. (2014). A new benchmark dataset with production methodology for short text semantic similarity algorithms. ACM Transactions Speech Language Processing, 10(4), 19:1–19:63.
Google Scholar
Rakesh, P., Shivapratap, G., Divya, G., & Soman, K.P. (2009). Evaluation of svd and nmf methods for latent semantic analysis. International Journal of Recent Trends in Engineering, 1(3).
Salton, G., Wong, A., & Yang, C. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Article MATH Google Scholar
Salton, G., & McGill, M. (1984). Introduction to Modern Information Retrieval: McGraw-Hill Book Company.
Spaeth, A., & Desmarais, M.C. (2013). Combining collaborative filtering and text similarity for expert profile recommendations in social websites. In Proceedings on User modeling, adaptation, and personalization - 21th international conference, UMAP 2013, rome, Italy, June 10-14, 2013 (pp. 178–189).
Tsatsaronis, G., Varlamis, I., & Vazirgiannis, Michalis (2010). Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research, 37(1), 1–40.
MATH Google Scholar
Winkler, W.E. (1999). The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Census Bureau.

Download references

Author information

Authors and Affiliations

Institut des Sciences de l’Homme, 14 Avenue Berthelot, 69007, Lyon, France
Samir Amir, Adrian Tanasescu & Djamel A. Zighed

Authors

Samir Amir
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Tanasescu
View author publications
You can also search for this author in PubMed Google Scholar
Djamel A. Zighed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samir Amir.

Appendix

Table 4 The benchmark used for the first experiment (O’Shea et al. 2008)

Full size table

Table 5 The benchmark used for the second experiment (O’shea et al. 2014)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amir, S., Tanasescu, A. & Zighed, D.A. Sentence similarity based on semantic kernels for intelligent text retrieval. J Intell Inf Syst 48, 675–689 (2017). https://doi.org/10.1007/s10844-016-0434-3

Download citation

Received: 19 January 2016
Revised: 06 October 2016
Accepted: 10 October 2016
Published: 28 November 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10844-016-0434-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentence similarity based on semantic kernels for intelligent text retrieval

Abstract

Access this article

Similar content being viewed by others

An Approach to Semantic Text Similarity Computing

A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents

Robust semantic text similarity using LSA, machine learning, and linguistic resources

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sentence similarity based on semantic kernels for intelligent text retrieval

Abstract

Access this article

Similar content being viewed by others

An Approach to Semantic Text Similarity Computing

A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents

Robust semantic text similarity using LSA, machine learning, and linguistic resources

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation