Abstract
Short text similarity deals with determining the closeness of two text mean the same thing by lexical or semantic. Various short text similarity approaches have been proposed which are based on lexical matching, semantic knowledge background or combining models. Lexical based model does not capture the actual meaning behind the words. However, semantic approach are relying on knowledge background or corpus which cannot be assumed to be available in handling such huge new word of data sparseness and noise in short text. This work are focusing on lexical-based similarity models for analysing the unstructured short text. The term-based and edit distance model are used in comparing the applicability of these model to compute the similarity value of short text. The experimental results shows that each model have their key strengths and limitations in computing similarity value of short text.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alabbas, M., Ramsay, A.: Natural language inference for Arabic using extended tree edit distance with subtrees. J. Artif. Intell. Res. 48, 1–22 (2013)
Anson, S., Watson, H., Wadhwa, K., Metz, K.: Analysing social media data for disaster preparedness: understanding the opportunities and barriers faced by humanitarian actors. Int. J. Disaster Risk Reduction 21, 131–139 (2017)
Boom, C.D., Canneyt, S.V., Bohez, S., Demeester, T., Dhoedt, B.: Learning semantic similarity for very short texts. CoRR abs/1512.00765 (2015)
Cong, G., Wang, L., Lin, C.Y., Song, Y.I., Sun, Y.: Finding question-answer pairs from online forums. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 467–474 (2008)
El-Shishtawy, T.: A hybrid algorithm for matching Arabic names. CoRR abs/1309.5657 (2013)
EnglishPractice.com: Writing similar sentences (2019). https://www.englishpractice.com/
Ferreira, R., Lins, R.D., Simske, S.J., Freitas, F., Riss, M.: Assessing sentence similarity through lexical, syntactic and semantic analysis. Comput. Speech Lang. 39(C), 1–28 (2016)
Gali, N., Mariescu-Istodor, R., FrÃnti, P.: Similarity measures for title matching. In: 23rd International Conference on Pattern Recognition (ICPR), pp. 1548–1553, December 2016
Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323(C), 130–142 (2015)
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Harispe, S., Ranwez, S., Janaqi, S., Montmain, J.: Semantic similarity from natural language and ontology analysis. CoRR abs/1704.05295 (2017)
Hasan, A.A., Tiun, S., Yusof, M.M., Mokhtar, U.A., Jambari, D.I.: Enhanced feature for short document classification. J. Eng. Appl. Sci. 12(13), 3534–3540 (2017)
Hu, X., Tang, L., Tang, J., Liu, H.: Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 537–546. ACM, New York (2013)
Huang, A.: Similarity measures for text document clustering. In: Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC 2008), Christchurch, pp. 49–56 (2008)
Jiang, Y., Li, G., Feng, J., Li, W.S.: String similarity joins: an experimental evaluation. Proc. VLDB Endow. 7(8), 625–636 (2014)
Jones, R., Bartz, K., Subasic, P., Rey, B.: Automatically generating related queries in Japanese. Lang. Resour. Eval. 40(3), 219–232 (2006)
Lee, D., Park, J., Shim, J., Lee, S.G.: Efficient filtering techniques for cosine similarity joins. Inf. Int. Interdisc. J. 14, 1265 (2011)
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC 1986, pp. 24–26. ACM, New York (1986)
Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)
Lochter, J.V., Zanetti, R.F., Reller, D., Almeida, T.A.: Short text opinion detection using ensemble of classifiers and semantic indexing. Expert Syst. Appl. 62, 243–249 (2016)
Ma, W., Suel, T.: Structural sentence similarity estimation for short texts. In: Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2016, Key Largo, 16–18 May 2016, pp. 232–237 (2016)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Martínez-Cámara, E., Montejo-Ráez, A., Martín-Valdivia, M.T.: Ureña López, L.A.: SINAI: machine learning and emotion of the crowd for sentiment analysis in microblogs. In: Second Joint Conference on Lexical and Computational Semantics (*SEM). Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 402–407. Association for Computational Linguistics, Atlanta, June 2013
Metzler, D., Dumais, S., Meek, C.: Similarity measures for short segments of text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71496-5_5
Nakov, P., et al.: Developing a successful SemEval task in sentiment analysis of Twitter and other social media texts. Lang. Resour. Eval. 50(1), 35–65 (2016)
Noah, S.A., Amruddin, A.Y., Omar, N.: Semantic similarity measures for Malay sentences. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 117–126. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77094-7_19
Noah, S.A., Omar, N., Amruddin, A.Y.: Evaluation of lexical-based approaches to the semantic similarity of Malay sentences. J. Quantit. Linguist. 22(2), 135–156 (2015)
Rizzo Irfan, M., Fauzi, M., Tibyani, T., Dyah Mentari, N.: Twitter sentiment analysis on 2013 curriculum using ensemble features and k-nearest neighbor. Int. J. Electr. Comput. Eng. (IJECE) 8, 5409 (2018)
Rong, C., Silva, Y.N., Li, C.: String similarity join with different similarity thresholds based on novel indexing techniques. Front. Comput. Sci. 11(2), 307–319 (2017)
Saif, H., Fernandez, M., He, Y., Alani, H.: On stopwords, filtering and data sparsity for sentiment analysis of Twitter. In: Calzolari, N., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik, May 2014
Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 373–382. ACM, New York (2015)
Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. J. Artif. Int. Res. 37(1), 1–40 (2010)
Varnhagen, C.K., McFall, G.P., Pugh, N., Routledge, L., Sumida-MacDonald, H., Kwong, T.E.: lol: new language and spelling in instant messaging. Read. Writ. 23(6), 719–733 (2010)
Wenyin, L., Quan, X., Feng, M., Qiu, B.: A short text modeling method combining semantic and statistical information. Inf. Sci. 180(20), 4031–4041 (2010)
Yan, L., Zheng, Y., Cao, J.: Few-shot learning for short text classification. Multimed. Tools Appl. 77(22), 29799–29810 (2018)
Acknowledgment
This research is sponsored by the Ministry of Higher Education, under the Fundamental Research Grants Scheme vot 59467.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Che Alhadi, A., Deraman, A., Abdul Jalil, M., Wan Yussof, W.N.J., Mohd Noah, S.A. (2019). Short Text Computing Based on Lexical Similarity Model. In: Damaševičius, R., Vasiljevienė, G. (eds) Information and Software Technologies. ICIST 2019. Communications in Computer and Information Science, vol 1078. Springer, Cham. https://doi.org/10.1007/978-3-030-30275-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-30275-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30274-0
Online ISBN: 978-3-030-30275-7
eBook Packages: Computer ScienceComputer Science (R0)