An Ensemble Similarity Model for Short Text Retrieval

  • Arifah Che AlhadiEmail author
  • Aziz Deraman
  • Masita@Masila Abdul Jalil
  • Wan Nural Jawahir Wan Yussof
  • Akashah Amin Mohamed
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10404)


The rapid growth of World Wide Web has extended Information Retrieval related technology such as queries for information needs become more easily accessible. One such platform is online question answering (QA). Online community can posting questions and get direct response for their special information needs using various platforms. It creates large unorganized repositories of valuable knowledge resources. Effective QA retrieval is required to make these repositories accessible to fulfill users information requests quickly. The repositories might contained similar questions and answer to users newly asked question. This paper explores the similarity-based models for the QA system to rank search result candidates. We used Damerau-Levenshtein distance and cosine similarity model to obtain ranking scores between the question posted by the registered user and a similar candidate questions in repository. Empirical experimental results indicate that our proposed ensemble models are very encouraging and give a significantly better similarity value to improve search ranking results.


Ensemble similarity model Damerau-Levenshtein Distance Cosine Information retrieval 


  1. 1.
    Anson, S., Watson, H., Wadhwa, K., Metz, K.: Analysing social media data for disaster preparedness: understanding the opportunities and barriers faced by humanitarian actors. Int. J. Disaster Risk Reduction 21, 131–139 (2017)CrossRefGoogle Scholar
  2. 2.
    Bard, G.V.: Spelling-error tolerant, order-independent pass-phrases via the damerau-levenshtein string-edit distance metric. In: Proceedings of the Fifth Australasian Symposium on ACSW Frontiers, ACSW 2007, vol. 68, pp. 117–124. Australian Computer Society Inc., Darlinghurst, Australia (2007)Google Scholar
  3. 3.
    Boom, C.D., Canneyt, S.V., Bohez, S., Demeester, T., Dhoedt, B.: Learning semantic similarity for very short texts. CoRR abs/1512.00765 (2015)Google Scholar
  4. 4.
    Chen, H.: String Metric and Word Similarity applied to Information Retrieval. Master’s thesis, School of Computing. University of Eastern Findland (2012)Google Scholar
  5. 5.
    Cong, G., Wang, L., Lin, C.Y., Song, Y.I., Sun, Y.: Finding question-answer pairs from online forums. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 467–474, SIGIR 2008 (2008)Google Scholar
  6. 6.
    Cucerzan, S., Brill, E.: Spelling correction as an iterative process that exploits the collective knowledge of web users. In: Proceedings of EMNLP 4, 293–300 (2004)Google Scholar
  7. 7.
    Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)CrossRefGoogle Scholar
  8. 8.
    Duan, H., Hsu, B.J.P.: Online spelling correction for query completion. In: Proceedings of the 20th International Conference on World Wide Web, pp. 117–126, WWW 2011, USA. ACM, New York (2011)Google Scholar
  9. 9.
    Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323(C), 130–142 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Gomaa, W.H., Fahmy, A.A.: Article: a survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)Google Scholar
  11. 11.
    Hu, X., Tang, L., Tang, J., Liu, H.: Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 537–546, WSDM 2013, USA. ACM, New York (2013)Google Scholar
  12. 12.
    Jeon, J., Croft, W.B., Lee, J.H.: Finding similar questions in large question and answer archives. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 84–90, CIKM 2005, NY, USA. ACM, New York (2005)Google Scholar
  13. 13.
    Lhoussain, A.S., Hicham, G., Abdellah, Y.: Adaptating the levenshtein distance to contextual spelling correction. Int. J. Comput. Sci. Appl. 12(1), 127–133 (2015)Google Scholar
  14. 14.
    Li, Y., Duan, H., Zhai, C.: A generalized hidden Markov model with discriminative training for query spelling correction. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 611–620, SIGIR 2012, USA. ACM, New York (2012)Google Scholar
  15. 15.
    Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)Google Scholar
  16. 16.
    Lochter, J.V., Zanetti, R.F., Reller, D., Almeida, T.A.: Short text opinion detection using ensemble of classifiers and semantic indexing. Expert Syst. Appl. 62, 243–249 (2016)CrossRefGoogle Scholar
  17. 17.
    Martínez-Cámara, E., Montejo-Ráez, A., Martín-Valdivia, M.T., Ureña López, L.A.: Sinai: machine learning and emotion of the crowd for sentiment analysis in microblogs. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), vol. 2, Proceedings of the Seventh International Workshop on Semantic Evaluation, pp. 402–407, SemEval 2013. Association for Computational Linguistics, Atlanta, Georgia, USA, June 2013Google Scholar
  18. 18.
    Martins, B., Silva, M.J.: Spelling correction for search engine queries. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 372–383. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30228-5_33 CrossRefGoogle Scholar
  19. 19.
    Metzler, D., Dumais, S., Meek, C.: Similarity measures for short segments of text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-71496-5_5 CrossRefGoogle Scholar
  20. 20.
    Noah, S.A., Amruddin, A.Y., Omar, N.: Semantic similarity measures for malay sentences. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 117–126. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-77094-7_19 CrossRefGoogle Scholar
  21. 21.
    Saif, H., Fernandez, M., He, Y., Alani, H.: On stopwords, filtering and data sparsity for sentiment analysis of twitter. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik, Iceland, May 2014Google Scholar
  22. 22.
    Shtok, A., Dror, G., Maarek, Y., Szpektor, I.: Learning from the past: answering new questions with past answers. In: Proceedings of the 21st International Conference on World Wide Web, pp. 759–768, WWW 2012 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Arifah Che Alhadi
    • 1
    Email author
  • Aziz Deraman
    • 1
  • Masita@Masila Abdul Jalil
    • 1
  • Wan Nural Jawahir Wan Yussof
    • 1
  • Akashah Amin Mohamed
    • 1
  1. 1.School of Informatics and Applied MathematicsUniversiti Malaysia TerengganuKuala NerusMalaysia

Personalised recommendations