Ameliorating Search Results Recommendation System Based on K-Means Clustering Algorithm and Distance Measurements

  • Marwa Massaâbi
  • Olfa Layouni
  • Jalel Akaichi
Part of the Lecture Notes in Social Networks book series (LNSN)


Due to the technological progress and the continuous upload on the Web, an enormous amount of documents has been accumulating. This accumulation became an issue since it makes the data big and its mining difficult. Therefore, the focus of this work is the extraction of useful data in terms of quality and time by ameliorating search results. In this paper, we propose a framework that eliminates the duplications in the first place, then making use of a clustering algorithm combined with a distance measure filters and classifies the results in order to reduce the amount of documents efficiently and gain in terms of documents quality and search time. The proposed architecture is based on k-means clustering algorithm and the cosine similarity measure. The system showed encouraging results.


  1. 1.
    Al-Anazi, S., AlMahmoud, H., Al-Turaiki, I.: Finding similar documents using different clustering techniques. Proc. Comput. Sci. 82, 28–34 (2016)CrossRefGoogle Scholar
  2. 2.
    Bao, X., Dai, S., Zhang, N., Yu, C.: Large-scale text similarity computing with spark. Int. J. Grid Distr. Comput. 9(4), 95–100 (2016)CrossRefGoogle Scholar
  3. 3.
    Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. J. Biomed. Inform. 44(1), 118–125 (2011)CrossRefGoogle Scholar
  4. 4.
    Bsoul, Q., Salim, J., Zakaria, L.Q.: Document clustering approach to detect crime. World Appl. Sci. J. 34(8), 1026–1036 (2016)Google Scholar
  5. 5.
    Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman, Burlington (2002).
  6. 6.
    Fetterly, D., Manasse, M., Najork, M.: On the evolution of clusters of near-duplicate web pages. J. Web Eng. 2(4), 228–246 (2003)Google Scholar
  7. 7.
    Fu, Z., Wu, X., Guan, C., Sun, X., Ren, K.: Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans. Inf. Forensics Secur. 11(12), 2706–2716 (2016)CrossRefGoogle Scholar
  8. 8.
    Huang, A.: Similarity measures for text document clustering. In: Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, pp. 49–56 (2008)Google Scholar
  9. 9.
    Huang, L., Milne, D., Frank, E., Witten, I.H.: Learning a concept-based document similarity measure. J. Am. Soc. Inf. Sci. Technol. 63(8), 1593–1608 (2012)CrossRefGoogle Scholar
  10. 10.
    Islam, A., Milios, E., Kešelj, V.: Text similarity using google tri-grams. In: Canadian Conference on Artificial Intelligence, pp. 312–317. Springer, Berlin (2012)CrossRefGoogle Scholar
  11. 11.
    Kozorovitzky, A.K., Kurland, O.: From Identical to Similar: Fusing Retrieved Lists Based on Inter-Document Similarities. Springer, Berlin (2009)Google Scholar
  12. 12.
    Mei, J., Islam, A., Milios, E.: DalGTM at SemEval-2016 task 1: importance-aware compositional approach to short text similarity. In: Proceedings of SemEval, pp. 765–770 (2016)Google Scholar
  13. 13.
    Metzler, D., Dumais, S., Meek, C.: Similarity Measures for Short Segments of Text. Springer, Berlin (2007)Google Scholar
  14. 14.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)Google Scholar
  15. 15.
    Nagaraj, R., Kalarani, X.A.: Semantically document clustering using contextual similarities. Int. J. Appl. Eng. Res. 11(1), 71–76 (2016)Google Scholar
  16. 16.
    Nalawade, R., Samal, A., Avhad, K.: Improved similarity measure for text classification and clustering (2016)Google Scholar
  17. 17.
    Pasari, R., Chaudhari, V., Borkar, A., Joshi, A.: Parallelization of vertical search engine using Hadoop and MapReduce. In: Proceedings of the International Conference on Advances in Information Communication Technology & Computing, p. 51. ACM, New York (2016)Google Scholar
  18. 18.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–43 (2014)Google Scholar
  19. 19.
    Pereira, Á.R., Ziviani, N.: Retrieving similar documents from the web. J. Web Eng. 2(4), 247–261 (2003)Google Scholar
  20. 20.
    Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar
  21. 21.
    Reddy, G., Krishnaiah, R.: Clustering algorithm with a novel similarity measure. IOSR J. Comput. Eng. 4(6), 37–42 (2012)CrossRefGoogle Scholar
  22. 22.
    Rophie, A.S., Anitha, A.: User preferences based personalized search engine (2016)Google Scholar
  23. 23.
    Saraçoglu, R., Allahverdi, N.: A study on finding similar document with multiple categories. In: Proceedings of World Academy of Science, Engineering and Technology, vol. 80, p. 837. World Academy of Science, Engineering and Technology (WASET) (2013)Google Scholar
  24. 24.
    Sharma, R., Gulati, N.: Improving the accuracy and reducing the redundancy in data mining. Int. J. Eng. Sci. 45–75 (2016)Google Scholar
  25. 25.
    Slimani, T., BenYaghlane, B., Mellouli, K.: Une extension de mesure de similarité entre les concepts dune ontologie. In: International Conference on Sciences of Electronic, Technologies of Information and Telecommunications, pp. 1–10 (2007)Google Scholar
  26. 26.
    Song, W., Li, C.H., Park, S.C.: Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Syst. Appl. 36(5), 9095–9104 (2009)CrossRefGoogle Scholar
  27. 27.
    Ventresque, A.: Une mesure de similarité sémantique utilisant des résultats de psychologie. In: COnférence en Recherche d’Infomations et Applications-CORIA 2006, pp. 371–376 (2006)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Marwa Massaâbi
    • 1
  • Olfa Layouni
    • 1
  • Jalel Akaichi
    • 2
  1. 1.BESTMOD LaboratoryUniversity of Tunis, Institut Supérieur de Gestion de TunisTunisTunisia
  2. 2.College of Computer ScienceKing Khalid UniversityAbhaSaudi Arabia

Personalised recommendations