Abstract
Clustering is by far the most commonly used unsupervised data mining techniques for discovering interesting knowledge and patterns. It aims to group a set of data objects into clusters that are coherent internally but basically different from each other. In this work, we involve clustering algorithms in Information Retrieval (IR) to strengthen the user’s original query with appropriate additional terms and return more relevant information. The overall procedure consists of the following steps: (i) use the k-medoids clustering algorithm to group terms into clusters with similar characteristics, (ii) involve k-means algorithm to calculate the centroid of query terms, (iii) select the relevant clusters to the original query and return the expansion term candidates, (iv) evaluate the expansion term candidates to the centroid and add the best ones to the original query, (v) run a search with the expanded query. We present numerical experiments based on real data from a large online health database. The results of our numerical testing demonstrate the effectiveness of the proposed method compared to prior state-of-the-art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bernhard, D.: Query expansion based on pseudo relevance feedback from definition clusters. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 54–62. Association for Computational Linguistics (2010)
Chifu, A.G., Hristea, F., Mothe, J., Popescu, M.: Word sense discrimination in information retrieval: a spectral clustering-based approach. Inf. Process. Manag. 51(2), 16–31 (2015)
Gao, K., Zhang, Y., Zhang, D., Lin, S.: Accurate off-line query expansion for large-scale mobile visual search. Sig. Process. 93(8), 2305–2315 (2013)
Gao, L., Lu, Y., Zhang, Q., Yang, H., Hu, Y.: Query expansion for exploratory search with subtopic discovery in community question answering. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4715–4720. IEEE (2016)
Grigoras, G., Scarlatache, F.: An assessment of the renewable energy potential using a clustering based data mining method. Case study in Romania. Energy 81, 416–429 (2015)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
Hou, J., Li, L., He, J.: Detection of grapevine leafroll disease based on 11-index imagery and ant colony clustering algorithm. Precision Agric. 17(4), 488–505 (2016)
Jun, S., Park, S.S., Jang, D.S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst. Appl. 41(7), 3204–3212 (2014)
Karaa, W.B.A., Ashour, A.S., Sassi, D.B., Roy, P., Kausar, N., Dey, N.: MEDLINE text mining: an enhancement genetic algorithm based approach for document clustering. In: Hassanien, A.-E., Grosan, C., Fahmy Tolba, M. (eds.) Applications of Intelligent Optimization in Biology and Medicine. ISRL, vol. 96, pp. 267–287. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21212-8_12
Karol, S., Mangat, V.: Evaluation of text document clustering approach based on particle swarm optimization. Open Comput. Sci. 3(2), 69–90 (2013)
Kathuria, A., Jansen, B.J., Hafernik, C., Spink, A.: Classifying the user intent of web queries using k-means clustering. Internet Res. 20(5), 563–581 (2010)
Khanmohammadi, S., Adibeig, N., Shanehbandy, S.: An improved overlapping k-means clustering method for medical applications. Expert Syst. Appl. 67, 12–18 (2017)
Liao, K., Liu, G., Xiao, L., Liu, C.: A sample-based hierarchical adaptive k-means clustering method for large-scale video retrieval. Knowl. Based Syst. 49, 123–133 (2013)
Lin, C.H., Chen, C.C., Lee, H.L., Liao, J.R.: Fast k-means algorithm based on a level histogram for image retrieval. Expert Syst. Appl. 41(7), 3276–3283 (2014)
Najafabadi, M.K., Mahrin, M.N., Chuprat, S., Sarkan, H.M.: Improving the accuracy of collaborative filtering recommendations using clustering and association rules mining on implicit data. Comput. Hum. Behav. 67, 113–128 (2017)
Robertson, S., Zaragoza, H., et al.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retrieval 3(4), 333–389 (2009)
Saraiva, P.C., Cavalcanti, J.M., de Moura, E.S., Gonçalves, M.A., Torres, R.D.S.: A multimodal query expansion based on genetic programming for visually-oriented e-commerce applications. Inf. Process. Manag. 52(5), 783–800 (2016)
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)
Younus, Z.S., et al.: Content-based image retrieval using pso and k-means clustering algorithm. Arab. J. Geosci. 8(8), 6211–6224 (2015)
Zhong, X., Enke, D.: A comprehensive cluster and classification mining procedure for daily stock market return forecasting. Neurocomputing 267, 152–168 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Khennak, I., Drias, H., Kechid, A., Moulai, H. (2019). Clustering Algorithms for Query Expansion Based Information Retrieval. In: Nguyen, N., Chbeir, R., Exposito, E., Aniorté, P., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2019. Lecture Notes in Computer Science(), vol 11684. Springer, Cham. https://doi.org/10.1007/978-3-030-28374-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-28374-2_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28373-5
Online ISBN: 978-3-030-28374-2
eBook Packages: Computer ScienceComputer Science (R0)