Clustering Algorithms for Query Expansion Based Information Retrieval

Khennak, Ilyes; Drias, Habiba; Kechid, Amine; Moulai, Hadjer

doi:10.1007/978-3-030-28374-2_23

Ilyes Khennak¹³,
Habiba Drias¹³,
Amine Kechid¹³ &
…
Hadjer Moulai¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11684))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1802 Accesses

Abstract

Clustering is by far the most commonly used unsupervised data mining techniques for discovering interesting knowledge and patterns. It aims to group a set of data objects into clusters that are coherent internally but basically different from each other. In this work, we involve clustering algorithms in Information Retrieval (IR) to strengthen the user’s original query with appropriate additional terms and return more relevant information. The overall procedure consists of the following steps: (i) use the k-medoids clustering algorithm to group terms into clusters with similar characteristics, (ii) involve k-means algorithm to calculate the centroid of query terms, (iii) select the relevant clusters to the original query and return the expansion term candidates, (iv) evaluate the expansion term candidates to the centroid and add the best ones to the original query, (v) run a search with the expanded query. We present numerical experiments based on real data from a large online health database. The results of our numerical testing demonstrate the effectiveness of the proposed method compared to prior state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bernhard, D.: Query expansion based on pseudo relevance feedback from definition clusters. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 54–62. Association for Computational Linguistics (2010)
Google Scholar
Chifu, A.G., Hristea, F., Mothe, J., Popescu, M.: Word sense discrimination in information retrieval: a spectral clustering-based approach. Inf. Process. Manag. 51(2), 16–31 (2015)
Article Google Scholar
Gao, K., Zhang, Y., Zhang, D., Lin, S.: Accurate off-line query expansion for large-scale mobile visual search. Sig. Process. 93(8), 2305–2315 (2013)
Article Google Scholar
Gao, L., Lu, Y., Zhang, Q., Yang, H., Hu, Y.: Query expansion for exploratory search with subtopic discovery in community question answering. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4715–4720. IEEE (2016)
Google Scholar
Grigoras, G., Scarlatache, F.: An assessment of the renewable energy potential using a clustering based data mining method. Case study in Romania. Energy 81, 416–429 (2015)
Article Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
MATH Google Scholar
Hou, J., Li, L., He, J.: Detection of grapevine leafroll disease based on 11-index imagery and ant colony clustering algorithm. Precision Agric. 17(4), 488–505 (2016)
Article Google Scholar
Jun, S., Park, S.S., Jang, D.S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst. Appl. 41(7), 3204–3212 (2014)
Article Google Scholar
Karaa, W.B.A., Ashour, A.S., Sassi, D.B., Roy, P., Kausar, N., Dey, N.: MEDLINE text mining: an enhancement genetic algorithm based approach for document clustering. In: Hassanien, A.-E., Grosan, C., Fahmy Tolba, M. (eds.) Applications of Intelligent Optimization in Biology and Medicine. ISRL, vol. 96, pp. 267–287. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21212-8_12
Chapter Google Scholar
Karol, S., Mangat, V.: Evaluation of text document clustering approach based on particle swarm optimization. Open Comput. Sci. 3(2), 69–90 (2013)
Article Google Scholar
Kathuria, A., Jansen, B.J., Hafernik, C., Spink, A.: Classifying the user intent of web queries using k-means clustering. Internet Res. 20(5), 563–581 (2010)
Article Google Scholar
Khanmohammadi, S., Adibeig, N., Shanehbandy, S.: An improved overlapping k-means clustering method for medical applications. Expert Syst. Appl. 67, 12–18 (2017)
Article Google Scholar
Liao, K., Liu, G., Xiao, L., Liu, C.: A sample-based hierarchical adaptive k-means clustering method for large-scale video retrieval. Knowl. Based Syst. 49, 123–133 (2013)
Article Google Scholar
Lin, C.H., Chen, C.C., Lee, H.L., Liao, J.R.: Fast k-means algorithm based on a level histogram for image retrieval. Expert Syst. Appl. 41(7), 3276–3283 (2014)
Article Google Scholar
Najafabadi, M.K., Mahrin, M.N., Chuprat, S., Sarkan, H.M.: Improving the accuracy of collaborative filtering recommendations using clustering and association rules mining on implicit data. Comput. Hum. Behav. 67, 113–128 (2017)
Article Google Scholar
Robertson, S., Zaragoza, H., et al.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retrieval 3(4), 333–389 (2009)
Article Google Scholar
Saraiva, P.C., Cavalcanti, J.M., de Moura, E.S., Gonçalves, M.A., Torres, R.D.S.: A multimodal query expansion based on genetic programming for visually-oriented e-commerce applications. Inf. Process. Manag. 52(5), 783–800 (2016)
Article Google Scholar
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)
Article Google Scholar
Younus, Z.S., et al.: Content-based image retrieval using pso and k-means clustering algorithm. Arab. J. Geosci. 8(8), 6211–6224 (2015)
Article Google Scholar
Zhong, X., Enke, D.: A comprehensive cluster and classification mining procedure for daily stock market return forecasting. Neurocomputing 267, 152–168 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Research in Artificial Intelligence, USTHB, Algiers, Algeria
Ilyes Khennak, Habiba Drias, Amine Kechid & Hadjer Moulai

Authors

Ilyes Khennak
View author publications
You can also search for this author in PubMed Google Scholar
Habiba Drias
View author publications
You can also search for this author in PubMed Google Scholar
Amine Kechid
View author publications
You can also search for this author in PubMed Google Scholar
Hadjer Moulai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilyes Khennak .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
University of Pau and Pays de l'Adour, Pau, France
Richard Chbeir
University of Pau and Pays de l'Adour, Pau, France
Ernesto Exposito
University of Pau and Pays de l'Adour, Pau, France
Philippe Aniorté
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khennak, I., Drias, H., Kechid, A., Moulai, H. (2019). Clustering Algorithms for Query Expansion Based Information Retrieval. In: Nguyen, N., Chbeir, R., Exposito, E., Aniorté, P., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2019. Lecture Notes in Computer Science(), vol 11684. Springer, Cham. https://doi.org/10.1007/978-3-030-28374-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-28374-2_23
Published: 09 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28373-5
Online ISBN: 978-3-030-28374-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics