Skip to main content

Clustering Algorithms for Query Expansion Based Information Retrieval

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11684))

Included in the following conference series:

  • 1802 Accesses

Abstract

Clustering is by far the most commonly used unsupervised data mining techniques for discovering interesting knowledge and patterns. It aims to group a set of data objects into clusters that are coherent internally but basically different from each other. In this work, we involve clustering algorithms in Information Retrieval (IR) to strengthen the user’s original query with appropriate additional terms and return more relevant information. The overall procedure consists of the following steps: (i) use the k-medoids clustering algorithm to group terms into clusters with similar characteristics, (ii) involve k-means algorithm to calculate the centroid of query terms, (iii) select the relevant clusters to the original query and return the expansion term candidates, (iv) evaluate the expansion term candidates to the centroid and add the best ones to the original query, (v) run a search with the expanded query. We present numerical experiments based on real data from a large online health database. The results of our numerical testing demonstrate the effectiveness of the proposed method compared to prior state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bernhard, D.: Query expansion based on pseudo relevance feedback from definition clusters. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 54–62. Association for Computational Linguistics (2010)

    Google Scholar 

  2. Chifu, A.G., Hristea, F., Mothe, J., Popescu, M.: Word sense discrimination in information retrieval: a spectral clustering-based approach. Inf. Process. Manag. 51(2), 16–31 (2015)

    Article  Google Scholar 

  3. Gao, K., Zhang, Y., Zhang, D., Lin, S.: Accurate off-line query expansion for large-scale mobile visual search. Sig. Process. 93(8), 2305–2315 (2013)

    Article  Google Scholar 

  4. Gao, L., Lu, Y., Zhang, Q., Yang, H., Hu, Y.: Query expansion for exploratory search with subtopic discovery in community question answering. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4715–4720. IEEE (2016)

    Google Scholar 

  5. Grigoras, G., Scarlatache, F.: An assessment of the renewable energy potential using a clustering based data mining method. Case study in Romania. Energy 81, 416–429 (2015)

    Article  Google Scholar 

  6. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)

    MATH  Google Scholar 

  7. Hou, J., Li, L., He, J.: Detection of grapevine leafroll disease based on 11-index imagery and ant colony clustering algorithm. Precision Agric. 17(4), 488–505 (2016)

    Article  Google Scholar 

  8. Jun, S., Park, S.S., Jang, D.S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst. Appl. 41(7), 3204–3212 (2014)

    Article  Google Scholar 

  9. Karaa, W.B.A., Ashour, A.S., Sassi, D.B., Roy, P., Kausar, N., Dey, N.: MEDLINE text mining: an enhancement genetic algorithm based approach for document clustering. In: Hassanien, A.-E., Grosan, C., Fahmy Tolba, M. (eds.) Applications of Intelligent Optimization in Biology and Medicine. ISRL, vol. 96, pp. 267–287. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21212-8_12

    Chapter  Google Scholar 

  10. Karol, S., Mangat, V.: Evaluation of text document clustering approach based on particle swarm optimization. Open Comput. Sci. 3(2), 69–90 (2013)

    Article  Google Scholar 

  11. Kathuria, A., Jansen, B.J., Hafernik, C., Spink, A.: Classifying the user intent of web queries using k-means clustering. Internet Res. 20(5), 563–581 (2010)

    Article  Google Scholar 

  12. Khanmohammadi, S., Adibeig, N., Shanehbandy, S.: An improved overlapping k-means clustering method for medical applications. Expert Syst. Appl. 67, 12–18 (2017)

    Article  Google Scholar 

  13. Liao, K., Liu, G., Xiao, L., Liu, C.: A sample-based hierarchical adaptive k-means clustering method for large-scale video retrieval. Knowl. Based Syst. 49, 123–133 (2013)

    Article  Google Scholar 

  14. Lin, C.H., Chen, C.C., Lee, H.L., Liao, J.R.: Fast k-means algorithm based on a level histogram for image retrieval. Expert Syst. Appl. 41(7), 3276–3283 (2014)

    Article  Google Scholar 

  15. Najafabadi, M.K., Mahrin, M.N., Chuprat, S., Sarkan, H.M.: Improving the accuracy of collaborative filtering recommendations using clustering and association rules mining on implicit data. Comput. Hum. Behav. 67, 113–128 (2017)

    Article  Google Scholar 

  16. Robertson, S., Zaragoza, H., et al.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retrieval 3(4), 333–389 (2009)

    Article  Google Scholar 

  17. Saraiva, P.C., Cavalcanti, J.M., de Moura, E.S., Gonçalves, M.A., Torres, R.D.S.: A multimodal query expansion based on genetic programming for visually-oriented e-commerce applications. Inf. Process. Manag. 52(5), 783–800 (2016)

    Article  Google Scholar 

  18. Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)

    Article  Google Scholar 

  19. Younus, Z.S., et al.: Content-based image retrieval using pso and k-means clustering algorithm. Arab. J. Geosci. 8(8), 6211–6224 (2015)

    Article  Google Scholar 

  20. Zhong, X., Enke, D.: A comprehensive cluster and classification mining procedure for daily stock market return forecasting. Neurocomputing 267, 152–168 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilyes Khennak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khennak, I., Drias, H., Kechid, A., Moulai, H. (2019). Clustering Algorithms for Query Expansion Based Information Retrieval. In: Nguyen, N., Chbeir, R., Exposito, E., Aniorté, P., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2019. Lecture Notes in Computer Science(), vol 11684. Springer, Cham. https://doi.org/10.1007/978-3-030-28374-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28374-2_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28373-5

  • Online ISBN: 978-3-030-28374-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics