Abstract
The difficulty of disambiguating the sense of the incomplete and imprecise keywords that are extensively used in the search queries has caused the failure of search systems to retrieve the desired information. One of the most powerful and promising method to overcome this shortcoming and improve the performance of search engines is Query Expansion, whereby the user’s original query is augmented by new keywords that best characterize the user’s information needs and produce more useful query. In this paper, a new Firefly Algorithm-based approach is proposed to enhance the retrieval effectiveness of query expansion while maintaining low computational complexity. In contrast to the existing literature, the proposed approach uses a Firefly Algorithm to find the best expanded query among a set of expanded query candidates. Moreover, this new approach allows the determination of the length of the expanded query empirically. Experimental results on MEDLINE, the on-line medical information database, show that our proposed approach is more effective and efficient compared to the state-of-the-art.
Similar content being viewed by others
References
Ahmad, F., and Kondrak, G., Learning a spelling error model from search query logs. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 955–962. Association for Computational Linguistics (2005)
Alweshah, M., and Abdullah, S., Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems. Appl. Soft Comput. 35:513–524, 2015.
Attardi, G., Atzori, L., and Simi, M., Index expansion for machine reading and question answering. In: CLEF 2012 Evaluation Labs and Workshop, Online Working Notes (2012)
Baykasoğlu, A., and Ozsoydan, F. B., An improved firefly algorithm for solving dynamic multidimensional knapsack problems. Expert Systems with Applications 41(8):3712–3725, 2014.
Bernardini, A., Carpineto, C., and D’Amico, M., Full-subtopic retrieval with keyphrase-based search results clustering. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, pp. 206–213. IEEE (2009)
Bindal, A. K., and Sanyal, S., Query optimization in context of pseudo relevant documents. In: 3rd Italian Information Retrieval (IIR) workshop (2012)
Blum, C., and Roli, A., Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Comput. Surv. 35(3):268–308, 2003.
Brajevic, I., and Tuba, M., Cuckoo search and firefly algorithm applied to multilevel image thresholding. In: Cuckoo Search and Firefly Algorithm, pp. 115–139. Springer (2014)
Cao, G., Nie, J. Y., Gao, J., and Robertson, S., Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 243–250. ACM (2008)
Carpineto, C., and Romano, G., Concept Data Analysis: Theory and Applications John Wiley & Sons, 2004.
Carpineto, C., and Romano, G., A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1):1–50, 2012.
Chen, Q., Li, M., and Zhou, M., Improving query spelling correction using web search results. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 181–189. Association for Computational Linguistics (2007)
Crestani, F., Application of spreading activation techniques in information retrieval. Artif. Intell. Rev. 11(6): 453–482 , 1997.
Deep, K., and Bansal, J. C., Mean particle swarm optimisation for function optimisation. International Journal of Computational Intelligence Studies 1(1):72–92, 2009.
Dillon, J. V., and Collins-Thompson, K., A unified optimization framework for robust pseudo-relevance feedback algorithms. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1069–1078. ACM (2010)
Eisenstein, J., Xing, E. P., Smith, N. A., and O’Connor, B., Mapping the geographical diffusion of new words. Tech rep (2012)
Gao, K., Zhang, Y., Zhang, D., and Lin, S., Accurate off-line query expansion for large-scale mobile visual search. Signal Process. 93(8):2305–2315, 2013.
Jouglet, A., and Carlier, J., Dominance rules in combinatorial optimization problems. Eur. J. Oper. Res. 212(3):433–444, 2011.
Karthikeyan, S., Asokan, P., and Nickolas, S., A hybrid discrete firefly algorithm for multi-objective flexible job shop scheduling problem with limited resource constraints. Int. J. Adv. Manuf. Technol. 72(9-12):1567–1579, 2014.
Kennedy, J., Particle swarm optimization. In: Encyclopedia of Machine Learning, pp. 760–766. Springer (2011)
Kennedy, J., Kennedy, J. F., Eberhart, R. C., and Shi, Y., Swarm Intelligence Morgan Kaufmann (2001)
Kirkpatrick, S., Optimization by simulated annealing: Quantitative studies. J. Stat. Phys. 34(5-6):975–986, 1984.
Lee, A., and Chau, M., The impact of query suggestion in e-commerce websites. In: E-Life: Web-Enabled Convergence of Commerce, Work, and Social Life 10th Workshop on E-Business, WEB 2011, pp. 248–254 (2011)
Lee, K. S., and Croft, W. B., A deterministic resampling method using overlapping document clusters for pseudo-relevance feedback. Inf. Process. Manag. 49(4):792–806, 2013.
Lei, X., Wang, F., Wu, F. X., Zhang, A., and Pedrycz, W., Protein complex identification through markov clustering with firefly algorithm on dynamic protein–protein interaction networks. Inf. Sci. 329:303–316, 2016.
Leturia, I., Gurrutxaga, A., Areta, N., Alegria, I., and Ezeiza, A., Morphological query expansion and language-filtering words for improving basque web retrieval. Lang. Resour. Eval. 47(2):425–448, 2013.
Long, N. C., and Meesad, P., An optimal design for type–2 fuzzy logic system using hybrid of chaos firefly algorithm and genetic algorithm and its application to sea level prediction. J. Intell. Fuzzy Syst. 27(3):1335–1346, 2014.
Lv, Y., Zhai, C., and Chen, W., A boosting approach to improving pseudo-relevance feedback. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 165–174. ACM (2011)
Martinez, D., Otegi, A., Soroa, A., and Agirre, E., Improving search over electronic health records using umls-based query expansion through random walks. J. Biomed. Inform. 51:100–106, 2014.
Melucci, M., A basis for information retrieval in context. ACM Trans. Inf. Syst. 26(3):14:1–14:41, 2008.
Miao, J., Huang, J. X., and Ye, Z., Proximity-based rocchio’s model for pseudo relevance. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 535–544. ACM (2012)
Mitchell, M., An Introduction to Genetic Algorithms MIT press (1998)
Robertson, S., and Zaragoza, H., The Probabilistic Relevance Framework: BM25 and Beyond Now Publishers Inc (2009)
Robertson, S. E., and Jones, K. S., Relevance weighting of search terms. J. Am. Soc. Inf. Sci. 27(3):129–146, 1976.
Robertson, S. E., Walker, S., Beaulieu, M., Gatford, M., and Payne, A., Okapi at trec-4. In: Proceedings of the 4th Text Retrieval Conference, pp. 73–97 (1995)
Rocchio, J. J., Relevance feedback in information retrieval. In: Salton, G. (Ed.) The SMART Retrieval System - Experiments in Automatic Document Processing, pp. 313–323 (1971)
Sahlgren, M., An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE (2005)
Subramaniam, L. V., Roy, S., Faruquie, T. A., and Negi, S., A survey of types of text noise and techniques to handle noisy text. In: Proceedings of The 3rd Workshop on Analytics for Noisy Unstructured Text Data, pp. 115–122. ACM (2009)
Sun, H. m., A study of the features of internet english from the linguistic perspective. Tex. Stud. Lit. Lang. 1(7):98, 2010.
Véronis, J., Hyperlex: Lexical cartography for information retrieval. Comput. Speech Lang. 18(3):223–252, 2004.
Williams, H. E., and Zobel, J., Searchable words on the web. Int. J. Digit. Libr. 5(2):99–105.
Wong, S. K., Ziarko, W., Raghavan, V. V., and Wong, P. C., On modeling of information retrieval concepts in vector spaces. ACM Trans. Database Syst. 12(2):299–321, 1987.
Xie, H., Zhang, Y., Tan, J., Guo, L., and Li, J., Contextual query expansion for image retrieval. IEEE Trans. Multimedia 16(4):1104–1114, 2014.
Yang, X. S., Nature-Inspired Metaheuristic Algorithms Luniver Press (2008)
Yang, X. S., Firefly algorithms for multimodal optimization. In: Proceedings of the 5th International Conference on Stochastic Algorithms: Foundations and Applications, pp. 169–178 (2009)
Yang, X. S., Nature-Inspired Metaheuristic Algorithms: 2nd Edn Luniver Press (2010)
Yang, X. S., Nature-inspired optimization algorithms Elsevier (2014)
Ye, Z., and Huang, J. X., A simple term frequency transformation model for effective pseudo relevance feedback. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 323–332. ACM (2014)
Zeng, Y., Zhang, Z., and Kusiak, A., Predictive modeling and optimization of a multi-zone hvac system with data mining and firefly algorithms. Energy 86:393–402, 2015.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Systems-Level Quality Improvement
Rights and permissions
About this article
Cite this article
Khennak, I., Drias, H. A Firefly Algorithm-based Approach for Pseudo-Relevance Feedback: Application to Medical Database. J Med Syst 40, 240 (2016). https://doi.org/10.1007/s10916-016-0603-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-016-0603-5