Abstract
We address the problem of information sources selection in a context of a large number of distributed sources. We formulate the sources selection problem as a combinatorial optimization problem in order to yield the best set of relevant information sources for a given query. We define a solution as a combination of sources among a huge predefined set of sources. We propose a genetic algorithm to tackle the issue by maximizing the similarity between a selection and the query. Extensive experiments were performed on databases of scientific research documents covering different domains such as computer science and medicine. The results based on the precision measure are very encouraging.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Java Genetic Algorithms and Genetic Programming (http://jgap.sourceforge.net/).
- 2.
- 3.
Indexing and information searching system, http://www.lemurproject.org/indri.
References
Callan, J.: Distributed information retrieval. In: Croft, W.B. (eds.): Advances in Information Retrieval, pp. 127–150. Kluwer Academic Publishers (2000)
Shokouhi, M., Si, L.: Federated search. J. Found. Tren. Inf. Ret. 5(1), 1–102 (2011)
Callan, J., Connell, M.: Query-based sampling of text databases. ACM Trans. Inform. Syst. 19(2), 97–130 (2001)
Callan, J.P., Lu, Z., Bruce Croft, W.: Searching distributed collections with inference networks. In: 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21–28. ACM, New York (1995)
Thomas, P., Shokouhi, M.: SUSHI: scoring scaled samples for server selection, pp. 419–426. ACM SIGIR, Singapore, Singapore (2009)
Shokouhi, M., Zobel, J.: Robust result merging using sample-based score estimates. ACM Trans. Inform. Syst. 27(3), 1–29 (2009)
Cetintas, S., Si, L., Yuan, H.: Learning from past queries for resource selection. In: 18th ACM Conference on Information and Knowledge Management, pp. 1867–1870. ACM, New York (2009)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley Publishing Company, Boston (1989)
Eiben, A.E., Smith, J.E. (eds.): Introduction to Evolutionary Computing. Springer, Heidelberg (2007). ISBN 978-3-540-40184-1
Drias, H., Khennak, I., Boukhedra, A.: Hybrid genetic algorithm for large scale information retrieval, pp. 842–846. IEEE (2009)
Gravano, L., Ipeirotis, P., Sahami, M.: GlOSS: text-Source discovery over the Internet. ACM Trans. Inf. Syst. 24(2), 229–264 (1999)
Shokouhi, M.: Central-rank-based collection selection in uncooperative distributed information retrieval. In: 29th European Conference on Information Retrieval, Rome, Italy, pp. 160–172 (2007)
Markov, I., Azzopardi, L., Crestani, F.: Reducing the uncertainty in resource selection. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 507–519. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_43
Hong, D., Si, L., Bracke, P., Witt, M., Juchcinski, T.: A joint probabilistic classification model for resource selection. In: 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR, pp. 98–105 (2010)
Fujita, S.: Retrieval parameter optimization using genetic algorithms. Inf. Process. Manage. 45(6), 664–682 (2009)
Gordon, M.: Probabilistic and genetic algorithms for document retrieval. Commun. ACM 31(10), 1208–1218 (1988)
Ravi, S., Neeraja, G., Raju, V.: Search engine using evolutionary algorithms. Int. J. Com. Net. Sec. (IJCNS) 1(4), 39–44 (2012)
Al Mashagba, E., Al Mashagba, F., Nassar, M.O.: Query optimization using genetic algorithm in the vector space model. Int. J. Comp. Sci. 8(3), 450–457 (2011)
Sathya, A.S.S., Simon, B.P.: A document retrieval system with combination terms using genetic algorithm. J. Comp. Elect. Eng. 2(1), 1–6 (2010)
Ibrahim, N.A., Selamat, A., Selamat, M.H.: Query optimization in relevance feedback using hybrid GA-PSO for effective web information retrieval, pp. 91–96. IEEE (2009)
Araujo, L., Pérez–Iglesias, J.: Training a classifier for the selection of good query expansion terms with a genetic algorithm. In: IEEE Congress on Evolutionary Computation, Barcelona, pp. 1–8 (2010)
Araujo, L., Zaragoza, H., Pérez-Aguera, J.R., Pérez-Iglesias-Iglesias, J.: Structure of morphologically expanded queries: a genetic algorithm approach. Data Knowl. Eng. 69, 279–289 (2010)
Nhan, N.D., Son, V.T., Binh, H.T.T., Khanh, T.D.: Crawl topical vietnamese web pages using genetic algorithm. In: 2nd International on Knowledge and System Engineering, pp. 217–223 (2010)
Fan, H., Zeng, G., Li, X.: Crawling strategy of focused crawler based on niche genetic algorithm. In: 8th IEEE DASC, pp. 591–594 (2009)
Bhatnagar, P., Pareek, N.K.: A combined matching function based evolutionary approach for development of adaptive information retrieval system. J. Emerg. Tech. Adv. Eng. 2(6), 249–256 (2012)
Maleki-Dizaji, S., Siddiqi, J.I.A., Soltan-Zadeh, Y., Rahman, F.: Adaptive information retrieval system via modelling user behaviour. J. Ambient Intell. Humanized Comput. 5, 105–110 (2014)
Bouchachia, A., Lena, A., Vanaret, C.: Online and interactive self-adaptive learning of user profile using incremental evolutionary algorithms. Evolving Syst. 5(3), 143–157 (2014)
Kumar, R., Singh, S.K., Kumar, V.: A heuristic approach for search engine selection in meta-search engine. In: International Conference on Computing, Communication and Automation (ICCCA), Noida, pp. 865–869 (2015)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lebib, F.Z., Drias, H., Mellah, H. (2017). Selection of Information Sources Using a Genetic Algorithm. In: Rocha, Á., Correia, A., Adeli, H., Reis, L., Costanzo, S. (eds) Recent Advances in Information Systems and Technologies. WorldCIST 2017. Advances in Intelligent Systems and Computing, vol 569. Springer, Cham. https://doi.org/10.1007/978-3-319-56535-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-56535-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56534-7
Online ISBN: 978-3-319-56535-4
eBook Packages: EngineeringEngineering (R0)