Skip to main content

Selection of Information Sources Using a Genetic Algorithm

  • Conference paper
  • First Online:
Recent Advances in Information Systems and Technologies (WorldCIST 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 569))

Included in the following conference series:

  • 2565 Accesses

Abstract

We address the problem of information sources selection in a context of a large number of distributed sources. We formulate the sources selection problem as a combinatorial optimization problem in order to yield the best set of relevant information sources for a given query. We define a solution as a combination of sources among a huge predefined set of sources. We propose a genetic algorithm to tackle the issue by maximizing the similarity between a selection and the query. Extensive experiments were performed on databases of scientific research documents covering different domains such as computer science and medicine. The results based on the precision measure are very encouraging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Java Genetic Algorithms and Genetic Programming (http://jgap.sourceforge.net/).

  2. 2.

    https://www.sndl.cerist.dz/.

  3. 3.

    Indexing and information searching system, http://www.lemurproject.org/indri.

References

  1. Callan, J.: Distributed information retrieval. In: Croft, W.B. (eds.): Advances in Information Retrieval, pp. 127–150. Kluwer Academic Publishers (2000)

    Google Scholar 

  2. Shokouhi, M., Si, L.: Federated search. J. Found. Tren. Inf. Ret. 5(1), 1–102 (2011)

    Article  Google Scholar 

  3. Callan, J., Connell, M.: Query-based sampling of text databases. ACM Trans. Inform. Syst. 19(2), 97–130 (2001)

    Article  Google Scholar 

  4. Callan, J.P., Lu, Z., Bruce Croft, W.: Searching distributed collections with inference networks. In: 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21–28. ACM, New York (1995)

    Google Scholar 

  5. Thomas, P., Shokouhi, M.: SUSHI: scoring scaled samples for server selection, pp. 419–426. ACM SIGIR, Singapore, Singapore (2009)

    Google Scholar 

  6. Shokouhi, M., Zobel, J.: Robust result merging using sample-based score estimates. ACM Trans. Inform. Syst. 27(3), 1–29 (2009)

    Article  Google Scholar 

  7. Cetintas, S., Si, L., Yuan, H.: Learning from past queries for resource selection. In: 18th ACM Conference on Information and Knowledge Management, pp. 1867–1870. ACM, New York (2009)

    Google Scholar 

  8. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley Publishing Company, Boston (1989)

    MATH  Google Scholar 

  9. Eiben, A.E., Smith, J.E. (eds.): Introduction to Evolutionary Computing. Springer, Heidelberg (2007). ISBN 978-3-540-40184-1

    MATH  Google Scholar 

  10. Drias, H., Khennak, I., Boukhedra, A.: Hybrid genetic algorithm for large scale information retrieval, pp. 842–846. IEEE (2009)

    Google Scholar 

  11. Gravano, L., Ipeirotis, P., Sahami, M.: GlOSS: text-Source discovery over the Internet. ACM Trans. Inf. Syst. 24(2), 229–264 (1999)

    Google Scholar 

  12. Shokouhi, M.: Central-rank-based collection selection in uncooperative distributed information retrieval. In: 29th European Conference on Information Retrieval, Rome, Italy, pp. 160–172 (2007)

    Google Scholar 

  13. Markov, I., Azzopardi, L., Crestani, F.: Reducing the uncertainty in resource selection. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 507–519. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_43

    Chapter  Google Scholar 

  14. Hong, D., Si, L., Bracke, P., Witt, M., Juchcinski, T.: A joint probabilistic classification model for resource selection. In: 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR, pp. 98–105 (2010)

    Google Scholar 

  15. Fujita, S.: Retrieval parameter optimization using genetic algorithms. Inf. Process. Manage. 45(6), 664–682 (2009)

    Article  Google Scholar 

  16. Gordon, M.: Probabilistic and genetic algorithms for document retrieval. Commun. ACM 31(10), 1208–1218 (1988)

    Article  Google Scholar 

  17. Ravi, S., Neeraja, G., Raju, V.: Search engine using evolutionary algorithms. Int. J. Com. Net. Sec. (IJCNS) 1(4), 39–44 (2012)

    Google Scholar 

  18. Al Mashagba, E., Al Mashagba, F., Nassar, M.O.: Query optimization using genetic algorithm in the vector space model. Int. J. Comp. Sci. 8(3), 450–457 (2011)

    Google Scholar 

  19. Sathya, A.S.S., Simon, B.P.: A document retrieval system with combination terms using genetic algorithm. J. Comp. Elect. Eng. 2(1), 1–6 (2010)

    Google Scholar 

  20. Ibrahim, N.A., Selamat, A., Selamat, M.H.: Query optimization in relevance feedback using hybrid GA-PSO for effective web information retrieval, pp. 91–96. IEEE (2009)

    Google Scholar 

  21. Araujo, L., Pérez–Iglesias, J.: Training a classifier for the selection of good query expansion terms with a genetic algorithm. In: IEEE Congress on Evolutionary Computation, Barcelona, pp. 1–8 (2010)

    Google Scholar 

  22. Araujo, L., Zaragoza, H., Pérez-Aguera, J.R., Pérez-Iglesias-Iglesias, J.: Structure of morphologically expanded queries: a genetic algorithm approach. Data Knowl. Eng. 69, 279–289 (2010)

    Article  Google Scholar 

  23. Nhan, N.D., Son, V.T., Binh, H.T.T., Khanh, T.D.: Crawl topical vietnamese web pages using genetic algorithm. In: 2nd International on Knowledge and System Engineering, pp. 217–223 (2010)

    Google Scholar 

  24. Fan, H., Zeng, G., Li, X.: Crawling strategy of focused crawler based on niche genetic algorithm. In: 8th IEEE DASC, pp. 591–594 (2009)

    Google Scholar 

  25. Bhatnagar, P., Pareek, N.K.: A combined matching function based evolutionary approach for development of adaptive information retrieval system. J. Emerg. Tech. Adv. Eng. 2(6), 249–256 (2012)

    Google Scholar 

  26. Maleki-Dizaji, S., Siddiqi, J.I.A., Soltan-Zadeh, Y., Rahman, F.: Adaptive information retrieval system via modelling user behaviour. J. Ambient Intell. Humanized Comput. 5, 105–110 (2014)

    Article  Google Scholar 

  27. Bouchachia, A., Lena, A., Vanaret, C.: Online and interactive self-adaptive learning of user profile using incremental evolutionary algorithms. Evolving Syst. 5(3), 143–157 (2014)

    Article  Google Scholar 

  28. Kumar, R., Singh, S.K., Kumar, V.: A heuristic approach for search engine selection in meta-search engine. In: International Conference on Computing, Communication and Automation (ICCCA), Noida, pp. 865–869 (2015)

    Google Scholar 

  29. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatma Zohra Lebib .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lebib, F.Z., Drias, H., Mellah, H. (2017). Selection of Information Sources Using a Genetic Algorithm. In: Rocha, Á., Correia, A., Adeli, H., Reis, L., Costanzo, S. (eds) Recent Advances in Information Systems and Technologies. WorldCIST 2017. Advances in Intelligent Systems and Computing, vol 569. Springer, Cham. https://doi.org/10.1007/978-3-319-56535-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56535-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56534-7

  • Online ISBN: 978-3-319-56535-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics