An Evolutionary Approach in Information Retrieval

  • T. Amghar
  • B. Levrat
  • F. Saubion
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3991)


One critical step in information retrieval is the skimming of the returned documents, considered as globally relevant by an Information retrieval system as responses to a user’s query. This skimming has generally to be done in order to find the parts of the returned documents which contain the information satisfying the user’s information need. This task may be particularly heavy when only small parts of the returned documents are related to the asked topic. Therefore, our proposition here is to substitute an automatic extraction and recomposition process in order to provide the user with synthetic documents, called here composite documents, made of parts of documents extracted from the set of documents returned as responses to a query. The composite documents are built in such a way that they summarize as concisely as possible the various aspects of relevant information for the query and which are initially scattered among the returned documents. Due to the combinatorial cost of the recomposition process, we use a genetic algorithm whose individuals are texts and that aims at optimizing a satisfaction criterion based on similarity. We have implemented several variants of the algorithm and we proposed an analysis of the first experimental results which seems promising for a preliminary work.


Genetic Algorithm Information Retrieval Good Individual Information Retrieval System Initial Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press/Addison-Wesley (1999)Google Scholar
  2. 2.
    Goldberg, D.E.: Genetic Algorithms for Search, Optimization, and Machine Learning. Addison-Wesley, Reading (1989)Google Scholar
  3. 3.
    Hearst, M.A.: Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1), 33–64 (1997)Google Scholar
  4. 4.
    Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)Google Scholar
  5. 5.
    De Jong, K.A.: An analysis of the behavior of a class of genetic adaptive systems. Phd thesis, University of Michigan (1975)Google Scholar
  6. 6.
    Lesk, M.: Practical Digital Libraries: Books, Bytes, and Bucks. Morgan Kaufmann, San Francisco (1997)Google Scholar
  7. 7.
    Michalewicz, Z.: Genetic algorithms + data structures = evolution programs, 3rd edn. Springer, Heidelberg (1996)MATHGoogle Scholar
  8. 8.
    Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Commun. ACM 26(11), 1022–1036 (1983)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Salton, G., Lesk, M.: Computer evaluation of indexing and text processing. J. ACM 15(1), 8–36 (1968)MATHCrossRefGoogle Scholar
  10. 10.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold (1994)Google Scholar
  11. 11.
    Michael Wong, S.K., Ziarko, W., Wong, P.C.N.: Generalized vector space model in information retrieval. In: SIGIR, pp. 18–25 (1985)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • T. Amghar
    • 1
  • B. Levrat
    • 1
  • F. Saubion
    • 1
  1. 1.LERIAUniversité d’AngersAngersFrance

Personalised recommendations