Skip to main content

(1+1)-Evolutionary Gradient Strategy to Evolve Global Term Weights in Information Retrieval

  • Conference paper
  • First Online:
Advances in Computational Intelligence Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 513))

  • 1242 Accesses

Abstract

In many contexts of Information Retrieval (IR), term weights play an important role in retrieving the relevant documents responding to users’ queries. The term weight measures the importance or the information content of a keyword existing in the documents in the IR system. The term weight can be divided into two parts, the Global Term Weight (GTW) and the Local Term Weight (LTW). The GTW is a value assigned to each index term to indicate the topic of the documents. It has the discrimination value of the term to discriminate between documents in the same collection. The LTW is a value that measures the contribution of the index term in the document. This paper proposes an approach, based on an evolutionary gradient strategy, for evolving the Global Term Weights (GTWs) of the collection and using Term Frequency-Average Term Occurrence (TF-ATO) as the Local Term Weights (LTWs). This approach reduces the problem size for the term weights evolution which reduces the computational time helping to achieve an improved IR effectiveness compared to other Evolutionary Computation (EC) approaches in the literature. The paper also investigates the limitation that the relevance judgment can have in this approach by conducting two sets of experiments, for partially and fully evolved GTWs. The proposed approach outperformed the Okapi BM25 and TF-ATO with DA weighting schemes methods in terms of Mean Average Precision (MAP), Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. 20 Newsgroups Document Collection: http://qwone.com/~jason/20Newsgroups/. Accessed 2015

  2. Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)

    Article  Google Scholar 

  3. Arnold, D.V., Salomon, R.: Evolutionary gradient search revisited. IEEE Trans. Evol. Comput. 11(4), 480–495 (2007)

    Article  Google Scholar 

  4. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval—The Concepts and Technology Behind Search, 2nd edn. Pearson Education Ltd., Harlow, England (2011)

    Google Scholar 

  5. Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.: Bias and the limits of pooling for large collections. Inf. Retriev. 10(6), 491–508 (2007)

    Article  Google Scholar 

  6. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22Nd International Conference on Machine Learning. pp. 89–96. ICML ’05, ACM, New York, NY, USA (2005)

    Google Scholar 

  7. Cordon, O., Herrera-Viedma, E., Lopez-Pujalte, C., Luque, M., Zarco, C.: A review on the application of evolutionary computation to information retrieval. Int. J. Approx. Reason. 34, 241–264 (2003) (Soft Computing Applications to Intelligent Information Retrieval on the Internet)

    Google Scholar 

  8. Cummins, R.: The evolution and analysis of term-weighting schemes in information retrieval. Ph.D. thesis, National University of Ireland, Galway (May 2008)

    Google Scholar 

  9. Cummins, R., O’Riordan, C.: Evolving local and global weighting schemes in information retrieval. Inf. Retriev. 9(3), 311–330 (2006)

    Article  Google Scholar 

  10. Document Collections From University Of Glasgow: http://ir.dcs.gla.ac.uk/resources/test_collections/. Accessed 2015

  11. Doornik, J.A.: An improved ziggurat method to generate normal random samples (2005)

    Google Scholar 

  12. Escalante, H.J., Garcia-Limon, M.A., Morales-Reyes, A., Graff, M., y Gomez, M.M., Morales, E.F., Martinez-Carranza. J. Knowl.-Based Syst. 83, 176–189 (2015)

    Google Scholar 

  13. Fan, W., Gordon, M.D., Pathak, P.: Personalization of search engine services for effective retrieval and knowledge management. In: Proceedings of the Twenty First International Conference on Information Systems. pp. 20–34. ICIS ’00, Association for Information Systems, Atlanta, GA, USA (2000)

    Google Scholar 

  14. Gordon, M.: Probabilistic and genetic algorithms in document retrieval. Commun. ACM 31(10), 1208–1218 (1988)

    Article  Google Scholar 

  15. Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: Ohsumed: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 192–201. SIGIR ’94, Springer-Verlag New York, Inc., New York, NY, USA (1994)

    Google Scholar 

  16. Ibrahim, O., Landa-Silva, D.: Term frequency with average term occurrences for textual information retrieval. Soft Comput. 20(8), 3045–3061 (2016)

    Article  Google Scholar 

  17. Ibrahim, O.A.S., Landa-Silva, D.: A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections. In: Computational Intelligence (UKCI), 2014 14th UK Workshop on. pp. 1–8 (Sept 2014)

    Google Scholar 

  18. Kuo, R., Zulvia, F.E.: The gradient evolution algorithm. Inf. Sci. 316(C), 246–265 (Sept 2015)

    Google Scholar 

  19. Kwok, K.L.: Comparing representations in Chinese information retrieval. In: SIGIR ’97 Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 34–41. ACM, New York, NY, USA (1997)

    Google Scholar 

  20. Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retriev. 3(3), 225–331 (2009)

    Article  Google Scholar 

  21. Loshchilov, I.: A computationally efficient limited memory cma-es for large scale optimization. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. pp. 397–404. GECCO ’14, ACM, New York, NY, USA (2014)

    Google Scholar 

  22. MacFarlane, A., Tuson, A.: Local search: a guide for the information retrieval practitioner. Inf. Process. Manag. 45(1), 159–174 (2009)

    Article  Google Scholar 

  23. Oren, N.: Reexamining tf.idf based information retrieval with genetic programming. In: Proceedings of the 2002 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on Enablement Through Technology. pp. 224–234. SAICSIT ’02, South African Institute for Computer Scientists and Information Technologists, Republic of South Africa (2002)

    Google Scholar 

  24. PĂ©rez-Iglesias, J., PĂ©rez-AgĂĽera, J.R., Fresno, V., Feinstein, Y.Z.: Integrating the probabilistic models BM25/BM25F into lucene. CoRR abs/0911.5046 (2009)

    Google Scholar 

  25. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retriev. 3(4), 333–389 (2009)

    Article  Google Scholar 

  26. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  27. Shaw, W.M., Wood, J.B., Wood, R.E., Tibbo, T.R.: The cystic fibrosis database: content and research opportunities. Library Inf. Sci. Res. 13(4), 347–366 (1991)

    Google Scholar 

  28. Smucker, M.D., Kazai, G., Lease, M.: Overview of the trec 2012 crowdsourcing track. Technical report, DTIC Document (2012)

    Google Scholar 

  29. Tonon, A., Demartini, G., Cudr-Mauroux, P.: Pooling-based continuous evaluation of information retrieval systems. Inf. Retrieval J. 18(5), 445–472 (2015)

    Article  Google Scholar 

  30. Voorhees, E.M.: Overview of the trec 2004 robust retrieval track (2004)

    Google Scholar 

  31. Vrajitoru, D.: Crossover improvement for the genetic algorithm in information retrieval. Inf. Process. Manag. 34(4), 405–415 (1998)

    Article  Google Scholar 

  32. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osman Ali Sadek Ibrahim .

Editor information

Editors and Affiliations

Appendix: Detailed Results of the Experimental Study

Appendix: Detailed Results of the Experimental Study

Tables 6, 7, 8 and 9 show the Average Precision (AP), Mean Average Precision (MAP) of Okapi BM25, TF-ATO with DA and the proposed approach in fully and partially experiments.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ibrahim, O.A.S., Landa-Silva, D. (2017). (1+1)-Evolutionary Gradient Strategy to Evolve Global Term Weights in Information Retrieval. In: Angelov, P., Gegov, A., Jayne, C., Shen, Q. (eds) Advances in Computational Intelligence Systems. Advances in Intelligent Systems and Computing, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-319-46562-3_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46562-3_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46561-6

  • Online ISBN: 978-3-319-46562-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics