(1+1)-Evolutionary Gradient Strategy to Evolve Global Term Weights in Information Retrieval

Ibrahim, Osman Ali Sadek; Landa-Silva, Dario

doi:10.1007/978-3-319-46562-3_25

Osman Ali Sadek Ibrahim^6,7 &
Dario Landa-Silva⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 513))

1242 Accesses

Abstract

In many contexts of Information Retrieval (IR), term weights play an important role in retrieving the relevant documents responding to users’ queries. The term weight measures the importance or the information content of a keyword existing in the documents in the IR system. The term weight can be divided into two parts, the Global Term Weight (GTW) and the Local Term Weight (LTW). The GTW is a value assigned to each index term to indicate the topic of the documents. It has the discrimination value of the term to discriminate between documents in the same collection. The LTW is a value that measures the contribution of the index term in the document. This paper proposes an approach, based on an evolutionary gradient strategy, for evolving the Global Term Weights (GTWs) of the collection and using Term Frequency-Average Term Occurrence (TF-ATO) as the Local Term Weights (LTWs). This approach reduces the problem size for the term weights evolution which reduces the computational time helping to achieve an improved IR effectiveness compared to other Evolutionary Computation (EC) approaches in the literature. The paper also investigates the limitation that the relevance judgment can have in this approach by conducting two sets of experiments, for partially and fully evolved GTWs. The proposed approach outperformed the Okapi BM25 and TF-ATO with DA weighting schemes methods in terms of Mean Average Precision (MAP), Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

20 Newsgroups Document Collection: http://qwone.com/~jason/20Newsgroups/. Accessed 2015
Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)
Article Google Scholar
Arnold, D.V., Salomon, R.: Evolutionary gradient search revisited. IEEE Trans. Evol. Comput. 11(4), 480–495 (2007)
Article Google Scholar
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval—The Concepts and Technology Behind Search, 2nd edn. Pearson Education Ltd., Harlow, England (2011)
Google Scholar
Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.: Bias and the limits of pooling for large collections. Inf. Retriev. 10(6), 491–508 (2007)
Article Google Scholar
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22Nd International Conference on Machine Learning. pp. 89–96. ICML ’05, ACM, New York, NY, USA (2005)
Google Scholar
Cordon, O., Herrera-Viedma, E., Lopez-Pujalte, C., Luque, M., Zarco, C.: A review on the application of evolutionary computation to information retrieval. Int. J. Approx. Reason. 34, 241–264 (2003) (Soft Computing Applications to Intelligent Information Retrieval on the Internet)
Google Scholar
Cummins, R.: The evolution and analysis of term-weighting schemes in information retrieval. Ph.D. thesis, National University of Ireland, Galway (May 2008)
Google Scholar
Cummins, R., O’Riordan, C.: Evolving local and global weighting schemes in information retrieval. Inf. Retriev. 9(3), 311–330 (2006)
Article Google Scholar
Document Collections From University Of Glasgow: http://ir.dcs.gla.ac.uk/resources/test_collections/. Accessed 2015
Doornik, J.A.: An improved ziggurat method to generate normal random samples (2005)
Google Scholar
Escalante, H.J., Garcia-Limon, M.A., Morales-Reyes, A., Graff, M., y Gomez, M.M., Morales, E.F., Martinez-Carranza. J. Knowl.-Based Syst. 83, 176–189 (2015)
Google Scholar
Fan, W., Gordon, M.D., Pathak, P.: Personalization of search engine services for effective retrieval and knowledge management. In: Proceedings of the Twenty First International Conference on Information Systems. pp. 20–34. ICIS ’00, Association for Information Systems, Atlanta, GA, USA (2000)
Google Scholar
Gordon, M.: Probabilistic and genetic algorithms in document retrieval. Commun. ACM 31(10), 1208–1218 (1988)
Article Google Scholar
Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: Ohsumed: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 192–201. SIGIR ’94, Springer-Verlag New York, Inc., New York, NY, USA (1994)
Google Scholar
Ibrahim, O., Landa-Silva, D.: Term frequency with average term occurrences for textual information retrieval. Soft Comput. 20(8), 3045–3061 (2016)
Article Google Scholar
Ibrahim, O.A.S., Landa-Silva, D.: A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections. In: Computational Intelligence (UKCI), 2014 14th UK Workshop on. pp. 1–8 (Sept 2014)
Google Scholar
Kuo, R., Zulvia, F.E.: The gradient evolution algorithm. Inf. Sci. 316(C), 246–265 (Sept 2015)
Google Scholar
Kwok, K.L.: Comparing representations in Chinese information retrieval. In: SIGIR ’97 Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 34–41. ACM, New York, NY, USA (1997)
Google Scholar
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retriev. 3(3), 225–331 (2009)
Article Google Scholar
Loshchilov, I.: A computationally efficient limited memory cma-es for large scale optimization. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. pp. 397–404. GECCO ’14, ACM, New York, NY, USA (2014)
Google Scholar
MacFarlane, A., Tuson, A.: Local search: a guide for the information retrieval practitioner. Inf. Process. Manag. 45(1), 159–174 (2009)
Article Google Scholar
Oren, N.: Reexamining tf.idf based information retrieval with genetic programming. In: Proceedings of the 2002 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on Enablement Through Technology. pp. 224–234. SAICSIT ’02, South African Institute for Computer Scientists and Information Technologists, Republic of South Africa (2002)
Google Scholar
Pérez-Iglesias, J., Pérez-Agüera, J.R., Fresno, V., Feinstein, Y.Z.: Integrating the probabilistic models BM25/BM25F into lucene. CoRR abs/0911.5046 (2009)
Google Scholar
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retriev. 3(4), 333–389 (2009)
Article Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Shaw, W.M., Wood, J.B., Wood, R.E., Tibbo, T.R.: The cystic fibrosis database: content and research opportunities. Library Inf. Sci. Res. 13(4), 347–366 (1991)
Google Scholar
Smucker, M.D., Kazai, G., Lease, M.: Overview of the trec 2012 crowdsourcing track. Technical report, DTIC Document (2012)
Google Scholar
Tonon, A., Demartini, G., Cudr-Mauroux, P.: Pooling-based continuous evaluation of information retrieval systems. Inf. Retrieval J. 18(5), 445–472 (2015)
Article Google Scholar
Voorhees, E.M.: Overview of the trec 2004 robust retrieval track (2004)
Google Scholar
Vrajitoru, D.: Crossover improvement for the genetic algorithm in information retrieval. Inf. Process. Manag. 34(4), 405–415 (1998)
Article Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

ASAP Research Group, School of Computer Science, The University of Nottingham, Nottingham, UK
Osman Ali Sadek Ibrahim & Dario Landa-Silva
CS Department, Minia University, Al-minya, Egypt
Osman Ali Sadek Ibrahim

Authors

Osman Ali Sadek Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
Dario Landa-Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osman Ali Sadek Ibrahim .

Editor information

Editors and Affiliations

School of Computing and Communications, Lancaster University Bailrigg School of Computing and Communications, Lancaster, United Kingdom
Plamen Angelov
School of Computing, University of Portsmouth School of Computing, Portsmouth, Hampshire, United Kingdom
Alexander Gegov
School of Comp. Sci. & Digital Media, Robert Gordon University School of Comp. Sci. & Digital Media, Aberdeen, United Kingdom
Chrisina Jayne
Ins. of Mathematics, Physics & Comp. Sci, Aberystwyth University Ins. of Mathematics, Physics & Comp. Sci, Aberystwyth, United Kingdom
Qiang Shen

Appendix: Detailed Results of the Experimental Study

Tables 6, 7, 8 and 9 show the Average Precision (AP), Mean Average Precision (MAP) of Okapi BM25, TF-ATO with DA and the proposed approach in fully and partially experiments.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ibrahim, O.A.S., Landa-Silva, D. (2017). (1+1)-Evolutionary Gradient Strategy to Evolve Global Term Weights in Information Retrieval. In: Angelov, P., Gegov, A., Jayne, C., Shen, Q. (eds) Advances in Computational Intelligence Systems. Advances in Intelligent Systems and Computing, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-319-46562-3_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-46562-3_25
Published: 07 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46561-6
Online ISBN: 978-3-319-46562-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

(1+1)-Evolutionary Gradient Strategy to Evolve Global Term Weights in Information Retrieval

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Detailed Results of the Experimental Study

Appendix: Detailed Results of the Experimental Study

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation