Abstract
In many contexts of Information Retrieval (IR), term weights play an important role in retrieving the relevant documents responding to users’ queries. The term weight measures the importance or the information content of a keyword existing in the documents in the IR system. The term weight can be divided into two parts, the Global Term Weight (GTW) and the Local Term Weight (LTW). The GTW is a value assigned to each index term to indicate the topic of the documents. It has the discrimination value of the term to discriminate between documents in the same collection. The LTW is a value that measures the contribution of the index term in the document. This paper proposes an approach, based on an evolutionary gradient strategy, for evolving the Global Term Weights (GTWs) of the collection and using Term Frequency-Average Term Occurrence (TF-ATO) as the Local Term Weights (LTWs). This approach reduces the problem size for the term weights evolution which reduces the computational time helping to achieve an improved IR effectiveness compared to other Evolutionary Computation (EC) approaches in the literature. The paper also investigates the limitation that the relevance judgment can have in this approach by conducting two sets of experiments, for partially and fully evolved GTWs. The proposed approach outperformed the Okapi BM25 and TF-ATO with DA weighting schemes methods in terms of Mean Average Precision (MAP), Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
20 Newsgroups Document Collection: http://qwone.com/~jason/20Newsgroups/. Accessed 2015
Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)
Arnold, D.V., Salomon, R.: Evolutionary gradient search revisited. IEEE Trans. Evol. Comput. 11(4), 480–495 (2007)
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval—The Concepts and Technology Behind Search, 2nd edn. Pearson Education Ltd., Harlow, England (2011)
Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.: Bias and the limits of pooling for large collections. Inf. Retriev. 10(6), 491–508 (2007)
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22Nd International Conference on Machine Learning. pp. 89–96. ICML ’05, ACM, New York, NY, USA (2005)
Cordon, O., Herrera-Viedma, E., Lopez-Pujalte, C., Luque, M., Zarco, C.: A review on the application of evolutionary computation to information retrieval. Int. J. Approx. Reason. 34, 241–264 (2003) (Soft Computing Applications to Intelligent Information Retrieval on the Internet)
Cummins, R.: The evolution and analysis of term-weighting schemes in information retrieval. Ph.D. thesis, National University of Ireland, Galway (May 2008)
Cummins, R., O’Riordan, C.: Evolving local and global weighting schemes in information retrieval. Inf. Retriev. 9(3), 311–330 (2006)
Document Collections From University Of Glasgow: http://ir.dcs.gla.ac.uk/resources/test_collections/. Accessed 2015
Doornik, J.A.: An improved ziggurat method to generate normal random samples (2005)
Escalante, H.J., Garcia-Limon, M.A., Morales-Reyes, A., Graff, M., y Gomez, M.M., Morales, E.F., Martinez-Carranza. J. Knowl.-Based Syst. 83, 176–189 (2015)
Fan, W., Gordon, M.D., Pathak, P.: Personalization of search engine services for effective retrieval and knowledge management. In: Proceedings of the Twenty First International Conference on Information Systems. pp. 20–34. ICIS ’00, Association for Information Systems, Atlanta, GA, USA (2000)
Gordon, M.: Probabilistic and genetic algorithms in document retrieval. Commun. ACM 31(10), 1208–1218 (1988)
Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: Ohsumed: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 192–201. SIGIR ’94, Springer-Verlag New York, Inc., New York, NY, USA (1994)
Ibrahim, O., Landa-Silva, D.: Term frequency with average term occurrences for textual information retrieval. Soft Comput. 20(8), 3045–3061 (2016)
Ibrahim, O.A.S., Landa-Silva, D.: A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections. In: Computational Intelligence (UKCI), 2014 14th UK Workshop on. pp. 1–8 (Sept 2014)
Kuo, R., Zulvia, F.E.: The gradient evolution algorithm. Inf. Sci. 316(C), 246–265 (Sept 2015)
Kwok, K.L.: Comparing representations in Chinese information retrieval. In: SIGIR ’97 Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 34–41. ACM, New York, NY, USA (1997)
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retriev. 3(3), 225–331 (2009)
Loshchilov, I.: A computationally efficient limited memory cma-es for large scale optimization. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. pp. 397–404. GECCO ’14, ACM, New York, NY, USA (2014)
MacFarlane, A., Tuson, A.: Local search: a guide for the information retrieval practitioner. Inf. Process. Manag. 45(1), 159–174 (2009)
Oren, N.: Reexamining tf.idf based information retrieval with genetic programming. In: Proceedings of the 2002 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on Enablement Through Technology. pp. 224–234. SAICSIT ’02, South African Institute for Computer Scientists and Information Technologists, Republic of South Africa (2002)
PĂ©rez-Iglesias, J., PĂ©rez-AgĂĽera, J.R., Fresno, V., Feinstein, Y.Z.: Integrating the probabilistic models BM25/BM25F into lucene. CoRR abs/0911.5046 (2009)
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retriev. 3(4), 333–389 (2009)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Shaw, W.M., Wood, J.B., Wood, R.E., Tibbo, T.R.: The cystic fibrosis database: content and research opportunities. Library Inf. Sci. Res. 13(4), 347–366 (1991)
Smucker, M.D., Kazai, G., Lease, M.: Overview of the trec 2012 crowdsourcing track. Technical report, DTIC Document (2012)
Tonon, A., Demartini, G., Cudr-Mauroux, P.: Pooling-based continuous evaluation of information retrieval systems. Inf. Retrieval J. 18(5), 445–472 (2015)
Voorhees, E.M.: Overview of the trec 2004 robust retrieval track (2004)
Vrajitoru, D.: Crossover improvement for the genetic algorithm in information retrieval. Inf. Process. Manag. 34(4), 405–415 (1998)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ibrahim, O.A.S., Landa-Silva, D. (2017). (1+1)-Evolutionary Gradient Strategy to Evolve Global Term Weights in Information Retrieval. In: Angelov, P., Gegov, A., Jayne, C., Shen, Q. (eds) Advances in Computational Intelligence Systems. Advances in Intelligent Systems and Computing, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-319-46562-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-46562-3_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46561-6
Online ISBN: 978-3-319-46562-3
eBook Packages: EngineeringEngineering (R0)