Information Retrieval

, Volume 9, Issue 3, pp 311–330

Evolving local and global weighting schemes in information retrieval

Article

DOI: 10.1007/s10791-006-1682-6

Cite this article as:
Cummins, R. & O’Riordan, C. Inf Retrieval (2006) 9: 311. doi:10.1007/s10791-006-1682-6

Abstract

This paper describes a method, using Genetic Programming, to automatically determine term weighting schemes for the vector space model. Based on a set of queries and their human determined relevant documents, weighting schemes are evolved which achieve a high average precision. In Information Retrieval (IR) systems, useful information for term weighting schemes is available from the query, individual documents and the collection as a whole.

We evolve term weighting schemes in both local (within-document) and global (collection-wide) domains which interact with each other correctly to achieve a high average precision. These weighting schemes are tested on well-known test collections and are compared to the traditional tf-idf weighting scheme and to the BM25 weighting scheme using standard IR performance metrics.

Furthermore, we show that the global weighting schemes evolved on small collections also increase average precision on larger TREC data. These global weighting schemes are shown to adhere to Luhn’s resolving power as both high and low frequency terms are assigned low weights. However, the local weightings evolved on small collections do not perform as well on large collections. We conclude that in order to evolve improved local (within-document) weighting schemes it is necessary to evolve these on large collections.

Keywords

Genetic Programming Information Retrieval Term-Weighting Schemes 

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.Department of Information TechnologyNational University of IrelandGalwayIreland

Personalised recommendations