Applying genetic algorithms to the feature selection problem in information retrieval
The demand of accuracy and speed in the Information Retrieval processes has revealed the necessity of a good classification of the large collection of documents existing in databases and Web servers. The representation of documents in the vector space model with terms as features offers the possibility of application of Machine Learning techniques. A filter method to select the most relevant features before the classification process is presented in this paper. A Genetic Algorithm (GA) is used as a powerful tool to search solutions in the domain of relevant features. Implementation and some preliminary experiments have been realized. The application of this technique to the vector space model in Information Retrieval is outlined as future work.
Unable to display preview. Download preview PDF.
- 1.Baker, J.E. Adaptive Selection Methods for Genetic Algorithms. In Proc. on the First International Conference on Genetic Algorithms and their applications, pp.101–111, Grefenstette, J.J. (ed). Hillsdale, New Jersey: Lawrence Earlbaum, 1985.Google Scholar
- 2.Dash, M and Liu, H. Feature Selection for Classification. In Intelligent Data Analysis, vol. 1, no. 3, 1997.Google Scholar
- 3.Holland, J.H. Adaptation in Natural and Artificial Systems. Massachusetts: MIT Press, 1992.Google Scholar
- 4.John, G.H., Kohavi, R. and Pfleger, K. Irrelevant Features and the Subset Selection Problem. In Proc. of the Eleventh International Conference on Machine Learning, pp.121–129. San Francisco, CA: Morgan Kauffmann Publishers, 1994.Google Scholar
- 5.Langley, P. Selection of Relevant Features in Machine Learning. In Proc. of the AAAI Fall Symposium on Relevance. New Orleans, LA: AAAI Press, 1994.Google Scholar
- 6.Salton, G. and McGill, M.J. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983.Google Scholar