On the Scalability of Genetic Algorithms to Very Large-Scale Feature Selection
Feature Selection is a very promising optimisation strategy for Pattern Recognition systems. But, as an NP-complete task, it is extremely difficult to carry out. Past studies therefore were rather limited in either the cardinality of the feature space or the number of patterns utilised to assess the feature subset performance.
This study examines the scalability of Distributed Genetic Algorithms to very large-scale Feature Selection. As domain of application, a classification system for Optical Characters is chosen. The system is tailored to classify hand-written digits, involving 768 binary features. Due to the vastness of the investigated problem, this study forms a step into new realms in Feature Selection for classification.
We present a set of customisations of GAs that provide for an application of known concepts to Feature Selection problems of practical interest. Some limitations of GAs in the domain of Feature Selection are unrevealed and improvements are suggested. A widely used strategy to accelerate the optimisation process, Training Set Sampling, was observed to fail in this domain of application.
Experiments on unseen validation data suggest that Distributed GAs are capable of reducing the problem complexity significantly. The results show that the classification accuracy can be maintained while reducing the feature space cardinality by about 50%. Genetic Algorithms are demonstrated to scale well to very large-scale problems in Feature Selection.
- M. Dash and H. Liu. Feature selection for classification. Intelligent Data Analysis, 1997.
- G. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection problem. Proceedings of the International Conference on Machine Learning, 11, 1994.
- Siedlecki, W., Sklansky, J. (1988) A note on genetic algorithms for large-scale feature selection. Pattern Recognition Letters 10: pp. 335-347 CrossRef
- Punch, Goodman, Pei, Lai Chia-Shun, P. Hovland, and R. Enbody. Further research on feature selection and classification using genetic algorithms. Proceedings of the 5th International Conference of Genetic Algorithms, 1993.
- A. Moser. A distributed vertical genetic algorithm for feature selection. Fifth International Conference on Document Analysis and Recognition, Open Research Forum, 1999.
- D. Flotzinger. Feature selection by genetic algorithms. IIG Report Series, 369, 1993.
- M. Prakash and M. N. Murty. Feature selection to improce classification accuracy using a genetic algorithm. Journal of the Indian Institute of Science, 1997.
- A. K. Jain and D. Zongker. Feature selection: Evaluation, application and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2), 1997.
- J. E. Smith, T. C. Fogarty, and I.R. Johnson. Genetic feature selection for clustering and classification. Proceedings of the IEEE Colloquium on Genetic Algorithms in Image Processing & Vision; IEEE Digest 1994/193, 1994.
- C. Guerra-Salcedo and D. Whitley. Genetic search for feature selection: A comparison between CHC and GENESIS. Proceedings of the Symposium on Genetic Algorithms, 1998.
- J. Yang and V. Honavar. Feature subset selection using a genetic algorithm. Feature Extraction, Construction and Selection-A Data Mining Perspective, 1998.
- F. J. Ferri, P. Pudil, M. Hatef, and J. Kittler. Comparative study of techniques for large-scale feature reduction. Pattern Recognition in Practice IV, 1994.
- I. F. Imam and H. Vafaie. An emprical comparison between global and greedy-like search for feature selection. Proceedings of the Florida AI Research Symposium, 1994.
- E. I. Chang and R. P. Lippmann. Using genetic algorithms to improve pattern classification performance. Advances in Neural Information Processing, 3, 1990.
- Beasley, D., Bull, D. R., Martin, R. R. (1993) An overview of genetic algorithms; part 2: Research topics. University Computing 15: pp. 58-69
- F. Z. Brill, D. E. Brown, and W. N. Martin. Fast genetic selection of features for neural network classifiers. IEEE Transactions of Neural Networks, 3(2), 1992.
- M. Prakash and M. N. Murty. Growing subspace pattern recognition methods and their neural-network models. IEEE Transactions on Neural Networks, 8(1), 1997.
- Saradhi, V. V. (1999) Pattern Representation and Prototype Selection in Classification. Department of Computer Science and Automation, Indian Institute of Science, Bangalore
- L. Holmstroem, P. Koistinen, and E. Oja. Neural and statistical classifiers-taxonomy and two case studies. IEEE Transactions on Neural Networks, 8(1), 1997.
- Moser, A. (1999) Distributed genetic algorithms for feature selection. University of Kaiserslautern, Germany
- On the Scalability of Genetic Algorithms to Very Large-Scale Feature Selection
- Book Title
- Real-World Applications of Evolutionary Computing
- Book Subtitle
- EvoWorkshops 2000: EvoIASP, EvoSCONDI, EvoTel, EvoSTIM, EvoRob, and EvoFlight Edinburgh, Scotland, UK, April 17, 2000 Proceedings
- pp 77-86
- Print ISBN
- Online ISBN
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- Springer-Verlag Berlin Heidelberg
- Additional Links
- Industry Sectors
- eBook Packages
- Stefano Cagnoni (4)
- Editor Affiliations
- 4. Department of Computer Engineering, University of Parma
- Author Affiliations
- 5. German Research Center for Artificial Intelligence GmbH, 67608, Kaiserslautern, Germany
- 6. Department of Computer Science and Automation, Indian Institute of Science, Bangalore, 560 012, India
To view the rest of this content please follow the download PDF link above.