Abstract
An overview of the principle feature subset selection methods isgiven. We investigate a number of measures of feature subset quality, usinglarge commercial databases. We develop an entropic measure, based upon theinformation gain approach used within ID3 and C4.5 to build trees, which isshown to give the best performance over our databases. This measure is usedwithin a simple feature subset selection algorithm and the technique is usedto generate subsets of high quality features from the databases. A simulatedannealing based data mining technique is presented and applied to thedatabases. The performance using all features is compared to that achievedusing the subset selected by our algorithm. We show that a substantialreduction in the number of features may be achieved together with animprovement in the performance of our data mining system. We also present amodification of the data mining algorithm, which allows it to simultaneouslysearch for promising feature subsets and high quality rules. The effect ofvarying the generality level of the desired pattern is alsoinvestigated.
Similar content being viewed by others
References
ANGOSS: 1994, KnowledgeSEEKER 3.11.05 on-line help.
ANGOSS: 1995, KnowledgeSEEKER for Windows Version 3.0 User's Guide, Toronto, Canada.
Biggs, D., de Ville, B. and Suen, E.: 1991, A method of choosing multiway partitions for classification and decision trees, Journal of Applied Statistics 18(1), 49–62.
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J.: 1984, Classification and Regression Trees, Wadsworth and Brooks, Monterey, CA.
Chang, C. Y.: 1973, Dynamic programming as applied to feature selection in pattern recognition systems, IEEE Trans. Syst. Man and Cybernet. 3, 166–171.
Chardaire, P., Lutton, J. L. and Sutter, A.: 1995, Thermostatistical persistency: A powerful improving concept for simulated annealing algorithms, European Journal of Operational Research 86.
Cover, T. M. and Van Campenhout, J. M.: 1977, On the possible orderings in the measurement selection problem, IEEE Trans. Sys. Man Cybern. 7, 651–661.
de la Iglesia, B., Debuse, J. C. W. and Rayward-Smith, V. J.: 1996, Discovering knowledge in commercial databases using modern heuristic techniques, inE. Simoudis, J. W. Han and U. Fayyad (eds), Proc. of the Second Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), pp. 44–49.
deVille, B.: 1990, Applying statistical knowledge to database analysis and knowledge base construction, Proceedings of the Sixth IEEE Conference on Artificial Intelligence Applications, IEEE Computer Society, Washington.
Devijver, P. A. and Kittler, J.: 1982, Pattern Recognition: a Statistical Approach, Prentice-Hall International, London.
Foroutan, I. and Sklansky, J.: 1987, Feature selection for automatic classification of non-gaussian data, IEEE Trans. Sys. Man Cybern. 17, 187–198.
John, G. H., Kohavi, R. and Pfleger, K.: 1994, Irrelevant features and the subset selection problem, inW. W. Cohen and H. Hirsh (eds), Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann, San Francisco, pp. 121–129.
Kass, G. V.: 1980, An exploratory technique for investigating large quantities of categorical data, Appl. Statist. 29, 119–127.
Kohavi, R. and Sommerfield, D.: 1995, Feature subset selection using the wrapper method: overfitting and dynamic search space topology, inU. M. Fayyad and R. Uthurusamy (eds), Proc. of the First Int. Conf. on Knowledge Discovery and Data Mining, AAAI Press, pp. 192–197.
Koller, D. and Sahami, M.: 1996, Toward optimal feature selection, in(Saitta, 1996).
Liu, H. and Setiono, R.: 1996a, Feature selection and classification-a probabilistic wrapper approach, Proc. of the 9th Int. Conf. on Industrial and Engineering Applications of AI and Expert Systemspp. 419–424.
Liu, H. and Setiono, R.: 1996b, A probabilistic approach to feature selection–a filter solution, in(Saitta, 1996).
Lundy, M. and Mees, A.: 1986, Convergence of an annealing algorithm, Mathematical programming 34, 111–124.
Mann, J. W.: 1995, X-SAmson v1.0 user manual, School of Information Systems, University of East Anglia.
Marill, T. and Green, D. M.: 1963, On the effectiveness of receptors in recognition systems, IEEE Trans. Inform. Theory 9, 11–17.
Michael, M. and Lin, W. C.: 1973, Experimental study of information measures and inter-intra class distance ratios on feature selection and ordering, IEEE Trans. Systems Man Cybernet. 3, 172–181.
Narendra, P. M. and Fukunaga, K.: 1977, A branch and bound algorithm for feature subset selection, IEEE Transactions on Computerspp. 917–922.
Pei, M., Goodman, E. D., Punch, W. F. and Ding, Y.: 1995, Genetic algorithms for classification and feature extraction, Proc. of the Classification Society Conf..
Quinlan, J. R.: 1983, Learning efficient classification procedures and their application to chess end games, inR. S. Michalski, J. G. Carbonell and T. M. Mitchell (eds), Machine learning: An artificial intelligence approach, Morgan Kaufmann, San Mateo, CA.
Quinlan, J. R.: 1986, Induction of decision trees, Machine Learning 1.
Quinlan, J. R.: 1993, C4.5: Programs for Machine Learning, Morgan Kaufmann.
Rauber, T. W.: 1996, The tooldiag package. Available electronically from: http://www.uninova.pt/∼tr/home/tooldiag.html.
Rayward-Smith, V. J., Debuse, J. C. W. and de la Iglesia, B.: 1995, Using a genetic algorithm to data mine in the financial services sector, inA. Macintosh and C. Cooper (eds), Applications and Innovations in Expert Systems III, SGES Publications, pp. 237–252.
Saitta, L. (ed.): 1996, Proc. of the 13th Int. Conf. on Machine Learning.
Shannon, C. E. and Weaver, W.: 1963, The Mathematical Theory of Communication, University of Illinois Press, Urbana.
Thrun, S., Bala, J., Bloedorn, E., Bratko, I., Cestnik, B., Cheng, J., De Jong, K., Dzeroski, S., Fahlman, S. E., Fisher, D., Hamman, R., Kaufman, K., Keller, I. Kononenko, I., Kreuziger, J., Michalski, R. S., Mitchell, T., Pachowicz, P., Reich, Y., Vafaie, H., Van de Welde, W., Wenzel, W., Wnek, J. and Zhang, J.: 1991, The MONK’s problems-a performance comparison of different learning algorithms, Carnegie Mellon University CMU-CS-91-197.
Weiss, S. M. and Kulikowski, C. A.: 1991, Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning and expert systems, Morgan Kaufmann, San Francisco.s
Whitney, A.: 1971, A direct method of nonparametric measurement selection, IEEE Trans. Comput. 20, 1100–1103.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Debuse, J.C., Rayward-Smith, V.J. Feature Subset Selection within a Simulated Annealing Data Mining Algorithm. Journal of Intelligent Information Systems 9, 57–81 (1997). https://doi.org/10.1023/A:1008641220268
Issue Date:
DOI: https://doi.org/10.1023/A:1008641220268