Abstract
Considering the importance of the domain relationship in eliminating noisy features in feature selection, we present an alternate approach to designing a multi-objective fitness function using multiple correlation for the genetic algorithm (GA), which is used as a search tool in the problem. Multiple correlation is a simple statistical technique that uses the multiple correlation coefficients to measure the relationship between a dependent variable and a set of independent variables within the domain space. Simulation studies were conducted on both real-world and controlled data sets to assess the performance of the proposed fitness function. The comparison between the traditional fitness function and our proposed function is also reported. The results show that the proposed fitness function can perform more satisfactorily than the traditional one in all cases considered, including different data types, multi-class and multi-dimensional data.
Similar content being viewed by others
References
A. A. Afifi, S. P. Azen. Statistical Analysis a Computer Oriented Approach, Academic Press, New York, 1972, pp. 107–128.
H. Aluallim, T. G. Dietterich. Learning with many irrelevant features. In: T. L. Dean, K. McKeown (eds.), Proc. 9th Nat’l Conf. Artificial Intelligence, AAAI-91, Anaheim, July 1991, MIT Press: USA, 1991, pp. 547–552.
H. Aluallim, T. G. Dietterich. Efficient algorithms for identifying relevant features. In: J. Glasgow, R. Hedley (eds.), Proc. 9th Canadian Conf. Artificial Intelligence, AI-92, Vancouver, Canada, May 1992, Morgan Kau]fmann: CA, 1992, pp. 38–45.
S. D. Bay. Combining nearest neighbor classifier through multiple feature subsets. In: P. Langley (ed.), Proc. 15th Int’l Conf. Machine Learning, ICML-98, Madison, Wisconsin, USA, July 1998, Morgan Kau]fmann, 1998.
R. Caruana, D. Freitag. Greedy attribute selection. In: W. W. Cohen, H. Hirsh (eds.), Proc. 11th Int’l Conf. Machine Learning, ML-94, New Brunswick, NJ, July 1994. Morgan Kau]fmann: San Francisco, CA, 1994, pp. 28–36.
T. Cover, P. Hart. NN pattern classification, IEEE Trans. Information Theory 13, 21–27, 1967.
K. A. De Jong. Analysis of the behavior of a class of genetic adaptive systems, PhD Thesis, Department of Computer and Communication Sciences, University of Michigan, USA, 1975.
P. A. Devijver. An overview of asymptotic properties of NN rules. Pattern Recognition in Practice, Elsevier Science Publishers B.V.: New York, 1980, pp. 343–350.
L. Devroye. au]tomatic pattern recognition: A study of the probability of error, IEEE Trans. Pattern Analysis and Machine Intelligence 10(4), 530–543, 1988.
E. F. Fix, J. Hodges. Discriminatory analysis: Small performance, Tech. Rep. Project 21-49-004, Rep. No. 11, USAF School of Aviation Medicine, Randolph Field, Tex., au]gust 1952.
D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, USA, 1989, pp. 27–57.
J. Holland. Outline for a logical theory of adaptive systems, J. Association for Computing Machinery (ACM) 3, 293–314, 1962.
A. Jain, D. Zongker. Feature selection: Evaluation, application, and small sample performance, IEEE Trans. Pattern Analysis and Machine Intelligence 19(2), 153–158, 1997.
G. H. John, R. Kohavi, K. Pfleger. Irrelevant features and the subset selection problem. In: W.W. Cohen, H. Hirsh (eds.), Proc. 11th Int’l Conf. Machine Learning, ICML-94, New Brunswick, NJ, July 1994, Morgan Kau]fmann: San Francisco, CA, 1994, pp. 121–129.
K. Kira, L. A. Rendell. The feature selection problem: Traditional methods and a new algorithm. In: W. Swartout (ed.), Proc. 10th Nat’l Conf. Artificial Intelligence, AAAI-92, San Jose, CA, USA, July 1992, MIT Press: USA, 1992, pp. 129–134.
K. Kira, L. A. Rendell. A practical approach to feature selection. In: D. Sleeman, P. Edwards (eds.), Proc. 9th Int’l Conf. Machine Learning, ML-92, Aberdeen, UK, July 1992, Morgan Kaufmann: CA, 1992, pp. 249–256.
P. Langley. Selection of relevant features in machine learning. In: Proc. AAAI Fall Symposium on Relevance, New Orleans, LA, 1994, AAAI Press, 1994, pp. 1–5.
H. Liu, R. Setiono. Chi2: Feature selection and discretization of numeric attributes. In: Proc. 7th IEEE Int’l Conf. Tools with Artificial Intelligence, TAI-95, Washington D.C., USA, November 1995, IEEE Press, 1995, pp. 388–391.
H. Liu, R. Setiono. Dimensionality reduction via discretization, Knowledge-Based Systems 9(1), 67–72, 1996.
H. Liu, R. Setiono. A probabilistic approach to feature selection: A filter solution. In: Proc. 13th Int’l Conf. Machine Learning, ICML-96, Bari, Italy, July 1996, pp. 319–327.
H. Liu, R. Setiono. Neural network feature selector, IEEE Trans. On Neural Networks 8(3), 654–662, 1997.
H. Liu, R. Setiono. Feature selection via discretization of numeric attributes, IEEE Trans. Knowledge and Data Engineering 9(4), 642–645, 1997.
H. Liu, R. Setiono. Incremental feature selection, Applied Intelligence 9(3), 217–230, 1998.
M. Pei, E. D. Goodman, W. F. Punch, D. Ying. Genetic algorithms for classification and feature extraction. In: Proc. 1995 Annual Meeting Classification Society of North America, CSNA-95, Colorado, June 1995.
M. Pei, E. D. Goodman, W. F. Punch. Pattern discovery from data using genetic algorithms. In: Proc. 1st Pacific-Asia Conf. Knowledge Discovery and Data Mining, February 1997.
W. F. Punch, E. D. Goodman, M. Pei, L. Chia-Shun, P. Hovland, R. Enbody. Further research on feature selection and classification using genetic algorithms. In: Proc. 5th Int’l Conf. Genetic Algorithms, ICGA-93, Urbana-Champaign, July 1993, pp. 557–564.
M. L. Raymer, W. F. Punch, E. D. Goodman, P. C. Sanschagrin, L. A. Kuhn. Simultaneous feature extraction and selection using a masking genetic algorithm. In: Proc. 7th Int’l Conf. Genetic Algorithms, ICGA-97, East Lansing, Michigan, July 1997, Morgan Kaufmann: San Francisco, 1997, pp. 561–567.
S. Salzberg, A. L. Delcher. Best-case results for nearest-neighbor learning, IEEE Trans. Pattern Analysis and Machine Intelligence 17(6), 599–608, 1995.
W. Siedlecki, J. Sklansky. On automatic feature selection, Int. J. Pattern Recognition and Artificial Intelligence 2(2), 197–220, 1988.
W. Siedlecki, J. Sklansky. A note on genetic algorithm for large-scale feature selection, IEEE Trans. on Computers 10, 335–347, 1989.
D. B. Skalak. Prototype and feature selection by sampling and random mutation hill-climbing algorithms. In: Proc. 11th Int. Conf. Machine Learning, ML-94, New Brunswick, NJ, July 1994, Morgan Kaufmann: San Francisco, CA, 1994, pp. 293-301.
J. T. Tou, R. C. Gonzalez. Pattern Recognition Principles, Addison-Wesley: Massachusetts, USA, 1977, pp. 76–86.
H. Vafaie, K. A. De Jong. Robust feature selection algorithm. Proc. IEEE Int. Conf. Tools with Artificial Intelligence, TAI-93, Boston, MA, 1993, IEEE Press, 1993, pp. 356–363.
H. Vafaie, K. A. De Jong. Genetic algorithm as a tool for feature selection in machine learning. In: Proc. IEEE Int. Conf. Tools with Artificial Intelligence, TAI-92, Arlington, VA, 1992, IEEE Press, 1992, pp. 200–204.
H. Vafaie, K. De Jong. Improving a rule learning system using genetic algorithms. In: Machine Learning: A Multistrategy Approach, Morgan Kaufmann, 1994, pp. 453-470.
H. Vafaie, I. F. Imam. Feature selection methods: Genetic algorithms vs. greedy-like search. In: Proc. Int. Conf. Fuzzy and Intelligent Control Systems, Louisville, KY, 1994.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chaikla, N., Qi, Y. Feature Selection Using the Domain Relationship with Genetic Algorithms. Knowledge and Information Systems 1, 377–390 (1999). https://doi.org/10.1007/BF03325105
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF03325105