Skip to main content
Log in

Feature Selection Using the Domain Relationship with Genetic Algorithms

  • Critical Reviews
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Considering the importance of the domain relationship in eliminating noisy features in feature selection, we present an alternate approach to designing a multi-objective fitness function using multiple correlation for the genetic algorithm (GA), which is used as a search tool in the problem. Multiple correlation is a simple statistical technique that uses the multiple correlation coefficients to measure the relationship between a dependent variable and a set of independent variables within the domain space. Simulation studies were conducted on both real-world and controlled data sets to assess the performance of the proposed fitness function. The comparison between the traditional fitness function and our proposed function is also reported. The results show that the proposed fitness function can perform more satisfactorily than the traditional one in all cases considered, including different data types, multi-class and multi-dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. A. Afifi, S. P. Azen. Statistical Analysis a Computer Oriented Approach, Academic Press, New York, 1972, pp. 107–128.

    MATH  Google Scholar 

  2. H. Aluallim, T. G. Dietterich. Learning with many irrelevant features. In: T. L. Dean, K. McKeown (eds.), Proc. 9th Nat’l Conf. Artificial Intelligence, AAAI-91, Anaheim, July 1991, MIT Press: USA, 1991, pp. 547–552.

  3. H. Aluallim, T. G. Dietterich. Efficient algorithms for identifying relevant features. In: J. Glasgow, R. Hedley (eds.), Proc. 9th Canadian Conf. Artificial Intelligence, AI-92, Vancouver, Canada, May 1992, Morgan Kau]fmann: CA, 1992, pp. 38–45.

  4. S. D. Bay. Combining nearest neighbor classifier through multiple feature subsets. In: P. Langley (ed.), Proc. 15th Int’l Conf. Machine Learning, ICML-98, Madison, Wisconsin, USA, July 1998, Morgan Kau]fmann, 1998.

  5. R. Caruana, D. Freitag. Greedy attribute selection. In: W. W. Cohen, H. Hirsh (eds.), Proc. 11th Int’l Conf. Machine Learning, ML-94, New Brunswick, NJ, July 1994. Morgan Kau]fmann: San Francisco, CA, 1994, pp. 28–36.

  6. T. Cover, P. Hart. NN pattern classification, IEEE Trans. Information Theory 13, 21–27, 1967.

    Article  MATH  Google Scholar 

  7. K. A. De Jong. Analysis of the behavior of a class of genetic adaptive systems, PhD Thesis, Department of Computer and Communication Sciences, University of Michigan, USA, 1975.

    Google Scholar 

  8. P. A. Devijver. An overview of asymptotic properties of NN rules. Pattern Recognition in Practice, Elsevier Science Publishers B.V.: New York, 1980, pp. 343–350.

    Google Scholar 

  9. L. Devroye. au]tomatic pattern recognition: A study of the probability of error, IEEE Trans. Pattern Analysis and Machine Intelligence 10(4), 530–543, 1988.

    Article  MATH  Google Scholar 

  10. E. F. Fix, J. Hodges. Discriminatory analysis: Small performance, Tech. Rep. Project 21-49-004, Rep. No. 11, USAF School of Aviation Medicine, Randolph Field, Tex., au]gust 1952.

  11. D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, USA, 1989, pp. 27–57.

    MATH  Google Scholar 

  12. J. Holland. Outline for a logical theory of adaptive systems, J. Association for Computing Machinery (ACM) 3, 293–314, 1962.

    Google Scholar 

  13. A. Jain, D. Zongker. Feature selection: Evaluation, application, and small sample performance, IEEE Trans. Pattern Analysis and Machine Intelligence 19(2), 153–158, 1997.

    Article  Google Scholar 

  14. G. H. John, R. Kohavi, K. Pfleger. Irrelevant features and the subset selection problem. In: W.W. Cohen, H. Hirsh (eds.), Proc. 11th Int’l Conf. Machine Learning, ICML-94, New Brunswick, NJ, July 1994, Morgan Kau]fmann: San Francisco, CA, 1994, pp. 121–129.

  15. K. Kira, L. A. Rendell. The feature selection problem: Traditional methods and a new algorithm. In: W. Swartout (ed.), Proc. 10th Nat’l Conf. Artificial Intelligence, AAAI-92, San Jose, CA, USA, July 1992, MIT Press: USA, 1992, pp. 129–134.

  16. K. Kira, L. A. Rendell. A practical approach to feature selection. In: D. Sleeman, P. Edwards (eds.), Proc. 9th Int’l Conf. Machine Learning, ML-92, Aberdeen, UK, July 1992, Morgan Kaufmann: CA, 1992, pp. 249–256.

  17. P. Langley. Selection of relevant features in machine learning. In: Proc. AAAI Fall Symposium on Relevance, New Orleans, LA, 1994, AAAI Press, 1994, pp. 1–5.

  18. H. Liu, R. Setiono. Chi2: Feature selection and discretization of numeric attributes. In: Proc. 7th IEEE Int’l Conf. Tools with Artificial Intelligence, TAI-95, Washington D.C., USA, November 1995, IEEE Press, 1995, pp. 388–391.

    Google Scholar 

  19. H. Liu, R. Setiono. Dimensionality reduction via discretization, Knowledge-Based Systems 9(1), 67–72, 1996.

    Article  Google Scholar 

  20. H. Liu, R. Setiono. A probabilistic approach to feature selection: A filter solution. In: Proc. 13th Int’l Conf. Machine Learning, ICML-96, Bari, Italy, July 1996, pp. 319–327.

  21. H. Liu, R. Setiono. Neural network feature selector, IEEE Trans. On Neural Networks 8(3), 654–662, 1997.

    Article  Google Scholar 

  22. H. Liu, R. Setiono. Feature selection via discretization of numeric attributes, IEEE Trans. Knowledge and Data Engineering 9(4), 642–645, 1997.

    Article  Google Scholar 

  23. H. Liu, R. Setiono. Incremental feature selection, Applied Intelligence 9(3), 217–230, 1998.

    Article  Google Scholar 

  24. M. Pei, E. D. Goodman, W. F. Punch, D. Ying. Genetic algorithms for classification and feature extraction. In: Proc. 1995 Annual Meeting Classification Society of North America, CSNA-95, Colorado, June 1995.

  25. M. Pei, E. D. Goodman, W. F. Punch. Pattern discovery from data using genetic algorithms. In: Proc. 1st Pacific-Asia Conf. Knowledge Discovery and Data Mining, February 1997.

  26. W. F. Punch, E. D. Goodman, M. Pei, L. Chia-Shun, P. Hovland, R. Enbody. Further research on feature selection and classification using genetic algorithms. In: Proc. 5th Int’l Conf. Genetic Algorithms, ICGA-93, Urbana-Champaign, July 1993, pp. 557–564.

  27. M. L. Raymer, W. F. Punch, E. D. Goodman, P. C. Sanschagrin, L. A. Kuhn. Simultaneous feature extraction and selection using a masking genetic algorithm. In: Proc. 7th Int’l Conf. Genetic Algorithms, ICGA-97, East Lansing, Michigan, July 1997, Morgan Kaufmann: San Francisco, 1997, pp. 561–567.

    Google Scholar 

  28. S. Salzberg, A. L. Delcher. Best-case results for nearest-neighbor learning, IEEE Trans. Pattern Analysis and Machine Intelligence 17(6), 599–608, 1995.

    Article  Google Scholar 

  29. W. Siedlecki, J. Sklansky. On automatic feature selection, Int. J. Pattern Recognition and Artificial Intelligence 2(2), 197–220, 1988.

    Article  Google Scholar 

  30. W. Siedlecki, J. Sklansky. A note on genetic algorithm for large-scale feature selection, IEEE Trans. on Computers 10, 335–347, 1989.

    MATH  Google Scholar 

  31. D. B. Skalak. Prototype and feature selection by sampling and random mutation hill-climbing algorithms. In: Proc. 11th Int. Conf. Machine Learning, ML-94, New Brunswick, NJ, July 1994, Morgan Kaufmann: San Francisco, CA, 1994, pp. 293-301.

    Google Scholar 

  32. J. T. Tou, R. C. Gonzalez. Pattern Recognition Principles, Addison-Wesley: Massachusetts, USA, 1977, pp. 76–86.

    Google Scholar 

  33. H. Vafaie, K. A. De Jong. Robust feature selection algorithm. Proc. IEEE Int. Conf. Tools with Artificial Intelligence, TAI-93, Boston, MA, 1993, IEEE Press, 1993, pp. 356–363.

    Google Scholar 

  34. H. Vafaie, K. A. De Jong. Genetic algorithm as a tool for feature selection in machine learning. In: Proc. IEEE Int. Conf. Tools with Artificial Intelligence, TAI-92, Arlington, VA, 1992, IEEE Press, 1992, pp. 200–204.

    Google Scholar 

  35. H. Vafaie, K. De Jong. Improving a rule learning system using genetic algorithms. In: Machine Learning: A Multistrategy Approach, Morgan Kaufmann, 1994, pp. 453-470.

  36. H. Vafaie, I. F. Imam. Feature selection methods: Genetic algorithms vs. greedy-like search. In: Proc. Int. Conf. Fuzzy and Intelligent Control Systems, Louisville, KY, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nidapan Chaikla.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chaikla, N., Qi, Y. Feature Selection Using the Domain Relationship with Genetic Algorithms. Knowledge and Information Systems 1, 377–390 (1999). https://doi.org/10.1007/BF03325105

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03325105

En]Keywords

Navigation