Knowledge and Information Systems

, Volume 1, Issue 3, pp 377–390 | Cite as

Feature Selection Using the Domain Relationship with Genetic Algorithms

Critical Reviews

Abstract

Considering the importance of the domain relationship in eliminating noisy features in feature selection, we present an alternate approach to designing a multi-objective fitness function using multiple correlation for the genetic algorithm (GA), which is used as a search tool in the problem. Multiple correlation is a simple statistical technique that uses the multiple correlation coefficients to measure the relationship between a dependent variable and a set of independent variables within the domain space. Simulation studies were conducted on both real-world and controlled data sets to assess the performance of the proposed fitness function. The comparison between the traditional fitness function and our proposed function is also reported. The results show that the proposed fitness function can perform more satisfactorily than the traditional one in all cases considered, including different data types, multi-class and multi-dimensional data.

En]Keywords

Feature selection genetic algorithm fitness function domain relationship multiple correlation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. A. Afifi, S. P. Azen. Statistical Analysis a Computer Oriented Approach, Academic Press, New York, 1972, pp. 107–128.MATHGoogle Scholar
  2. 2.
    H. Aluallim, T. G. Dietterich. Learning with many irrelevant features. In: T. L. Dean, K. McKeown (eds.), Proc. 9th Nat’l Conf. Artificial Intelligence, AAAI-91, Anaheim, July 1991, MIT Press: USA, 1991, pp. 547–552.Google Scholar
  3. 3.
    H. Aluallim, T. G. Dietterich. Efficient algorithms for identifying relevant features. In: J. Glasgow, R. Hedley (eds.), Proc. 9th Canadian Conf. Artificial Intelligence, AI-92, Vancouver, Canada, May 1992, Morgan Kau]fmann: CA, 1992, pp. 38–45.Google Scholar
  4. 4.
    S. D. Bay. Combining nearest neighbor classifier through multiple feature subsets. In: P. Langley (ed.), Proc. 15th Int’l Conf. Machine Learning, ICML-98, Madison, Wisconsin, USA, July 1998, Morgan Kau]fmann, 1998.Google Scholar
  5. 5.
    R. Caruana, D. Freitag. Greedy attribute selection. In: W. W. Cohen, H. Hirsh (eds.), Proc. 11th Int’l Conf. Machine Learning, ML-94, New Brunswick, NJ, July 1994. Morgan Kau]fmann: San Francisco, CA, 1994, pp. 28–36.Google Scholar
  6. 6.
    T. Cover, P. Hart. NN pattern classification, IEEE Trans. Information Theory 13, 21–27, 1967.MATHCrossRefGoogle Scholar
  7. 7.
    K. A. De Jong. Analysis of the behavior of a class of genetic adaptive systems, PhD Thesis, Department of Computer and Communication Sciences, University of Michigan, USA, 1975.Google Scholar
  8. 8.
    P. A. Devijver. An overview of asymptotic properties of NN rules. Pattern Recognition in Practice, Elsevier Science Publishers B.V.: New York, 1980, pp. 343–350.Google Scholar
  9. 9.
    L. Devroye. au]tomatic pattern recognition: A study of the probability of error, IEEE Trans. Pattern Analysis and Machine Intelligence 10(4), 530–543, 1988.MATHCrossRefGoogle Scholar
  10. 10.
    E. F. Fix, J. Hodges. Discriminatory analysis: Small performance, Tech. Rep. Project 21-49-004, Rep. No. 11, USAF School of Aviation Medicine, Randolph Field, Tex., au]gust 1952.Google Scholar
  11. 11.
    D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, USA, 1989, pp. 27–57.MATHGoogle Scholar
  12. 12.
    J. Holland. Outline for a logical theory of adaptive systems, J. Association for Computing Machinery (ACM) 3, 293–314, 1962.Google Scholar
  13. 13.
    A. Jain, D. Zongker. Feature selection: Evaluation, application, and small sample performance, IEEE Trans. Pattern Analysis and Machine Intelligence 19(2), 153–158, 1997.CrossRefGoogle Scholar
  14. 14.
    G. H. John, R. Kohavi, K. Pfleger. Irrelevant features and the subset selection problem. In: W.W. Cohen, H. Hirsh (eds.), Proc. 11th Int’l Conf. Machine Learning, ICML-94, New Brunswick, NJ, July 1994, Morgan Kau]fmann: San Francisco, CA, 1994, pp. 121–129.Google Scholar
  15. 15.
    K. Kira, L. A. Rendell. The feature selection problem: Traditional methods and a new algorithm. In: W. Swartout (ed.), Proc. 10th Nat’l Conf. Artificial Intelligence, AAAI-92, San Jose, CA, USA, July 1992, MIT Press: USA, 1992, pp. 129–134.Google Scholar
  16. 16.
    K. Kira, L. A. Rendell. A practical approach to feature selection. In: D. Sleeman, P. Edwards (eds.), Proc. 9th Int’l Conf. Machine Learning, ML-92, Aberdeen, UK, July 1992, Morgan Kaufmann: CA, 1992, pp. 249–256.Google Scholar
  17. 17.
    P. Langley. Selection of relevant features in machine learning. In: Proc. AAAI Fall Symposium on Relevance, New Orleans, LA, 1994, AAAI Press, 1994, pp. 1–5.Google Scholar
  18. 18.
    H. Liu, R. Setiono. Chi2: Feature selection and discretization of numeric attributes. In: Proc. 7th IEEE Int’l Conf. Tools with Artificial Intelligence, TAI-95, Washington D.C., USA, November 1995, IEEE Press, 1995, pp. 388–391.Google Scholar
  19. 19.
    H. Liu, R. Setiono. Dimensionality reduction via discretization, Knowledge-Based Systems 9(1), 67–72, 1996.CrossRefGoogle Scholar
  20. 20.
    H. Liu, R. Setiono. A probabilistic approach to feature selection: A filter solution. In: Proc. 13th Int’l Conf. Machine Learning, ICML-96, Bari, Italy, July 1996, pp. 319–327.Google Scholar
  21. 21.
    H. Liu, R. Setiono. Neural network feature selector, IEEE Trans. On Neural Networks 8(3), 654–662, 1997.CrossRefGoogle Scholar
  22. 22.
    H. Liu, R. Setiono. Feature selection via discretization of numeric attributes, IEEE Trans. Knowledge and Data Engineering 9(4), 642–645, 1997.CrossRefGoogle Scholar
  23. 23.
    H. Liu, R. Setiono. Incremental feature selection, Applied Intelligence 9(3), 217–230, 1998.CrossRefGoogle Scholar
  24. 24.
    M. Pei, E. D. Goodman, W. F. Punch, D. Ying. Genetic algorithms for classification and feature extraction. In: Proc. 1995 Annual Meeting Classification Society of North America, CSNA-95, Colorado, June 1995.Google Scholar
  25. 25.
    M. Pei, E. D. Goodman, W. F. Punch. Pattern discovery from data using genetic algorithms. In: Proc. 1st Pacific-Asia Conf. Knowledge Discovery and Data Mining, February 1997.Google Scholar
  26. 26.
    W. F. Punch, E. D. Goodman, M. Pei, L. Chia-Shun, P. Hovland, R. Enbody. Further research on feature selection and classification using genetic algorithms. In: Proc. 5th Int’l Conf. Genetic Algorithms, ICGA-93, Urbana-Champaign, July 1993, pp. 557–564.Google Scholar
  27. 27.
    M. L. Raymer, W. F. Punch, E. D. Goodman, P. C. Sanschagrin, L. A. Kuhn. Simultaneous feature extraction and selection using a masking genetic algorithm. In: Proc. 7th Int’l Conf. Genetic Algorithms, ICGA-97, East Lansing, Michigan, July 1997, Morgan Kaufmann: San Francisco, 1997, pp. 561–567.Google Scholar
  28. 28.
    S. Salzberg, A. L. Delcher. Best-case results for nearest-neighbor learning, IEEE Trans. Pattern Analysis and Machine Intelligence 17(6), 599–608, 1995.CrossRefGoogle Scholar
  29. 29.
    W. Siedlecki, J. Sklansky. On automatic feature selection, Int. J. Pattern Recognition and Artificial Intelligence 2(2), 197–220, 1988.CrossRefGoogle Scholar
  30. 30.
    W. Siedlecki, J. Sklansky. A note on genetic algorithm for large-scale feature selection, IEEE Trans. on Computers 10, 335–347, 1989.MATHGoogle Scholar
  31. 31.
    D. B. Skalak. Prototype and feature selection by sampling and random mutation hill-climbing algorithms. In: Proc. 11th Int. Conf. Machine Learning, ML-94, New Brunswick, NJ, July 1994, Morgan Kaufmann: San Francisco, CA, 1994, pp. 293-301.Google Scholar
  32. 32.
    J. T. Tou, R. C. Gonzalez. Pattern Recognition Principles, Addison-Wesley: Massachusetts, USA, 1977, pp. 76–86.Google Scholar
  33. 33.
    H. Vafaie, K. A. De Jong. Robust feature selection algorithm. Proc. IEEE Int. Conf. Tools with Artificial Intelligence, TAI-93, Boston, MA, 1993, IEEE Press, 1993, pp. 356–363.Google Scholar
  34. 34.
    H. Vafaie, K. A. De Jong. Genetic algorithm as a tool for feature selection in machine learning. In: Proc. IEEE Int. Conf. Tools with Artificial Intelligence, TAI-92, Arlington, VA, 1992, IEEE Press, 1992, pp. 200–204.Google Scholar
  35. 35.
    H. Vafaie, K. De Jong. Improving a rule learning system using genetic algorithms. In: Machine Learning: A Multistrategy Approach, Morgan Kaufmann, 1994, pp. 453-470.Google Scholar
  36. 36.
    H. Vafaie, I. F. Imam. Feature selection methods: Genetic algorithms vs. greedy-like search. In: Proc. Int. Conf. Fuzzy and Intelligent Control Systems, Louisville, KY, 1994.Google Scholar

Copyright information

© Springer-Verlag Singapore Pte. Ltd. 1999

Authors and Affiliations

  1. 1.Computer Science and Information Management Division, School of Advanced TechnologiesAsian Institute of TechnologyKlong LuangThailand

Personalised recommendations