Advertisement

Iteratively Selecting Feature Subsets for Mining from High-Dimensional Databases

  • Hiroshi Mamitsuka
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2431)

Abstract

We propose a new data mining method that is effective for mining from extremely high-dimensional databases. Our proposed method iteratively selects a subset of features from a database and builds a hypothesis with the subset. Our selection of a feature subset has two steps, i.e. selecting a subset of instances from the database, to which predictions by multiple hypotheses previously obtained are most unreliable, and then selecting a subset of features, the distribution of whose values in the selected instances varies the most from that in all instances of the database. We empirically evaluate the effectiveness of the proposed method by comparing its performance with those of two other methods, including Xing et al.’s one of the latest feature subset selection methods. The evaluation was performed on a real-world data set with approximately 140,000 features. Our results show that the performance of the proposed method exceeds those of the other methods, both in terms of the final predictive accuracy and the precision attained at a recall given by Xing et al.’s method. We have also examined the effect of noise in the data and found that the advantage of the proposed method becomes more pronounced for larger noise levels.

Keywords

Support Vector Machine Feature Subset Inductive Algorithm Feature Subset Selection Component Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Breiman, L.: Pasting Small Votes for Classification in Large Databases and On-line. Machine Learning 36 (1999) 85–103CrossRefGoogle Scholar
  2. 2.
    Joachims, T. Making Large-scale SVMLearning Practical. In: Scholkopf, B., Burges, C., Smola, A. (eds.): Advances in Kernel Methods-Support Vector Learning, B. MIT Press, Cambridge (1999)Google Scholar
  3. 3.
    Kohavi, R., John, G. H.: Wrappers for Feature Subset Selection. Artificial Intelligence 97 (1997) 273–324MATHCrossRefGoogle Scholar
  4. 4.
    Koller, D., Sahami, M.: Toward Optimal Feature Selection. In: Saitta, L. (eds.): Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, Bari, Italy (1996) 284–292Google Scholar
  5. 5.
    Kononenko, I., Hong, S. J.: Attribute Selection for Modelling. Future Generation Computer Systems 13 (1997) 181–195CrossRefGoogle Scholar
  6. 6.
    Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery Data Mining. Kluwer Academic Publishers, Boston (1998)MATHGoogle Scholar
  7. 7.
    Mamitsuka, H., Abe, N.: Efficient Mining from Large Databases by Query Learning. In: Langley, P. (eds.): Proceedings of the Seventeenth International Conference on Machine Learning. Morgan Kaufmann, Stanford Univ., CA (2000) 575–582Google Scholar
  8. 8.
    Ng, A.: On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples. In: Shavlik, J. (eds.): Proceedings of the Fifteenth Intenational Conference on Machine Learning. Morgan Kaufmann, Madison, WI (1998) 404–412Google Scholar
  9. 9.
    Provost, F., Kolluri, V.: A Survey of Methods for Scaling up Inductive Algorithms. Knowledge Discovery and Data Mining 3 (1999) 131–169CrossRefGoogle Scholar
  10. 10.
    Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  11. 11.
    Seung, H. S., Opper, M., Sompolinsky, H.: Query by Committee. In: Haussler, D. (eds.): Proceedings of the Fifth Intenational Conference on Computational Learning Theory. Morgan Kaufmann, NY (1992) 287–294Google Scholar
  12. 12.
    Xing, E. P., Jordan, M. I., Karp, R. M.: Feature Selection for High-dimensional Genomic Microarray Data In: Brodley, C. E., Danyluk, A. P. (eds.): Proceedings of the Eighteenth Intenational Conference on Machine Learning. Morgan Kaufmann, Madison, WI (2001) 601–608Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Hiroshi Mamitsuka
    • 1
  1. 1.Institute for Chemical ResearchKyoto UniversityGokashoJapan

Personalised recommendations