Knowledge and Information Systems

, Volume 9, Issue 1, pp 91–108 | Cite as

Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets

  • Hiroshi Mamitsuka
Regular Paper


We propose a new data-mining method that is effective for learning from extremely high-dimensional data sets. Our proposed method selects a subset of features from a high-dimensional data set by a process of iterative refinement. Our selection of a feature-subset has two steps. The first step selects a subset of instances, to which predictions by hypotheses previously obtained are most unreliable, from the data set. The second step selects a subset of features whose values in the selected instances vary the most from those in all instances of the database. We empirically evaluate the effectiveness of the proposed method by comparing its performance with those of four other methods, including one of the latest feature-subset selection methods. The evaluation was performed on a real-world data set with approximately 140,000 features. Our results show that the performance of the proposed method exceeds those of the other methods in terms of prediction accuracy, precision at a certain recall value, and computation time to reach a certain prediction accuracy. We have also examined the effect of noise in the data and found that the advantage of the proposed method becomes more pronounced for larger noise levels. Extended abstracts of parts of the work presented in this paper have appeared in Mamitsuka [14] and Mamitsuka [15].


Query learning Feature-subset selection High-dimensional data set Uncertainty sampling Drug design 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Breiman, L (1999) Pasting small votes for classification in large databases and on-line. Mach Learn 36(1–2):85–103Google Scholar
  2. 2.
    Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305zbMATHGoogle Scholar
  3. 3.
    Freund Y, Shapire R (1997) A decision theoretic generalization of on-line learning and an application to boosting. J Comput Sys Sci 55(1):119–139CrossRefGoogle Scholar
  4. 4.
    Freund Y, Seung H, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28(2–3):133–168Google Scholar
  5. 5.
    Hagmann M (2000) Computers aid vaccine design. Science 290(5489):80–82Google Scholar
  6. 6.
    Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844Google Scholar
  7. 7.
    Joachims T (1999) Making large-scale SVM learning practical, In: Scholkopf B, Burges C, Smola A (eds) Advances in Kernel methods–support vector learning, B. MIT Press, Cambridge, MA, pp 41–56Google Scholar
  8. 8.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324Google Scholar
  9. 9.
    Koller D, Sahami M (1996) Toward optimal feature selection, In: Saitta L (ed) Proceedings of the thirteenth international conference on machine learning. Morgan Kaufmann, Bari, Italy, pp. 284–292Google Scholar
  10. 10.
    Kononenko I, Hong SJ (1997) Attribute selection for modelling. Future Gener Comput Sys 13(2–3):181–195Google Scholar
  11. 11.
    Lewis D, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Cohen W, Hirsh H (eds) Proceedings of the eleventh international conference on machine learning, Morgan Kaufmann, Brunswick, pp. 148–156Google Scholar
  12. 12.
    Lewis D, Gale W (1994) Training text classifiers by uncertainty sampling. In: Smeaton AF (ed) Proceedings of the seventeenth annual international ACM SIGIR conference on research and development in information retrieval. ACM, Dublin, Ireland, pp. 3–12Google Scholar
  13. 13.
    Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer, Boston.Google Scholar
  14. 14.
    Mamitsuka H (2002) Iteratively selecting feature subsets for mining from high-dimensional databases. In: Elomaa T, Mannila H, Toivonen H (eds) Proceedings of the 6th European conference on principles and practice of knowledge discovery in databases. Springer, Berlin Heidelberg New York, pp. 361–372Google Scholar
  15. 15.
    Mamitsuka H (2003) Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design. In: Bourbakis N (ed) Proceedings of the third IEEE international symposium on bioinformatics and bioengineering. IEEE Computer Society Press, Bethesda, MD, pp. 253–257Google Scholar
  16. 16.
    Mamitsuka H, Abe N (2000) Efficient mining from large databases by query learning. In: Langley P (ed) Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann, Stanford University, Stanford, pp. 575–582Google Scholar
  17. 17.
    Miller MA (2002) Chemical database techniques in drug discovery. Nat Rev Drug Discovery 1:220–227Google Scholar
  18. 18.
    Ng A (1998) On feature selection: learning with exponentially many irrelevant features as training examples. In: Shavlik J (ed) Proceedings of the fifteenth international conference on machine learning. Morgan Kaufmann, Madison, WI, pp. 404–412Google Scholar
  19. 19.
    Provost F, Kolluri V (1999) A survey of methods for scaling up inductive algorithms. Know Discovery Data Min 3(2):131–169Google Scholar
  20. 20.
    Quinlan J (1983) Learning efficient classification procedures and their applications to chess endgames. In: Michalski RS, Carbonell JG, Mitchell TM (eds) Machine learning: an artificial intelligence approach. Morgan Kaufmann, Palo Alto, CA, pp. 463–482Google Scholar
  21. 21.
    Quinlan J (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo, CAGoogle Scholar
  22. 22.
    Räsch G, Onoda T, Müller KR (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320Google Scholar
  23. 23.
    Seung HS, Opper N, Sompolinsky H (1992) Query by committee. In: Haussler D (ed) Proceedings of the 5th international conference on computational learning theory. Morgan Kaufmann, New York, pp. 287–294Google Scholar
  24. 24.
    Xing EP, Karp RM (2001) CLIFF: clustering of high-dimensional microarray data via feature filtering using normalized cuts. Bioinformatics 17(Suppl 1):S306–S315Google Scholar
  25. 25.
    Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: Brodley CE, Danyluk AP (eds) Proceedings of the eighteenth international conference on machine learning. Morgan Kaufmann, Madison, WI, pp. 601–608Google Scholar

Copyright information

© Springer-Verlag London Ltd. 2005

Authors and Affiliations

  1. 1.Institute for Chemical ResearchKyoto UniversityGokashoJapan

Personalised recommendations