Advertisement

Abstract

Dimensionality reduction is a commonly used step in machine learning, especially when dealing with a high dimensional space of features. The original feature space is mapped onto a new, reduced dimensionally space. The dimensionality reduction is usually performed either by selecting a subset of the original dimensions or/and by constructing new dimensions. This paper deals with feature subset selection for dimensionality reduction in machine learning. We provide a brief overview of the feature subset selection techniques that are commonly used in machine learning. Detailed description is provided for feature subset selection as commonly used on text data. For illustration, we show performance of several methods on document categorization of real-world data.

Keywords

Feature Selection Dimensionality Reduction Feature Subset Feature Selection Method Text Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining. AAAI Press/The MIT Press (1996)Google Scholar
  2. 2.
    Aha, D.W., Bankert, R.L.: Feature selection for case-based classification of cloud types: An empirical comparison. In: Proceedings of the AAAI 1994 Workshop on Case-Based Reasoning, pp. 106–112. AAAI Press, Seattle (1994)Google Scholar
  3. 3.
    Almuallin, H., Dietterich, T.G.: Efficient algorithms for identifying relevant features. In: Proceedings of the Ninth Canadian Conference on Artificial Intelligence, pp. 38–45. Morgan Kaufmann, Vancouver (1991)Google Scholar
  4. 4.
    Apté, C., Damerau, F., Weiss, S.M.: Toward Language Independent Automated Learning of Text Categorization Models. In: Proceedings of the 7th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dubline (1994)Google Scholar
  5. 5.
    Bala, J., Huang, J., Vafaie, H.: Hybrid Learning Using Genetis Algorithms and Decision Trees for Pattern Classification. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence IJCAI 1995, Montreal, Quebec, pp. 719–724 (1995)Google Scholar
  6. 6.
    Balabanović, M., Shoham, Y.: Learning Information Retrieval Agents: Experiments with Automated Web Browsing. In: Proceedings of the AAAI 1995 Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, Stanford (1995)Google Scholar
  7. 7.
    Berry, M.W., Dumais, S.T., OBrein, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review, 4 37, 573–595 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Bi, J., Bennett, K.P., Embrechts, M., Breneman, C.M., Song, M.: Dimensionality Reduction via Sparse Support Vector Machines. Journal of Machine Learning Research 3, 1229–1243 (2003)zbMATHGoogle Scholar
  9. 9.
    Bengio, Y., Chapados, N.: Extensions to Metric-Based Model Selection. Journal of Machine Learning Research 3, 1209–1227 (2003)zbMATHGoogle Scholar
  10. 10.
    Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional Word Clusters vs. Words for Text Categorization. Journal of Machine Learning Research 3, 1183–1208 (2003)zbMATHGoogle Scholar
  11. 11.
    Brank, J., Grobelnik, M., Milić-Frayling, N., Mladenić, D.: Feature selection using support vector machines. In: Zanasi, A. (ed.) Data mining III, pp. 261–273. WIT, Southampton (2002)Google Scholar
  12. 12.
    Cardie, C.: Using Decision Trees to Improve Case-Based Learning. In: Proceedings of the 10th International Conference on Machine Learning ICML 1993, pp. 25–32 (1993)Google Scholar
  13. 13.
    Caruana, R., Freitag, D.: Greedy Attribute Selection. In: Proceedings of the 11th International Conference on Machine Learning ICML 1994, pp. 28–26 (1994)Google Scholar
  14. 14.
    Caruana, R., de Sa, V.R.: Benefitting from the Variables that Variable Selection Discards. Journal of Machine Learning Research 3, 1245–1264 (2003)zbMATHGoogle Scholar
  15. 15.
    Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. The VLDB Journal 7, 163–178 (1998)CrossRefGoogle Scholar
  16. 16.
    Cherkauer, K.J., Shavlik, J.W.: Growing simpler decision trees to facilitate knowledge discovery. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining KDD 1996, pp. 315–318. AAAI Press, Portland (1996)Google Scholar
  17. 17.
    Cohen, W.W.: Learning to Classify English Text with ILP Methods. In: Proceedings of the Workshop on Inductive Logic Programming, Leuven (1995)Google Scholar
  18. 18.
    Dhillon, I., Mallela, S., Kumar, R.: A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification. Journal of Machine Learning Research 3, 1265–1287 (2003)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Domingos, P., Pazzani, M.: On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning 29, 103–130 (1997)CrossRefzbMATHGoogle Scholar
  20. 20.
    Dzeroski, S., Lavrac, N.: An introduction to inductive logic programming. In: Dzeroski, Lavrac (eds.) Relational data mining, pp. 48–73. Springer, Berlin (2001)CrossRefGoogle Scholar
  21. 21.
    Filo, D., Yang, J.: Yahoo! Inc. (1997), http://www.yahoo.com/docs/pr/
  22. 22.
    Forman, G.: An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research 3, 1289–1305 (2003)zbMATHGoogle Scholar
  23. 23.
    Frakes, W.B., Baeza-Yates, R. (eds.): Information Retrieval: Data Structures & Algorithms. Prentice Hall, Englewood Cliffs (1992)Google Scholar
  24. 24.
    Globerson, A., Tishby, N.: Sufficient Dimensionality Reduction. Journal of Machine Learning Research 3, 1307–1331 (2003)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)zbMATHGoogle Scholar
  26. 26.
    Grobelnik, M., Mladenić, D.: Learning Machine: design and implementation. Technical Report IJS-DP-7824. Department for Intelligent Systems, J. Stefan Institute, Slovenia (1998)Google Scholar
  27. 27.
    Jebara, T.: Multi-Task Feature and Kernel Selection for SVMs. In: Proceedings of the International Conference on Machine Learning, ICML-2004 (2004)Google Scholar
  28. 28.
    Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 143–151 (1997)Google Scholar
  29. 29.
    John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: Proceedings of the 11th International Conference on Machine Learning ICML 1994, pp. 121–129 (1994)Google Scholar
  30. 30.
    Kindo, T., Yoshida, H., Morimoto, T., Watanabe, T.: Adaptive Personal Information Filtering System that Organizes Personal Profiles Automatically. In: Proceedings of the 15th International Joint Conference on Artificial Intelligence IJCAI 1997, pp. 716–721 (1997)Google Scholar
  31. 31.
    Kira, K., Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. In: Proceedings of the Ninth National Conference on Artificial Intelligence AAAI 1992, pp. 129–134. AAAI Press/The MIT Press (1992)Google Scholar
  32. 32.
    Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the 13th International Conference on Machine Learning ICML 1996, pp. 284–292 (1996)Google Scholar
  33. 33.
    Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 170–178 (1997)Google Scholar
  34. 34.
    Kononenko, I.: On biases estimating multi-valued attributes. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence IJCAI 1995, pp. 1034–1040 (1995)Google Scholar
  35. 35.
    Lam, W., Low, K.F., Ho, C.Y.: Using Bayesian Network Induction Approach for Text Categorization. In: Proceedings of the 15th International Joint Conference on Artificial Intelligence IJCAI 1997, pp. 745–750 (1997)Google Scholar
  36. 36.
    Lewis, D.D.: Feature Selection and Feature Extraction for Text Categorization. In: Proceedings of Speech and Natural Language workshop, pp. 212–217. Morgan Kaufmann, Harriman (1992)CrossRefGoogle Scholar
  37. 37.
    Lewis, D.D.: Evaluating and optimizating autonomous text classification systems. In: Proceedings of the 18th Annual International ACM-SIGIR Conference on Recsearch and Development in Information Retrieval, pp. 246–254 (1995)Google Scholar
  38. 38.
    Liu, H., Setiono, R.: A probabilistic approach to feature selection - A filter solution. In: Proceedings of the 13th International Conference on Machine Learning ICML 1997, Bari, pp. 319–327 (1996)Google Scholar
  39. 39.
    Manly, B.F.J.: Multivariate Statistical Methods - a primer, 2nd edn. Chapman & Hall, Boca Raton (1994)zbMATHGoogle Scholar
  40. 40.
    Mansuripur, M.: Introduction to Information Theory. Prentice-Hall, Englewood Cliffs (1987)Google Scholar
  41. 41.
    Mitchell, T.M.: Machine Learning. The McGraw-Hill Companies, Inc., New York (1997)zbMATHGoogle Scholar
  42. 42.
    Mladenić, D.: Automated model selection. In: Proceedings of the MLNet familiarisation workshop: Knowledge level modelling and machine learning, ECML 1995, Heraklion (1995)Google Scholar
  43. 43.
    Mladenić, D.: Domain-Tailored Machine Learning. M.Sc. Thesis, Faculty of computer and information science, University of Ljubljan, Slovenia (1995)Google Scholar
  44. 44.
    Mladenić, D.: Personal WebWatcher: Implementation and Design. Technical Report IJS-DP-7472. Department for Intelligent Systems, J. Stefan Institute, Slovenia (1996)Google Scholar
  45. 45.
    Mladenić, D.: Feature subset selection in text-learning. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)Google Scholar
  46. 46.
    Mladenić, D., Grobelnik, M.: Feature selection for unbalanced class distribution and Naive Bayes. In: Proceedings of the Sixteenth International Conference on Machine Learning (ICML-1999), pp. 258–267. M. Kaufmann, San Francisco (1999)Google Scholar
  47. 47.
    Mladenić, D., Grobelnik, M.: Feature selection on hierarchy of web documents. Journal of Decision support systems 35, 45–87 (2003)CrossRefGoogle Scholar
  48. 48.
    Mladenić, D., Grobelnik, M.: Mapping documents onto web page ontology. In: Berendt, B., Hotho, A., Mladenič, D., van Someren, M., Spiliopoulou, M., Stumme, G. (eds.) EWMF 2003. LNCS, vol. 3209, pp. 77–96. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  49. 49.
    Pazzani, M., Billsus, D.: Learning and Revising User Profiles: The Identification of Interesting Web Sites. Machine Learning 27, 313–331 (1997)CrossRefGoogle Scholar
  50. 50.
    Perkins, S., Lacker, K., Theiler, J.: Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space. Journal of Machine Learning Research 3, 1333–1356 (2003)MathSciNetzbMATHGoogle Scholar
  51. 51.
    Pfahringer, B.: Compression-Based Feature Subset Selection. In: Turney, P. (ed.) Proceedings of the IJCAI 1995 Workshop on Data Engineering for Inductive Learning, Workshop Program Working Notes, Montreal, Canada (1995)Google Scholar
  52. 52.
    Quinlan, J.R.: Constructing Decision Tree. In: C4.5: Programs for Machine Learning. Morgan Kaufman Publishers, San Francisco (1993)Google Scholar
  53. 53.
    Rakotomamonjy, A.: Variable Selection Using SVM-based Criteria. Journal of Machine Learning Research 3, 1357–1370 (2003)MathSciNetzbMATHGoogle Scholar
  54. 54.
    van Rijsbergen, C.J., Harper, D.J., Porter, M.F.: The selection of good search terms. Information Processing & Management 17, 77–91 (1981)CrossRefGoogle Scholar
  55. 55.
    Robnik-Sikonja, M., Kononenko, I.: Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53, 23–69 (2003)CrossRefzbMATHGoogle Scholar
  56. 56.
    Shapiro, A.: Structured induction in expert systems. Addison-Wesley, Reading (1987)zbMATHGoogle Scholar
  57. 57.
    Shaw Jr., W.M.: Term-relevance computations and perfect retrieval performance. Information Processing & Management 31(4), 491–498 (1995)CrossRefGoogle Scholar
  58. 58.
    Skalak, D.B.: Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In: Proceedings of the 11th International Conference on Machine Learning ICML 1994, pp. 293–301 (1994)Google Scholar
  59. 59.
    Vafaie, H., De Jong, K.: Robust feature selection algorithms. In: Proceedings of the 5th IEEE International Conference on Tools for Artificial Intelligence, pp. 356–363. IEEE Press, Boston (1993)Google Scholar
  60. 60.
    Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 412–420 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dunja Mladenić
    • 1
  1. 1.Jožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations