Advertisement

Improving Supervised Learning by Feature Decomposition

  • Oded Maimon
  • Lior Rokach
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2284)

Abstract

This paper presents the Feature Decomposition Approach for improving supervised learning tasks. While in Feature Selection the aim is to identify a representative set of features from which to construct a classification model, in Feature Decomposition, the goal is to decompose the original set of features into several subsets. A classification model is built for each subset, and then all generated models are combined. This paper presents theoretical and practical aspects of the Feature Decomposition Approach. A greedy procedure, called DOT (Decomposed Oblivious Trees), is developed to decompose the input features set into subsets and to build a classification model for each subset separately. The results achieved in the empirical comparison testing with well-known learning algorithms (like C4.5) indicate the superiority of the feature decomposition approach in learning tasks that contains high number of features and moderate numbers of tuples.

Keywords

Feature Selection Terminal Node Unlabeled Data Target Attribute Generalization Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ali K. M., Pazzani M. J., Error Reduction through Learning Multiple Descriptions, Machine Learning, 24(3): 173–202, 1996.Google Scholar
  2. 2.
    Almuallim H. and Dietterich T.G., Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69(1–2):279–306, 1994.zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Attneave, F., Applications of Information Theory to Psychology. Holt, Rinehart and Winston, 1959.Google Scholar
  4. 4.
    Bay, S. Nearest neighbor classification from multiple feature subsets. Intelligent Data Analysis, 3(3): 191–209, 1999.CrossRefGoogle Scholar
  5. 5.
    Bellman, R., Adaptive Control Processes: A Guided Tour, Princeton University Press, 1961.Google Scholar
  6. 6.
    Blum, A. and Mitchell, T., “Combining Labeled and Unlabeled Data with Cotraining”, COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, 1998.Google Scholar
  7. 7.
    Buntine, W., “Graphical Models for Discovering Knowledge”, in U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pp 59–82. AAAI/MIT Press, 1996.Google Scholar
  8. 8.
    Chan, P.K. and Stolfo, S.J., A Comparative Evaluation of Voting and Metalearning on Partitioned Data, Proc. 12th Intl. Conf. on Machine Learning ICML-95, 1995.Google Scholar
  9. 9.
    Dietterich, T. G., and Bakiri, G., Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 2:263–286, 1995.zbMATHGoogle Scholar
  10. 10.
    Domingos, P., and Pazzani, M., “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss,” Machine Learning, 29: 103–130, 1997.zbMATHCrossRefGoogle Scholar
  11. 11.
    Duda, R., and Hart, P., Pattern Classification and Scene Analysis, New-York, NY: Wiley, 1973.zbMATHGoogle Scholar
  12. 12.
    Dunteman, G.H., Principal Components Analysis, Sage Publications, 1989.Google Scholar
  13. 13.
    Fayyad, U., Piatesky-Shapiro, G., and Smyth P., “From Data Minig to Knowledge Discovery: An Overview,” in U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pp 1–30, MIT Press, 1996.Google Scholar
  14. 14.
    Friedman, J.H., and Tukey, J.W., “A Projection Pursuit Algorithm for Exploratory Data Analysis,” IEEE Transactions on Computers, 23 (9): 881–889, 1974.zbMATHCrossRefGoogle Scholar
  15. 15.
    Friedman, J.H., “On bias, variance, 0/1-loss and the curse of dimensionality,” Data Mining and Knowledge Discovery, 1 (1): 55–77, 1997.CrossRefGoogle Scholar
  16. 16.
    Fukunaga, K., Introduction to Statistical Pattern Recognition. San Diego, CA: Academic, 1990.zbMATHGoogle Scholar
  17. 17.
    Hwang J., Lay S., and Lippman A., “Nonparametric multivariate density estimation: A comparative study,” IEEE Trans. Signal Processing, vol. 42, pp. 2795–2810, Oct. 1994.Google Scholar
  18. 18.
    Jimenez, L. O., and Landgrebe D. A., “Supervised Classification in High-Dimensional Space: Geometrical, Statistical, and Asymptotical Properties of Multivariate Data.” IEEE Transaction on Systems Man, and Cybernetics — Part C: Applications and Reviews, 28:39–54, 1998.CrossRefGoogle Scholar
  19. 19.
    Kim J.O., and C.W. Mueller, Factor Analysis: Statistical Methods and Practical Issues. Sage Publications, 1978.Google Scholar
  20. 20.
    Kononenko, I., “Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition”. In Current Trends In reply to: Knowledge Acquisition, IOS Press, 1990.Google Scholar
  21. 21.
    Kononenko, I., “Semi-naive Bayesian classifier,” in Proceedings of the Sixth European Working Session on Learning, Springer-Verlag, pp. 206–219, 1991.Google Scholar
  22. 22.
    Kusiak, A., Decomposition in Data Mining: An Industrial Case Study, IEEE Transactions on Electronics Packaging Manufacturing, Vol. 23, No. 4, 2000, pp. 345–353CrossRefGoogle Scholar
  23. 23.
    Langley, P., “Selection of relevant features in machine learning,” in Proceedings of the AAAI Fall Symposium on Relevance. AAAI Press, 1994.Google Scholar
  24. 24.
    Langley, P. and Sage, S., Oblivious decision trees and abstract cases. Working Notes of the AAAI-94 Workshop on Case-Based Reasoning, Seattle, WA: AAAI Press, 113–117.Google Scholar
  25. 25.
    Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers, 1998.Google Scholar
  26. 26.
    Maimon, O., and M. Last, Knowledge Discovery and Data Mining: The Info-Fuzzy network (IFN) methodology, Kluwer Academic Publishers, 2000.Google Scholar
  27. 27.
    Maimon, O. and Rokach, L., “Data Mining by Attribute Decomposition with semiconductors manufacturing case study” in D. Bracha, Editor, Data Mining for Design and Manufacturing: Methods and Applications, Kluwer Academic Publishers, 2001.Google Scholar
  28. 28.
    Mansour, Y., and McAllester, D., Generalization Bounds for Decision Trees, COLT 2000: 220–224.Google Scholar
  29. 29.
    Merz, C.J, and Murphy. P.M., UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science, 1998.Google Scholar
  30. 30.
    Michie, D., “Problem decomposition and the learning of skills,” in Proceedings of the European Conference on Machine Learning, Springer-Verlag, PP. 17–31, 1995.Google Scholar
  31. 31.
    Pfahringer, B., “Controlling constructive induction in CiPF,” in Proceedings of the European Conference on Machine Learning, Springer-Verlag, pp. 242–256. 1994.Google Scholar
  32. 32.
    Pickard, L., B. Kitchenham, and S. Linkman., “An investigation of analysis techniques for software datasets”, in Proc. 6th IEEE Intl. Software Metrics Symposium. Boca Raton, FL: IEEE Computer Society, 1999.Google Scholar
  33. 33.
    Quinlan, J.R., C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.Google Scholar
  34. 34.
    Ridgeway, G., Madigan, D., Richardson, T. and O’Kane, J. (1998), “Interpretable Boosted Naive Bayes Classification”, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp 101–104.Google Scholar
  35. 35.
    Salzberg. S. L. (1997), “On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach”. Data Mining and Knowledge Discovery, 1, 312–327, Kluwer Academic Publishers, Boston.Google Scholar
  36. 36.
    Schmitt, M. On the complexity of computing and learning with multiplicative neural networks, to appear in Neural Computation, 2001.Google Scholar
  37. 37.
    Schlimmer, J. C. Efficiently inducing determinations: A complete and systematic search algorithm that uses optimal pruning. In Proceedings of the 1993 International Conference on Machine Learning, pages 284–290, San Mateo, CA, 1993. Morgan Kaufmann.Google Scholar
  38. 38.
    Shapiro, A. D., Structured induction in expert systems, Turing Institute Press in association with Addison-Wesley Publishing Company, 1987.Google Scholar
  39. 39.
    Vapnik, V.N., 1995. The Nature of Statistical Learning The-ory. Springer-Verlag, New York.Google Scholar
  40. 40.
    Wallace, C. S., 1996. MML Inference of Predictive Trees, Graphs and Nets. In Computational Learning and Probabilitic Reasoning, A., Gammerman (ed), Wiley, pp43–66.Google Scholar
  41. 41.
    Walpole, R. E., and Myers, R. H., Probability and Statistics for Engineers and Scientists, pp. 268–272, 1986.Google Scholar
  42. 42.
    Zaki, M. J., and Ho, C. T., Eds., Large-Scale Parallel Data Mining. New York: Springer-Verlag, 2000.Google Scholar
  43. 43.
    Zupan, B., Bohanec, M., Demsar, J., and Bratko, I., “Feature transformation by function decomposition,” IEEE intelligent systems & their applications, 13: 38–43, 1998.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Oded Maimon
    • 1
  • Lior Rokach
    • 1
  1. 1.Department of Industrial EngineeringTel-Aviv UniversityTel AvivIsrael

Personalised recommendations