Advertisement

Pattern Analysis and Applications

, Volume 14, Issue 4, pp 395–413 | Cite as

CORES: fusion of supervised and unsupervised training methods for a multi-class classification problem

  • Igor T. PodolakEmail author
  • Adam Roman
Theoretical Advances

Abstract

This paper describes in full detail a model of a hierarchical classifier (HC). The original classification problem is broken down into several subproblems and a weak classifier is built for each of them. Subproblems consist of examples from a subset of the whole set of output classes. It is essential for this classification framework that the generated subproblems would overlap, i.e. some individual classes could belong to more than one subproblem. This approach allows to reduce the overall risk. Individual classifiers built for the subproblems are weak, i.e. their accuracy is only a little better than the accuracy of a random classifier. The notion of weakness for a multiclass model is extended in this paper. It is more intuitive than approaches proposed so far. In the HC model described, after a single node is trained, its problem is split into several subproblems using a clustering algorithm. It is responsible for selecting classes similarly classified. The main scope of this paper is focused on finding the most appropriate clustering method. Some algorithms are defined and compared. Finally, we compare a whole HC with other machine learning approaches.

Keywords

Hierarchical classification Neural networks Multiclass classification Clustering algorithms Face recognition 

References

  1. 1.
    Bishop C (2006) Pattern recognition and machine learning. Springer, BerlinGoogle Scholar
  2. 2.
    Byun H, Lee SW (2003) A survey on pattern recognition applications of support vector machines. Int J Pattern Recognit Artif Intell 17(3):459–486CrossRefGoogle Scholar
  3. 3.
    Chou YY, Shapiro LG (2003) A hierarchical multiple classifier learning algorithm. Pattern Anal Appl 6:150–168MathSciNetCrossRefGoogle Scholar
  4. 4.
    Christiani N, Shawe-Taylor J (2000) Support vector machines and other kernel-based learning methods. Cambridge University PressGoogle Scholar
  5. 5.
    Ciampi A, Lechevallier Y, Limas MC, Marcos AG (2008) Hierarchical clustering of subpopulations with a dissimilarity based on the likelihood ratio statistic: application to clustering massive data sets. Pattern Anal Appl 11:199–220MathSciNetCrossRefGoogle Scholar
  6. 6.
    Dara R, Kamel M, Wanas N (2009) Data dependency in multiple classifier systems. Pattern Recogn 42:1260–1273zbMATHCrossRefGoogle Scholar
  7. 7.
    Day W, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1(1):7–24zbMATHCrossRefGoogle Scholar
  8. 8.
    Dietterich T, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286zbMATHGoogle Scholar
  9. 9.
    Eibl G, Pfeiffer KP (2005) Multiclass boosting for weak classifiers. J Mach Learn 6:189–210MathSciNetzbMATHGoogle Scholar
  10. 10.
    Freund Y, Schapire RE (1997) A decision theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 55:119–139MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Fritzke B (1995) A growing neural gas network learns topologies. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems, vol 7. MIT Press, Cambridge, pp 625–632Google Scholar
  12. 12.
    Fritzke B (1997) A self-organizing network that can follow non-stationary distributions. In: Gerstner W, Germond A, Hasler M, Nicoud JD (eds) International conference on artificial neural networks. LNCS, vol 1327. Springer, New York, pp 613–618Google Scholar
  13. 13.
    Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New YorkGoogle Scholar
  14. 14.
    Haykin S (2009) Neural networks: a comprehensive foundation. Prentice Hall, New JerseyGoogle Scholar
  15. 15.
    Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New YorkGoogle Scholar
  16. 16.
    Kearns M, Valiant L (1989) Cryptographic limitations on learning boolean formulae and finite automata. In: Proceedings of 21st annual ACM symposium on theory of computing. ACM Press, New York, pp 434–444Google Scholar
  17. 17.
    Kumar S, Ghosh J, Crawford M (2005) Hierarchical fusion of multiple classifiers for hyperspectral data analysis. Pattern Anal Appl 5:210–220MathSciNetCrossRefGoogle Scholar
  18. 18.
    Newman DJ, Hettich S, Blake CL (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
  19. 19.
    Podolak IT (2008) Hierarchical classifier with overlapping class groups. Expert Syst Appl 34(1):673–682CrossRefGoogle Scholar
  20. 20.
    Podolak IT, Bartocha K (2009) A hierarchical classifier with growing neural gas clustering. In: Kolehmainen M, Toivanen P, Beliczynski B (eds) Adaptive and natural computing algorithms. LNCS, vol 5495. Springer, New York, pp 283–292Google Scholar
  21. 21.
    Podolak IT, Roman A (2009) A new notion of weakness in classification theory. In: Kurzynski M, Wozniak M (eds) Computer recognition systems, vol 3. Advances in intelligent and soft computing, no. 57. Springer, New York, pp 239–245Google Scholar
  22. 22.
    Podolak IT, Roman A (2010) Theoretical foundations and practical results for the hierarchical classifier. Comput Intell (submitted)Google Scholar
  23. 23.
    Rokach L (2006) Decomposition methodology for classification tasks: a meta decomposer framework. Pattern Anal Appl 9:257–271MathSciNetCrossRefGoogle Scholar
  24. 24.
    Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227Google Scholar
  25. 25.
    Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336zbMATHCrossRefGoogle Scholar
  26. 26.
    Setiono R (2001) Feedforward neural network construction using cross validation. Neural Comput 13:2865–2877zbMATHCrossRefGoogle Scholar
  27. 27.
    Tresp V (2001) Committee machines. In: Handbook for neural network signal processing. CRC Press, Boca RatonGoogle Scholar
  28. 28.
    Xi D, Podolak IT, Lee SW (2002) Facial component extraction and face recognition with support vector machines. In: Proceedings of the fifth IEEE international conference on automatic face and gesture recognition. IEEE Computer Society, pp 76–81Google Scholar
  29. 29.
    Zwitter M, Sokolic M (1988) Primary tumor data set, medical data from the University Medical Center, Institute of Oncology, LjubljanaGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.Faculty of Mathematics and Computer Science, Institute of Computer Science, Jagiellonian UniversityKrakówPoland

Personalised recommendations