Summary
Many complex pattern classification problems involve high-dimensional inputs as well as a large number of classes. In this chapter, we present a modular learning framework called the Binary Hierarchical Classifier (BHC) that takes a coarse-to-fine approach to dealing with a large number of output classes. BHC decomposes a C-class problem into a set of C-1 two-(meta)class problems, arranged in a binary tree with C leaf nodes and C-1 internal nodes. Each internal node is comprised of a feature extractor and a classifier that discriminates between the two meta-classes represented by its two children. Both bottom-up and top-down approaches for building such a BHC are presented in this chapter. The Bottom-up Binary Hierarchical Classifier (BU-BHC) is built by applying agglomerative clustering to the set of C classes. The Top-down Binary Hierarchical Classifier (TD-BHC) is built by recursively partitioning a set of classes at any internal node into two disjoint groups or meta-classes. The coupled problems of finding a good partition and of searching for a linear feature extractor that best discriminates the two resulting meta-classes are solved simultaneously at each stage of the recursive algorithm. The hierarchical, multistage classification approach taken by the BHC also helps with dealing with high-dimensional data, since simpler feature spaces are often adequate for solving the two-(meta)class problems. In addition, it leads to the discovery of useful domain knowledge such as class hierarchies or ontologies, and results in more interpretable results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ballard, D., 1987: Modular learning in neural networks. Proc. AAAI-87, 279–84.
Bellman, R. E., ed., 1961: Adaptive Control Processes. Princeton University Press.
Breiman, L., J. H. Friedman, R. Olshen and C. J. Stone, 1984: Classification and Regression Trees. Wadsworth and Brooks, Pacific Grove, California.
Brill, F. Z., D. E. Brown and W. N. Martin, 1992: Fast genetic selection of features for neural network classifiers. IEEE Transactions on Neural Networks, 3, 324–28.
Chakrabarti, S., B. Dom, R. Agrawal and P. Raghavan, 1998: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB Journal, 7, 163–78.
Chakravarthy, S., J. Ghosh, L. Deuser and S. Beck, 1991: Efficient training procedures for adaptive kernel classifiers. Neural Networks for Signal Processing, IEEE Press, 21–9.
Crawford, M. M., S. Kumar, M. R. Ricard, J. C. Gibeaut and A. Neuenshwander, 1999: Fusion of airborne polarimetric and interferometric SAR for classification of coastal environments. IEEE Transactions on Geoscience and Remote Sensing, 37, 1306–15.
Dattatreya, G. R. and L. N. Kanal, 1985: Decision trees in pattern recognition. Progress in Pattern Recognition 2, L. N. Kanal and A. Rosenfeld, eds., Elsevier Science, 189–239.
Deco, G. and L. Parra, 1997: Nonlinear feature extraction by redundancy reduction in an unsupervised stochastic neural network. Neural Networks, 10, 683–91.
Dietterich, T. G. and G. Bakiri, 1995: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–86.
Duda, R. and P. Hart, 1973: Pattern Classification and Scene Analysis. Addison-Wesley.
Etemad, K. and R. Chellappa, 1998: Separability-based multiscale basis selection and feature extraction for signal and image classification. IEEE Transactions on Image Processing, 7, 1453–65.
Friedman, J., 1989: Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165–75.
— 1996: Another approach to polychotomous classification. Technical report, Stanford University.
— 1996: On bias, variance, 0/1 loss, and the curse-of-dimensionality. Technical report, Department of Statistics, Stanford University.
Fukunaga, K., 1990: Introduction to Statistical Pattern Recognition (2nd Ed.), Academic Press, NY.
Furnkranz, J., 2002: Round robin classification. Jl. Machine Learning Research, 2, 721–47.
Ghosh, J., 2003: Scalable clustering. The Handbook of Data Mining, N. Ye, ed., Lawrence Erlbaum Assoc., 247–77.
Godbole, S., S. Sarawagi and S. Chakrabarti, 2002: Scaling multi-class support vector machines using inter-class confusion. Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (KDD-02), 513–18.
Hand, D., 1982: Kernel Discriminant Analysis. Research Studies Press, Chichester, UK.
Happel, B. and J. Murre, 1994: Design and evolution of modular neural network architectures. Neural Networks, 7:6/7, 985–1004.
Hastie, T. and R. Tibshirani, 1996: Discriminant adaptive nearest neightbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-18, 607–16.
— 1998: Classification by pairwise coupling. Advances in Neural Information Processing Systems, M. J. K. Michael, I. Jordan and S. A. Solla, eds., MIT Press, Cambridge, Massachusetts, 10, 507–13.
Henneguelle, A., J. Ghosh and M. M. Crawford, 2003: Polyline feature extraction for land cover classification using hyperspectral data. Proc. IICAI-03, 256–69.
Hsu, C. W. and C. J. Lin, 2002: A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13, 415–25.
Jordan, M. and R. Jacobs, 1994: Hierarchical mixture of experts and the EM algorithm. Neural Computation, 6, 181–214.
Khotanzad, A. and Y. Hong, 1990: Invariant image recognition by zernike moments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 28–37.
Kittler, J. and F. Roli, eds., 2001: Multiple Classifier Systems. LNCS Vol. 1857, Springer.
Kumar, S., 2000: Modular learning through output space decomposition. Ph.D. thesis, Dept. of ECE, Univ. of Texas at Austin, USA.
Kumar, S. and J. Ghosh, 1999: GAMLS: A generalized framework for associative modular learning systems (invited paper). Proceedings of the Applications and Science of Computational Intelligence II, Orlando, Florida, 24–34.
Kumar, S., J. Ghosh and M. M. Crawford, 1999: A versatile framework for labeling imagery with a large number of classes. Proceedings of the International Joint Conference on Neural Networks, Washington, D.C.
— 2002: Hierarchical fusion of multiple classifiers for hyperspectral data analysis. Pattern Analysis and Applications, splecial issue on Fusion of Multiple Classifiers, 5, 210–20.
Mao, J. and A. K. Jain, 1995: Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks, 6(2), 296–317.
McLachlan, G. J., 1992: Discriminant Analysis and Statistical Pattern Recognition. John Wiley, New York.
Merz, C. and P. Murphy, 1996: UCI repository of machine learning databases. URL: www.ics.uci.edu/~mlearn/MLRepository.html.
Murray-Smith, R. and T. A. Johansen, 1997: Multiple Model Approaches to Modelling and Control. Taylor and Francis, UK.
Nilsson, N. J., 1965: Learning Machines: Foundations of Trainable Pattern-Classifying Systems. McGraw Hill, NY.
Petridis, V. and A. Kehagias, 1998: Predictive Modular Neural Networks: Applications to Time Series. Kluwer Academic Publishers, Boston.
Platt, J. C., N. Cristianini and J. Shawe-Taylor, 2000: Large margin DAGs for multiclass classification. MIT Press, 12, 547–53.
Rajan, S. and J. Ghosh, 2004: An empirical comparison of hierarchical vs. two-level approaches to multiclass problems. Multiple Classifier Systems, F. Roli, J. Kittler and T. Windeatt, eds., LNCS Vol. 3077, Springer, 283–92.
Ramamurti, V. and J. Ghosh, 1998: On the use of localized gating in mixtures of experts networks (invited paper), SPIE Conf. on Applications and Science of Computational Intelligence, SPIE Proc. Vol. 3390, 24–35.
— 1999: Structurally adaptive modular networks for nonstationary environments. IEEE Trans. on Neural Networks, 10, 152–60.
Rasoul Safavian, S. and D. Landgrebe, 1991: A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21, 660–74.
Rifkin, R. and A. Klautau, 2004: In defense of one-vs-all classification. Jl. Machine Learning Research, 5, 101–41.
Sakurai-Amano, T., J. Iisaka and M. Takagi, 1997: Comparison of land cover indices of AVHRR data. International Geoscience and Remote Sensing Symposium, 916–18.
Schölkopf, B., C. Burges and A. J. Smola, eds., 1998: Advances in Kernel Methods: Support Vector Learning. MIT Press.
Sharkey, A., 1999: Combining Artificial Neural Nets. Springer-Verlag.
Sharkey, A. J. C., N. E. Sharkey, and G. O. Chandroth, 1995: Neural nets and diversity. Proceedings of the 14th International Conference on Computer Safety, Reliability and Security, Belgirate, Italy.
Tumer, K. and N. C. Oza, 1999: Decimated input ensembles for improved generalization. Proceedings of the International Joint Conference on Neural Networks, Washington, D.C.
Vapnik, V., 1995: The Nature of Statistical Learning Theory. Springer.
Rights and permissions
Copyright information
© 2005 Dr Sanghamitra Bandyopadhyay
About this chapter
Cite this chapter
Ghosh, J., Kumar, S., Crawford, M.M. (2005). Automatic Discovery of Class Hierarchies via Output Space Decomposition. In: Advanced Methods for Knowledge Discovery from Complex Data. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/1-84628-284-5_2
Download citation
DOI: https://doi.org/10.1007/1-84628-284-5_2
Publisher Name: Springer, London
Print ISBN: 978-1-85233-989-0
Online ISBN: 978-1-84628-284-3
eBook Packages: Computer ScienceComputer Science (R0)