Automatic Discovery of Class Hierarchies via Output Space Decomposition

Ghosh, Joydeep; Kumar, Shailesh; Crawford, Melba M.

doi:10.1007/1-84628-284-5_2

Joydeep Ghosh,
Shailesh Kumar &
Melba M. Crawford

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

865 Accesses

Summary

Many complex pattern classification problems involve high-dimensional inputs as well as a large number of classes. In this chapter, we present a modular learning framework called the Binary Hierarchical Classifier (BHC) that takes a coarse-to-fine approach to dealing with a large number of output classes. BHC decomposes a C-class problem into a set of C-1 two-(meta)class problems, arranged in a binary tree with C leaf nodes and C-1 internal nodes. Each internal node is comprised of a feature extractor and a classifier that discriminates between the two meta-classes represented by its two children. Both bottom-up and top-down approaches for building such a BHC are presented in this chapter. The Bottom-up Binary Hierarchical Classifier (BU-BHC) is built by applying agglomerative clustering to the set of C classes. The Top-down Binary Hierarchical Classifier (TD-BHC) is built by recursively partitioning a set of classes at any internal node into two disjoint groups or meta-classes. The coupled problems of finding a good partition and of searching for a linear feature extractor that best discriminates the two resulting meta-classes are solved simultaneously at each stage of the recursive algorithm. The hierarchical, multistage classification approach taken by the BHC also helps with dealing with high-dimensional data, since simpler feature spaces are often adequate for solving the two-(meta)class problems. In addition, it leads to the discovery of useful domain knowledge such as class hierarchies or ontologies, and results in more interpretable results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ballard, D., 1987: Modular learning in neural networks. Proc. AAAI-87, 279–84.
Google Scholar
Bellman, R. E., ed., 1961: Adaptive Control Processes. Princeton University Press.
Google Scholar
Breiman, L., J. H. Friedman, R. Olshen and C. J. Stone, 1984: Classification and Regression Trees. Wadsworth and Brooks, Pacific Grove, California.
Google Scholar
Brill, F. Z., D. E. Brown and W. N. Martin, 1992: Fast genetic selection of features for neural network classifiers. IEEE Transactions on Neural Networks, 3, 324–28.
Article Google Scholar
Chakrabarti, S., B. Dom, R. Agrawal and P. Raghavan, 1998: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB Journal, 7, 163–78.
Google Scholar
Chakravarthy, S., J. Ghosh, L. Deuser and S. Beck, 1991: Efficient training procedures for adaptive kernel classifiers. Neural Networks for Signal Processing, IEEE Press, 21–9.
Google Scholar
Crawford, M. M., S. Kumar, M. R. Ricard, J. C. Gibeaut and A. Neuenshwander, 1999: Fusion of airborne polarimetric and interferometric SAR for classification of coastal environments. IEEE Transactions on Geoscience and Remote Sensing, 37, 1306–15.
Article Google Scholar
Dattatreya, G. R. and L. N. Kanal, 1985: Decision trees in pattern recognition. Progress in Pattern Recognition 2, L. N. Kanal and A. Rosenfeld, eds., Elsevier Science, 189–239.
Google Scholar
Deco, G. and L. Parra, 1997: Nonlinear feature extraction by redundancy reduction in an unsupervised stochastic neural network. Neural Networks, 10, 683–91.
Article Google Scholar
Dietterich, T. G. and G. Bakiri, 1995: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–86.
Google Scholar
Duda, R. and P. Hart, 1973: Pattern Classification and Scene Analysis. Addison-Wesley.
Google Scholar
Etemad, K. and R. Chellappa, 1998: Separability-based multiscale basis selection and feature extraction for signal and image classification. IEEE Transactions on Image Processing, 7, 1453–65.
Article Google Scholar
Friedman, J., 1989: Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165–75.
MathSciNet Google Scholar
— 1996: Another approach to polychotomous classification. Technical report, Stanford University.
Google Scholar
— 1996: On bias, variance, 0/1 loss, and the curse-of-dimensionality. Technical report, Department of Statistics, Stanford University.
Google Scholar
Fukunaga, K., 1990: Introduction to Statistical Pattern Recognition (2nd Ed.), Academic Press, NY.
Google Scholar
Furnkranz, J., 2002: Round robin classification. Jl. Machine Learning Research, 2, 721–47.
MathSciNet Google Scholar
Ghosh, J., 2003: Scalable clustering. The Handbook of Data Mining, N. Ye, ed., Lawrence Erlbaum Assoc., 247–77.
Google Scholar
Godbole, S., S. Sarawagi and S. Chakrabarti, 2002: Scaling multi-class support vector machines using inter-class confusion. Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (KDD-02), 513–18.
Google Scholar
Hand, D., 1982: Kernel Discriminant Analysis. Research Studies Press, Chichester, UK.
Google Scholar
Happel, B. and J. Murre, 1994: Design and evolution of modular neural network architectures. Neural Networks, 7:6/7, 985–1004.
Google Scholar
Hastie, T. and R. Tibshirani, 1996: Discriminant adaptive nearest neightbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-18, 607–16.
Google Scholar
— 1998: Classification by pairwise coupling. Advances in Neural Information Processing Systems, M. J. K. Michael, I. Jordan and S. A. Solla, eds., MIT Press, Cambridge, Massachusetts, 10, 507–13.
Google Scholar
Henneguelle, A., J. Ghosh and M. M. Crawford, 2003: Polyline feature extraction for land cover classification using hyperspectral data. Proc. IICAI-03, 256–69.
Google Scholar
Hsu, C. W. and C. J. Lin, 2002: A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13, 415–25.
Article Google Scholar
Jordan, M. and R. Jacobs, 1994: Hierarchical mixture of experts and the EM algorithm. Neural Computation, 6, 181–214.
Google Scholar
Khotanzad, A. and Y. Hong, 1990: Invariant image recognition by zernike moments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 28–37.
Article Google Scholar
Kittler, J. and F. Roli, eds., 2001: Multiple Classifier Systems. LNCS Vol. 1857, Springer.
Google Scholar
Kumar, S., 2000: Modular learning through output space decomposition. Ph.D. thesis, Dept. of ECE, Univ. of Texas at Austin, USA.
Google Scholar
Kumar, S. and J. Ghosh, 1999: GAMLS: A generalized framework for associative modular learning systems (invited paper). Proceedings of the Applications and Science of Computational Intelligence II, Orlando, Florida, 24–34.
Google Scholar
Kumar, S., J. Ghosh and M. M. Crawford, 1999: A versatile framework for labeling imagery with a large number of classes. Proceedings of the International Joint Conference on Neural Networks, Washington, D.C.
Google Scholar
— 2002: Hierarchical fusion of multiple classifiers for hyperspectral data analysis. Pattern Analysis and Applications, splecial issue on Fusion of Multiple Classifiers, 5, 210–20.
MathSciNet Google Scholar
Mao, J. and A. K. Jain, 1995: Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks, 6(2), 296–317.
Google Scholar
McLachlan, G. J., 1992: Discriminant Analysis and Statistical Pattern Recognition. John Wiley, New York.
Google Scholar
Merz, C. and P. Murphy, 1996: UCI repository of machine learning databases. URL: www.ics.uci.edu/~mlearn/MLRepository.html.
Google Scholar
Murray-Smith, R. and T. A. Johansen, 1997: Multiple Model Approaches to Modelling and Control. Taylor and Francis, UK.
Google Scholar
Nilsson, N. J., 1965: Learning Machines: Foundations of Trainable Pattern-Classifying Systems. McGraw Hill, NY.
Google Scholar
Petridis, V. and A. Kehagias, 1998: Predictive Modular Neural Networks: Applications to Time Series. Kluwer Academic Publishers, Boston.
Google Scholar
Platt, J. C., N. Cristianini and J. Shawe-Taylor, 2000: Large margin DAGs for multiclass classification. MIT Press, 12, 547–53.
Google Scholar
Rajan, S. and J. Ghosh, 2004: An empirical comparison of hierarchical vs. two-level approaches to multiclass problems. Multiple Classifier Systems, F. Roli, J. Kittler and T. Windeatt, eds., LNCS Vol. 3077, Springer, 283–92.
Google Scholar
Ramamurti, V. and J. Ghosh, 1998: On the use of localized gating in mixtures of experts networks (invited paper), SPIE Conf. on Applications and Science of Computational Intelligence, SPIE Proc. Vol. 3390, 24–35.
Google Scholar
— 1999: Structurally adaptive modular networks for nonstationary environments. IEEE Trans. on Neural Networks, 10, 152–60.
Article Google Scholar
Rasoul Safavian, S. and D. Landgrebe, 1991: A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21, 660–74.
Google Scholar
Rifkin, R. and A. Klautau, 2004: In defense of one-vs-all classification. Jl. Machine Learning Research, 5, 101–41.
Google Scholar
Sakurai-Amano, T., J. Iisaka and M. Takagi, 1997: Comparison of land cover indices of AVHRR data. International Geoscience and Remote Sensing Symposium, 916–18.
Google Scholar
Schölkopf, B., C. Burges and A. J. Smola, eds., 1998: Advances in Kernel Methods: Support Vector Learning. MIT Press.
Google Scholar
Sharkey, A., 1999: Combining Artificial Neural Nets. Springer-Verlag.
Google Scholar
Sharkey, A. J. C., N. E. Sharkey, and G. O. Chandroth, 1995: Neural nets and diversity. Proceedings of the 14th International Conference on Computer Safety, Reliability and Security, Belgirate, Italy.
Google Scholar
Tumer, K. and N. C. Oza, 1999: Decimated input ensembles for improved generalization. Proceedings of the International Joint Conference on Neural Networks, Washington, D.C.
Google Scholar
Vapnik, V., 1995: The Nature of Statistical Learning Theory. Springer.
Google Scholar

Download references

Authors

Joydeep Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Shailesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Melba M. Crawford
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ghosh, J., Kumar, S., Crawford, M.M. (2005). Automatic Discovery of Class Hierarchies via Output Space Decomposition. In: Advanced Methods for Knowledge Discovery from Complex Data. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/1-84628-284-5_2

Download citation

DOI: https://doi.org/10.1007/1-84628-284-5_2
Publisher Name: Springer, London
Print ISBN: 978-1-85233-989-0
Online ISBN: 978-1-84628-284-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics