Skip to main content

Automatic Discovery of Class Hierarchies via Output Space Decomposition

  • Chapter
Book cover Advanced Methods for Knowledge Discovery from Complex Data

Summary

Many complex pattern classification problems involve high-dimensional inputs as well as a large number of classes. In this chapter, we present a modular learning framework called the Binary Hierarchical Classifier (BHC) that takes a coarse-to-fine approach to dealing with a large number of output classes. BHC decomposes a C-class problem into a set of C-1 two-(meta)class problems, arranged in a binary tree with C leaf nodes and C-1 internal nodes. Each internal node is comprised of a feature extractor and a classifier that discriminates between the two meta-classes represented by its two children. Both bottom-up and top-down approaches for building such a BHC are presented in this chapter. The Bottom-up Binary Hierarchical Classifier (BU-BHC) is built by applying agglomerative clustering to the set of C classes. The Top-down Binary Hierarchical Classifier (TD-BHC) is built by recursively partitioning a set of classes at any internal node into two disjoint groups or meta-classes. The coupled problems of finding a good partition and of searching for a linear feature extractor that best discriminates the two resulting meta-classes are solved simultaneously at each stage of the recursive algorithm. The hierarchical, multistage classification approach taken by the BHC also helps with dealing with high-dimensional data, since simpler feature spaces are often adequate for solving the two-(meta)class problems. In addition, it leads to the discovery of useful domain knowledge such as class hierarchies or ontologies, and results in more interpretable results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ballard, D., 1987: Modular learning in neural networks. Proc. AAAI-87, 279–84.

    Google Scholar 

  2. Bellman, R. E., ed., 1961: Adaptive Control Processes. Princeton University Press.

    Google Scholar 

  3. Breiman, L., J. H. Friedman, R. Olshen and C. J. Stone, 1984: Classification and Regression Trees. Wadsworth and Brooks, Pacific Grove, California.

    Google Scholar 

  4. Brill, F. Z., D. E. Brown and W. N. Martin, 1992: Fast genetic selection of features for neural network classifiers. IEEE Transactions on Neural Networks, 3, 324–28.

    Article  Google Scholar 

  5. Chakrabarti, S., B. Dom, R. Agrawal and P. Raghavan, 1998: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB Journal, 7, 163–78.

    Google Scholar 

  6. Chakravarthy, S., J. Ghosh, L. Deuser and S. Beck, 1991: Efficient training procedures for adaptive kernel classifiers. Neural Networks for Signal Processing, IEEE Press, 21–9.

    Google Scholar 

  7. Crawford, M. M., S. Kumar, M. R. Ricard, J. C. Gibeaut and A. Neuenshwander, 1999: Fusion of airborne polarimetric and interferometric SAR for classification of coastal environments. IEEE Transactions on Geoscience and Remote Sensing, 37, 1306–15.

    Article  Google Scholar 

  8. Dattatreya, G. R. and L. N. Kanal, 1985: Decision trees in pattern recognition. Progress in Pattern Recognition 2, L. N. Kanal and A. Rosenfeld, eds., Elsevier Science, 189–239.

    Google Scholar 

  9. Deco, G. and L. Parra, 1997: Nonlinear feature extraction by redundancy reduction in an unsupervised stochastic neural network. Neural Networks, 10, 683–91.

    Article  Google Scholar 

  10. Dietterich, T. G. and G. Bakiri, 1995: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–86.

    Google Scholar 

  11. Duda, R. and P. Hart, 1973: Pattern Classification and Scene Analysis. Addison-Wesley.

    Google Scholar 

  12. Etemad, K. and R. Chellappa, 1998: Separability-based multiscale basis selection and feature extraction for signal and image classification. IEEE Transactions on Image Processing, 7, 1453–65.

    Article  Google Scholar 

  13. Friedman, J., 1989: Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165–75.

    MathSciNet  Google Scholar 

  14. — 1996: Another approach to polychotomous classification. Technical report, Stanford University.

    Google Scholar 

  15. — 1996: On bias, variance, 0/1 loss, and the curse-of-dimensionality. Technical report, Department of Statistics, Stanford University.

    Google Scholar 

  16. Fukunaga, K., 1990: Introduction to Statistical Pattern Recognition (2nd Ed.), Academic Press, NY.

    Google Scholar 

  17. Furnkranz, J., 2002: Round robin classification. Jl. Machine Learning Research, 2, 721–47.

    MathSciNet  Google Scholar 

  18. Ghosh, J., 2003: Scalable clustering. The Handbook of Data Mining, N. Ye, ed., Lawrence Erlbaum Assoc., 247–77.

    Google Scholar 

  19. Godbole, S., S. Sarawagi and S. Chakrabarti, 2002: Scaling multi-class support vector machines using inter-class confusion. Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (KDD-02), 513–18.

    Google Scholar 

  20. Hand, D., 1982: Kernel Discriminant Analysis. Research Studies Press, Chichester, UK.

    Google Scholar 

  21. Happel, B. and J. Murre, 1994: Design and evolution of modular neural network architectures. Neural Networks, 7:6/7, 985–1004.

    Google Scholar 

  22. Hastie, T. and R. Tibshirani, 1996: Discriminant adaptive nearest neightbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-18, 607–16.

    Google Scholar 

  23. — 1998: Classification by pairwise coupling. Advances in Neural Information Processing Systems, M. J. K. Michael, I. Jordan and S. A. Solla, eds., MIT Press, Cambridge, Massachusetts, 10, 507–13.

    Google Scholar 

  24. Henneguelle, A., J. Ghosh and M. M. Crawford, 2003: Polyline feature extraction for land cover classification using hyperspectral data. Proc. IICAI-03, 256–69.

    Google Scholar 

  25. Hsu, C. W. and C. J. Lin, 2002: A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13, 415–25.

    Article  Google Scholar 

  26. Jordan, M. and R. Jacobs, 1994: Hierarchical mixture of experts and the EM algorithm. Neural Computation, 6, 181–214.

    Google Scholar 

  27. Khotanzad, A. and Y. Hong, 1990: Invariant image recognition by zernike moments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 28–37.

    Article  Google Scholar 

  28. Kittler, J. and F. Roli, eds., 2001: Multiple Classifier Systems. LNCS Vol. 1857, Springer.

    Google Scholar 

  29. Kumar, S., 2000: Modular learning through output space decomposition. Ph.D. thesis, Dept. of ECE, Univ. of Texas at Austin, USA.

    Google Scholar 

  30. Kumar, S. and J. Ghosh, 1999: GAMLS: A generalized framework for associative modular learning systems (invited paper). Proceedings of the Applications and Science of Computational Intelligence II, Orlando, Florida, 24–34.

    Google Scholar 

  31. Kumar, S., J. Ghosh and M. M. Crawford, 1999: A versatile framework for labeling imagery with a large number of classes. Proceedings of the International Joint Conference on Neural Networks, Washington, D.C.

    Google Scholar 

  32. — 2002: Hierarchical fusion of multiple classifiers for hyperspectral data analysis. Pattern Analysis and Applications, splecial issue on Fusion of Multiple Classifiers, 5, 210–20.

    MathSciNet  Google Scholar 

  33. Mao, J. and A. K. Jain, 1995: Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks, 6(2), 296–317.

    Google Scholar 

  34. McLachlan, G. J., 1992: Discriminant Analysis and Statistical Pattern Recognition. John Wiley, New York.

    Google Scholar 

  35. Merz, C. and P. Murphy, 1996: UCI repository of machine learning databases. URL: www.ics.uci.edu/~mlearn/MLRepository.html.

    Google Scholar 

  36. Murray-Smith, R. and T. A. Johansen, 1997: Multiple Model Approaches to Modelling and Control. Taylor and Francis, UK.

    Google Scholar 

  37. Nilsson, N. J., 1965: Learning Machines: Foundations of Trainable Pattern-Classifying Systems. McGraw Hill, NY.

    Google Scholar 

  38. Petridis, V. and A. Kehagias, 1998: Predictive Modular Neural Networks: Applications to Time Series. Kluwer Academic Publishers, Boston.

    Google Scholar 

  39. Platt, J. C., N. Cristianini and J. Shawe-Taylor, 2000: Large margin DAGs for multiclass classification. MIT Press, 12, 547–53.

    Google Scholar 

  40. Rajan, S. and J. Ghosh, 2004: An empirical comparison of hierarchical vs. two-level approaches to multiclass problems. Multiple Classifier Systems, F. Roli, J. Kittler and T. Windeatt, eds., LNCS Vol. 3077, Springer, 283–92.

    Google Scholar 

  41. Ramamurti, V. and J. Ghosh, 1998: On the use of localized gating in mixtures of experts networks (invited paper), SPIE Conf. on Applications and Science of Computational Intelligence, SPIE Proc. Vol. 3390, 24–35.

    Google Scholar 

  42. — 1999: Structurally adaptive modular networks for nonstationary environments. IEEE Trans. on Neural Networks, 10, 152–60.

    Article  Google Scholar 

  43. Rasoul Safavian, S. and D. Landgrebe, 1991: A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21, 660–74.

    Google Scholar 

  44. Rifkin, R. and A. Klautau, 2004: In defense of one-vs-all classification. Jl. Machine Learning Research, 5, 101–41.

    Google Scholar 

  45. Sakurai-Amano, T., J. Iisaka and M. Takagi, 1997: Comparison of land cover indices of AVHRR data. International Geoscience and Remote Sensing Symposium, 916–18.

    Google Scholar 

  46. Schölkopf, B., C. Burges and A. J. Smola, eds., 1998: Advances in Kernel Methods: Support Vector Learning. MIT Press.

    Google Scholar 

  47. Sharkey, A., 1999: Combining Artificial Neural Nets. Springer-Verlag.

    Google Scholar 

  48. Sharkey, A. J. C., N. E. Sharkey, and G. O. Chandroth, 1995: Neural nets and diversity. Proceedings of the 14th International Conference on Computer Safety, Reliability and Security, Belgirate, Italy.

    Google Scholar 

  49. Tumer, K. and N. C. Oza, 1999: Decimated input ensembles for improved generalization. Proceedings of the International Joint Conference on Neural Networks, Washington, D.C.

    Google Scholar 

  50. Vapnik, V., 1995: The Nature of Statistical Learning Theory. Springer.

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Dr Sanghamitra Bandyopadhyay

About this chapter

Cite this chapter

Ghosh, J., Kumar, S., Crawford, M.M. (2005). Automatic Discovery of Class Hierarchies via Output Space Decomposition. In: Advanced Methods for Knowledge Discovery from Complex Data. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/1-84628-284-5_2

Download citation

  • DOI: https://doi.org/10.1007/1-84628-284-5_2

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-989-0

  • Online ISBN: 978-1-84628-284-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics