Taxonomic Prediction with Tree-Structured Covariances

  • Matthew B. Blaschko
  • Wojciech Zaremba
  • Arthur Gretton
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8189)


Taxonomies have been proposed numerous times in the literature in order to encode semantic relationships between classes. Such taxonomies have been used to improve classification results by increasing the statistical efficiency of learning, as similarities between classes can be used to increase the amount of relevant data during training. In this paper, we show how data-derived taxonomies may be used in a structured prediction framework, and compare the performance of learned and semantically constructed taxonomies. Structured prediction in this case is multi-class categorization with the assumption that categories are taxonomically related. We make three main contributions: (i) We prove the equivalence between tree-structured covariance matrices and taxonomies; (ii) We use this covariance representation to develop a highly computationally efficient optimization algorithm for structured prediction with taxonomies; (iii) We show that the taxonomies learned from data using the Hilbert- Schmidt Independence Criterion (HSIC) often perform better than imputed semantic taxonomies. Source code of this implementation, as well as machine readable learned taxonomies are available for download from


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zweig, A., Weinshall, D.: Exploiting object hierarchy: Combining models from different category levels. In: ICCV (2007)Google Scholar
  2. 2.
    Binder, A., Müller, K.R., Kawanabe, M.: On taxonomies for multi-class image categorization. IJCV (2012)Google Scholar
  3. 3.
    Blaschko, M.B., Gretton, A.: Learning taxonomies by dependence maximization. In: NIPS (2009)Google Scholar
  4. 4.
    Lampert, C.H., Blaschko, M.B.: A multiple kernel learning approach to joint multi-class object detection. In: Rigoll, G. (ed.) DAGM 2008. LNCS, vol. 5096, pp. 31–40. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Tibshirani, R., Hastie, T.: Margin trees for high-dimensional classification. JMLR 8, 637–652 (2007)MATHGoogle Scholar
  6. 6.
    Fan, X.: Efficient multiclass object detection by a hierarchy of classifiers. In: CVPR (2005)Google Scholar
  7. 7.
    Griffin, G., Perona, P.: Learning and using taxonomies for fast visual categorization. In: CVPR (2008)Google Scholar
  8. 8.
    Marszałek, M., Schmid, C.: Semantic hierarchies for visual object recognition. In: CVPR (2007)Google Scholar
  9. 9.
    Marszałek, M., Schmid, C.: Constructing category hierarchies for visual recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 479–491. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Zhao, B., Li, F.F.F., Xing, E.P.: Large-scale category structure aware image categorization. In: NIPS, pp. 1251–1259 (2011)Google Scholar
  11. 11.
    Mittal, A., Blaschko, M.B., Zisserman, A., Torr, P.H.S.: Taxonomic multi-class prediction and person layout using efficient structured ranking. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 245–258. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    McAuley, J., Ramisa, A., Caetano, T.: Optimization of robust loss functions for weakly-labeled image taxonomies. IJCV, 1–19 (2012)Google Scholar
  13. 13.
    Weinberger, K., Chapelle, O.: Large margin taxonomy embedding for document categorization. In: NIPS, pp. 1737–1744 (2009)Google Scholar
  14. 14.
    Bengio, S., Weston, J., Grangier, D.: Label embedding trees for large multi-class tasks. In: NIPS, pp. 163–171 (2010)Google Scholar
  15. 15.
    Gao, T., Koller, D.: Discriminative learning of relaxed hierarchy for large-scale visual recognition. In: ICCV, pp. 2072–2079 (2011)Google Scholar
  16. 16.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  17. 17.
    Nilsback, M.E., Zisserman, A.: Delving deeper into the whorl of flower segmentation. Image and Vision Computing (2009)Google Scholar
  18. 18.
    World Intellectual Property Organization: WIPO-alpha data set (2009),
  19. 19.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)Google Scholar
  20. 20.
    Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)MATHCrossRefGoogle Scholar
  21. 21.
    Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: CIKM (2004)Google Scholar
  22. 22.
    Wang, K., Zhou, S., Liew, S.C.: Building hierarchical classifiers using class proximity. In: VLDB (1999)Google Scholar
  23. 23.
    Cavalli-Sforza, L.L., Edwards, A.W.F.: Phylogenetic analysis: Models and estimation procedures. American Journal of Human Genetics 19, 223–257 (1967)Google Scholar
  24. 24.
    Corrada Bravo, H., Wright, S., Eng, K., Keleş, S., Wahba, G.: Estimating tree-structured covariance matrices via mixed-integer programming. In: AISTATS (2009)Google Scholar
  25. 25.
    Buneman, P.: The recovery of trees from measures of dissimilarity. In: Kendall, D.G., Tautu, P. (eds.) Mathematics in the Archeological and Historical Sciences, pp. 387–395. Edinburgh University Press (1971)Google Scholar
  26. 26.
    Lafferty, J., Zhu, X., Liu, Y.: Kernel conditional random fields: representation and clique selection. In: ICML (2004)Google Scholar
  27. 27.
    Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley (1988)Google Scholar
  28. 28.
    Bottou, L., Chapelle, O., DeCoste, D., Weston, J.: Large-Scale Kernel Machines. MIT Press (2007)Google Scholar
  29. 29.
    Fukumizu, K., Gretton, A., Sun, X., Schölkopf, B.: Kernel measures of conditional dependence. In: NIPS, pp. 489–496 (2008)Google Scholar
  30. 30.
    Song, L., Smola, A., Gretton, A., Borgwardt, K.M.: A dependence maximization view of clustering. In: ICML (2007)Google Scholar
  31. 31.
    Blaschko, M.B., Gretton, A.: Taxonomy inference using kernel dependence measures. Technical Report 181, Max Planck Inst. for Bio. Cybernetics (2008)Google Scholar
  32. 32.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. JMLR 2, 265–292 (2002)MATHGoogle Scholar
  33. 33.
    Gehler, P., Nowozin, S.: On feature combination methods for multiclass object classification. In: ICCV (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Matthew B. Blaschko
    • 1
    • 2
  • Wojciech Zaremba
    • 1
    • 2
  • Arthur Gretton
    • 3
  1. 1.Center for Visual ComputingÉcole Centrale ParisFrance
  2. 2.Équipe Galen, INRIA Saclay, Île-de-FranceFrance
  3. 3.Gatsby Computational Neuroscience UnitUniversity College LondonUK

Personalised recommendations