Skip to main content

MMF: Multi-task Multi-structure Fusion for Hierarchical Image Classification

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2021 (ICANN 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12894))

Included in the following conference series:

Abstract

Hierarchical classification is significant for complex tasks by providing multi-granular predictions and encouraging better mistakes. As the label structure decides its performance, many existing approaches attempt to construct an excellent label structure for promoting the classification results. In this paper, we consider that different label structures provide a variety of prior knowledge for category recognition, thus fusing them is helpful to achieve better hierarchical classification results. Furthermore, we propose a multi-task multi-structure fusion model to integrate different label structures. It contains two kinds of branches: one is the traditional classification branch to classify the common subclasses, the other is responsible for identifying the heterogeneous superclasses defined by different label structures. Besides the effect of multiple label structures, we also explore the architecture of the deep model for better hierachical classification and adjust the hierarchical evaluation metrics for multiple label structures. Experimental results on CIFAR100 and Car196 show that our method obtains significantly better results than using a flat classifier or a hierarchical classifier with any single label structure.

Supported by the National Natural Science Foundation of China (No. 62006221), the Open Research Project of the State Key Laboratory of Media Convergence and Communication, Communication University of China, China (No. SKLMCC2020KF004), the Beijing Municipal Science & Technology Commission (Z191100007119002), and the Key Research Program of Frontier Sciences, CAS, Grant NO ZDBS-LY-7024.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmed, K., Baig, M.H., Torresani, L.: Network of experts for large-scale image categorization. In: ECCV, pp. 516–532 (2016)

    Google Scholar 

  2. Barz, B., Denzler, J.: Hierarchy-based image embeddings for semantic image retrieval. In: WACV, pp. 638–647 (2019)

    Google Scholar 

  3. Bengio, S., Weston, J., Grangier, D.: Label embedding trees for large multi-class tasks. In: NIPS, pp. 163–171 (2010)

    Google Scholar 

  4. Bertinetto, L., Müller, R., Tertikas, K., Samangooei, S., Lord, N.A.: Making better mistakes: leveraging class hierarchies with deep networks. CoRR abs/1912.09393 (2019)

    Google Scholar 

  5. Chen, Y., Wang, W., Zhou, Y., Yang, F., Yang, D., Wang, W.: Self-training for domain adaptive scene text detection. In: ICPR, pp. 850–857 (2020)

    Google Scholar 

  6. Chen, Y., Zhou, Yu., Yang, D., Wang, W.: Constrained relation network for character detection in scene images. In: Nayak, A.C., Sharma, A. (eds.) PRICAI 2019. LNCS (LNAI), vol. 11672, pp. 137–149. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29894-4_11

  7. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  8. Deng, J., Krause, J., Berg, A.C., Li, F.: Hedging your bets: optimizing accuracy-specificity trade-offs in large scale visual recognition. In: CVPR (2012)

    Google Scholar 

  9. Deng, J., Satheesh, S., Berg, A.C., Li, F.: Fast and balanced: efficient label tree learning for large scale object recognition. In: NIPS, pp. 567–575 (2011)

    Google Scholar 

  10. Devries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. CoRR abs/1708.04552 (2017)

    Google Scholar 

  11. Fan, J., Zhou, N., Peng, J., Gao, L.: Hierarchical learning of tree classifiers for large-scale plant species identification. TIP 24(11), 4172–4184 (2015)

    MathSciNet  MATH  Google Scholar 

  12. Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: NIPS, pp. 2121–2129 (2013)

    Google Scholar 

  13. Griffin, G., Perona, P.: Learning and using taxonomies for fast visual categorization. In: CVPR (2008)

    Google Scholar 

  14. Guillaumin, M., Ferrari, V.: Large-scale knowledge transfer for object localization in ImageNet. In: CVPR (2012)

    Google Scholar 

  15. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: ICCV Workshops, pp. 554–561 (2013)

    Google Scholar 

  16. Lei, H., Mei, K., Zheng, N., Dong, P., Zhou, N., Fan, J.: Learning group-based dictionaries for discriminative image representation. Pattern Recognit. 47(2), 899–913 (2014)

    Article  Google Scholar 

  17. Li, S., Liu, Z., Chan, A.B.: Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In: CVPR Workshops, pp. 488–495 (2014)

    Google Scholar 

  18. Lin, D.: WordNet: an electronic lexical database. CL 25(2), 292–296 (1999)

    Google Scholar 

  19. Liu, B., Sadeghi, F., Tappen, M.F., Shamir, O., Liu, C.: Probabilistic label trees for efficient large scale image classification. In: CVPR (2013)

    Google Scholar 

  20. Liu, Y., Dou, Y., Jin, R., Li, R.: Visual confusion label tree for image classification. CoRR abs/1906.02012 (2019)

    Google Scholar 

  21. Liu, Y., Dou, Y., Jin, R., Qiao, P.: Visual tree convolutional neural network in image classification. CoRR abs/1906.01536 (2019)

    Google Scholar 

  22. Luo, D., Fang, B., Zhou, Y., Zhou, Y., Wu, D., Wang, W.: Exploring relations in untrimmed videos for self-supervised learning. CoRR abs/2008.02711 (2020)

    Google Scholar 

  23. Luo, D., et al.: Video cloze procedure for self-supervised spatio-temporal learning. In: AAAI, pp. 11701–11708 (2020)

    Google Scholar 

  24. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: NIPS, pp. 849–856 (2001)

    Google Scholar 

  25. Qiao, Z., Qin, X., Zhou, Y., Yang, F., Wang, W.: Gaussian constrained attention network for scene text recognition. In: ICPR, pp. 3328–3335 (2020)

    Google Scholar 

  26. Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: SEED: semantics enhanced encoder-decoder framework for scene text recognition. In: CVPR, pp. 13525–13534 (2020)

    Google Scholar 

  27. Qin, X., Zhou, Y., Wu, D., Yue, Y., Wang, W.: FC2RN: a fully convolutional corner refinement network for accurate multi-oriented scene text detection. CoRR abs/2007.05113 (2020)

    Google Scholar 

  28. Qin, X., Zhou, Y., Yang, D., Wang, W.: Curved text detection in natural scene images with semi- and weakly-supervised learning. In: ICDAR, pp. 559–564 (2019)

    Google Scholar 

  29. Qu, Y., et al.: Joint hierarchical category structure learning and large-scale image classification. TIP 26(9), 4331–4346 (2017)

    MathSciNet  MATH  Google Scholar 

  30. Qu, Y., Wu, S., Liu, H., Xie, Y., Wang, H.: Evaluation of local features and classifiers in BOW model for image classification. Multimedia Tools Appl. 70(2), 605–624 (2012). https://doi.org/10.1007/s11042-012-1107-z

    Article  Google Scholar 

  31. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  32. Verma, A., Qassim, H., Feinzimer, D.: Residual squeeze CNDS deep learning CNN model for very large scale places image recognition. In: UEMCON, pp. 463–469 (2017)

    Google Scholar 

  33. Wang, Y., Forsyth, D.A.: Large multi-class image categorization with ensembles of label trees. In: ICME, pp. 1–6 (2013)

    Google Scholar 

  34. Wang, Y.: Deep fuzzy tree for large-scale hierarchical visual classification. ITFS 28(7), 1395–1406 (2020)

    Google Scholar 

  35. Wang, Y., Wang, Z., Hu, Q., Zhou, Y., Su, H.: Hierarchical semantic risk minimization for large-scale classification. ITC (2021)

    Google Scholar 

  36. Wu, H., Merler, M., Uceda-Sosa, R., Smith, J.R.: Learning to make better mistakes: semantics-aware visual food recognition. In: ACM MM, pp. 172–176 (2016)

    Google Scholar 

  37. Yang, D., Zhou, Y., Wu, D., Ma, C., Yang, F., Wang, W.: Two-level residual distillation based triple network for incremental object detection. CoRR abs/2007.13428 (2020)

    Google Scholar 

  38. Yao, Y., Liu, C., Luo, D., Zhou, Y., Ye, Q.: Video playback rate perception for self-supervised spatio-temporal representation learning. In: CVPR, pp. 6547–6556 (2020)

    Google Scholar 

  39. Zhang, Y., Liu, C., Zhou, Y., Wang, W., Wang, W., Ye, Q.: Progressive cluster purification for unsupervised feature learning. In: ICPR, pp. 8476–8483 (2020)

    Google Scholar 

  40. Zhang, Y., Zhou, Y., Wang, W.: Exploring instance relations for unsupervised feature embedding. CoRR abs/2105.03341 (2021)

    Google Scholar 

  41. Zhao, B., Li, F., Xing, E.P.: Large-scale category structure aware image categorization. In: NIPS, pp. 1251–1259 (2011)

    Google Scholar 

  42. Zhao, H., Hu, Q., Zhu, P., Wang, Y., Wang, P.: A recursive regularization based feature selection framework for hierarchical classification. ITKDA (2020)

    Google Scholar 

  43. Zhao, S., Zou, Q.: Fusing multiple hierarchies for semantic hierarchical classification. IJMLC 6(1), 47 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yucan Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Zhou, Y., Zhou, Y., Wang, W. (2021). MMF: Multi-task Multi-structure Fusion for Hierarchical Image Classification. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86380-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86379-1

  • Online ISBN: 978-3-030-86380-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics