MMF: Multi-task Multi-structure Fusion for Hierarchical Image Classification

Li, Xiaoni; Zhou, Yucan; Zhou, Yu; Wang, Weiping

doi:10.1007/978-3-030-86380-7_6

Xiaoni Li^12,13,
Yucan Zhou¹²,
Yu Zhou¹² &
…
Weiping Wang¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12894))

Included in the following conference series:

International Conference on Artificial Neural Networks

2296 Accesses
4 Citations

Abstract

Hierarchical classification is significant for complex tasks by providing multi-granular predictions and encouraging better mistakes. As the label structure decides its performance, many existing approaches attempt to construct an excellent label structure for promoting the classification results. In this paper, we consider that different label structures provide a variety of prior knowledge for category recognition, thus fusing them is helpful to achieve better hierarchical classification results. Furthermore, we propose a multi-task multi-structure fusion model to integrate different label structures. It contains two kinds of branches: one is the traditional classification branch to classify the common subclasses, the other is responsible for identifying the heterogeneous superclasses defined by different label structures. Besides the effect of multiple label structures, we also explore the architecture of the deep model for better hierachical classification and adjust the hierarchical evaluation metrics for multiple label structures. Experimental results on CIFAR100 and Car196 show that our method obtains significantly better results than using a flat classifier or a hierarchical classifier with any single label structure.

Supported by the National Natural Science Foundation of China (No. 62006221), the Open Research Project of the State Key Laboratory of Media Convergence and Communication, Communication University of China, China (No. SKLMCC2020KF004), the Beijing Municipal Science & Technology Commission (Z191100007119002), and the Key Research Program of Frontier Sciences, CAS, Grant NO ZDBS-LY-7024.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmed, K., Baig, M.H., Torresani, L.: Network of experts for large-scale image categorization. In: ECCV, pp. 516–532 (2016)
Google Scholar
Barz, B., Denzler, J.: Hierarchy-based image embeddings for semantic image retrieval. In: WACV, pp. 638–647 (2019)
Google Scholar
Bengio, S., Weston, J., Grangier, D.: Label embedding trees for large multi-class tasks. In: NIPS, pp. 163–171 (2010)
Google Scholar
Bertinetto, L., Müller, R., Tertikas, K., Samangooei, S., Lord, N.A.: Making better mistakes: leveraging class hierarchies with deep networks. CoRR abs/1912.09393 (2019)
Google Scholar
Chen, Y., Wang, W., Zhou, Y., Yang, F., Yang, D., Wang, W.: Self-training for domain adaptive scene text detection. In: ICPR, pp. 850–857 (2020)
Google Scholar
Chen, Y., Zhou, Yu., Yang, D., Wang, W.: Constrained relation network for character detection in scene images. In: Nayak, A.C., Sharma, A. (eds.) PRICAI 2019. LNCS (LNAI), vol. 11672, pp. 137–149. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29894-4_11
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Deng, J., Krause, J., Berg, A.C., Li, F.: Hedging your bets: optimizing accuracy-specificity trade-offs in large scale visual recognition. In: CVPR (2012)
Google Scholar
Deng, J., Satheesh, S., Berg, A.C., Li, F.: Fast and balanced: efficient label tree learning for large scale object recognition. In: NIPS, pp. 567–575 (2011)
Google Scholar
Devries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. CoRR abs/1708.04552 (2017)
Google Scholar
Fan, J., Zhou, N., Peng, J., Gao, L.: Hierarchical learning of tree classifiers for large-scale plant species identification. TIP 24(11), 4172–4184 (2015)
MathSciNet MATH Google Scholar
Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: NIPS, pp. 2121–2129 (2013)
Google Scholar
Griffin, G., Perona, P.: Learning and using taxonomies for fast visual categorization. In: CVPR (2008)
Google Scholar
Guillaumin, M., Ferrari, V.: Large-scale knowledge transfer for object localization in ImageNet. In: CVPR (2012)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: ICCV Workshops, pp. 554–561 (2013)
Google Scholar
Lei, H., Mei, K., Zheng, N., Dong, P., Zhou, N., Fan, J.: Learning group-based dictionaries for discriminative image representation. Pattern Recognit. 47(2), 899–913 (2014)
Article Google Scholar
Li, S., Liu, Z., Chan, A.B.: Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In: CVPR Workshops, pp. 488–495 (2014)
Google Scholar
Lin, D.: WordNet: an electronic lexical database. CL 25(2), 292–296 (1999)
Google Scholar
Liu, B., Sadeghi, F., Tappen, M.F., Shamir, O., Liu, C.: Probabilistic label trees for efficient large scale image classification. In: CVPR (2013)
Google Scholar
Liu, Y., Dou, Y., Jin, R., Li, R.: Visual confusion label tree for image classification. CoRR abs/1906.02012 (2019)
Google Scholar
Liu, Y., Dou, Y., Jin, R., Qiao, P.: Visual tree convolutional neural network in image classification. CoRR abs/1906.01536 (2019)
Google Scholar
Luo, D., Fang, B., Zhou, Y., Zhou, Y., Wu, D., Wang, W.: Exploring relations in untrimmed videos for self-supervised learning. CoRR abs/2008.02711 (2020)
Google Scholar
Luo, D., et al.: Video cloze procedure for self-supervised spatio-temporal learning. In: AAAI, pp. 11701–11708 (2020)
Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: NIPS, pp. 849–856 (2001)
Google Scholar
Qiao, Z., Qin, X., Zhou, Y., Yang, F., Wang, W.: Gaussian constrained attention network for scene text recognition. In: ICPR, pp. 3328–3335 (2020)
Google Scholar
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: SEED: semantics enhanced encoder-decoder framework for scene text recognition. In: CVPR, pp. 13525–13534 (2020)
Google Scholar
Qin, X., Zhou, Y., Wu, D., Yue, Y., Wang, W.: FC2RN: a fully convolutional corner refinement network for accurate multi-oriented scene text detection. CoRR abs/2007.05113 (2020)
Google Scholar
Qin, X., Zhou, Y., Yang, D., Wang, W.: Curved text detection in natural scene images with semi- and weakly-supervised learning. In: ICDAR, pp. 559–564 (2019)
Google Scholar
Qu, Y., et al.: Joint hierarchical category structure learning and large-scale image classification. TIP 26(9), 4331–4346 (2017)
MathSciNet MATH Google Scholar
Qu, Y., Wu, S., Liu, H., Xie, Y., Wang, H.: Evaluation of local features and classifiers in BOW model for image classification. Multimedia Tools Appl. 70(2), 605–624 (2012). https://doi.org/10.1007/s11042-012-1107-z
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Verma, A., Qassim, H., Feinzimer, D.: Residual squeeze CNDS deep learning CNN model for very large scale places image recognition. In: UEMCON, pp. 463–469 (2017)
Google Scholar
Wang, Y., Forsyth, D.A.: Large multi-class image categorization with ensembles of label trees. In: ICME, pp. 1–6 (2013)
Google Scholar
Wang, Y.: Deep fuzzy tree for large-scale hierarchical visual classification. ITFS 28(7), 1395–1406 (2020)
Google Scholar
Wang, Y., Wang, Z., Hu, Q., Zhou, Y., Su, H.: Hierarchical semantic risk minimization for large-scale classification. ITC (2021)
Google Scholar
Wu, H., Merler, M., Uceda-Sosa, R., Smith, J.R.: Learning to make better mistakes: semantics-aware visual food recognition. In: ACM MM, pp. 172–176 (2016)
Google Scholar
Yang, D., Zhou, Y., Wu, D., Ma, C., Yang, F., Wang, W.: Two-level residual distillation based triple network for incremental object detection. CoRR abs/2007.13428 (2020)
Google Scholar
Yao, Y., Liu, C., Luo, D., Zhou, Y., Ye, Q.: Video playback rate perception for self-supervised spatio-temporal representation learning. In: CVPR, pp. 6547–6556 (2020)
Google Scholar
Zhang, Y., Liu, C., Zhou, Y., Wang, W., Wang, W., Ye, Q.: Progressive cluster purification for unsupervised feature learning. In: ICPR, pp. 8476–8483 (2020)
Google Scholar
Zhang, Y., Zhou, Y., Wang, W.: Exploring instance relations for unsupervised feature embedding. CoRR abs/2105.03341 (2021)
Google Scholar
Zhao, B., Li, F., Xing, E.P.: Large-scale category structure aware image categorization. In: NIPS, pp. 1251–1259 (2011)
Google Scholar
Zhao, H., Hu, Q., Zhu, P., Wang, Y., Wang, P.: A recursive regularization based feature selection framework for hierarchical classification. ITKDA (2020)
Google Scholar
Zhao, S., Zou, Q.: Fusing multiple hierarchies for semantic hierarchical classification. IJMLC 6(1), 47 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Xiaoni Li, Yucan Zhou, Yu Zhou & Weiping Wang
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Xiaoni Li

Authors

Xiaoni Li
View author publications
You can also search for this author in PubMed Google Scholar
Yucan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Weiping Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yucan Zhou .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Zhou, Y., Zhou, Y., Wang, W. (2021). MMF: Multi-task Multi-structure Fusion for Hierarchical Image Classification. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-86380-7_6
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86379-1
Online ISBN: 978-3-030-86380-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics