Misclassification-guided loss under the weighted cross-entropy loss framework

Wu, Yan-Xue; Du, Kai; Wang, Xian-Jie; Min, Fan

doi:10.1007/s10115-024-02123-5

Misclassification-guided loss under the weighted cross-entropy loss framework

Regular Paper
Published: 12 May 2024

(2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yan-Xue Wu¹,
Kai Du²,
Xian-Jie Wang³ &
…
Fan Min⁴

56 Accesses
Explore all metrics

Abstract

As deep neural networks for visual recognition gain momentum, many studies have modified the loss function to improve the classification performance on long-tailed data. Typical and effective improvement strategies are to assign different weights to different classes or samples, yielding a series of cost-sensitive re-weighting cross-entropy losses. Granted, most of these strategies only focus on the properties of the training data, such as the data distribution and the samples’ distinguishability. This paper works these strategies into a weighted cross-entropy loss framework with a simple production form (\(\text {WCEL}_{\prod }\)), which takes into account different features of different losses. Also, there is this new loss function, misclassification-guided loss (MGL), that generalizes the class-wise difficulty-balanced loss and utilizes the misclassification rate on validation data to update class weights during training. In respect of MGL, a series of weighting functions with different relative preferences are introduced. Both softmax MGL and sigmoid MGL are derived to address the multi-class and multi-label classification problems. Experiments are undertaken on four public datasets, namely MNIST-LT, CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and a self-built dataset of 4 main-classes, 44 sub-classes, and altogether 57,944 images, where the results show that on the self-built dataset, the exponential weighting function achieves higher balanced accuracy than the polynomial function does. Ablation studies also show that MGL sees better performance in combination with most of other state-of-the-art loss functions under the \(\text {WCEL}_{\prod }\) framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Notes

References

Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Article Google Scholar
He K-M, Zhang X-Y, Ren S-Q, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/ARXIV.1409.1556
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970. https://doi.org/10.1109/TPAMI.2008.128
Article Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR vol 1, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. https://resolver.caltech.edu/CaltechAUTHORS:20111026-120541847. Accessed: 2022-07-12
Sinha S, Ohashi H, Nakamura K (2020) Class-wise difficulty-balanced loss for solving class-imbalance. In: ACCV. https://doi.org/10.48550/arXiv.2010.01824
Cui Y, Jia M-L, Lin T-Y, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: CVPR, pp 9268–9277. https://doi.org/10.1109/cvpr.2019.00949
Liu Z-W, Miao Z-Q, Zhan X-H, Wang J-Y, Gong B-Q, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: CVPR, pp 2537–2546. https://doi.org/10.1109/CVPR.2019.00264
Bengio S (2015) Sharing representations for long tail computer vision problems. In: ICMI, pp 1–1. https://doi.org/10.1145/2818346.2818348
Ouyang W-L, Wang X-G, Zhang C, Yang X-K (2016) Factors in finetuning deep model for object detection with long-tail distribution. In: CVPR, pp 864–873. https://doi.org/10.1109/CVPR.2016.100
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953
Article Google Scholar
He H-B, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
King G, Zeng L-C (2001) Logistic regression in rare events data. Soc Sci Electron Publ 9(2):137–163. https://doi.org/10.1093/oxfordjournals.pan.a004868
Article Google Scholar
Tan J-R, Wang C-B, Li B-Y, Li Q-Q, Ouyang W-L, Yin C-Q, Yan J-J (2020) Equalization loss for long-tailed object recognition. In: CVPR, pp 11662–11671. https://doi.org/10.1109/CVPR42600.2020.01168
Lin T-Y, Goyal P, Girshick R, He K-M, Dollár P (2017) Focal loss for dense object detection. In: ICCV, pp 2980–2988. https://doi.org/10.1109/iccv.2017.324
Leng Z-Q, Tan M-X, Liu C-X, Cubuk ED, Shi J, Cheng S-Y, Anguelov D (2022) Polyloss: a polynomial expansion perspective of classification loss functions. In: ICLR
Tan M, Le Q (2021) Efficientnetv2: smaller models and faster training. In: ICML, vol 139, pp 10096–10106. https://doi.org/10.48550/arXiv.2104.00298. https://proceedings.mlr.press/v139/tan21a.html
Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010) The balanced accuracy and its posterior distribution. In: ICPR, pp 3121–3124. https://doi.org/10.1109/ICPR.2010.764
Rao RB, Krishnan S, Niculescu RS (2006) Data mining for improved cardiac care. ACM SIGKDD Explor Newsl 8(1):3–10. https://doi.org/10.1145/1147234.1147236
Article Google Scholar
Herland M, Khoshgoftaar TM, Bauder RA (2018) Big data fraud detection using multiple medicare data sources. J Big Data 5(1):1–21. https://doi.org/10.1186/s40537-018-0138-3
Article Google Scholar
Cieslak DA, Chawla NV, Striegel A (2006) Combating imbalance in network intrusion datasets. In: GrC, pp 732–737. https://doi.org/10.1109/GRC.2006.1635905
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2):195–215. https://doi.org/10.1023/A:1007452223027
Article Google Scholar
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(27):1–54. https://doi.org/10.1186/s40537-019-0192-5
Article Google Scholar
He H-B, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284. https://doi.org/10.1109/TKDE.2008.239
Article Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232. https://doi.org/10.1007/s13748-016-0094-0
Article Google Scholar
Wang M, Lin Y, Min F, Liu D (2019) Cost-sensitive active learning through statistical methods. Inf Sci 501:460–482. https://doi.org/10.1016/j.ins.2019.06.015
Article MathSciNet Google Scholar
Wu Y-X, Min X-Y, Min F, Wang M (2019) Cost-sensitive active learning with a label uniform distribution model. Int J Approx Reason 105:49–65. https://doi.org/10.1016/j.ijar.2018.11.004
Article MathSciNet Google Scholar
Zhang Z-L, Sabuncu MR (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: NIPS, vol 31, pp 8792–8802. https://doi.org/10.5555/3327546.3327555
Wang S-J, Liu W, Wu J, Cao L-B, Meng Q-X, Kennedy PJ (2016) Training deep neural networks on imbalanced data sets. In: IJCNN, pp 4368–4374. https://doi.org/10.1109/IJCNN.2016.7727770
Romdhane TF, Pr MA (2020) Electrocardiogram heartbeat classification based on a deep convolutional neural network and focal loss. Comput Biol Med 123:103866. https://doi.org/10.1016/j.compbiomed.2020.103866
Article Google Scholar
Qiao Z, Bae A, Glass LM, Xiao C, Sun J-M (2021) Flannel (focal loss based neural network ensemble) for covid-19 detection. J Am Med Inform Assoc 28(3):444–452. https://doi.org/10.1093/jamia/ocaa280
Article Google Scholar
Wu Y-X, Hu Z-N, Wang Y-Y, Min F (2022) Rare potential poor household identification with a focus embedded logistic regression. IEEE Access 10:32954–32972. https://doi.org/10.1109/ACCESS.2022.3161574
Article Google Scholar
Yu S-H, Guo J-F, Zhang R-Q, Fan Y-X, Wang Z-Z, Cheng X-Q (2022) A re-balancing strategy for class-imbalanced classification based on instance difficulty. In: CVPR, pp 70–79. https://doi.org/10.1109/CVPR52688.2022.00017
Zhang S-Y, Li Z-M, Yan S-P, He X-M, Sun J (2021) Distribution alignment: a unified framework for long-tail visual recognition. In: CVPR, pp 2361–2370. https://doi.org/10.1109/CVPR46437.2021.00239
Zhang Y-S, Wei X-S, Zhou B-Y, Wu J-X (2021) Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. AAAI 35(4):3447–3455. https://doi.org/10.1609/aaai.v35i4.16458
Article Google Scholar
Huang C, Li Y, Loy CC, Tang X-O (2016) Learning deep representation for imbalanced classification. In: CVPR. https://doi.org/10.1109/CVPR.2016.580
Wang Y-X, Ramanan D, Hebert M (2017) Learning to model the tail. In: NeuIPS, vol 30, pp 1–11. https://doi.org/10.5555/3295222.3295446
Kang B-Y, Xie S-N, Rohrbach M, Yan Z-C, Gordo A, Feng J-S, Kalantidis Y (2020) Decoupling representation and classifier for long-tailed recognition. In: ICLR. https://doi.org/10.48550/arXiv.1910.09217
Shen L, Lin Z, Huang Q (2016) Relay backpropagation for effective learning of deep convolutional neural networks. In: ECCV, pp 467–482. https://doi.org/10.1007/978-3-319-46478-7_29
Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, van der Maaten L (2018) Exploring the limits of weakly supervised pretraining. In: ECCV, pp 185–201.https://doi.org/10.1007/978-3-030-01216-8_12
Zhou B-Y, Cui Q, Wei X-S, Chen Z-M (2020) BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: CVPR, pp 9716–9725.https://doi.org/10.1109/CVPR42600.2020.00974
Hong Y, Han S, Choi K, Seo S, Kim B, Chang B (2021) Disentangling label distribution for long-tailed visual recognition. In: CVPR, pp 6626–6636
Cao K, Wei C, Gaidon A, Aréchiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: NeurIPS, pp 1567–1578. https://doi.org/10.48550/arXiv.1906.07413
Duggal R, Freitas S, Dhamnani S, Chau DH, Sun J (2020) ELF: an early-exiting framework for long-tailed classification. arXiv preprint arXiv:2006.11979. https://doi.org/10.48550/arXiv.2006.11979
Sinha S, Ohashi H, Nakamura K (2022) Class-difficulty based methods for long-tailed visual recognition. Int J Comput Vis 130(10):2517–2531. https://doi.org/10.1007/s11263-022-01643-3
Article Google Scholar
Cai J-R, Wang Y-Z, Hwang J-N (2021) Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In: ICCV, pp 112–121. https://doi.org/10.48550/arXiv.2108.02385
Wu Y-X, Min F, Zhang B-W, Wang X-J (2023) Long-tailed image recognition through balancing discriminant quality. Artif Intell Rev 56:1–24. https://doi.org/10.1007/s10462-023-10544-x
Article Google Scholar
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: ICML. https://doi.org/10.5555/3524938.3525087
He K, Fan H-Q, Wu Y-X, Xie S-N, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: CVPR. https://doi.org/10.1109/CVPR42600.2020.00975
Wang P, Han K, Wei X-S, Zhang L, Wang L (2021) Contrastive learning based hybrid networks for long-tailed image classification. In: CVPR, pp 943–952. https://doi.org/10.1109/CVPR46437.2021.00100
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. In: NeurIPS, vol 33, pp 18661–18673 10.48550/arXiv.2004.11362
Cui J-Q, Zhong Z-S, Liu S, Yu B, Jia J-Y (2021) Parametric contrastive learning. In: ICCV, pp 715–724. https://doi.org/10.1109/ICCV48922.2021.00075
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, vol 70, pp 1126–1135. https://doi.org/10.5555/3305381.3305498
Jamal MA, Brown M, Yang M-H, Wang L, Gong B (2020) Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: CVPR. https://doi.org/10.1109/CVPR42600.2020.00763
Ren M-Y, Zeng W-Y, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: ICML. https://doi.org/10.48550/arXiv.1803.09050
Shu J, Xie Q, Yi L-X, Zhao Q, Zhou S-P, Xu Z-B, Meng D-Y (2019) Meta-weight-net: learning an explicit mapping for sample weighting. In: NeurIPS, pp 1919–1930. https://doi.org/10.48550/arXiv.1902.07379
Sinha S, Ohashi H (2022) Difficulty-net: Learning to predict difficulty for long-tailed recognition. arXiv preprint arXiv:2209.02960
Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In: CVPR, pp 4109–4118. https://doi.org/10.1109/CVPR.2018.00432
Yang Y-Z, Xu Z (2020) Rethinking the value of labels for improving class-imbalanced learning. NeuIPS 33:19290–19301. https://doi.org/10.5555/3495724.3497342
Article Google Scholar
Liu B, Li H-X, Kang H, Hua G, Vasconcelos N (2021) Gistnet: a geometric structure transfer network for long-tailed recognition. In: ICCV, pp 8189–8198. https://doi.org/10.1109/ICCV48922.2021.00810
Ren J-W, Yu C-J, Sheng S, Ma X, Zhao H-Y, Yi S, Li H-S (2020) Balanced meta-softmax for long-tailed visual recognition. In: NeurIPS, vol 33, pp 4175–4186. https://doi.org/10.5555/3495724.3496075
Zhong Z-S, Cui J-Q, Liu S, Jia J-Y (2021) Improving calibration for long-tailed recognition. In: CVPR, pp 16489–16498. https://doi.org/10.1109/CVPR46437.2021.01622
Samuel D, Chechik G (2021) Distributional robustness loss for long-tail learning. In: ICCV, pp 9495–9504. https://doi.org/10.1109/ICCV48922.2021.00936
Min X-Y, Qian K, Zhang B-W, Song G-J, Min F (2022) Multi-label active learning through serial-parallel neural networks. Knowl Based Syst 251:109226. https://doi.org/10.1016/j.knosys.2022.109226
Article Google Scholar
Tang L-C, Deng S-Y, Wu Y-X, Wen L-Y (2019) Duplicate detection algorithm for massive images based on phash block detection. J Comput Appl 39(9):2789–2794. https://doi.org/10.11772/j.issn.1001-9081.2019020792
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Krizhevsky A, Hinton GE (2009) Learning multiple layers of features from tiny images
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR, pp 1–15. https://doi.org/10.48550/arXiv.1412.6980
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: ICLR, pp 1–18. https://doi.org/10.48550/arXiv.1711.05101
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2018) Accurate, large minibatch SGD: Training imagenet in 1 hour. https://doi.org/10.48550/arXiv.1706.02677
Ketkar N, Moolayil J(2021) Introduction to PyTorch. Apress, Berkeley, CA, pp 27–91. https://doi.org/10.1007/978-1-4842-5364-9_2
Zhang H-Y, Cisse M, N Dauphin Y, Lopez-Paz D (2018) mixup: beyond empirical risk minimization. In: ICLR.https://doi.org/10.48550/arXiv.1710.09412. https://openreview.net/forum?id=r1Ddp1-Rb
Tang K-H, Huang J-Q, Zhang H-W (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS 33:1513–1524
Google Scholar
Liu Z-W, Miao Z-Q, Zhang X-H, Wang J-Y, Guo B-Q, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: CVPR

Download references

Acknowledgements

We thank Min Wang and Zhi-Heng Zhang for their valuable suggestions. This work is in part supported by the Nanchong Municipal Government-Universities Scientific Cooperation Project (grant numbers 23XNSYSX0062, 23XNSYSX0013) and the Scientific Research Project of Sichuan Tourism University (grant number 2023SCTUZK97).

Author information

Authors and Affiliations

School of Information and Engineering, Sichuan Tourism University, Hongling Road, Chengdu, 610100, Sichuan, China
Yan-Xue Wu
School of Computer Science and Software Engineering, Southwest Petroleum University, Xindu Road, Chengdu, 610500, Sichuan, China
Kai Du
College of Computer Science and Engineering, Chongqing University of Technology, Banan District, Chongqing, 400054, China
Xian-Jie Wang
Institute for Artificial Intelligence, Southwest Petroleum University, Xindu Road, Chengdu, 610500, Sichuan, China
Fan Min

Authors

Yan-Xue Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Du
View author publications
You can also search for this author in PubMed Google Scholar
Xian-Jie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fan Min
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yan-Xue Wu involved in methodology, funding acquisition, writing—original draft; Kai Du took part in investigation, software, validation, writing—review & editing; Xian-Jie Wang involved in investigation, visualization, writing—review & editing; and Fan Min took part in conceptualization, supervision, funding acquisition, writing—review & editing.

Corresponding author

Correspondence to Fan Min.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, YX., Du, K., Wang, XJ. et al. Misclassification-guided loss under the weighted cross-entropy loss framework. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02123-5

Download citation

Received: 30 March 2023
Revised: 11 March 2024
Accepted: 13 April 2024
Published: 12 May 2024
DOI: https://doi.org/10.1007/s10115-024-02123-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Misclassification-guided loss under the weighted cross-entropy loss framework

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

ImageNet Large Scale Visual Recognition Challenge

A review of convolutional neural networks in computer vision

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Misclassification-guided loss under the weighted cross-entropy loss framework

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

ImageNet Large Scale Visual Recognition Challenge

A review of convolutional neural networks in computer vision

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation