Skip to main content
Log in

Misclassification-guided loss under the weighted cross-entropy loss framework

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

As deep neural networks for visual recognition gain momentum, many studies have modified the loss function to improve the classification performance on long-tailed data. Typical and effective improvement strategies are to assign different weights to different classes or samples, yielding a series of cost-sensitive re-weighting cross-entropy losses. Granted, most of these strategies only focus on the properties of the training data, such as the data distribution and the samples’ distinguishability. This paper works these strategies into a weighted cross-entropy loss framework with a simple production form (\(\text {WCEL}_{\prod }\)), which takes into account different features of different losses. Also, there is this new loss function, misclassification-guided loss (MGL), that generalizes the class-wise difficulty-balanced loss and utilizes the misclassification rate on validation data to update class weights during training. In respect of MGL, a series of weighting functions with different relative preferences are introduced. Both softmax MGL and sigmoid MGL are derived to address the multi-class and multi-label classification problems. Experiments are undertaken on four public datasets, namely MNIST-LT, CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and a self-built dataset of 4 main-classes, 44 sub-classes, and altogether 57,944 images, where the results show that on the self-built dataset, the exponential weighting function achieves higher balanced accuracy than the polynomial function does. Ablation studies also show that MGL sees better performance in combination with most of other state-of-the-art loss functions under the \(\text {WCEL}_{\prod }\) framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://focal-loss.readthedocs.io/en/latest/generated/focal_loss.sparse_categorical_focal_loss.html.

  2. https://github.com/hitachi-rd-cv/CDB-loss/blob/main/EGTEA/losses/eql_loss.py.

  3. https://github.com/hitachi-rd-cv/CDB-loss/blob/82b0119e189a50beb3ae581d401dd8585739c-112/CIFAR-LT/cifar_train.py#L28.

References

  1. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  2. He K-M, Zhang X-Y, Ren S-Q, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  3. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/ARXIV.1409.1556

  4. Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970. https://doi.org/10.1109/TPAMI.2008.128

    Article  Google Scholar 

  5. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR vol 1, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  6. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. https://resolver.caltech.edu/CaltechAUTHORS:20111026-120541847. Accessed: 2022-07-12

  7. Sinha S, Ohashi H, Nakamura K (2020) Class-wise difficulty-balanced loss for solving class-imbalance. In: ACCV. https://doi.org/10.48550/arXiv.2010.01824

  8. Cui Y, Jia M-L, Lin T-Y, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: CVPR, pp 9268–9277. https://doi.org/10.1109/cvpr.2019.00949

  9. Liu Z-W, Miao Z-Q, Zhan X-H, Wang J-Y, Gong B-Q, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: CVPR, pp 2537–2546. https://doi.org/10.1109/CVPR.2019.00264

  10. Bengio S (2015) Sharing representations for long tail computer vision problems. In: ICMI, pp 1–1. https://doi.org/10.1145/2818346.2818348

  11. Ouyang W-L, Wang X-G, Zhang C, Yang X-K (2016) Factors in finetuning deep model for object detection with long-tail distribution. In: CVPR, pp 864–873. https://doi.org/10.1109/CVPR.2016.100

  12. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953

    Article  Google Scholar 

  13. He H-B, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969

  14. King G, Zeng L-C (2001) Logistic regression in rare events data. Soc Sci Electron Publ 9(2):137–163. https://doi.org/10.1093/oxfordjournals.pan.a004868

    Article  Google Scholar 

  15. Tan J-R, Wang C-B, Li B-Y, Li Q-Q, Ouyang W-L, Yin C-Q, Yan J-J (2020) Equalization loss for long-tailed object recognition. In: CVPR, pp 11662–11671. https://doi.org/10.1109/CVPR42600.2020.01168

  16. Lin T-Y, Goyal P, Girshick R, He K-M, Dollár P (2017) Focal loss for dense object detection. In: ICCV, pp 2980–2988. https://doi.org/10.1109/iccv.2017.324

  17. Leng Z-Q, Tan M-X, Liu C-X, Cubuk ED, Shi J, Cheng S-Y, Anguelov D (2022) Polyloss: a polynomial expansion perspective of classification loss functions. In: ICLR

  18. Tan M, Le Q (2021) Efficientnetv2: smaller models and faster training. In: ICML, vol 139, pp 10096–10106. https://doi.org/10.48550/arXiv.2104.00298. https://proceedings.mlr.press/v139/tan21a.html

  19. Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010) The balanced accuracy and its posterior distribution. In: ICPR, pp 3121–3124. https://doi.org/10.1109/ICPR.2010.764

  20. Rao RB, Krishnan S, Niculescu RS (2006) Data mining for improved cardiac care. ACM SIGKDD Explor Newsl 8(1):3–10. https://doi.org/10.1145/1147234.1147236

    Article  Google Scholar 

  21. Herland M, Khoshgoftaar TM, Bauder RA (2018) Big data fraud detection using multiple medicare data sources. J Big Data 5(1):1–21. https://doi.org/10.1186/s40537-018-0138-3

    Article  Google Scholar 

  22. Cieslak DA, Chawla NV, Striegel A (2006) Combating imbalance in network intrusion datasets. In: GrC, pp 732–737. https://doi.org/10.1109/GRC.2006.1635905

  23. Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2):195–215. https://doi.org/10.1023/A:1007452223027

    Article  Google Scholar 

  24. Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(27):1–54. https://doi.org/10.1186/s40537-019-0192-5

    Article  Google Scholar 

  25. He H-B, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284. https://doi.org/10.1109/TKDE.2008.239

    Article  Google Scholar 

  26. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232. https://doi.org/10.1007/s13748-016-0094-0

    Article  Google Scholar 

  27. Wang M, Lin Y, Min F, Liu D (2019) Cost-sensitive active learning through statistical methods. Inf Sci 501:460–482. https://doi.org/10.1016/j.ins.2019.06.015

    Article  MathSciNet  Google Scholar 

  28. Wu Y-X, Min X-Y, Min F, Wang M (2019) Cost-sensitive active learning with a label uniform distribution model. Int J Approx Reason 105:49–65. https://doi.org/10.1016/j.ijar.2018.11.004

    Article  MathSciNet  Google Scholar 

  29. Zhang Z-L, Sabuncu MR (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: NIPS, vol 31, pp 8792–8802. https://doi.org/10.5555/3327546.3327555

  30. Wang S-J, Liu W, Wu J, Cao L-B, Meng Q-X, Kennedy PJ (2016) Training deep neural networks on imbalanced data sets. In: IJCNN, pp 4368–4374. https://doi.org/10.1109/IJCNN.2016.7727770

  31. Romdhane TF, Pr MA (2020) Electrocardiogram heartbeat classification based on a deep convolutional neural network and focal loss. Comput Biol Med 123:103866. https://doi.org/10.1016/j.compbiomed.2020.103866

    Article  Google Scholar 

  32. Qiao Z, Bae A, Glass LM, Xiao C, Sun J-M (2021) Flannel (focal loss based neural network ensemble) for covid-19 detection. J Am Med Inform Assoc 28(3):444–452. https://doi.org/10.1093/jamia/ocaa280

    Article  Google Scholar 

  33. Wu Y-X, Hu Z-N, Wang Y-Y, Min F (2022) Rare potential poor household identification with a focus embedded logistic regression. IEEE Access 10:32954–32972. https://doi.org/10.1109/ACCESS.2022.3161574

    Article  Google Scholar 

  34. Yu S-H, Guo J-F, Zhang R-Q, Fan Y-X, Wang Z-Z, Cheng X-Q (2022) A re-balancing strategy for class-imbalanced classification based on instance difficulty. In: CVPR, pp 70–79. https://doi.org/10.1109/CVPR52688.2022.00017

  35. Zhang S-Y, Li Z-M, Yan S-P, He X-M, Sun J (2021) Distribution alignment: a unified framework for long-tail visual recognition. In: CVPR, pp 2361–2370. https://doi.org/10.1109/CVPR46437.2021.00239

  36. Zhang Y-S, Wei X-S, Zhou B-Y, Wu J-X (2021) Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. AAAI 35(4):3447–3455. https://doi.org/10.1609/aaai.v35i4.16458

    Article  Google Scholar 

  37. Huang C, Li Y, Loy CC, Tang X-O (2016) Learning deep representation for imbalanced classification. In: CVPR. https://doi.org/10.1109/CVPR.2016.580

  38. Wang Y-X, Ramanan D, Hebert M (2017) Learning to model the tail. In: NeuIPS, vol 30, pp 1–11. https://doi.org/10.5555/3295222.3295446

  39. Kang B-Y, Xie S-N, Rohrbach M, Yan Z-C, Gordo A, Feng J-S, Kalantidis Y (2020) Decoupling representation and classifier for long-tailed recognition. In: ICLR. https://doi.org/10.48550/arXiv.1910.09217

  40. Shen L, Lin Z, Huang Q (2016) Relay backpropagation for effective learning of deep convolutional neural networks. In: ECCV, pp 467–482. https://doi.org/10.1007/978-3-319-46478-7_29

  41. Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, van der Maaten L (2018) Exploring the limits of weakly supervised pretraining. In: ECCV, pp 185–201.https://doi.org/10.1007/978-3-030-01216-8_12

  42. Zhou B-Y, Cui Q, Wei X-S, Chen Z-M (2020) BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: CVPR, pp 9716–9725.https://doi.org/10.1109/CVPR42600.2020.00974

  43. Hong Y, Han S, Choi K, Seo S, Kim B, Chang B (2021) Disentangling label distribution for long-tailed visual recognition. In: CVPR, pp 6626–6636

  44. Cao K, Wei C, Gaidon A, Aréchiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: NeurIPS, pp 1567–1578. https://doi.org/10.48550/arXiv.1906.07413

  45. Duggal R, Freitas S, Dhamnani S, Chau DH, Sun J (2020) ELF: an early-exiting framework for long-tailed classification. arXiv preprint arXiv:2006.11979. https://doi.org/10.48550/arXiv.2006.11979

  46. Sinha S, Ohashi H, Nakamura K (2022) Class-difficulty based methods for long-tailed visual recognition. Int J Comput Vis 130(10):2517–2531. https://doi.org/10.1007/s11263-022-01643-3

    Article  Google Scholar 

  47. Cai J-R, Wang Y-Z, Hwang J-N (2021) Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In: ICCV, pp 112–121. https://doi.org/10.48550/arXiv.2108.02385

  48. Wu Y-X, Min F, Zhang B-W, Wang X-J (2023) Long-tailed image recognition through balancing discriminant quality. Artif Intell Rev 56:1–24. https://doi.org/10.1007/s10462-023-10544-x

    Article  Google Scholar 

  49. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: ICML. https://doi.org/10.5555/3524938.3525087

  50. He K, Fan H-Q, Wu Y-X, Xie S-N, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: CVPR. https://doi.org/10.1109/CVPR42600.2020.00975

  51. Wang P, Han K, Wei X-S, Zhang L, Wang L (2021) Contrastive learning based hybrid networks for long-tailed image classification. In: CVPR, pp 943–952. https://doi.org/10.1109/CVPR46437.2021.00100

  52. Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. In: NeurIPS, vol 33, pp 18661–18673 10.48550/arXiv.2004.11362

  53. Cui J-Q, Zhong Z-S, Liu S, Yu B, Jia J-Y (2021) Parametric contrastive learning. In: ICCV, pp 715–724. https://doi.org/10.1109/ICCV48922.2021.00075

  54. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, vol 70, pp 1126–1135. https://doi.org/10.5555/3305381.3305498

  55. Jamal MA, Brown M, Yang M-H, Wang L, Gong B (2020) Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: CVPR. https://doi.org/10.1109/CVPR42600.2020.00763

  56. Ren M-Y, Zeng W-Y, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: ICML. https://doi.org/10.48550/arXiv.1803.09050

  57. Shu J, Xie Q, Yi L-X, Zhao Q, Zhou S-P, Xu Z-B, Meng D-Y (2019) Meta-weight-net: learning an explicit mapping for sample weighting. In: NeurIPS, pp 1919–1930. https://doi.org/10.48550/arXiv.1902.07379

  58. Sinha S, Ohashi H (2022) Difficulty-net: Learning to predict difficulty for long-tailed recognition. arXiv preprint arXiv:2209.02960

  59. Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In: CVPR, pp 4109–4118. https://doi.org/10.1109/CVPR.2018.00432

  60. Yang Y-Z, Xu Z (2020) Rethinking the value of labels for improving class-imbalanced learning. NeuIPS 33:19290–19301. https://doi.org/10.5555/3495724.3497342

    Article  Google Scholar 

  61. Liu B, Li H-X, Kang H, Hua G, Vasconcelos N (2021) Gistnet: a geometric structure transfer network for long-tailed recognition. In: ICCV, pp 8189–8198. https://doi.org/10.1109/ICCV48922.2021.00810

  62. Ren J-W, Yu C-J, Sheng S, Ma X, Zhao H-Y, Yi S, Li H-S (2020) Balanced meta-softmax for long-tailed visual recognition. In: NeurIPS, vol 33, pp 4175–4186. https://doi.org/10.5555/3495724.3496075

  63. Zhong Z-S, Cui J-Q, Liu S, Jia J-Y (2021) Improving calibration for long-tailed recognition. In: CVPR, pp 16489–16498. https://doi.org/10.1109/CVPR46437.2021.01622

  64. Samuel D, Chechik G (2021) Distributional robustness loss for long-tail learning. In: ICCV, pp 9495–9504. https://doi.org/10.1109/ICCV48922.2021.00936

  65. Min X-Y, Qian K, Zhang B-W, Song G-J, Min F (2022) Multi-label active learning through serial-parallel neural networks. Knowl Based Syst 251:109226. https://doi.org/10.1016/j.knosys.2022.109226

    Article  Google Scholar 

  66. Tang L-C, Deng S-Y, Wu Y-X, Wen L-Y (2019) Duplicate detection algorithm for massive images based on phash block detection. J Comput Appl 39(9):2789–2794. https://doi.org/10.11772/j.issn.1001-9081.2019020792

    Article  Google Scholar 

  67. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  68. Krizhevsky A, Hinton GE (2009) Learning multiple layers of features from tiny images

  69. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR, pp 1–15. https://doi.org/10.48550/arXiv.1412.6980

  70. Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: ICLR, pp 1–18. https://doi.org/10.48550/arXiv.1711.05101

  71. Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2018) Accurate, large minibatch SGD: Training imagenet in 1 hour. https://doi.org/10.48550/arXiv.1706.02677

  72. Ketkar N, Moolayil J(2021) Introduction to PyTorch. Apress, Berkeley, CA, pp 27–91. https://doi.org/10.1007/978-1-4842-5364-9_2

  73. Zhang H-Y, Cisse M, N Dauphin Y, Lopez-Paz D (2018) mixup: beyond empirical risk minimization. In: ICLR.https://doi.org/10.48550/arXiv.1710.09412. https://openreview.net/forum?id=r1Ddp1-Rb

  74. Tang K-H, Huang J-Q, Zhang H-W (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS 33:1513–1524

    Google Scholar 

  75. Liu Z-W, Miao Z-Q, Zhang X-H, Wang J-Y, Guo B-Q, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: CVPR

Download references

Acknowledgements

We thank Min Wang and Zhi-Heng Zhang for their valuable suggestions. This work is in part supported by the Nanchong Municipal Government-Universities Scientific Cooperation Project (grant numbers 23XNSYSX0062, 23XNSYSX0013) and the Scientific Research Project of Sichuan Tourism University (grant number 2023SCTUZK97).

Author information

Authors and Affiliations

Authors

Contributions

Yan-Xue Wu involved in methodology, funding acquisition, writing—original draft; Kai Du took part in investigation, software, validation, writing—review & editing; Xian-Jie Wang involved in investigation, visualization, writing—review & editing; and Fan Min took part in conceptualization, supervision, funding acquisition, writing—review & editing.

Corresponding author

Correspondence to Fan Min.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, YX., Du, K., Wang, XJ. et al. Misclassification-guided loss under the weighted cross-entropy loss framework. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02123-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10115-024-02123-5

Keywords

Navigation