Adaptive Discriminative Regularization for Visual Classification

Zhao, Qingsong; Wang, Yi; Dou, Shuguang; Gong, Chen; Wang, Yin; Zhao, Cairong

doi:10.1007/s11263-024-02080-0

Adaptive Discriminative Regularization for Visual Classification

Published: 13 May 2024

(2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Qingsong Zhao^1,2^na1,
Yi Wang³^na1,
Shuguang Dou¹,
Chen Gong^4,5,
Yin Wang¹ &
…
Cairong Zhao ORCID: orcid.org/0000-0001-6745-9674^1,2

82 Accesses
Explore all metrics

Abstract

How to improve discriminative feature learning is central in classification. Existing works address this problem by explicitly increasing inter-class separability and intra-class compactness by constructing positive and negative pairs for contrastive learning or posing tighter class separating margins. These methods do not exploit the similarity between different classes as they adhere to independent identical distributions assumption in data. In this paper, we embrace the real-world data distribution setting in that some classes share semantic overlaps due to their similar appearances or concepts. Regarding this hypothesis, we propose a novel regularization to improve discriminative learning. We first calibrate the estimated highest likelihood of one sample based on its semantically neighboring classes, then encourage the overall likelihood predictions to be deterministic by imposing an adaptive exponential penalty. As the gradient of the proposed method is roughly proportional to the uncertainty of the predicted likelihoods, we name it adaptive discriminative regularization (ADR), trained along with a standard cross entropy loss in classification. Extensive experiments demonstrate that it can yield consistent and non-trivial performance improvements in a variety of visual classification tasks (over 10 benchmarks). Furthermore, we find it is robust to long-tailed and noisy label data distribution. Its flexible design enables its compatibility with mainstream classification architectures and losses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

A review of unsupervised feature selection methods

Article 29 January 2019

Data Availibility

The datasets used during and analyzed during the current study are available in the following public domain resources: https://image-net.org/index.php, https://www.cs.toronto.edu/~kriz/cifar.html, https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html, https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data, http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html, https://www.robots.ox.ac.uk/$\sim $vgg/data/flowers/102/, http://host.robots.ox.ac.uk/pascal/VOC/, http://vis-www.cs.umass.edu/lfw/, http://whdeng.cn/CALFW/index.html, http://whdeng.cn/CPLFW/index.html, https://ibug.doc.ic.ac.uk/resources/agedb/, http://www.cfpw.io/, http://rose1.ntu.edu.sg/datasets/actionrecognition.asp, The models and source data generated during and analyzed during the current study are available from the corresponding author upon reasonable request.

Notes

The InsightFace project: https://github.com/deepinsight/insightface.git
Pytorch 1.9.0 documentation: https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html

References

Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2010). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.
Article Google Scholar
Arora, S., Ge, R., Neyshabur, B., & Zhang, Y. (2018). Stronger generalization bounds for deep nets via a compression approach. In International Conference on Machine Learning, PMLR, pp 254–263.
Banburski, A., De La Torre, F., Pant, N., Shastri, I., & Poggio, T (2021). Distribution of classification margins: Are all data equal? arXiv preprint arXiv:2107.10199
Cao, D., Zhu, X., Huang, X., Guo, J., & Lei, Z. (2020). Domain balancing: Face recognition on long-tailed domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 5671–5679.
Castellano, G., & Vessio, G. (2022). A deep learning approach to clustering visual arts. International Journal of Computer Vision, 130(11), 2590–2605.
Article Google Scholar
De Boer, P. T., Kroese, D. P., Mannor, S., & Rubinstein, R. Y. (2005). A tutorial on the cross-entropy method. Annals of Operations Research, 134, 19–67.
Article MathSciNet Google Scholar
Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4690–4699.
DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. ICLR.
Du, B., Ye, J., Zhang, J., Liu, J., & Tao, D. (2022). I3cl: Intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. International Journal of Computer Vision, 130(8), 1961–1977.
Article Google Scholar
Dynkin, E. (1978). Sufficient statistics and extreme points. The Annals of Probability, 6(5), 705–730.
Article MathSciNet Google Scholar
Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111, 98–136.
Article Google Scholar
Gong, C., Liu, T., Tang, Y., Yang, J., Yang, J., & Tao, D. (2017). A regularization approach for instance-based superset label learning. IEEE Transactions on Cybernetics, 48(3), 967–978.
Article Google Scholar
Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D. H., & Zhou, Y. (2013). Challenges in representation learning: A report on three machine learning contests. In: International conference on neural information processing, Springer, pp 117–124.
Guariglia, E. (2021). Fractional calculus, zeta functions and Shannon entropy. Open Mathematics, 19(1), 87–100.
Article MathSciNet Google Scholar
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In International Conference on Machine Learning, PMLR, pp 1321–1330.
Guo, J., Zhu, X., Zhao, C., Cao, D., Lei, Z., & Li, S. Z. (2020). Learning meta face recognition in unseen domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6163–6172
Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), IEEE, pp 1735–1742.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: CVPR, pp 770–778.
Huang, G. B., Mattar, M., Berg, T., & Learned-Miller, E. (2008). Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on faces in’Real-Life’Images: detection, alignment, and recognition.
Ji, X., Zhao, Q., Cheng, J., & Ma, C. (2021). Exploiting spatio-temporal representation for 3d human action recognition from depth map sequences. Knowledge-Based Systems, 227, 107040.
Article Google Scholar
Kanezaki, A. (2018). Unsupervised image segmentation by backpropagation. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1543–1547.
Khaireddin, Y., & Chen, Z. (2021). Facial emotion recognition: State of the art performance on fer2013. arXiv preprint arXiv:2105.03588
Kim, W., Kanezaki, A., & Tanaka, M. (2020). Unsupervised learning of image segmentation based on differentiable feature clustering. IEEE Transactions on Image Processing, 29, 8055–8068.
Article Google Scholar
Kong, Y., & Fu, Y. (2022). Human action recognition and prediction: A survey. International Journal of Computer Vision, 130(5), 1366–1401.
Article Google Scholar
Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Master’s thesis, University of Tront.
Krizhevsky, A. (2014). One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997
Lee, C. Y., Xie, S., Gallagher, P., Zhang, Z., & Tu, Z. (2015). Deeply-supervised nets. In Artificial intelligence and statistics, PMLR, pp 562–570.
Li, H., Jiang, T., & Zhang, K. (2003). Efficient and robust feature extraction by maximum margin criterion. Advances in neural information processing systems. Vol. 16.
Liu, W., Wen, Y., Yu, Z., & Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. In ICML, p 7.
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., & Song, L. (2017). Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220.
Moschoglou, S., Papaioannou, A., Sagonas, C., Deng, J., Kotsia, I., & Zafeiriou, S. (2017). Agedb: The first manually collected, in-the-wild age database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 51–59.
Müller, R., Kornblith, S., & Hinton, G. (2019). When does label smoothing help? arXiv preprint arXiv:1906.02629
Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In Indian Conference on Computer Vision, Graphics and Image Processing.
Qian, N. (1999). On the momentum term in gradient descent learning algorithms. Neural Networks, 12(1), 145–151.
Article Google Scholar
Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., & Dollár, P. (2020). Designing network design spaces. In: CVPR.
Russakovsky, O., Deng, J., Su, H., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823.
Sen, K., De Proft, F., Borgoo, A., et al. (2005). N-derivative of shannon entropy of shape function for atoms. Chemical Physics Letters, 410(1–3), 70–76.
Article Google Scholar
Sengupta, S., Chen, J. C., Castillo, C., Patel, V. M., Chellappa, R., & Jacobs, D. W. (2016). Frontal to profile face verification in the wild. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp 1–9.
Sengupta, S., Chen, J. C., Castillo, C., Patel, V. M., Chellappa, R., & Jacobs, D. W. (2016). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In CVPR, pp 1010–1019.
Shi, Y., & Jain, A. K. (2019). Probabilistic face embeddings. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., & Wei, Y. (2020). Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6398–6407.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z., (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Toneva, M., Sordoni, A., Combes, R. T. D., Trischler, A., Bengio, Y., & Gordon, G. J. (2019). An empirical study of example forgetting during deep neural network learning. In ICLR.
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers and distillation through attention. In International Conference on Machine Learning, pp 10,347–10,357.
Trockman, A., & Kolter, J. Z. (2022). Patches are all you need? arXiv preprint arXiv:2201.09792
Wang, D. B., Zhang, M. L., Li, L. (2021). Adaptive graph guided disambiguation for partial label learning. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., & Liu, W. (2018). Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274.
Wang, Q. W., Li, Y. F., Zhou, Z. H. (2019a). Partial label learning with unlabeled data. In IJCAI, pp 3755–3761.
Wang, Y., Ma, X., Chen, Z., Luo, Y., Yi, J., & Bailey, J. (2019b). Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 322–330.
Wu, B., Jia, F., Liu, W., et al. (2018). Multi-label learning with missing labels using mixed dependency graphs. International Journal of Computer Vision, 126(8), 875–896.
Article MathSciNet Google Scholar
Xu, N., Qiao, C., Geng, X., et al. (2021). Instance-dependent partial label learning. Advances in Neural Information Processing Systems, 34, 27,119-27,130.
Google Scholar
Xu, X., Meng, Q., Qin, Y., Guo, J., Zhao, C., Zhou, F., & Lei, Z. (2021b). Searching for alignment in face recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 3065–3073.
Yao, Y., Deng, J., Chen, X., Gong, C., Wu, J., & Yang, J. (2020). Deep discriminative CNN with temporal ensembling for ambiguously-labeled image classification. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 12,669–12,676.
Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Learning face representation from scratch. arXiv preprint arXiv:1411.7923
Yin, D., Pananjady, A., Lam, M., Papailiopoulos, D., Ramchandran, K., & Bartlett, P., (2018). Gradient diversity: A key ingredient for scalable distributed learning. In International Conference on Artificial Intelligence and Statistics, PMLR, pp 1998–2007.
Yuan, L., Tay, F. E., Li, G., Wang, T., & Feng, J. (2020). Revisiting knowledge distillation via label smoothing regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3903–3911.
Zhang, C. B., Jiang, P. T., Hou, Q., et al. (2021). Delving deep into label smoothing. IEEE Transactions on Image Processing, 30, 5984–5996.
Article Google Scholar
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., & Ma, K. (2019). Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3713–3722.
Zhao, Q., Wang, Y., Dou, S., Gong, C., Wang, Y., & Zhao, C. (2022). Adaptive discriminative regularization for visual classification. arXiv preprint arXiv:2203.00833
Zheng, T., & Deng, W. (2018). Cross-pose lfw: A database for studying cross-pose face recognition in unconstrained environments. Beijing University of Posts and Telecommunications, Tech Rep, 5, 7.
Zheng, T., Deng, W., Hu, J. (2017). Cross-age lfw: A database for studying cross-age face recognition in unconstrained environments. arXiv preprint arXiv:1708.08197
Zhu, X., Liu, H., Lei, Z., et al. (2019). Large-scale bisample learning on id versus spot face recognition. International Journal of Computer Vision, 127(6), 684–700.
Article Google Scholar
Zidek, J. V., & van Eeden, C. (2003). Uncertainty, entropy, variance and the effect of partial information. Lecture Notes-Monograph Series pp 155–167.

Download references

Acknowledgements

This work was supported by the National Natural Science Fund of China (62076184, 61976158, 61976160, 62076182, 62276190), in part by Fundamental Research Funds for the Central Universities and State Key Laboratory of Integrated Services Networks (Xidian University); and in part by Shanghai Innovation Action Project of Science and Technology (20511100700) and Shanghai Natural Science Foundation (22ZR1466700). Thanks to Xiaopeng Ji (Xiaopeng Ji is with the State Key Lab of CAD &CG, Zhejiang University, China. (email: xp.ji@cad.zju.edu.cn)) and Xinyang Jiang (Xinyang Jiang is with the Microsoft Research Asia (Shanghai), Shanghai, China. (email: xinyangjiang@microsoft.com)) for their help with this work.

Author information

Qingsong Zhao, Yi Wang have equally contributed to this work.

Authors and Affiliations

Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China
Qingsong Zhao, Shuguang Dou, Yin Wang & Cairong Zhao
State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, China
Qingsong Zhao & Cairong Zhao
Shanghai AI Laboratory, Shanghai, 200232, China
Yi Wang
Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information, Ministry of Education, Nanjing, 210094, China
Chen Gong
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
Chen Gong

Authors

Qingsong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuguang Dou
View author publications
You can also search for this author in PubMed Google Scholar
Chen Gong
View author publications
You can also search for this author in PubMed Google Scholar
Yin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cairong Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cairong Zhao.

Additional information

Communicated by Liwei Wang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preprint version of this research work was put on arXiv (Zhao et al., 2022).

Appendix

Our adaptive discriminative regularization loss for one sample can be written as

$$\begin{aligned} \begin{aligned} \mathcal{L}_d({{\tilde{y}}}_i)&= \prod _{j=1}^{\tau } \frac{1}{\sqrt{2\pi \varphi _{i}}} \text {exp} \left\{ -\frac{1}{2\varphi _{i}}{{\hat{y}}}^2_{ij}\right\} . \\&= \prod _{j=1}^{\tau } \mathcal{F}_j({{\hat{y}}}_{ij}), \\ \end{aligned} \end{aligned}$$

(10)

where $\varphi _{i}$ is a function of ${{\tilde{y}}}_{i}$, ${{\hat{y}}}_{i}$ is generated by a non-linear function $\text {TopK}({\tilde{y}}_{i})$. The function $\mathcal{F}_j({{\hat{y}}}_{ij})$ in Eq. (10) can be denoted as

$$\begin{aligned} \begin{aligned} \mathcal{F}_j({{\hat{y}}}_{ij})&= \mathcal{H}({{\hat{y}}}_{ij})\mathcal{E}({{\hat{y}}}_{ij}), \end{aligned} \end{aligned}$$

(11)

where $\mathcal{H}({{\hat{y}}}_{ij})$ is called the base measure function “$ \frac{1}{\sqrt{2\pi \varphi _{i}}}$", $\mathcal{E}({{\hat{y}}}_{ij})$ is named the exponential term “$ \text {exp}\{-\frac{1}{2\varphi _{i}}{{\hat{y}}}^2_{ij}\}$".

In the backward propagation, $ \frac{\partial \mathcal{L}_d({\tilde{y}}_i)}{\partial {{\tilde{y}}}_{i}} $ can be calculated with

$$\begin{aligned} \begin{aligned} \frac{\partial \mathcal{L}_d({{\tilde{y}}}_i)}{\partial {{\tilde{y}}}_{i}}&= \sum _{j=1}^{\tau } \left[ \mathcal{F}'_j({{\hat{y}}}_{ij}) \prod _{m\ne j}^{\tau } \mathcal{F}_m({{\hat{y}}}_{ij}) \right] , \end{aligned} \end{aligned}$$

(12)

The derivative function $\mathcal{F}'_j({{\hat{y}}}_{ij}) $ in Eq. (12) can be computed with

$$\begin{aligned} \begin{aligned} \mathcal{F}'_{j}({{\hat{y}}}_{ij})&= \mathcal{H}'({{\hat{y}}}_{ij})\mathcal{E}({{\hat{y}}}_{ij}) + \mathcal{E}'({{\hat{y}}}_{ij})\mathcal{H}({{\hat{y}}}_{ij}), \end{aligned} \end{aligned}$$

(13)

In Eq. 13, $\mathcal{H}'({{\hat{y}}}_{ij}) $ and $\mathcal{E}'({{\hat{y}}}_{ij}) $ can be calculated by

$$\begin{aligned} \begin{aligned} \frac{\partial \mathcal{H}({{\hat{y}}}_{ij})}{\partial {{\hat{y}}}_{ij}}&= \frac{1}{\sqrt{2\pi }}\left( -\frac{1}{2}\varphi _{i}^{-\frac{3}{2}}\right) \varphi _{ij}' \\&= -\frac{\varphi _{ij}'}{2\varphi _{i}} \mathcal{H}({{\hat{y}}}_{ij}),\\ \frac{\partial \mathcal{E}({{\hat{y}}}_{ij})}{\partial {{\hat{y}}}_{ij}}&= \left[ \frac{-{{\hat{y}}}_{ij}^2}{2\varphi _{i}} \right] '\mathcal{E}({{\hat{y}}}_{ij}) \\&= \left[ \frac{{{\hat{y}}}_{ij}^2\varphi _{ij}'-2{{\hat{y}}}_{ij}\varphi _{i}}{2\varphi _{i}^2}\right] \mathcal{E}({{\hat{y}}}_{ij}). \end{aligned} \end{aligned}$$

(14)

Putting Eq. 14 into Eq. (13), $\mathcal{F}'_{j}({{\hat{y}}}_{ij}) $ can be rewritten as

$$\begin{aligned} \begin{aligned} \mathcal{F}'_{j}({{\hat{y}}}_{ij})&= \left[ \frac{{{\hat{y}}}_{ij}^2\varphi _{ij}'-2{{\hat{y}}}_{ij}\varphi _{i}}{2\varphi _{i}^2} - \frac{\varphi _{ij}'}{2\varphi _{i}} \right] \mathcal{F}_{j}({{\hat{y}}}_{ij}) \\&= \left[ \frac{{{\hat{y}}}_{ij}^2\varphi _{ij}'-2{{\hat{y}}}_{ij}\varphi _{i} - \varphi _{i}\varphi _{ij}'}{2\varphi _{i}^2} \right] \mathcal{F}_{j}({{\hat{y}}}_{ij}), \end{aligned} \end{aligned}$$

(15)

Then, putting Eq. (15) into Eq. (12), $ \frac{\partial \mathcal{L}_d({{\tilde{y}}}_i)}{\partial {{\tilde{y}}}_{i}} $ can be rewritten as

$$\begin{aligned} \begin{aligned} \frac{\partial \mathcal{L}_d({{\tilde{y}}}_i)}{\partial {{\tilde{y}}}_{i}}&= \sum _{j=1}^{\tau } \left[ \mathcal{F}'_j({{\hat{y}}}_{ij}) \prod _{m\ne j}^{\tau } \mathcal{F}_m({{\hat{y}}}_{ij}) \right] \\&= \sum _{j=1}^{\tau } \left[ \left( \frac{{{\hat{y}}}_{ij}^2\varphi _{ij}'-2{{\hat{y}}}_{ij}\varphi _{i} - \varphi _{i}\varphi _{ij}'}{2\varphi _{i}^2} \right) \prod _{j=1}^{\tau }\mathcal{F}_m({{\hat{y}}}_{ij}) \right] \\&= \sum _{j=1}^{\tau } \left[ \left( \frac{{{\hat{y}}}_{ij}^2\varphi _{ij}'-2{{\hat{y}}}_{ij}\varphi _{i} - \varphi _{i}\varphi _{ij}'}{2\varphi _{i}^2} \right) \mathcal{L}_d({{\tilde{y}}}_i)\right] , \end{aligned}\nonumber \\ \end{aligned}$$

(16)

where $\varphi _{i}' $ is the partial derivative function $\varphi _{i}$ with respect to ${{\hat{y}}}_{ij} $. We refer to Sen et al. (2005) and Guariglia (2021), $\varphi _{i}' $ can be computed with

$$\begin{aligned} \begin{aligned} \frac{\partial \varphi _{i}}{\partial {{\hat{y}}}_{ij}}&= - \frac{\varphi _{i}+\text {log}({{\hat{y}}}_{ij})}{1-{{\hat{y}}}_{ij}}. \end{aligned} \end{aligned}$$

(17)

We also give the derivative function of entropy $\mathcal{L}'_e(p) $ for binary classification. It can be calculated by

$$\begin{aligned} \begin{aligned} \frac{\partial \mathcal{L}_e(p)}{\partial p}&= -\left[ log(p)-log(1-p)\right] \\&= log\left( \frac{1-p}{p}\right) . \end{aligned} \end{aligned}$$

(18)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, Q., Wang, Y., Dou, S. et al. Adaptive Discriminative Regularization for Visual Classification. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-02080-0

Download citation

Received: 11 January 2023
Accepted: 13 April 2024
Published: 13 May 2024
DOI: https://doi.org/10.1007/s11263-024-02080-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Discriminative Regularization for Visual Classification

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

A review of unsupervised feature selection methods

Data Availibility

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive Discriminative Regularization for Visual Classification

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

A review of unsupervised feature selection methods

Data Availibility

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation