Skip to main content
Log in

Ranking-preserved generative label enhancement

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

Label distribution learning (LDL) is effective for addressing label ambiguity. In LDL, ground-truth label distributions are hardly available due to the high annotation cost, whereas it is relatively easy to obtain examples with logical labels. Hence, label enhancement (LE) is proposed to automatically transform logical labels into label distributions. Most existing LE methods employ discriminative approaches. However, discriminative approaches specialize in obtaining better predictive performance under supervised learning, and their capability is limited in LE that lacks supervisory information. Therefore, we propose a generative LE model, and infer label distributions by the variational Bayes capable of preserving the label ranking within the logical label vector. Our method consists of a generation process and an inference process. In the generation process, we treat label distributions as latent variables, and assume that label distributions generate logical labels and feature values of the instance itself and logical labels of the neighbors of this instance. In the inference process, we design a function, which mines the label correlation and preserves the label ranking within the logical label vector, to parameterize the variational posterior. Finally, we conduct extensive experiments to validate our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Availability of data and materials

The datasets used in this paper are public. We have provided the corresponding reference of each dataset.

Code availability

The code is available at https://github.com/yunan-lu/GLERB.

Notes

  1. The LE methods usually normalize a real-valued vector, whose elements take values between 0 and 1 and may not sum to 1, to the form of label distribution. We call this unnormalized real-valued vector the label confidence.

  2. In Appendix, we discuss the demerits of more distributions w.r.t. modelling the variational posterior for the variable with Beta prior.

  3. Since label relation plays a vital role in this paper, we ignore the Yeast datasets which contain less than five label variables.

  4. https://pytorch.org/docs/1.13/generated/torch.optim.Adam.html.

References

  • Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In International conference on machine learning (pp. 89–96).

  • Chen, K., Kämäräinen, J., & Zhang, Z. (2016). Facial age estimation using robust label distribution. In ACM international conference on multimedia (pp. 77–81).

  • Crammer, K., & Singer, Y. (2001). Pranking with ranking. In Advances in neural information processing systems.

  • Deng, Z., Zhao, M., Liu, H., Yu, Z., & Feng, F. (2020). Learning neighborhood-reasoning label distribution (NRLD) for facial age estimation. In IEEE international conference on multimedia and expo (pp. 1–6).

  • Gao, B., Zhou, H., Wu, J., & Geng, X. (2018). Age estimation using expectation of label distribution learning. In International joint conference on artificial intelligence (pp. 712–718).

  • Gayar, N., Schwenker, F., & Palm, G. (2006). A study of the robustness of knn classifiers trained using soft labels. In Artificial neural networks in pattern recognition (pp. 67–80).

  • Geng, X., & Xia, Y. (2014). Head pose estimation based on multivariate label distribution. In IEEE conference on computer vision and pattern recognition (pp. 1837–1842).

  • Geng, X. (2016). Label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1734–1748.

    Article  Google Scholar 

  • Geng, X., Smith-Miles, K., & Zhou, Z.-H. (2013). Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2401–2412.

    Article  Google Scholar 

  • Guiver, J., & Snelson, E. (2009). Bayesian inference for Plackett-Luce ranking models. In International conference on machine learning (pp. 377–384).

  • Guo, B., Han, S., Han, X., Huang, H., & Lu, T. (2021). Label confusion learning to enhance text classification models. In AAAI conference on artificial intelligence (pp. 12929–12936).

  • Hou, P., Geng, X., & Zhang, M.-L. (2016). Multi-label manifold learning. In AAAI conference on artificial intelligence (pp. 1680–1686).

  • Hou, P., Geng, X., Huo, Z., & Lv, J. (2017). Semi-supervised adaptive label distribution learning for facial age estimation. In AAAI conference on artificial intelligence (pp. 2015–2021).

  • Jebara, T. (2012). Machine learning: Discriminative and generative, Vol. 755.

  • Jia, X., Lu, Y., & Zhang, F. (2023). Label enhancement by maintaining positive and negative label relation. IEEE Transactions on Knowledge and Data Engineering, 35(2), 1708–1720.

    Google Scholar 

  • Jiang, X., Yi, Z., & Lv, J. (2006). Fuzzy SVM with a new fuzzy membership function. Neural Computing & Applications, 15(3), 268–276.

    Article  Google Scholar 

  • Jia, X., Shen, X., Li, W., Lu, Y., & Zhu, J. (2023). Label distribution learning by maintaining label ranking relation. IEEE Transactions on Knowledge and Data Engineering, 35(2), 1695–1707.

    Google Scholar 

  • Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In IEEE conference on computer vision and pattern recognition (pp. 7482–7491).

  • Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations.

  • Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In International conference on learning representations.

  • Kipf, T., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In International conference on learning representations.

  • Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1), 79–88.

    Article  Google Scholar 

  • Li, Y., Zhang, M.-L., & Geng, X. (2015). Leveraging implicit relative labeling-importance information for effective multi-label learning. In IEEE international conference on data mining (pp. 251–260).

  • Ling, M.-L., & Geng, X. (2019). Indoor crowd counting by mixture of Gaussians label distribution learning. IEEE Transactions on Image Processing, 28, 5691–5701.

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, T., Venkatachalam, A., Sanjay Bongale, P., & Homan, C. (2019). Learning to predict population-level label distributions. In World wide web conference (pp. 1111–1120).

  • Liu, P., Wang, X., Wang, S., Ye, W., Xi, X., & Zhang, S. (2021). Improving embedding-based large-scale retrieval via label enhancement. In Conference on empirical methods in natural language processing (pp. 133–142).

  • Liu, X., Zhu, J., Li, Z., Tian, Z., Jia, X., & Chen, L. (2021). Unified framework for learning with label distribution. Information Fusion, 75, 116–130.

    Article  Google Scholar 

  • Liu, X., Zhu, J., Zheng, Q., Li, Z., Liu, R., & Wang, J. (2021). Bidirectional loss function for label enhancement and distribution learning. Knowledge-Based System, 213, 106690.

    Article  Google Scholar 

  • Lv, J.-Q., Xu, N., Zheng, R.-Y., & Geng, X. (2019). Weakly supervised multi-label learning via label enhancement. In AAAI conference on artificial intelligence (pp. 3101–3107).

  • Peng, K., Chen, T., Sadovnik, A., & Gallagher, A. (2015). A mixed bag of emotions: Model, predict, and transfer emotion distributions. In IEEE conference on computer vision and pattern recognition (pp. 860–868).

  • Ren, T., Jia, X., Li, W., Chen, L., & Li, Z. (2019). Label distribution learning with label-specific features. In International joint conference on artificial intelligence (pp. 3318–3324).

  • Shao, R., Geng, X., & Xu, N. (2018). Multi-label learning with label enhancement. In IEEE international conference on data Mining, 437–446.

  • Tan, C., Chen, S., Ji, G., & Geng, X. (2022). A novel probabilistic label enhancement algorithm for multi-label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 34(11), 5098–5113.

    Article  Google Scholar 

  • Tang, H., Zhu, J., Zheng, Q., Wang, J., Pang, S., & Li, Z. (2020). Label enhancement with sample correlations via low-rank representation. In AAAI conference on artificial intelligence (pp. 5932–5939).

  • Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.

    Article  Google Scholar 

  • Wang, J., & Geng, X. (2019). Classification with label distribution learning. In International joint conference on artificial intelligence (pp. 3712–3718).

  • Wang, J., & Geng, X. (2021). Label distribution learning machine. In International conference on machine learning.

  • Wang, J., & Geng, X. (2021). Learn the highest label and rest label description degrees. In IJCAI (pp. 3097–3103).

  • Wang, J., Geng, X., & Xue, H. (2022). Re-weighting large margin label distribution learning for classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5445–5459.

    Google Scholar 

  • Wang, K., Xu, N., Ling, M.-L., & Geng, X. (2023). Fast label enhancement for label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 35(2), 1502–1514.

    Google Scholar 

  • Weerasooriya, T., Liu, T., & Homan, C. (2020). Neighborhood-based pooling for population-level label distribution learning. In European conference on artificial intelligence (pp. 490–497).

  • Wen, X., Li, B., Guo, H., Liu, Z., Hu, G., Tang, M., & Wang, J. (2020). Adaptive variance based label distribution learning for facial age estimation. In European conference on computer vision (pp. 379–395).

  • Wen, T., Li, W., Chen, L., & Jia, X. (2022). Semi-supervised label enhancement via structured semantic extraction. International Journal of Machine Learning and Cybernetics, 13, 1131–1144.

    Article  Google Scholar 

  • Xu, N., Lv, J., & Geng, X. (2019). Partial label learning via label enhancement. In AAAI conference on artificial intelligence (pp. 5557–5564).

  • Xu, N., Shu, J., Liu, Y., & Geng, X. (2020). Variational label enhancement. In International conference on machine learning (pp. 10597–10606).

  • Xu, N., Tao, A., & Geng, X. (2018). Label enhancement for label distribution learning. In International joint conference on artificial intelligence (pp. 2926–2932).

  • Xu, N., Liu, Y.-P., & Geng, X. (2021). Label enhancement for label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 33, 1632–1643.

    Article  Google Scholar 

  • Yang, J., Sun, M., & Sun, X. (2017). Learning visual sentiment distributions via augmented conditional probability neural network. In AAAI conference on artificial intelligence (pp. 224–230).

  • Zeng, X., Chen, Q., Chen, S., & Zuo, J. (2021). Emotion label enhancement via emotion wheel and lexicon. Mathematical Problems in Engineering (pp. 1–11).

  • Zhang, Q.-W., Zhong, Y., & Zhang, M.-L. (2018). Feature-induced labeling information enrichment for multi-label learning. In AAAI conference on artificial intelligence (pp. 4446–4453).

  • Zhang, Z., Wang, M.-L., & Geng, X. (2015). Crowd counting in public video surveillance by label distribution learning. Neurocomputing, 166, 151–163.

    Article  Google Scholar 

  • Zhang, M.-L., & Zhou, Z.-H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26, 1819–1837.

    Article  Google Scholar 

  • Zheng, Q., Zhu, J., Tang, H., Liu, X., Li, Z., & Lu, H. (2023). Generalized label enhancement with sample correlations. IEEE Transactions on Knowledge and Data Engineering, 35(1), 482–495.

    Google Scholar 

  • Zhou, D., Zhang, X., Zhou, Y., Zhao, Q., & Geng, X. (2016). Emotion distribution learning from texts. In Conference on empirical methods in natural language processing (pp. 638–647).

  • Zhu, W., Jia, X., & Li, W. (2020). Privileged label enhancement with multi-label learning. In International joint conference on artificial intelligence (pp. 2376–2382).

Download references

Funding

This work was supported by the National Natural Science Foundation of China (62176123).

Author information

Authors and Affiliations

Authors

Contributions

YL contributes to conceptualization, methodology, experiments, and writing. WL contributes to reviewing. HL contributes to reviewing. XJ contributes to conceptualization, methodology, and reviewing.

Corresponding author

Correspondence to Xiuyi Jia.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing interests or conflict of interest that are relevant to the content of this paper.

Ethics approval

Not applicable.

Consent to participate

Not applicable. All experiments in this paper do not involve animals, plants, or human entities.

Consent for publication

Not applicable. The paper does not include data or images requiring permissions.

Additional information

Editor: Eyke Hüllermeier.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Why choose Kumaraswamy distribution

Here we discuss in more detail the rationale for modeling the variational posterior as Kumaraswamy distribution. We briefly show several probability distributions with a support of (0, 1) as follow:

$$\begin{aligned} \begin{aligned} \text {Beta}(x\vert \alpha ,\beta )&= \frac{x^{\alpha -1}(1-x)^{\beta -1}}{{\text {Beta}}(\alpha ,\beta )}\\ \text {Arcsine}(x)&={\frac{1}{\pi {\sqrt{x(1-x)}}}} \\ \text {PERT}(x)&= {\displaystyle {\frac{(x-a)^{\alpha -1}(c-x)^{\beta -1}}{\textrm{B} (\alpha ,\beta )(c-a)^{\alpha +\beta -1}}}}\\ \text {Logit-normal}(x)&={\frac{1}{\sigma {\sqrt{2\pi }}}}\,e^{{-{\frac{({\text {logit}}(x)-\mu )^{2}}{2\sigma ^{2}}}}}{\frac{1}{x(1-x)}}\\ \text {Kumaraswamy}(x\vert a,b)&= abx^{{a-1}}(1-x^{a})^{{b-1}} \\ \end{aligned} \end{aligned}$$
(35)

In addition to the probability distributions listed in Eq. (35), there are also some probability distribution with complicated form, such as truncated normal distribution, U-quadratic distribution, triangular distribution and trapezoidal distribution. Using these probability distributions to model variational posteriors may result in some problems. For example, the shape of “Arcsine” and “Uniform” cannot be controlled by the learnable function; “PERT” is difficult to apply the reparameterization trick; “Logit-normal” and truncated normal distribution do not have closed-form KL divergence w.r.t. the prior distribution (Beta distribution); Triangular distribution and trapezoidal distribution have segmented density functions, which are not conducive to gradient-based optimization; U-quadratic distribution is difficult to express the case that mean value is around 0.5. Among all these probability distributions, only the Kumaraswamy distribution is the most suitable for modeling the variational posterior of the variable with Beta prior because it not only allows easy implementation of the reparameterization trick but also has closed-form KL divergence w.r.t. the Beta prior.

1.2 Proof of Theorem 1

Suppose that \({{\varvec{y}}}\) is any logical label vector, \({\textbf{P}}\) is the symmetrically normalized Laplacian of any label co-occurrence graph, \({{\varvec{v}}}= {\varphi }({{\varvec{y}}},\alpha )\). Then we have \({{\varvec{v}}}\simeq {{\varvec{y}}}\) if \(\alpha \in (0, \alpha ^\star _{{\varvec{y}}})\), where \(\alpha _{{\varvec{y}}}^\star =\inf ({\mathbb {A}} \cup \{1\})\),

$$\begin{aligned} {\mathbb {A}} = \bigcup _{\begin{array}{c} {i\in \{i\mid {y}_i=1\}}\\ {j\in \{j\mid {y}_j=0\}} \end{array}} \Big \{ \frac{1}{2}\le \alpha <1\mid {\varphi }_i({{\varvec{y}}}, \alpha ) = {\varphi }_j({{\varvec{y}}},\alpha ) \Big \}. \end{aligned}$$

Proof

Let \([{{\varvec{u}}}]^+\triangleq \inf \{ {u}_i\mid {y}_i=1 \}\) and \([{{\varvec{u}}}]^-\triangleq \sup \{{u}_i\mid {y}_i=0\}\), where \({{\varvec{u}}}\) is a \({m}\)-dimensional vector.

1) Proving that \({{\varvec{v}}}\simeq {{\varvec{y}}}\) holds for any \(\alpha \in (0, 2^{-1})\). We consider the source of \({{\varvec{v}}}={\varphi }({{\varvec{y}}}, \alpha )\), i.e., \({{\varvec{v}}}^{(t+1)} = \alpha {\textbf{P}}{{\varvec{v}}}^{(t)} + (1 - \alpha ) {{\varvec{y}}}\). Suppose that \({{\varvec{v}}}^{(t)}\in [0,1]^{m}\), we have \({\textbf{P}}{{\varvec{v}}}^{(t)} \in [0, 1]^{m}\), and \({{\varvec{v}}}^{(t+1)}=\alpha {\textbf{P}}{{\varvec{v}}}^{(t)} + (1-\alpha ){{\varvec{y}}}\in [0,1]^{m}\). Since \({{\varvec{v}}}^{(0)}={{\varvec{y}}}\), we have \({{\varvec{v}}}^{(t)}\in [0,1]^{m}\) holds for any \(t\ge 0\). Since \({{\varvec{v}}}^{(t)}\in [0,1]^{m}\) implies \([\alpha {\textbf{P}}{{\varvec{v}}}^{(t)}]^+=0\) and \([\alpha {\textbf{P}}{{\varvec{v}}}^{(t)}]^-=\alpha\), we can obtain \([{{\varvec{v}}}^{(t+1)}]^+=1-\alpha\) and \([{{\varvec{v}}}^{(t+1)}]^-=\alpha\); since \(0<\alpha <0.5\), we have \([{{\varvec{v}}}^{(t+1)}]^+>[{{\varvec{v}}}^{(t+1)}]^-\). Therefore, \([{{\varvec{v}}}^{(t)}]^+>[{{\varvec{v}}}^{(t)}]^-\) holds for any \(t\ge 0\), i.e., \({{\varvec{v}}}\simeq {{\varvec{y}}}\).

2) Proving that \({{\varvec{v}}}\simeq {{\varvec{y}}}\) holds for any \(\alpha \in [2^{-1},\alpha _{{\varvec{y}}}^\star )\). Say there exists \(\alpha _0\in [2^{-1}, \alpha _{{\varvec{y}}}^\star )\) such that \(v_i < v_j\), where \(i\in \{i\mid {y}_i=1\}\) and \(j\in \{j\mid {y}_j=0\}\), then there must exist \(\alpha ^\prime \in [2^{-1}, \alpha _0)\) such that \(v_i=v_j\) (since \(v_i > v_j\) always holds for \(0<\alpha <2^{-1}\), and \({{\varvec{v}}}={\varphi }({{\varvec{y}}},\alpha )\) is continuous), which contradicts the fact that \(\alpha ^\star _{{\varvec{y}}}\) is the minimum of the roots (between \(2^{-1}\) and 1) of all equations \(v_i=v_j\) (where \((i,j)\in \{(i,j)\mid {y}_i=1\wedge {y}_j=0\}\)). Therefore, \({{\varvec{v}}}\simeq {{\varvec{y}}}\) holds for any \(\alpha \in (2^{-1},\alpha _{{\varvec{y}}}^\star )\).

Therefore, \({{\varvec{v}}}\simeq {{\varvec{y}}}\) holds for any \(\alpha \in (0, \alpha _{{\varvec{y}}}^\star )\). \(\square\)

1.3 Example of calculating \(\alpha _{{\varvec{y}}}^\star\) in Theorem 1

Suppose that \({{\varvec{y}}}=[1,0,1]^\top\), Laplacian matrix \({\textbf{P}}\) of the label co-occurrence graph can be factorized as:

$$\begin{aligned} \begin{aligned}&{\textbf{P}}= \begin{bmatrix} 2^{-1} &{} 3^{-1} &{} 6^{-1} \\ 3^{-1} &{} 2^{-1} &{} 6^{-1} \\ 6^{-1} &{} 6^{-1} &{} 2\times 3^{-1} \end{bmatrix} = {\textbf{U}} \begin{bmatrix} 1 &{} 0 &{} 0 \\ 0 &{} 0.17 &{} 0 \\ 0 &{} 0 &{} 0.5 \end{bmatrix} {\textbf{U}}^\top ,\\&\text {where }{\textbf{U}}= [{{\varvec{u}}}_1,{{\varvec{u}}}_2,{{\varvec{u}}}_3]= \begin{bmatrix} -0.58 &{} -0.71 &{} -0.41 \\ -0.58 &{} 0.71 &{} -0.41 \\ -0.58 &{} 0 &{} 0.82 \end{bmatrix}. \end{aligned} \end{aligned}$$

Then, we have:

$$\begin{aligned} \begin{aligned} \frac{1}{1-\alpha } {{\varvec{v}}}&=\frac{ {{\varvec{u}}}_1{{\varvec{u}}}_1^\top {{\varvec{y}}}}{1-\alpha }+\frac{{{\varvec{u}}}_2{{\varvec{u}}}_2^\top {{\varvec{y}}}}{1-0.17\alpha } +\frac{{{\varvec{u}}}_3{{\varvec{u}}}_3^\top {{\varvec{y}}}}{1-0.5\alpha } \\&= \begin{bmatrix} {0.67}(1-\alpha )^{-1} + {0.5}{(1-0.17\alpha )^{-1}} {-0.17}{(1-0.5\alpha )^{-1}}\\ {0.67}(1-\alpha )^{-1} {-0.5}{(1-0.17\alpha )^{-1}} {-0.17}{(1-0.5\alpha )^{-1}} \\ {0.67}(1-\alpha )^{-1} + {0}{(1-0.17\alpha )^{-1}} + {0.33}{(1-0.5\alpha )^{-1}} \end{bmatrix}. \end{aligned} \end{aligned}$$

Finally, solving \(v_1=v_2\) or \(v_3 = v_2\) both yield \(\alpha \in \varnothing\) (when \(0.5<\alpha <1\)), then we have \({\mathbb {A}} = \varnothing\), \(\alpha _{{\varvec{y}}}^\star =\min ({\mathbb {A}}\cup \{1\})=1\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Y., Li, W., Li, H. et al. Ranking-preserved generative label enhancement. Mach Learn 112, 4693–4721 (2023). https://doi.org/10.1007/s10994-023-06388-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10994-023-06388-9

Keywords

Navigation