Abstract
Label distribution learning (LDL) is effective for addressing label ambiguity. In LDL, ground-truth label distributions are hardly available due to the high annotation cost, whereas it is relatively easy to obtain examples with logical labels. Hence, label enhancement (LE) is proposed to automatically transform logical labels into label distributions. Most existing LE methods employ discriminative approaches. However, discriminative approaches specialize in obtaining better predictive performance under supervised learning, and their capability is limited in LE that lacks supervisory information. Therefore, we propose a generative LE model, and infer label distributions by the variational Bayes capable of preserving the label ranking within the logical label vector. Our method consists of a generation process and an inference process. In the generation process, we treat label distributions as latent variables, and assume that label distributions generate logical labels and feature values of the instance itself and logical labels of the neighbors of this instance. In the inference process, we design a function, which mines the label correlation and preserves the label ranking within the logical label vector, to parameterize the variational posterior. Finally, we conduct extensive experiments to validate our proposal.
Similar content being viewed by others
Availability of data and materials
The datasets used in this paper are public. We have provided the corresponding reference of each dataset.
Code availability
The code is available at https://github.com/yunan-lu/GLERB.
Notes
The LE methods usually normalize a real-valued vector, whose elements take values between 0 and 1 and may not sum to 1, to the form of label distribution. We call this unnormalized real-valued vector the label confidence.
In Appendix, we discuss the demerits of more distributions w.r.t. modelling the variational posterior for the variable with Beta prior.
Since label relation plays a vital role in this paper, we ignore the Yeast datasets which contain less than five label variables.
References
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In International conference on machine learning (pp. 89–96).
Chen, K., Kämäräinen, J., & Zhang, Z. (2016). Facial age estimation using robust label distribution. In ACM international conference on multimedia (pp. 77–81).
Crammer, K., & Singer, Y. (2001). Pranking with ranking. In Advances in neural information processing systems.
Deng, Z., Zhao, M., Liu, H., Yu, Z., & Feng, F. (2020). Learning neighborhood-reasoning label distribution (NRLD) for facial age estimation. In IEEE international conference on multimedia and expo (pp. 1–6).
Gao, B., Zhou, H., Wu, J., & Geng, X. (2018). Age estimation using expectation of label distribution learning. In International joint conference on artificial intelligence (pp. 712–718).
Gayar, N., Schwenker, F., & Palm, G. (2006). A study of the robustness of knn classifiers trained using soft labels. In Artificial neural networks in pattern recognition (pp. 67–80).
Geng, X., & Xia, Y. (2014). Head pose estimation based on multivariate label distribution. In IEEE conference on computer vision and pattern recognition (pp. 1837–1842).
Geng, X. (2016). Label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1734–1748.
Geng, X., Smith-Miles, K., & Zhou, Z.-H. (2013). Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2401–2412.
Guiver, J., & Snelson, E. (2009). Bayesian inference for Plackett-Luce ranking models. In International conference on machine learning (pp. 377–384).
Guo, B., Han, S., Han, X., Huang, H., & Lu, T. (2021). Label confusion learning to enhance text classification models. In AAAI conference on artificial intelligence (pp. 12929–12936).
Hou, P., Geng, X., & Zhang, M.-L. (2016). Multi-label manifold learning. In AAAI conference on artificial intelligence (pp. 1680–1686).
Hou, P., Geng, X., Huo, Z., & Lv, J. (2017). Semi-supervised adaptive label distribution learning for facial age estimation. In AAAI conference on artificial intelligence (pp. 2015–2021).
Jebara, T. (2012). Machine learning: Discriminative and generative, Vol. 755.
Jia, X., Lu, Y., & Zhang, F. (2023). Label enhancement by maintaining positive and negative label relation. IEEE Transactions on Knowledge and Data Engineering, 35(2), 1708–1720.
Jiang, X., Yi, Z., & Lv, J. (2006). Fuzzy SVM with a new fuzzy membership function. Neural Computing & Applications, 15(3), 268–276.
Jia, X., Shen, X., Li, W., Lu, Y., & Zhu, J. (2023). Label distribution learning by maintaining label ranking relation. IEEE Transactions on Knowledge and Data Engineering, 35(2), 1695–1707.
Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In IEEE conference on computer vision and pattern recognition (pp. 7482–7491).
Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations.
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In International conference on learning representations.
Kipf, T., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In International conference on learning representations.
Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1), 79–88.
Li, Y., Zhang, M.-L., & Geng, X. (2015). Leveraging implicit relative labeling-importance information for effective multi-label learning. In IEEE international conference on data mining (pp. 251–260).
Ling, M.-L., & Geng, X. (2019). Indoor crowd counting by mixture of Gaussians label distribution learning. IEEE Transactions on Image Processing, 28, 5691–5701.
Liu, T., Venkatachalam, A., Sanjay Bongale, P., & Homan, C. (2019). Learning to predict population-level label distributions. In World wide web conference (pp. 1111–1120).
Liu, P., Wang, X., Wang, S., Ye, W., Xi, X., & Zhang, S. (2021). Improving embedding-based large-scale retrieval via label enhancement. In Conference on empirical methods in natural language processing (pp. 133–142).
Liu, X., Zhu, J., Li, Z., Tian, Z., Jia, X., & Chen, L. (2021). Unified framework for learning with label distribution. Information Fusion, 75, 116–130.
Liu, X., Zhu, J., Zheng, Q., Li, Z., Liu, R., & Wang, J. (2021). Bidirectional loss function for label enhancement and distribution learning. Knowledge-Based System, 213, 106690.
Lv, J.-Q., Xu, N., Zheng, R.-Y., & Geng, X. (2019). Weakly supervised multi-label learning via label enhancement. In AAAI conference on artificial intelligence (pp. 3101–3107).
Peng, K., Chen, T., Sadovnik, A., & Gallagher, A. (2015). A mixed bag of emotions: Model, predict, and transfer emotion distributions. In IEEE conference on computer vision and pattern recognition (pp. 860–868).
Ren, T., Jia, X., Li, W., Chen, L., & Li, Z. (2019). Label distribution learning with label-specific features. In International joint conference on artificial intelligence (pp. 3318–3324).
Shao, R., Geng, X., & Xu, N. (2018). Multi-label learning with label enhancement. In IEEE international conference on data Mining, 437–446.
Tan, C., Chen, S., Ji, G., & Geng, X. (2022). A novel probabilistic label enhancement algorithm for multi-label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 34(11), 5098–5113.
Tang, H., Zhu, J., Zheng, Q., Wang, J., Pang, S., & Li, Z. (2020). Label enhancement with sample correlations via low-rank representation. In AAAI conference on artificial intelligence (pp. 5932–5939).
Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.
Wang, J., & Geng, X. (2019). Classification with label distribution learning. In International joint conference on artificial intelligence (pp. 3712–3718).
Wang, J., & Geng, X. (2021). Label distribution learning machine. In International conference on machine learning.
Wang, J., & Geng, X. (2021). Learn the highest label and rest label description degrees. In IJCAI (pp. 3097–3103).
Wang, J., Geng, X., & Xue, H. (2022). Re-weighting large margin label distribution learning for classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5445–5459.
Wang, K., Xu, N., Ling, M.-L., & Geng, X. (2023). Fast label enhancement for label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 35(2), 1502–1514.
Weerasooriya, T., Liu, T., & Homan, C. (2020). Neighborhood-based pooling for population-level label distribution learning. In European conference on artificial intelligence (pp. 490–497).
Wen, X., Li, B., Guo, H., Liu, Z., Hu, G., Tang, M., & Wang, J. (2020). Adaptive variance based label distribution learning for facial age estimation. In European conference on computer vision (pp. 379–395).
Wen, T., Li, W., Chen, L., & Jia, X. (2022). Semi-supervised label enhancement via structured semantic extraction. International Journal of Machine Learning and Cybernetics, 13, 1131–1144.
Xu, N., Lv, J., & Geng, X. (2019). Partial label learning via label enhancement. In AAAI conference on artificial intelligence (pp. 5557–5564).
Xu, N., Shu, J., Liu, Y., & Geng, X. (2020). Variational label enhancement. In International conference on machine learning (pp. 10597–10606).
Xu, N., Tao, A., & Geng, X. (2018). Label enhancement for label distribution learning. In International joint conference on artificial intelligence (pp. 2926–2932).
Xu, N., Liu, Y.-P., & Geng, X. (2021). Label enhancement for label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 33, 1632–1643.
Yang, J., Sun, M., & Sun, X. (2017). Learning visual sentiment distributions via augmented conditional probability neural network. In AAAI conference on artificial intelligence (pp. 224–230).
Zeng, X., Chen, Q., Chen, S., & Zuo, J. (2021). Emotion label enhancement via emotion wheel and lexicon. Mathematical Problems in Engineering (pp. 1–11).
Zhang, Q.-W., Zhong, Y., & Zhang, M.-L. (2018). Feature-induced labeling information enrichment for multi-label learning. In AAAI conference on artificial intelligence (pp. 4446–4453).
Zhang, Z., Wang, M.-L., & Geng, X. (2015). Crowd counting in public video surveillance by label distribution learning. Neurocomputing, 166, 151–163.
Zhang, M.-L., & Zhou, Z.-H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26, 1819–1837.
Zheng, Q., Zhu, J., Tang, H., Liu, X., Li, Z., & Lu, H. (2023). Generalized label enhancement with sample correlations. IEEE Transactions on Knowledge and Data Engineering, 35(1), 482–495.
Zhou, D., Zhang, X., Zhou, Y., Zhao, Q., & Geng, X. (2016). Emotion distribution learning from texts. In Conference on empirical methods in natural language processing (pp. 638–647).
Zhu, W., Jia, X., & Li, W. (2020). Privileged label enhancement with multi-label learning. In International joint conference on artificial intelligence (pp. 2376–2382).
Funding
This work was supported by the National Natural Science Foundation of China (62176123).
Author information
Authors and Affiliations
Contributions
YL contributes to conceptualization, methodology, experiments, and writing. WL contributes to reviewing. HL contributes to reviewing. XJ contributes to conceptualization, methodology, and reviewing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing interests or conflict of interest that are relevant to the content of this paper.
Ethics approval
Not applicable.
Consent to participate
Not applicable. All experiments in this paper do not involve animals, plants, or human entities.
Consent for publication
Not applicable. The paper does not include data or images requiring permissions.
Additional information
Editor: Eyke Hüllermeier.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Why choose Kumaraswamy distribution
Here we discuss in more detail the rationale for modeling the variational posterior as Kumaraswamy distribution. We briefly show several probability distributions with a support of (0, 1) as follow:
In addition to the probability distributions listed in Eq. (35), there are also some probability distribution with complicated form, such as truncated normal distribution, U-quadratic distribution, triangular distribution and trapezoidal distribution. Using these probability distributions to model variational posteriors may result in some problems. For example, the shape of “Arcsine” and “Uniform” cannot be controlled by the learnable function; “PERT” is difficult to apply the reparameterization trick; “Logit-normal” and truncated normal distribution do not have closed-form KL divergence w.r.t. the prior distribution (Beta distribution); Triangular distribution and trapezoidal distribution have segmented density functions, which are not conducive to gradient-based optimization; U-quadratic distribution is difficult to express the case that mean value is around 0.5. Among all these probability distributions, only the Kumaraswamy distribution is the most suitable for modeling the variational posterior of the variable with Beta prior because it not only allows easy implementation of the reparameterization trick but also has closed-form KL divergence w.r.t. the Beta prior.
1.2 Proof of Theorem 1
Suppose that \({{\varvec{y}}}\) is any logical label vector, \({\textbf{P}}\) is the symmetrically normalized Laplacian of any label co-occurrence graph, \({{\varvec{v}}}= {\varphi }({{\varvec{y}}},\alpha )\). Then we have \({{\varvec{v}}}\simeq {{\varvec{y}}}\) if \(\alpha \in (0, \alpha ^\star _{{\varvec{y}}})\), where \(\alpha _{{\varvec{y}}}^\star =\inf ({\mathbb {A}} \cup \{1\})\),
$$\begin{aligned} {\mathbb {A}} = \bigcup _{\begin{array}{c} {i\in \{i\mid {y}_i=1\}}\\ {j\in \{j\mid {y}_j=0\}} \end{array}} \Big \{ \frac{1}{2}\le \alpha <1\mid {\varphi }_i({{\varvec{y}}}, \alpha ) = {\varphi }_j({{\varvec{y}}},\alpha ) \Big \}. \end{aligned}$$
Proof
Let \([{{\varvec{u}}}]^+\triangleq \inf \{ {u}_i\mid {y}_i=1 \}\) and \([{{\varvec{u}}}]^-\triangleq \sup \{{u}_i\mid {y}_i=0\}\), where \({{\varvec{u}}}\) is a \({m}\)-dimensional vector.
1) Proving that \({{\varvec{v}}}\simeq {{\varvec{y}}}\) holds for any \(\alpha \in (0, 2^{-1})\). We consider the source of \({{\varvec{v}}}={\varphi }({{\varvec{y}}}, \alpha )\), i.e., \({{\varvec{v}}}^{(t+1)} = \alpha {\textbf{P}}{{\varvec{v}}}^{(t)} + (1 - \alpha ) {{\varvec{y}}}\). Suppose that \({{\varvec{v}}}^{(t)}\in [0,1]^{m}\), we have \({\textbf{P}}{{\varvec{v}}}^{(t)} \in [0, 1]^{m}\), and \({{\varvec{v}}}^{(t+1)}=\alpha {\textbf{P}}{{\varvec{v}}}^{(t)} + (1-\alpha ){{\varvec{y}}}\in [0,1]^{m}\). Since \({{\varvec{v}}}^{(0)}={{\varvec{y}}}\), we have \({{\varvec{v}}}^{(t)}\in [0,1]^{m}\) holds for any \(t\ge 0\). Since \({{\varvec{v}}}^{(t)}\in [0,1]^{m}\) implies \([\alpha {\textbf{P}}{{\varvec{v}}}^{(t)}]^+=0\) and \([\alpha {\textbf{P}}{{\varvec{v}}}^{(t)}]^-=\alpha\), we can obtain \([{{\varvec{v}}}^{(t+1)}]^+=1-\alpha\) and \([{{\varvec{v}}}^{(t+1)}]^-=\alpha\); since \(0<\alpha <0.5\), we have \([{{\varvec{v}}}^{(t+1)}]^+>[{{\varvec{v}}}^{(t+1)}]^-\). Therefore, \([{{\varvec{v}}}^{(t)}]^+>[{{\varvec{v}}}^{(t)}]^-\) holds for any \(t\ge 0\), i.e., \({{\varvec{v}}}\simeq {{\varvec{y}}}\).
2) Proving that \({{\varvec{v}}}\simeq {{\varvec{y}}}\) holds for any \(\alpha \in [2^{-1},\alpha _{{\varvec{y}}}^\star )\). Say there exists \(\alpha _0\in [2^{-1}, \alpha _{{\varvec{y}}}^\star )\) such that \(v_i < v_j\), where \(i\in \{i\mid {y}_i=1\}\) and \(j\in \{j\mid {y}_j=0\}\), then there must exist \(\alpha ^\prime \in [2^{-1}, \alpha _0)\) such that \(v_i=v_j\) (since \(v_i > v_j\) always holds for \(0<\alpha <2^{-1}\), and \({{\varvec{v}}}={\varphi }({{\varvec{y}}},\alpha )\) is continuous), which contradicts the fact that \(\alpha ^\star _{{\varvec{y}}}\) is the minimum of the roots (between \(2^{-1}\) and 1) of all equations \(v_i=v_j\) (where \((i,j)\in \{(i,j)\mid {y}_i=1\wedge {y}_j=0\}\)). Therefore, \({{\varvec{v}}}\simeq {{\varvec{y}}}\) holds for any \(\alpha \in (2^{-1},\alpha _{{\varvec{y}}}^\star )\).
Therefore, \({{\varvec{v}}}\simeq {{\varvec{y}}}\) holds for any \(\alpha \in (0, \alpha _{{\varvec{y}}}^\star )\). \(\square\)
1.3 Example of calculating \(\alpha _{{\varvec{y}}}^\star\) in Theorem 1
Suppose that \({{\varvec{y}}}=[1,0,1]^\top\), Laplacian matrix \({\textbf{P}}\) of the label co-occurrence graph can be factorized as:
Then, we have:
Finally, solving \(v_1=v_2\) or \(v_3 = v_2\) both yield \(\alpha \in \varnothing\) (when \(0.5<\alpha <1\)), then we have \({\mathbb {A}} = \varnothing\), \(\alpha _{{\varvec{y}}}^\star =\min ({\mathbb {A}}\cup \{1\})=1\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lu, Y., Li, W., Li, H. et al. Ranking-preserved generative label enhancement. Mach Learn 112, 4693–4721 (2023). https://doi.org/10.1007/s10994-023-06388-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-023-06388-9