Robust multi-label feature learning-based dual space

Braytee, Ali; Liu, Wei

doi:10.1007/s41060-023-00496-4

Robust multi-label feature learning-based dual space

Regular Paper
Open access
Published: 13 January 2024

Volume 17, pages 373–387, (2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Robust multi-label feature learning-based dual space

Download PDF

460 Accesses
Explore all metrics

Abstract

Multi-label learning handles instances associated with multiple class labels. The original label space is a logical matrix with entries from the Boolean domain $\in \left\{ 0,1 \right\} $. Logical labels cannot show the relative importance of each semantic label to the instances. Most existing methods map the input features to the label space using linear projections considering the label dependencies using a logical label matrix. However, the discriminative features are learned using one-way projection from the feature representation of an instance into a logical label space. There is no manifold in the learning space of logical labels, which limits the potential of learned models. We propose a novel method in multi-label learning to learn the projection matrix from the feature space to the semantic label space and project it back to the original feature space using encoder–decoder deep learning architecture. The key intuition which guides our method is that the discriminative features are identified due to mapping the features back and forth using two linear projections. To the best of our knowledge, this is one of the first attempts to study the ability to reconstruct the original features from the label manifold in multi-label learning. We show that the learned projection matrix identifies a subset of discriminative features across multiple semantic labels. Extensive experiments on real-world datasets show the superiority of the proposed method.

Learning Discriminative Features Using Multi-label Dual Space

Geometrically Preserved Dual Projections Learning for Multi-label Classification

Article 27 March 2023

The Linear Geometry Structure of Label Matrix for Multi-label Learning

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Multi-label learning deals with the problem where each sample is represented by a feature vector and associated with multiple concepts or semantic labels. For example, in image annotation, an image may be annotated with different scenes, or in text categorisation, a document may be tagged to multiple topics. Formally, given a data matrix $X \in \mathbb {R}^{d \times n}$ is composed of n samples of d-dimensional feature space. The feature vector $x_{i} \in X$ is associated with label set $Y_{i}=\left\{ y_{i1},y_{i2},_{\cdots },y_{ik} \right\} $ where k is the number of labels, and $y_{(i,j)} \in \left\{ 0,1 \right\} $ is a logical value where the associated label is relevant to the instance $x_{i}$. Over the past decade, many strategies have been proposed in the literature to learn from multi-labelled data. Initially, the problem was tackled by learning binary classification models on each label independently [21]. However, this strategy ignores the existence of label correlation. Interestingly, several methods [3, 17] show the importance of considering the label correlation during multi-label learning to improve classification performance. However, these methods use logical labels where no manifold exists and apply traditional similarity metrics such as Euclidean distance, which is mainly built for continuous data [12]. In this study, contrary to most methods, in addition to learning the mapping function from feature space to multi-label space, we explore the projection function from label space to feature space to reconstruct the original feature representations. For example, our novel method can reconstruct the scene image using the projection function and the semantic labels in image annotation. Initially, it is necessary to explore the natural structure of the label space in multi-labelled data. Existing datasets naturally contain logical label vectors indicating whether the instance is relevant or not to a specific label. For example, as shown in Fig. 1, both images tagged the label boat with the same weight equal to 1 (present). However, to accurately describe the labels in both images, we need to identify the importance of the labels in each image. It is clearly seen that the label boat in the image (Fig. 1b) is more important than that in the image (Fig. 1a). Furthermore, the label with the zero value in the logical label vector refers to different meanings, which may be irrelevant, unrepresented or missing. Using the same example in Fig. 1, the boat and the sun label in images (Fig. 1a and b) are not tagged due to their small contribution (unrepresented). Our method learns a numerical multi-label matrix in semantic embedding space during the optimisation method based on label dependencies. Therefore, replacing the importance of labels using numerical values instead of logical labels can improve the multi-label learning process.

Learning the numerical labels is essential to our novel approach that is developed based on the encoder–decoder deep learning paradigm [15]. Specifically, the input training data in the feature space are projected into the learned semantic label space (label manifold) as an encoder step. This step, simultaneously through an optimisation problem, learns the projection function and the semantic labels in Euclidean space. Significantly, we consider the reconstruction task of the original feature representations using the projection matrix as input to a decoder. This step imposes a constraint to ensure that the projection matrix preserves all the information in the original feature matrix. The decoder allows the original features to be recovered using the projection matrix and the learned semantic labels. In the case of image annotation, this process is similar to combining puzzle pieces to create the picture. However, in the case of logical labels, where the label either exists or does not, it cannot reconstruct the original visual feature representations. We show that the impact of the decoder in identifying the relevant features can improve multi-label classification performance. This is because the feature coefficients are estimated based on the actual numerical labels, and more importantly, the weights of the relevant features reduce the reconstruction error. The proposed method is visualised in Fig. 2. We test the proposed approach on various public multi-label datasets and verify that they favourably outperform the state-of-the-art methods in feature selection and data reconstruction.

We formulate the proposed approach as a constrained optimisation problem to project feature representations into semantic labels with a reconstruction constraint. More precisely, the method is designed by effectively formulating the encoder and decoder model using a linear projection to and from the learned semantic labels. This design alleviates the computational complexity of the proposed approach, making it suitable for large-scale datasets. To the best of our knowledge, this is the first attempt to learn the semantic label representation from the training data that can be used for data reconstruction in multi-label learning. In summary, our contributions are (1) a semantic encoder–decoder model to learn the projection matrix from original features to semantic labels that can be used for data reconstruction; (2) we extend the logical label to a numerical label that describes the relative importance of the label in a specific instance; (3) we propose a novel Robust Multi-label Feature Learning based Dual Space (RMFDS) which can identify discriminative features across multiple class labels.

2 Related work

Multi-label classification aims to predict the set of labels corresponding to instances. It was widely applied in several domains, including image recognition. Several methods have been proposed recently to predict image labels [18, 22]. This study briefly describes the work on label correlation, semantic labels, and autoencoders.

Label correlation Over the past decade until recently, it has been proven that label correlation has improved the performance of multi-label learning methods. The correlation is considered either between pairs of class labels or between all the class labels, which is known as second and high-order approaches, respectively [6, 23]. Recently, a method has been proposed to learn common label-specific features using the correlation information from labels and instances [14]. Another study is proposed for feature selection based on label correlations [8]. However, in these models, the common learning strategy deals with logical labels, representing whether the label is relevant or irrelevant to an instance. The label matrix in the available multi-labelled datasets contains logical values which lack semantic information. Hence, few works reveal that transforming the labels from logical into numerical values improves the learning process. Semantic labels The numerical value in the label space carries semantic information, i.e. the value may refer to the importance or the weight of an object in the image. The numerical label matrix in Euclidean space is not explicitly available in the multi-labelled data. A few works have studied the multi-label manifold by transforming the logical label space to the Euclidean label space. For example, [9] explores the label manifold in multi-label learning and reconstructs the numerical label matrix using the instance smoothness assumption. Another work [4] incorporates feature manifold learning in the multi-label feature selection method, and [10] selects the meaningful features using the constraint Laplacian score in manifold learning. However, our proposed method differs from these by learning an encoder–decoder network to reconstruct the input data using the learned projection matrix and predicting the semantic labels.

Autoencoder Several variants use an autoencoder for multi-label learning. [7] learns the unknown labels using the entropy measure from existing labels, and then the completed label matrix is used as an input layer feature set in autoencoder architecture. However, our method reconstructs the original input data using the learned semantic labels in the decoder. Further, [13] proposes a stacked autoencoder for feature encoding and an extreme learning machine to improve the prediction capability. However, the authors did not take label correlation into consideration, and the original logical labels are used in the learning process. A study proposes a novel method to learn low-dimensional manifolds to capture nonlinear features using autoencoders [11]. A method is proposed to transform the medical image annotation to multi-label classification using a denoising autoencoder [5]. In this paper, we select the discrete features that are important to detect the objects’ weights during the encoding phase and simultaneously, they are significant to reconstruct the original data in the decoding phase.

3 The proposed method

In multi-label learning, as mentioned above, the training set of multi-labelled data can be represented by $\left\{ x_{i} \in X | i=1,\cdots ,n \right\} $, the instance $x_{i} \in \mathbb {R}^{d}$ is a d-dimensional feature vector associated with the logical label $Y_{i}=\left\{ y_{i1},y_{i2},_{\cdots },y_{ik} \right\} $, where k is the number of possible labels, and the values 0 and $+1$ represent the irrelevant and relevant label to the instance $x_{i}$, respectively.

3.1 Label manifold

To overcome the key challenges in logical label vectors, we first propose to learn a new numerical label matrix $\widetilde{Y} \in \mathbb {R}^{k \times n}$ which contains labels with semantic information. According to the label smoothness assumption [19], which states that if two labels are semantically similar, then their feature vectors should be similar, we initially exploit the dependencies among labels to learn $\widetilde{Y}$ by multiplying the original label matrix with the correlation matrix $C \in \mathbb {R}^{k \times k}$. Due to the existence of logical values in the original label matrix, we use the Jaccard index to compute the correlation matrix as follows

$$\begin{aligned} \widetilde{Y}=Y^{T}C \end{aligned}$$

(1)

where the element $\widetilde{Y}_{i,j}=Y_{i,1}^{T} \times C_{1,j} + Y_{i,2}^{T} \times C_{2,j} + \cdots + Y_{i,n}^{T}\times C_{n,j}$ is initially determined as the predictive numerical value of instance $x_{i}$ is related to the j-th label using the prior information of label dependencies. Following is a simple example to investigate the efficiency of using label correlation to learn semantic numerical labels. The original logical label vectors Y of images (Fig. 1a and b) are shown in Fig. 2. The zero values of the original label matrix point to three different types of information. The grey and red colours in Fig. 2 refer to unrepresented and missing labels. However, the white colour means that images (Fig. 1a and b) are not labelled as “Grass”. We clearly show that the predictive label space $\widetilde{Y}$ can distinguish between three types of zero values and provides the appropriate numerical values, including semantic information. For example, due to the correlation between “Boat” and “Ocean” and “Sunset” and “Sun”, the unrepresented label information for “Boat” and “Sun” in the image (Fig. 1a) and “Sun” in the image (Fig. 1b) is learned. Further, the missing label of the “Ocean” image (Fig. 1b) is predicted. Interestingly, we can see in the predictive label matrix that the numerical values of the “Grass” label in both images are very small because the “Grass” label is not correlated with the other labels. Thus, this perfectly matches with the information in the original label matrix that the “Grass” object does not exist in images (Fig. 1a and b) as shown in Fig. 2. Therefore, we can learn accurate numerical labels with semantic information based on the above example. The completed numerical label matrix $\widetilde{Y}$ of the training data is learned in the optimisation method of the encoder–decoder framework in the next section.

3.2 Approach formulation

Suppose there are training data $X \in \mathbb {R}^{d \times n}$ with n samples that are associated with $Y \in \mathbb {R}^{k \times n}$ labels. The predictive numerical matrix $\widetilde{Y}$ is initialised using Eq. 1. The intuition behind our idea is that the proposed method can capture the relationship between the feature space and the manifold label space. Inspired by the autoencoder architecture, we develop an effective method that integrates the characteristics of the low-rank coefficient matrix and semantic numerical label matrix. Specifically, our method is composed of encoder–decoder architecture, which tries to learn the projection matrix $W \in \mathbb {R}^{k \times d}$ from the feature space X to the numerical label space $\widetilde{Y}$ in the encoder. At the same time, the decoder can project back to the feature space with $W^{T} \in \mathbb {R}^{d \times k}$ to reconstruct the input training data as shown in Fig. 2. The objective function is formulated as

$$\begin{aligned} \begin{aligned} \min _{W}&\left\| X - W^{T}WX \right\| ^{2}_{F}&\text {s.t. } WX=\widetilde{Y} \end{aligned} \end{aligned}$$

(2)

where $\left\| . \right\| _{F}$ is the Frobenius norm.

3.2.1 Optimisation algorithm

To optimise the objective function in Eq. 2, we first substitute WX with $\widetilde{Y}$. Then, due to the existence of constraint $WX=\widetilde{Y}$, it is very difficult to solve Eq. 2. Therefore, we relax the constraint into a soft one and reformulate the objective function (2) as

$$\begin{aligned} \min _{W} \left\| X - W^{T}\widetilde{Y} \right\| ^{2}_{F} + \lambda \left\| WX - \widetilde{Y}\right\| ^{2}_{F} \end{aligned}$$

(3)

where $\lambda $ is a parameter to control the importance of the second term. Now, the objective function in Eq. 3 is non-convex, and it contains two unknown variables W and $\widetilde{Y}$. It is difficult to solve the equation directly. We propose a solution to iteratively update one variable while fixing the other. Since the objective function is convex by updating one variable, we compute the partial derivative of Eq. 3 to W and $\widetilde{Y}$ and set both to zero.

Update W:
$$\begin{aligned} \begin{aligned}&-\widetilde{Y}\left( X^{T} - \widetilde{Y}^{T}W \right) + \lambda \left( WX-\widetilde{Y} \right) X^{T}=0 \\&\Rightarrow \widetilde{Y} \widetilde{Y}^{T}W + \lambda WXX^{T}=\widetilde{Y}X^{T} + \lambda \widetilde{Y} X^{T} \\&\Rightarrow PW + WQ=R \end{aligned} \end{aligned}$$
(4)
where $P=\widetilde{Y}\widetilde{Y}^{T}$, $Q=\lambda XX^{T}$, and $R=\left( \lambda + 1 \right) \widetilde{Y}X^{T}$
Update $\widetilde{Y}$:
$$\begin{aligned} \begin{aligned}&-WX+WW^{T}\widetilde{Y} + \lambda \left( -WX + \widetilde{Y} \right) =0 \\&\Rightarrow WW^{T}\widetilde{Y} + \lambda \widetilde{Y} = WX + \lambda WX \\&\Rightarrow A\widetilde{Y}+\widetilde{Y}B=D \end{aligned} \end{aligned}$$
(5)
where $A=WW^{T}$, $B=\lambda I$, $D=\left( \lambda + 1 \right) WX$ and $I \in \mathbb {R}^{k \times k}$ is an identity matrix.

Equations 4 and 5 are formulated as the well-known Sylvester equation of the form $MX+XN=O$. The Sylvester equation is a matrix equation with given matrices M, N, and O, and it aims to find the possible unknown matrix X. The solution of the Sylvester equation can be solved efficiently and lead to a unique solution. For more explanations and proofs, the reader can refer to [1]. Using the Kronecker products notation and the vectorisation operator vec, Eqs. 4 and 5 can be written as a linear equation, respectively.

$$\begin{aligned} \left( I_{d} \otimes P + Q^{T} \otimes I_{k} \right) vec(W)=vec(R) \end{aligned}$$

(6)

where $I_{d} \in \mathbb {R}^{d \times d}$ and $I_{k} \in \mathbb {R}^{k \times k}$ are identity matrices and $\otimes $ is the Kronecker product.

$$\begin{aligned} \left( I_{n} \otimes A + B^{T} \otimes I_{k} \right) vec(\widetilde{Y})=vec(D) \end{aligned}$$

(7)

where $I_{n} \in \mathbb {R}^{n \times n}$ is an identity matrix. Fortunately, this equation can be solved in MATLAB with a single line of code Sylvester.^{Footnote 1} Now, the two unknown matrices W and $\widetilde{Y}$ can be iteratively updated using the proposed optimisation rules above until convergence. The procedure is described in Algorithm 1.

Our proposed method learns the encoder projection matrix W. Thus, we can embed a new test sample $x^{s}_{i}$ to the semantic label space by $\widetilde{y}_{i}=Wx^{s}_{i}$. Similarly, we can reconstruct the original features using the decoder projection matrix $W^{T}$ by $x^{s}_{i}=W^{T}\widetilde{y}_{i}$. Therefore, W contains the discriminative features to predict the real semantic labels. To identify these features, we rank each feature according to the value of $\left\| W_{m,:} \right\| _{2} \left( m=1,\cdots ,d \right) $ in descending order and return the top-ranked features.

Table 1 Characteristics of the evaluated datasets

Full size table

Table 2 Average results of the first 50 features on eight datasets with different missing label proportions

Full size table

4 Experiments

4.1 Experimental datasets

We open source for our RMFDS code for reproducibility of our experiments.^{Footnote 2} Our proposed method is coded in MATLAB, and the experiments ran on Intel Core i5-8250U with CPU 1.80 GHz and 8 GB memory. Experiments are conducted on eight public multi-label datasets, which can be downloaded from the Mulan repository.^{Footnote 3} The details of these datasets are summarised in Table 1. Among these datasets, Scene consists of 2407 images. Each image is associated with six scenes. Emotions contain 593 songs, each related to six emotions. Six Yahoo datasets from the text domain are used, including Business, Computers, Entertainment, Health, Reference, and Science [16].

To evaluate the compared algorithms, we use the standard multi-label evaluation metrics, namely Hamming loss, Ranking loss, Coverage, Average precision, Micro-F1, and Macro-F1 which define in [2]. For the first three, the smaller the value, the better the performance. However, for the other three evaluation metrics, the reverse is true.

4.2 Comparing methods and experiment settings

Multi-label feature selection methods have attracted the interest of researchers in the last decade. In this study, we compare our proposed method RMFDS against the recent state-of-the-art multi-label feature selection methods, including GLOCAL [23], MCLS [10], and MSSL [4]. MCLS and MSSL methods consider feature manifold learning in their studies. ML-KNN (K=10) [20] is used as the multi-label classifier to evaluate the performance of the identified features. The results, based on a different number of features, vary from 1 to 100 features. PCA is applied as a prepossessing step with and retains $95\%$ of the data. To ensure a fair comparison, the parameters of the compared methods are tuned to find the optimum values. For GLOCAL, the regularisation parameters $\lambda _{3}$ and $\lambda _{4}$ are tuned in $\left\{ 0.0001,0.001,\dots , 1 \right\} $, the number of clusters is searched from $\left\{ 4,8,16,32, 64 \right\} $, and the latent dimensionality rank is tuned in $\left\{ 5,10,\dots , 30 \right\} $. MCLS uses the default settings for its parameters. For MSSL, the parameters $\alpha $ and $\beta $ are tuned by searching the grid $\left\{ 0.001,0.01,\dots ,1000 \right\} $. The parameter settings for RMFDS are described in the following section.

4.3 Results

4.3.1 Classification results

Several experiments have demonstrated the classification performance of RMFDS compared to the state-of-the-art multi-label feature selection methods. Figures 3, 4, 5, 6, 7, 8, 9 and 10 show the results using six multi-label evaluation metrics namely Hamming loss, Ranking loss, Coverage, Average precision, Micro-F1, and Macro-F1 on eight datasets. The classification results are generated in these figures based on top-ranked 100 features (except the Emotions dataset, which only has 72 features). Based on the experiment results shown in Figs. 3, 4, 5, 6, 7, 8, 9 and 10, interestingly, it is clear that our proposed method has a significant classification improvement with an increasing number of selected features and then remains stable. Thus, this observation indicates that studying dimensionality reduction in multi-label learning is meaningful. Further, it highlights the stability and capability of RMFDS to achieve good performance on all the datasets with fewer selected features.

The proposed method is compared with the state of the art on each dataset. In Figs. 3, 4, 5, 6, 7, 8, 9 and 10, RMFDS achieves better results than MCLS, MSSL, and GLOCAL on almost all the evaluated datasets. Specifically, in terms of the Hamming loss, Ranking loss, and Coverage evaluation metrics, where the smaller the values, the better the performance, RMFDS’s features substantially improve the classification results compared to the state of the art. It can be observed that MCLS has the worst results, and MSSL and GLOCAL achieve comparable results as shown in Figs. 6, 6, 7, 8, 9 and 10. In terms of Average precision, Micro-F1, and Macro-F1 evaluation metrics, where the larger the values, the better the performance, RMFDS generally achieves better results against the compared methods in all datasets. We note that RMFDS performs slightly better than MSSL and GLOCAL on the Emotions and Entertainment datasets using Average precision, Micro-F1, and Macro-F1 metrics as shown in Figs. 6 and 7. We also note that the state-of-the-art methods produce unstable results on the Business, Reference, and Computers datasets using the Micro-F1 metric. In general, our proposed method demonstrates the benefits of using label manifolds in encoder–decoder architecture to identify discriminative features.

Furthermore, using the Friedman test, we investigate whether the results produced by RMFDS significantly differ from the state of the art. In particular, we examine the Friedman test between RMFDS against each compared method for each evaluation metric in the eight datasets. The statistical results show that the p value in all tests is less than 0.05, which rejects the null hypothesis, which denotes that the proposed method and the compared methods have equal performance. Finally, we explored the reconstruction capability of the decoder in RMFDS using the projection matrix W to reconstruct the original data. Table 3 reports the reconstruction errors using the training logical labels (Y), training predicted labels ($\widetilde{Y}$), and the logical testing labels. Based on the results, it is observed from eight datasets that the percentage reconstruction error of the original training matrix using the logical training labels only ranges between $4\%$ and $8\%$, and this error is dramatically decreased to $0.1\%$ to $3\%$ by using the predicted numerical label matrix. This observation reveals that the decoder plays an important role in selecting the important features that can be used to reconstruct the original matrix. Further, it supports our argument to reconstruct the visual images using the semantic labels and the coefficient matrix. In addition, we report the capability of reconstructing the testing data matrix using the projection matrix and the testing labels with a small error that ranges between $4\%$ and $8\%$.

Table 3 Reconstruction of different error values using the RMFDS decoder

Full size table

4.3.2 RMFDS results for handling missing labels

The proposed method learns the semantic label matrix during the back-and-forth projections. Therefore, the RMFDS method should be able to learn the missing labels in the original label matrix during the optimisation process. To investigate the proposed method’s ability to handle missing labels, we randomly removed different proportions of labels from the samples from moderate to extreme levels: 20%, 40%, 60%, and 80%. Motivated by RMFDS results in Figs. 3, 4, 5, 6, 7, 8, 9 and 10 that shows stable classification accuracy above 40 features, we compute the average results of the first 50 features in each evaluation metric and report the results in Table 2. Interestingly, the results indicate that our proposed method improves the multi-label classification results in the presence of missing labels. Specifically, we found that RMFDS significantly improved the accuracy in case of extreme missing labels (i.e. 60% and 80%) in four datasets, namely Scene, Emotions, Science and Entertainment (Table 2).

4.4 Parameter sensitivity

In this section, we study the influence of the proposed method’s parameters $\lambda $ and MaxIteration on the classification results. First, $\lambda $ controls the contribution of the decoder and encoder in the method. However, the second parameter defines the number of iterations required for convergence. The parameters $\lambda $ and MaxIteration are tuned using a grid search from $\left\{ 0.2,0.4,\dots ,2 \right\} $ and $\left\{ 1,20,40,\dots ,100 \right\} $, respectively. As shown in Fig. 12a and 12b, using the average precision metric on the two datasets, we can observe that with an increasing $\lambda $, the learning performance is improved.

4.5 Convergence analysis and computational time

We analyse the convergence of RMFDS on eight datasets. Figure 11a and b shows the results of the objective function value on the training datasets. It is obviously seen that our method converges rapidly and has around 10 iterations, which demonstrates the efficacy and speed of our algorithm. We also show in Fig. 11c the computational time RMFDS took using different numbers of iterations. We can observe that the time is linearly increased with the number of iterations.

4.6 Computational analysis

In this section, we will study the computational analysis of the key operations involved in Algorithm 1, including initialisation and While loop.

Initialisation. Computing $C_{jk}$ involves several summations and multiplications. The computational complexity is $\mathcal {O}(n)$, and calculating $\widetilde{Y}$ has a time complexity of $\mathcal {O}(dkn)$.

While Loop (MaxIteration). The loop runs for a maximum of MaxIteration times. Within each iteration, updating W and $\hat{Y}$ using the Sylvester equation by vectorising the matrices and using linear algebra operations. The worst case is $\mathcal {O}(d^{3})$. The operations inside the loop will dominate the overall time complexity of the algorithm, especially the updates of W and $\widetilde{Y}$. Therefore, the overall time complexity can be approximated as $O(\text {MaxIteration} \times \max (d^3, \text {complexity of Eq.\,}7))$.

5 Ablation study

In an ablation experiment to assess the role of the constraint in our objective function (Eq. 2), we removed the constraint $WX=\widetilde{Y}$ from the objective function and conducted experiments on all the datasets using all the evaluation metrics. Generally, as shown in Fig. 13, the results are degraded across various evaluation metrics for the top 100 features, with a step size of 10.

6 Conclusion

This paper proposes a novel semantic multi-label learning model based on an autoencoder. Our proposed method learns the projection matrix to map from the feature space to the semantic space back and forth. The semantic labels are predicted in the optimisation method because they are not explicitly available from the training samples. We further rank the feature weights in the learned project matrix for feature selection. The proposed method is simple and computationally fast. We demonstrate through extensive experiments that our method outperforms the state of the art. Furthermore, we demonstrate the proposed method’s efficiency in reconstructing the original data using the predicted labels.

Notes

References

Bartels, R.H., Stewart, G.W.: Solution of the matrix equation ax+ xb= c [f4]. Commun. ACM 15(9), 820–826 (1972)
Article Google Scholar
Braytee, A., Liu, W., Anaissi, A., Kennedy, P.J.: Correlated multi-label classification with incomplete label space and class imbalance. ACM Trans. Intell. Syst. Technol. (TIST) 10(5), 1–26 (2019)
Article Google Scholar
Braytee, A., Liu, W., Catchpoole, D.R., Kennedy, P.J.: Multi-label feature selection using correlation information. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. pp. 1649–1656 (2017)
Cai, Z., Zhu, W.: Multi-label feature selection via feature manifold learning and sparsity regularization. Int. J. Mach. Learn. Cyber. 9(8), 1321–1334 (2018)
Article Google Scholar
Chai, Y., Liu, H., Xu, J., Samtani, S., Jiang, Y., Liu, H.: A multi-label classification with an adversarial-based denoising autoencoder for medical image annotation. ACM Transactions on Management Information Systems (TMIS) (2022)
Che, X., Chen, D., Mi, J.: A novel approach for learning label correlation with application to feature selection of multi-label data. Information Sci 512, 795–812 (2020)
Article MathSciNet Google Scholar
Cheng, Y., Zhao, D., Wang, Y., Pei, G.: Multi-label learning with kernel extreme learning machine autoencoder. Knowl. Based Syst. 178, 1–10 (2019)
Article Google Scholar
Fan, Y., Chen, B., Huang, W., Liu, J., Weng, W., Lan, W.: Multi-label feature selection based on label correlations and feature redundancy. Knowl. Based Syst. 241, 108256 (2022)
Article Google Scholar
Hou, P., Geng, X., Zhang, M.L.: Multi-label manifold learning. Citeseer 30, 1680–1686 (2016)
Google Scholar
Huang, R., Jiang, W., Sun, G.: Manifold-based constraint laplacian score for multi-label feature selection. Patt. Recogn. Lett. 112, 346–352 (2018)
Article Google Scholar
Ishii, Y., Koide, S., Hayakawa, K.: Learning low-dimensional manifolds under the l0-norm constraint for unsupervised outlier detection. Int. J. Data Sci. Anal. 13(1), 47–61 (2022)
Article Google Scholar
Jia, X., Lu, Y., Zhang, F.: Label enhancement by maintaining positive and negative label relation. IEEE Transactions on Knowledge and Data Engineering (2021)
Law, A., Ghosh, A.: Multi-label classification using a cascade of stacked autoencoder and extreme learning machines. Neurocomputing 358, 222–234 (2019)
Article Google Scholar
Li, J., Li, P., Hu, X., Yu, K.: Learning common and label-specific features for multi-label classification with correlation information. Patt. Recogn. 121, 108259 (2022)
Article Google Scholar
Ranzato, M., Boureau, Y.L., Chopra, S., LeCun, Y.: A unified energy-based framework for unsupervised learning. Artif. Intell. Statist. 11, 371–379 (2007)
Google Scholar
Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Advances in neural information processing systems. pp. 737–744 (2003)
Wang, L., Liu, Y., Qin, C., Sun, G., Fu, Y.: Dual relation semi-supervised multi-label learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 6227–6234 (2020)
Xiao, J., Xu, J., Tian, C., Han, P., You, L., Zhang, S.: A serial attention frame for multi-label waste bottle classification. Appl. Sci. 12(3), 1742 (2022)
Article Google Scholar
Xu, L., Wang, Z., Shen, Z., Wang, Y., Chen, E.: Learning low-rank label correlations for multi-label classification with missing labels. In: 2014 IEEE International Conference on Data Mining. pp. 1067–1072. IEEE (2014)
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Patt. Recogn. 40(7), 2038–2048 (2007)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2013)
Article Google Scholar
Zheng, M., Xu, J., Shen, Y., Tian, C., Li, J., Fei, L., Zong, M., Liu, X.: Attention-based cnns for image classification: A survey. In: Journal of Physics: Conference Series. vol. 2171, p. 012068. IOP Publishing (2022)
Zhu, Y., Kwok, J.T., Zhou, Z.H.: Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 30(6), 1081–1094 (2018)
Article Google Scholar

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Authors and Affiliations

School of Computer Science, University of Technology Sydney, Sydney, Australia
Ali Braytee & Wei Liu

Authors

Ali Braytee
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Braytee.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Braytee, A., Liu, W. Robust multi-label feature learning-based dual space. Int J Data Sci Anal 17, 373–387 (2024). https://doi.org/10.1007/s41060-023-00496-4

Download citation

Received: 21 July 2021
Accepted: 09 December 2023
Published: 13 January 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s41060-023-00496-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robust multi-label feature learning-based dual space

Abstract

Similar content being viewed by others

Learning Discriminative Features Using Multi-label Dual Space

Geometrically Preserved Dual Projections Learning for Multi-label Classification

The Linear Geometry Structure of Label Matrix for Multi-label Learning

1 Introduction

2 Related work