1 Introduction

With the global proliferation of COVID-19 and the widespread usage of the RT-PCR test, known for its limited accuracy, particularly in the early stages of the disease [1], there is an urgent need to employ machine learning algorithms on X-ray images for the detection and diagnosis of COVID-19 cases. The field of machine learning has seen remarkable progress, simplifying dataset handling. In contrast to the RT-PCR test, outcomes obtained from these images tend to be more precise.

Clustering, a critical area of research, involves categorizing data into distinct groups known as clusters. The nature and structure of datasets significantly impact the performance of clustering approaches. Consequently, identifying the most effective strategy for clustering a given dataset is of paramount importance. Utilizing different data perspectives can provide more comprehensive insights into cluster distributions, ultimately leading to more meaningful clustering outcomes. These various data perspectives should be integrated using a method that minimizes dissimilarity between them while emphasizing their commonalities.

In recent years, numerous methods have been proposed for multi-view clustering. Spectral clustering algorithms [2,3,4] are among the most widely used clustering techniques. These methods typically follow a three-step process: (1) creating a similarity matrix among data points, (2) computing a spectral projection matrix, and (3) generating clustering results using additional methods such as k-means, k-medoids, or spectral rotation. It is crucial to note that the final step has limitations, as it depends on the initialization phase and is vulnerable to noise and outliers.

Subspace clustering algorithms [5,6,7] are used to establish a consistent graph from multiple views within the shared subspace of the data, enabling clustering using spectral clustering techniques. To address clusters with arbitrary shapes, multiple kernels algorithms [8, 9] are employed by selecting the most suitable kernel for each view. Additionally, matrix factorization algorithms [10, 11] are widely adopted to reduce the number of features, offering a more computationally efficient alternative compared to other approaches, although they are less suited for handling nonlinear data.

In this paper, we introduce a novel approach, OCNE (One-step multi-view spectral clustering by learning Constrained Nonnegative Embedding), inspired by the “Multi-view spectral clustering via integrating nonnegative embedding and spectral embedding” (NESE) method presented in Hu et al.’s work [12]. Our approach aims to address certain limitations observed in previous methods.

OCNE capitalizes on the strengths of the NESE technique, particularly its ability to simultaneously construct nonnegative and spectral projection matrices. This simultaneous construction enables direct clustering without the need for additional steps such as k-means or the introduction of extra parameters. Additionally, our approach introduces two essential constraints to the nonnegative embedding matrix: the first constraint enforces the smoothness of cluster indices across the various graphs, while the second constraint ensures the orthogonality of columns in the nonnegative embedding matrix, thereby facilitating cluster separation. We provide an efficient optimization framework for optimizing the specified criteria. Furthermore, we put OCNE to the test using the COVIDx dataset, which aggregates data from multiple public datasets. This dataset comprises chest X-ray images categorized into three classes and offers three different views. (Each image is associated with three distinct sets of deep features.)

Diagnosing a specific disease typically falls within the realm of supervised learning. However, in practical scenarios, gathering a substantial and accurately labeled dataset for a disease like COVID-19 can be both financially demanding and time-intensive. Consequently, we have chosen to embrace an unsupervised approach primarily due to the practical complexities associated with acquiring a sufficiently large and meticulously labeled dataset suitable for supervised machine learning. By opting for an unsupervised approach, our aim is to make the most efficient use of the available data resources without the dependency on an extensive collection of labeled COVID-19 cases. This decision is grounded in the recognition that unsupervised methods, such as multiview clustering, possess the capacity to unveil underlying patterns and structures within the data, even in scenarios where labeled examples are scarce or entirely absent.

Moreover, the exploration of multi-view learning is notably absent in these areas. Multi-view learning techniques have the capacity to comprehensively exploit the informative aspects of multiple perspectives, thereby enhancing the predictive performance of data. The integration of various data views is additionally justified by the varying degrees of importance and prior information associated with each view. Consequently, a multi-view COVIDx dataset and unsupervised learning models are employed for COVID-19 diagnosis. While unsupervised clustering functions as a class-agnostic classifier, the resultant clusters can readily be correlated with actual labels. In addition, we extend to the exploration of diverse scenarios within the COVIDx dataset. By systematically removing specific views and examining different dataset subsets, we comprehensively assess the effectiveness of our approach across varying conditions.

Figure 1 shows examples of lung X-rays images with their corresponding labels.

Fig. 1
figure 1

Lung X-rays images for the three mentioned classes: Normal, Pneumonia and COVID-19

The following is a summary of the main contributions of the paper.

  1. 1.

    The proposed approach amalgamates the strengths of both graph-based and matrix factorization-based methods. Notably, our method eliminates the need for post-processing steps like K-means, streamlining the clustering process for improved efficiency and effectiveness.

  2. 2.

    Building upon this foundation, we introduce innovative elements into the clustering framework. Our approach incorporates a smoothing term for cluster indices and enforces an orthogonality constraint on the nonnegative embedding matrix. These constraints contribute to improved clustering outcomes compared to the NESE approach.

  3. 3.

    Our work represents a pioneering effort in the application of multi-view clustering algorithms to the detection of COVID-19 cases. By seamlessly integrating information from diverse data sources, we unlock new potential for disease diagnosis and showcase the versatility of multi-view learning technologies.

  4. 4.

    To substantiate the selection of the multi-view clustering algorithm for application to this dataset, we conducted tests on other datasets to demonstrate its efficiency. Additionally, we introduce an efficient optimization framework, utilizing an alternating minimization scheme, to optimize the specified criteria.

  5. 5.

    Furthermore, we evaluate the performance of our method by examining various subsamples from the COVIDx dataset. These subsamples are generated by excluding a specific view from the COVIDx dataset, resulting in two distinct data subsets. We subject these samples to testing using the OCNE, NESE, and SC methods.

  6. 6.

    Diagnosing a specific disease typically falls under the domain of supervised learning. However, in real-world applications, collecting a substantial amount of labeled COVID-19 data proves to be a costly and time-consuming endeavor. Furthermore, the exploration of clustering with multiple views, also referred to as multi-view clustering, has been notably absent in this context. Leveraging multi-view learning technology can offer a comprehensive understanding of the valuable insights across various perspectives, ultimately enhancing data prediction performance. The amalgamation of diverse data views is driven by the recognition that different views carry varying degrees of significance and prior information. Hence, we applied the COVIDx dataset, which encompasses multiple views, in conjunction with unsupervised learning models for the purpose of COVID-19 diagnosis.

Collectively, these contributions underscore the significance and innovation of our work in the domain of COVID-19 diagnosis, offering a promising avenue for efficient, cost-effective, and accurate disease detection, particularly in resource-constrained environments.

The remainder of this paper is structured as follows: In Sect. 2, we delve into relevant work on multi-view clustering and provide an overview of the NESE approach introduced by Hu et al. in [12]. Section 3 offers a comprehensive exposition of our proposed approach and the associated optimization scheme. Section 4 presents our experimental findings, including a comparative analysis of our method with several state-of-the-art techniques. Finally, the paper concludes with Sect. 5.

2 Related work

2.1 Notations

In this study, matrices are denoted in bold uppercase, while vectors are represented in bold lowercase. Let \(\textbf{X}^{(v)}\) represent the data matrix of view v, with \(v = 1, \ldots , V\). \(\textbf{X}^{(v)}\) is equivalent to \((\textbf{x}_1^{(v)}, \textbf{x}_2^{(v)}, \ldots , \textbf{x}_n^{(v)}) \in {\mathbb {R}}^{n \times d^{(v)}}\), where n is the total number of samples, and \(d_v\) is the dimensionality of the data in each view v. Our objective is to cluster the data into K clusters. Additionally, \(\textbf{x}_i^{(v)}\) signifies the i-th sample within the matrix \(\textbf{X}^{(v)}\). The trace of a matrix \(\textbf{M}\) is denoted by \(\textit{Tr}(\textbf{M})\), and the transpose is represented as \(\textbf{M}^T\). \(M_{ij}\) refers to the element in the i-th row and j-th column of the matrix \(\textbf{M}\). The Frobenius norm of this matrix is expressed as \(\Vert \textbf{M}\Vert _{2}\), and the \(l_2\)-norm of a vector \(\textbf{m}\) is given by \( \Vert \textbf{m}\Vert _{2}\). The primary matrices utilized in our work include:

The similarity matrix of each view denoted as \(\textbf{S}_v\), with the corresponding spectral projection matrix and Laplacian matrix represented by \(\textbf{P}_v\) and \(\textbf{L}_v\), respectively. The diagonal matrix is symbolized by \(\textbf{D}\), and the identity matrix is represented as \(\textbf{I}\). The cluster index matrix (non-embedding matrix) is denoted as \(\textbf{H}\). \({{\textbf {1}}}\) signifies a column vector in which all elements are set to one. The balance parameters employed in this article are denoted as \(\alpha \) and \(\lambda \).

The notations utilized in this paper are detailed in Table 1.

Table 1 Main notations used in this paper

2.2 Related work

Nowadays, various multiview clustering methods have shown promising performance in different domains. In this paper, various multiview clustering methods are used to detect COVID-19 cases using chest X-ray datasets. In this section, the main categories of multiview clustering methods are explained. There are four main categories of multiview learning methods: (1) spectral clustering, (2) graph-based methods, (3) matrix factorization, and (4) subspace clustering methods

The goal of spectral clustering methods [13, 14] is to project data into a space where they are linearly separable. First, a similarity matrix is created between the data points for each view. Then, a unified similarity matrix is created by merging the different similarity matrices. Then, a spectral projection matrix corresponding to this similarity matrix is computed, whose rows lead to the final clustering results using the k-means method. The co-training method, as proposed in [3], is a widely recognized technique for multi-view spectral clustering. In this method, the same set of instances is clustered across multiple views. It achieves this by using the clustering result from the first view to adjust the affinity matrix of the second view, making it more similar to the first view. Co-regularized spectral clustering [4] is another prominent method that adaptively combines multiple similarity matrices from various views to achieve more accurate clustering results.

There are other multi-view clustering methods that take into account the importance of each view by assigning weights to each view, as outlined in references such as [2, 15]. While these approaches offer a practical solution by providing a common framework to combine different graphs, they do so at the expense of introducing extra weighting factors for each view. These weighting parameters can increase the computational complexity of the proposed methods.

To address this limitation, several methods have incorporated automatic weight learning, eliminating the need for manually set hyperparameters. Notable examples include [16,17,18,19,20], where the weight for each view is calculated automatically.

Recently, spectral clustering has gained prominence in clustering with multiple views [21,22,23]. Spectral-based methods create a similarity matrix across all views and generate the spectral projection matrix with K connected components from V graphs. Notably, the authors of [24] introduced the “Adaptively Weighted Procrustes” (AWP) method, a spectral-based clustering variant that utilizes spectral rotation to learn the cluster indicator matrix. Compared to previous graph-based approaches, AWP is characterized by lower computational complexity and higher precision. Furthermore, there is the Multi-View Subspace-based Clustering (MVSC) algorithm introduced by the authors in [5,6,7]. These methods aim to generate the most coherent representation matrix of the data in a low-dimensional space.

In the work presented in [25], the authors introduced the Multi-View Learning with Adaptive Neighbors (MLAN) algorithm, which can simultaneously learn the graph structure and perform the clustering step. However, these methods have a significant drawback in terms of computational cost due to matrix inversion and eigenvalue decomposition.

Another approach presented in [26] employs two weighting systems: one for each view and the other for the features within each view. This method selects both the best view and the most informative features for each view. The rationale behind this is that feature selection can enhance the quality of clustering. This algorithm is particularly useful when dealing with high-dimensional features, as it performs feature selection and multi-view clustering concurrently.

The approach detailed in [27] can concurrently handle three tasks: (1) learning the similarity matrix, (2) creating the unified graph matrix, and (3) making the final clustering assignment. It utilizes an innovative multi-view fusion technique that automatically assigns weights to the graphs from each view to construct the unified graph matrix. Additionally, this method enforces a rank constraint on the Laplacian matrix to guarantee the presence of exactly K clusters. In addition, the researchers in [22] introduced a comprehensive framework for a multi-view spectral clustering approach that provides various learned graphs from each view, the unified fused graph, and the spectral clustering matrix simultaneously. Furthermore, [16] presents two automatic weighted clustering algorithms specifically designed for multiple views.

To reduce the computation time of graph-based methods, matrix factorization methods [12, 28] have become popular approaches for multi-view clustering. The basic idea of these methods is to convert the initial matrix into a product of two smaller matrices, which allows them to handle large datasets. Moreover, in [28], the authors present a method for multi-view clustering by consensus graph learning and non-negative embedding (MVCGE). This method benefits from kernelized graph learning methods and matrix factorization methods. The kernel is used to project the data into a space where it can be linearly separated. Thus, it can represent the similarity between the data points and account for the nonlinearity of the data. Moreover, the nonnegative matrix factorization matrix calculated by this method is introduced to directly obtain the final clustering result. This method is also able to automatically compute robust similarity and spectral projection matrices as well as the weights of each view. The authors of [29] develop a new method called one-step multi-view spectral clustering with cluster-label correlation graph. This method represents a significant innovation over existing multiview clustering methods. In addition to maintaining the advantages of methods for factorizing nonnegative matrices, this approach introduces an innovation inspired by semi-supervised learning. An additional graph is created to represent the similarity of the predicted labels. This graph is called the cluster label graph and is used in addition to the graphs associated with the data space.

A recent method called Dual Shared-Specific Multi-view Subspace Clustering (DSS-MSC) was introduced in [30]. DSS-MSC employs a dual learning model to simultaneously explore the characteristics of each view in a low-dimensional space. This approach aims to leverage the valuable and precise information from each view, while also considering the relationships among the shared information across different views. Additionally, the authors of [31] propose an innovative approach for learning a unified consistent graph. They achieve this by jointly computing the self-expressive coefficients and affinity matrix derived from different kernels. This unified graph is then used for the final clustering process. Two comprehensive surveys on the subject of multiview clustering can be found in [32, 33]. These surveys offer an extensive overview of multiview clustering methods, encompassing both generative and discriminative approaches.

2.3 Review of the (NESE) method

The “Nonnegative Embedding and Spectral Embedding method” (NESE) is presented in [12]. NESE takes a distinctive approach by concurrently estimating the nonnegative embedding and spectral embedding matrices. This method aims to produce clustering results directly, obviating the necessity for supplementary clustering steps or additional parameters. The authors in [12] introduce a novel objective function for computing a unified nonnegative embedding matrix \(\textbf{H}\). This objective function draws inspiration from symmetric nonnegative matrix factorization and the relaxed continuous Ncut. The primary objective function of NESE is:

$$\begin{aligned} \min _{\textbf{H}, \; \textbf{P}_v} \sum _{v=1}^V \Vert \textbf{S}_v - \textbf{H}\, \textbf{P}_v^T\Vert _2 s.t. \; \; \textbf{H}\ge 0, \; \; \textbf{P}_v^T \, \textbf{P}_v = \textbf{I}. \end{aligned}$$
(1)

In this method, \(\textbf{S}_v\) represents the similarity matrix for the respective view v, \(\textbf{P}_v\) denotes the spectral projection matrix, and \(\textbf{H}\) serves as the unified nonnegative embedding matrix used for clustering assignments (where each row corresponds to a sample). Notably, this approach removes the requirement for supplementary parameters or additional clustering steps, like k-means, which can be notably influenced by the selection of an initial configuration. The authors employ an iterative optimization technique to compute the results of their approach, which encompass the spectral projection matrices and the unified nonnegative embedding matrix. This iterative process ensures the accuracy of the method’s results.

3 Proposed approach

This article introduces a novel approach, an enhancement of the NESE method, known as “One-step multi-view spectral clustering by learning Constrained Nonnegative Embedding” (OCNE). Our method introduces an additional constraint to the nonnegative embedding matrix \(\textbf{H}\) aimed at improving clustering quality. The key distinction between our approach and the NESE method is the inclusion of two constraints on the matrix \(\textbf{H}\): the view-based label-like smoothness constraint and the orthogonality constraint. Here, n represents the total number of samples, and with V views, the data for each view can be expressed as: \(\textbf{X}^{(v)} = (\textbf{x}_1^{(v)}, \textbf{x}_2^{(v)}, ..., \textbf{x}_n^{(v)})\). Similar to NESE, with the graph matrices of each view denoted as \(\textbf{S}_v \in {\mathbb {R}}^{n \times n}\) serving as input to the algorithm, the objective is to compute the spectral projection matrix \(\textbf{P}_v \in {\mathbb {R}}^{n \times K}\) and the coherent nonnegative embedding matrix \(\textbf{H}\in {\mathbb {R}}^{n \times K}\).

In the NESE method (as seen in Eq. (1)), only the nonnegative condition on the matrix \(\textbf{H}\) is imposed. To enhance clustering accuracy, we propose adding a set of additional constraints to the matrix \(\textbf{H}\). One of these constraints aims to ensure the smoothness of cluster labels across all views. This constraint implies that if the similarity between two data points \(\textbf{x}^{(v)}i\) and \(\textbf{x}^{(v)}j\) is high, then the vectors \(\textbf{H}{i*}\) and \(\textbf{H}{j*}\) should be similar. This mathematical formulation is achieved by minimizing the following term:

$$\begin{aligned} \frac{1}{2}\, \sum \limits _{i} \sum \limits _{j} \Vert \textbf{H}_{i*} \,-\textbf{H}_{j*} \, \Vert _2^2 \,S_{ij}^{(v)}\, ={Tr \,\left( \textbf{H}^T \, \textbf{L}_v\,\textbf{H}\right) }, \end{aligned}$$
(2)

where \(\textbf{L}_v\) \(\in {\mathbb {R}}^{n \times n}\) is the Laplacian matrix of the similarity matrix \(\textbf{S}_v\). \(\textbf{L}_v\) is equal to \(\textbf{D}_v-\textbf{S}_v\) where \(\textbf{D}_v\) is a diagonal matrix whose i-th diagonal element in the v-th view is given by: \( D_{ii}^{(v)} = \sum \nolimits _{j=1}^{n}{{\frac{S_{ij}^{(v)} +S_{ji}^{(v)}}{2}}}\).

The authors of [34] have demonstrated that introducing an orthogonality constraint on the soft label matrix can lead to improved results in semi-supervised classification. We consequently impose orthogonality constraints on the columns of the nonnegative embedding matrix \(\textbf{H}\). This constraint is enforced by minimizing the following term.

$$\begin{aligned} ||\textbf{H}^T \, \textbf{H}\,-\textbf{I}\, ||^2_2 \, =\,Tr \,\left( ( \textbf{H}^T \, \textbf{H}\, - \,\textbf{I})^T\,( \textbf{H}^T \, \textbf{H}\, - \,\textbf{I}) \right) . \end{aligned}$$
(3)
Fig. 2
figure 2

Illustration of the OCNE method

Finally, the objective function of the OCNE method will be :

$$\begin{aligned} \begin{aligned}&\min _{\textbf{P}_v, \; \textbf{H}} \sum _{v=1}^V \, \Vert \textbf{S}_v \,-\textbf{H}\, \textbf{P}_v^T \Vert _2 \,+ \,\lambda \,\sum _{v=1}^V \,\sqrt{Tr \,\left( \textbf{H}^T \, \textbf{L}_v\,\textbf{H}\right) } \, \\&\qquad +\alpha \,Tr \,\left( ( \textbf{H}^T \, \textbf{H}\, - \,\textbf{I})^T\,( \textbf{H}^T \, \textbf{H}\, - \,\textbf{I}) \right) \, s.t. \; \; \textbf{H}\\&\quad \ge 0, \; \; \textbf{P}_v^T \, \textbf{P}_v = \textbf{I}, \end{aligned} \end{aligned}$$
(4)

where \(\lambda \) is a balance parameter, and \(\alpha \) is a positive scalar that ensures the orthogonality of the matrix \(\textbf{H}\). Furthermore, our approach integrates the benefits of specific methods, as observed in [16, 18,19,20], which utilize an auto-weighted scheme in their objective function to minimize the need for additional parameters. In our approach, we employ two sets of adaptive weights, corresponding to the first and second terms in our objective function (4). The first set of weights is defined as follows:

$$\begin{aligned} \delta _v=\frac{1}{ 2\,*\, \Vert \textbf{S}_v \,-\textbf{H}\, \textbf{P}_v^T \Vert _2} \hspace{1cm} v=1,...., V. \end{aligned}$$
(5)

The second set of weights is given by:

$$\begin{aligned} w_v=\, \frac{1}{2\,*\,\sqrt{Tr \,\left( \textbf{H}^T \, \textbf{L}_v\,\textbf{H}\right) } } \hspace{1cm} v=1,...., V. \end{aligned}$$
(6)

In the end, the minimization problem corresponding to our method can be expressed as minimizing the following objective function:

$$\begin{aligned}{} & {} \min _{\textbf{P}_v, \; \textbf{H}} \, \sum _{v=1}^V \,\delta _v \, \Vert \textbf{S}_v \,-\textbf{H}\, \textbf{P}_v^T \Vert ^{2}_2 \,+ \,\lambda \,\sum _{v=1}^V \, w_v \,{Tr \,\left( \textbf{H}^T \, \textbf{L}_v\,\textbf{H}\right) } \, \nonumber \\{} & {} \qquad +\alpha \,Tr \,\left( ( \textbf{H}^T \, \textbf{H}\, - \,\textbf{I})^T\,(\, \textbf{H}^T \, \textbf{H}\, - \,\textbf{I}\,) \right) \, s.t. \; \; \textbf{H}\nonumber \\{} & {} \quad \ge 0, \; \; \textbf{P}_v^T \, \textbf{P}_v = \textbf{I}. \end{aligned}$$
(7)

Once \(\textbf{H}\) is estimated, the cluster index for each sample is determined by the position of the highest value in the corresponding row of \(\textbf{H}\).

Figure 2 shows an illustration of our proposed multi-view clustering method.

3.1 Optimization

In this section, we will outline our approach to optimizing the objective function defined in (7). To update the matrices \(\textbf{H}\) and \(\textbf{P}_v\), we employ an alternating minimization approach. This approach entails updating one of these two matrices while keeping the other fixed, and this process is reiterated until convergence is achieved.

Initially, we set the parameters \(\lambda \) and \(\alpha \) to zero and subsequently solve the resultant minimization problem, which yields the results akin to the NESE method. Consequently, the matrix \(\textbf{H}\) estimated by NESE acts as the initial matrix for our optimization. We also implement the same methodologies described in [35] for computing the matrices \(\textbf{S}_v\) for each view v and initializing \(\textbf{P}_v\).

Update \(\textbf{P}_v\): Fixing \(\textbf{H}\), \(w_v\), and \(\delta _v\), the objective function of OCNE will be equivalent to:

$$\begin{aligned} {\mathop {\min _{\textbf{P}_v}}} \, \sum _{v=1}^V \, \delta _v \, \Vert \textbf{S}_v \,-\textbf{H}\, \textbf{P}_v^T \Vert ^{2}_2 \end{aligned}$$
(8)

Given that \(\textbf{P}_v^T, \textbf{P}_v = \textbf{I}\), this problem is the well-known orthogonal Procrustes problem, and its solution can be obtained using the singular value decomposition of \(\textbf{S}_v^T ,\textbf{H}\). Let \(\textbf{O}\Sigma \textbf{Q}^T = \text {SVD}(\textbf{S}_v^T ,\textbf{H})\). The solution to equation (8) is then given by:

$$\begin{aligned} \textbf{P}_v =\textbf{O}\, \textbf{Q}^T \,\,\, {\text{ with }} \,\,\, \textbf{O}\Sigma \textbf{Q}^T= SVD \,(\textbf{S}_v^T \,\textbf{H}). \end{aligned}$$
(9)

Update H :

If we fix \(\textbf{P}_v\), \(w_v\), and \(\delta _v\), we calculate the derivative of the function in (7) w.r.t. \(\textbf{H}\):

$$\begin{aligned} \frac{\partial f}{\partial \textbf{H}}= & {} \,2\, \sum _{v=1}^V \,\delta _v\,(\,\textbf{H}\,-\,\textbf{S}_v \,\textbf{P}_v\,) \,+ \,2 \,\lambda \,\sum _{v=1}^V \,w_v \,\textbf{L}_v\,\textbf{H}\\{} & {} + \, 4 \, \alpha \, \textbf{H}\, (\, \textbf{H}^T \, \textbf{H}\, - \,\textbf{I}). \end{aligned}$$

Knowing that any real matrix \(\textbf{T}\) can be written as the difference of two nonnegative matrices, i.e., \(\textbf{T}= \textbf{T}^{+} - \textbf{T}^{-}\) where \(\textbf{T}^{+} = \frac{1}{2} \left( | \textbf{T}| + \textbf{T}\right) \) and \(\textbf{T}^{-} = \frac{1}{2} \left( | \textbf{T}| - \textbf{T}\right) \). Suppose that \(\textbf{N}_v=\,\textbf{S}_v \textbf{P}_v\,=\,\textbf{N}_v^+ -\, \textbf{N}_v^-\,,\;\)

and \(\textbf{L}_v\,=\,\textbf{L}_v^+ -\, \textbf{L}_v^-\,\).

After some algebraic manipulations, the gradient matrix will be equivalent to:

\( \frac{\partial f}{\partial \textbf{H}} = 2\, (\,\varvec{\Delta ^-} \, -\, \varvec{\Delta ^+})\) where:

\(\varvec{\Delta ^-} \,= \,\sum _{v=1}^V \,\delta _v\,\textbf{H}\,+ \,\sum _{v=1}^V \,\delta _v\,\textbf{N}_v^-\,+\,\lambda \,\sum _{v=1}^V \,w_v \,\textbf{L}_v^+\,\textbf{H}\,+ \,2 \, \alpha \,\textbf{H}\, \textbf{H}^T \, \textbf{H}\, \).

\(\varvec{\Delta ^+}\,= \,\sum _{v=1}^V \,\delta _v\,\textbf{N}_v^+\,+\,\lambda \,\sum _{v=1}^V \,w_v \,\textbf{L}_v^-\,\textbf{H}\,+ \,2 \, \alpha \,\textbf{H}\, \).

According to the nonnegative embedding matrix \(\textbf{H}\), it is updated by using the gradient descent algorithm. A step is given by:

$$\begin{aligned}&H_{ij}\,\leftarrow \, H_{ij}\,-\,\mu _{ij}\,\frac{\partial f}{\partial H_{ij}}\nonumber \\&\quad =H_{ij}\,-\,\frac{1}{2\,\Delta _{ij}^-}\,H_{ij} *\,2\,*\,(\,\Delta _{ij}^- \,-\,\Delta _{ij}^+ \,) \nonumber \\&\quad = \,H_{ij}\,*\, \frac{\Delta _{ij}^+}{\Delta _{ij}^-}. \end{aligned}$$
(10)

The learning parameter of the above equation \( \mu _{ij}\) is set to \(\frac{1}{2\,\Delta _{ij}^-}\,H_{ij}\). Therefore, the matrix \(\textbf{H}\) can be updated as follows:

$$\begin{aligned} H_{ij}\,\leftarrow \,H_{ij}\,*\, \frac{\Delta _{ij}^+}{\Delta _{ij}^-} \hspace{1cm} i=1,...., n; \; \; j=1,..., K. \end{aligned}$$
(11)

Update \(w_v\) and \(\delta _v\): The weights are updated using Eqs. (6) and (5), respectively, after all the mentioned matrices have been updated.

A summarized procedure of our OCNE method can be found in Table 2.

Table 2 Algorithm 1 (OCNE)

4 Performance evaluation

4.1 Experimental setup

Five image datasets were utilized to assess the effectiveness of our approach: MSRCV1, Caltech101-7, MNIST-10000, NUS, and COVIDx (Table 3). The MNIST-10000 and COVIDx datasets are relatively large for graph-based multiview clustering approaches. The COVIDx dataset comprises 13,892 chest X-ray images categorized into three classes: COVID-19, normal, and pneumonia. Although this dataset is commonly employed for supervised classification, we used it to evaluate the proposed unsupervised method. For each X-ray image, three different deep CNNs provided three image descriptors: ResNet50, ResNet101 [41], and DenseNet169 [42], trained on the ImageNet dataset. The dimensions of these descriptors are 2048, 2048, and 1664, respectively.

Table 3 Description of the datasets used in the paper

Our method was compared with several other approaches, including:

  • Auto-weighted Multi-View Clustering via Kernelized graph learning (MVCSK).

  • Spectral Clustering applied on the average of all views’ affinity matrices (SC Fused).

  • Multi-view spectral clustering via integrating Non-negative Embedding and Spectral Embedding approach (NESE).

  • Multi-View Spectral Clustering via Sparse graph learning (S-MVSC).

  • Consistency-aware and Inconsistency-aware Graph-based Multi-View Clustering approach (CI-GMVC).

The evaluation compared the performance of these methods on the datasets mentioned earlier.

With respect to the dataset NUS, we compare the OCNE method with the following competing methods:

  • Multi-view clustering via Adaptively Weighted Procrustes (AWP)

  • Multi-view Learning clustering with Adaptive Neighbors (MLAN) [25]

  • Self-weighted Multi-view Clustering with Multiple Graphs (SwMC) [18]

  • Parameter-free Auto-weighted Multiple Graph Learning (AMGL) [19]

  • Affinity Aggregation for Spectral Clustering (AASC) [36]

  • Graph Learning for Multi-View clustering (MVGL) [37]

  • Co-regulated Approach for Multi-View Spectral Clustering (CorSC) [4]

  • Co-training approach for multi-view Spectral Clustering (CotSC) [3].

For all these methods, we directly reproduce the best experimental results from the corresponding published paper [12]. The optimization procedure in our approach involves two hyperparameters: \(\alpha \) and \(\lambda \). We set \(\alpha \) to \(10^6\) to enforce the orthogonality constraint. The parameter \(\lambda \) is varied in the range from 10 to \(10^8\) in our experiments. To evaluate the performance of our method, we use four metrics: clustering accuracy (ACC), normalized mutual information (NMI), purity indicator, and adjusted rand index (ARI). These metrics are defined in [21]. Higher values of these metrics indicate better performance, meaning that the resulting clusters are more similar to the real clusters.

4.2 Experimental results

In this section, we provide a detailed evaluation of the experimental results. The best-performing results are indicated in bold, and for methods requiring an additional clustering step such as K-means, the standard deviation of the indicator parameters over multiple trials is shown in parentheses.

Our approach is compared to several state-of-the-art methods, including SC Fused, MVCSK, NESE, S-MVSC, and CI-GMVC. We assess the performance of our method not only on the COVIDx dataset but also on other datasets, including NUS, MSRCv1, Caltech101-7, and MNIST-10000. The results are presented in Tables 45, and 6. Best results are shown in bold.

From these tables, it is evident that OCNE outperforms most of the competing methods across the five datasets. Furthermore, OCNE demonstrates superior performance on large datasets such as MNIST-10000 and COVIDx, highlighting its effectiveness on datasets of varying sizes.

It’s worth noting that this work represents a significant step in applying clustering algorithms to detect COVID-19 cases in lung images. While the clustering results on this dataset may not be perfect, this study lays the groundwork for the use of unsupervised and semi-supervised learning algorithms in such scenarios. This approach enables the utilization of datasets with missing labels, which is particularly valuable in the context of medical image analysis.

4.3 Convergence study

Regarding the convergence of OCNE, Fig. 4 displays the variation of the objective function concerning the number of iterations, focusing on the MSRCv1 dataset. The figure clearly demonstrates that our method converges rapidly, typically before reaching 10 iterations. This fast convergence is a valuable characteristic of OCNE, as it leads to efficient and effective clustering results.

Table 4 Clustering performance on the COVIDx dataset
Table 5 Clustering performance on the NUS dataset
Table 6 Clustering performance on the MSRCv1, Caltech101-7 and MNIST-10000 datasets

4.4 Parameter sensitivity

In this section, we explore the sensitivity of the \(\lambda \) parameter. Figure 3 presents the values of the ACC and NMI indicators when varying the \(\lambda \) parameter from 10 to \(10^{+8}\) for the MSRCv1 dataset. The \(\alpha \) parameter was fixed at \(10^{+6}\).

From this figure, it is evident that OCNE achieves its best performance when the value of the \(\lambda \) parameter is set to \(10^{+5}\). This result highlights the importance of parameter tuning for optimizing the performance of the OCNE method.

Fig. 3
figure 3

Clustering performance ACC (%) and NMI (%) as a function of \(\lambda \) on the MSRCv1 dataset

Fig. 4
figure 4

Convergence of OCNE on the MSRCv1 dataset

4.5 Performance on different subsets of COVIDx

In this section, we delve into a comprehensive analysis of the performance of our approach across various scenarios using different-sized subsets of the COVIDx dataset, aiming to gain deeper insights into its adaptability and robustness. The exploration of these scenarios serves the dual purpose of assessing the scalability of our method and shedding light on its effectiveness under different data distributions. This is done by considering different sizes of this dataset and testing them with the SC, NESE and OCNE approaches.

First, a small sample of 2468 instances is extracted from the large dataset, consisting of 1000 instances of the Pneumonia class, 468 instances of the COVID-19 class, and 1000 instances of the Normal class. This sample is referred to as COVIDx-2468. This sample, while relatively limited in size, provides an initial glimpse into our method’s capabilities in scenarios where data resources may be constrained. It challenges our approach to effectively distinguish COVID-19 cases from other conditions in a context where class imbalance is prevalent. The second sample contains 6468 instances, divided into 3000 instances of the Pneumonia class, 468 instances of the COVID-19 class, and 3000 instances of the Normal class. This sample is referred to as COVIDx-6468. This dataset size better aligns with real-world medical datasets and poses a different set of challenges related to class balance and diversity in the data. The final sample is the large COVIDx dataset, which consists of 13,892 instances, as described previously. This extensive dataset encompasses a wide range of cases, providing a comprehensive evaluation of our approach’s performance across a diverse set of instances.

First, the famous “spectral clustering” method SC [38] is tested on these different samples, adapted for multi-view analysis. To assess the individual contributions of each view, we apply SC separately to each view, by considering the affinity matrix of the corresponding view, which can be obtained using the same method as in [12]. We refer to this method as SC1, SC2, and SC3 corresponding to \(View_1\), \(View_2\), and \(View_3\), respectively. Also, SC is applied to all views together by considering the affinity matrix as the average of all affinity matrices for all views. Table 7 summarizes the corresponding experimental results. By dissecting the results across different views and considering their collective impact, we gain an understanding of the strengths and limitations of our approach. This analysis serves to underscore the significance of multi-view clustering techniques and their potential for enhancing the diagnostic capabilities of our methodology.

Table 7 Clustering performance of the classic spectral method SC, applied to different views of the different subsets of the COVIDx dataset. Here, class 1 denotes the Pneumonia class, class 2 denotes the COVID-19 class, and class 3 denotes the Normal class
Table 8 Clustering performance of the NESE and OCNE methods applied to different views of the different subsets of the COVIDx dataset

Once we have a particular predicted clustering, we look for the best mapping between the three clusters obtained and the three classes. We can then compute the confusion matrices from which we can derive three evaluation metrics to measure the quality of the clustering results: Precision (indicates the percentage of all items predicted as positive that were actually positive), Recall (indicates the percentage of all true positive samples that the model was able to detect), and Selectivity Indicator (indicates the percentage of all true negative samples that the model was able to detect). All of these metrics have been widely used in previous studies, and their definitions can be found in [39]. In Table 7, we report these indicators for the three different classes: (1) pneumonia, (2) COVID-19, and (3) normal, which are referred to as class 1, 2, and 3, respectively. These evaluation metrics offer a comprehensive assessment of the clustering quality for each class. Importantly, a higher value for each metric corresponds to superior clustering results, emphasizing the significance of achieving higher precision, recall, and selectivity in our methodology. By reporting these metrics for each class in Table 7, we provide a detailed performance analysis that underscores the diagnostic potential of our approach. The focus on these well-established metrics ensures the robustness and reliability of our evaluation methodology, aligning with best practices in medical diagnostic research. For each evaluation metric, the larger the values, the better the results.

Therefore, NESE and OCNE are also evaluated with different cases. First, these methods are applied to two views each (\(View_1\) & \(View_2\), denoted \(v_1\) & \(v_2\), \(View_1\) & \(View_3\), denoted \(v_1\) & \(v_3\), and \(View_2\) & \(View_3\), denoted \(v_2\) & \(v_3\)). Then all views are considered. The experimental results are shown in Table 8.

From Table 7 which presents the results of the classical spectral clustering method, we can clearly see for the COVIDx-2468 subset that View 2 is the most informative for detecting COVID-19 cases, as indicated by Class 2 in the table (see Precision and Recall indicators). However, for the COVIDx-6468 and COVIDx datasets, view 3 is the most informative view. The difference between the clustering results of the different classes can be interpreted as follows. Our dataset contains “468” COVID-19 samples. This number is small compared with the other two classes (pneumonia and normal) for all subsets considered. Therefore, our data are imbalanced, which may reduce the efficiency of clustering methods for COVID-19 case detection. This highlights the importance of addressing class imbalance in future research endeavors and underscores the need for more advanced techniques in handling such scenarios. Because COVID-19 disease has similar symptoms to pneumonia, the algorithm may be confused when clustering images corresponding to these two different classes because they may have similar and overlapping patterns. Table 7 also shows that the result of Multiview Spectral Clustering (SC) for all subsets is generally lower than the result obtained by the best single view. This result can be interpreted by the fact that there is one view that gives significantly worse results than the other two views, which distorts the overall result of the multiview method SC. Furthermore, the graph fusion was performed with a simple averaging of the view graphs. It is worth noting that the graph fusion in our methodology offers a glimpse into the potential for further refinement. In summary, the detailed analysis of our experimental results in Table 7 unravels the intricate dynamics of multi-view clustering in the context of COVID-19 diagnosis. These findings not only enhance our understanding of clustering performance but also pave the way for future research focused on refining multi-view methodologies for medical diagnostics. Finally, we can observe that the clustering performance of the methods NESE and OCNE decreases as the imbalance if the ground-truth classes increases.

Regarding the results obtained by applying the method NESE using to the three subsets of the COVIDx dataset, Table 8 shows that the combination of all views did not provide the best clustering results for the three subsets. For the NESE method, the clustering performance was relatively good for both the combination of \(View_1\) & \(View_2\) and the combination of \(View_2\) & \(View_3\).

From Table 8, it can be seen that the proposed OCNE method gives the best clustering results when we consider \(View_2\) & \(View_3\) for all three subsets. It can also be observed that the fusion of all views in both NESE and our proposed OCNE method helps to improve the recall indicator associated with COVID-19 (class 2). The recall associated with OCNE was also higher than that obtained with NESE. We emphasize that a high recall indicator is required for this type of problem. In conclusion, the performance of both clustering methods NESE and OCNE decreases with increasing imbalance of ground-truth classes.

The results of our experiments have promising practical implications for the field of medical diagnostics, especially for the detection and diagnosis of COVID-19 cases. Although our method is currently still in the research phase, its potential applications in clinical practice are remarkable. If further developed and rigorously validated, our approach could become a valuable complementary tool for medical professionals. In settings where access to extensive labeled COVID-19 data is difficult, such as in resource-limited regions, the ability of our method to work with unlabeled data demonstrates its resource efficiency. Moreover, the transparent and interpretable results of our approach are indispensable in the medical field, as they allow physicians to make informed decisions based on the clustering results. However, we recognize that our work is a step toward practical implementation. Future clinical trials and collaboration with healthcare institutions will be instrumental in validating the effectiveness and safety of our method in practice and ultimately establishing it in the healthcare system.

5 Conclusion

Our work on Nonnegative Embedding and Spectral Embedding method, and its application for the first time to detect COVID-19 cases, is a significant contribution to the field of data analysis and medical diagnostics. By adding constraints to the nonnegative embedding matrix, you’ve enhanced the clustering performance and demonstrated its effectiveness on various datasets.

The potential extensions and applications of the method are promising. Using additional image datasets like CT images for patients is a valuable direction to explore, as it can provide complementary information for diagnosis. Detecting diseases at an earlier stage and tailoring treatment plans are critical for improving patient outcomes, and your method could contribute to this area. Expanding this work to diagnose other types of diseases would also have a positive impact on the field of medical diagnostics.

Our proposed method offers several notable advantages in the context of COVID-19 diagnosis. By integrating multiple data views, it captures a comprehensive representation of the disease, which can improve diagnostic accuracy. Since it works in an unsupervised manner, it does not require extensive labeled data, so it can be used in the real world. In addition, the introduction of constraints, including a smoothing term for cluster indices and an orthogonality constraint for the non-negative embedding matrix, improves the quality of clustering results, potentially outperforming other unsupervised methods. Rigorous tests on various datasets demonstrate its robustness and generalizability, suggesting potential broader applicability. However, there are other challenges, such as the complexity introduced by data with multiple views, dependence on the availability and relevance of data views, sensitivity to hyperparameters, and interpretability of complex unsupervised models. Although our approach offers promising solutions, further research and validation are essential to evaluate its practical utility in the clinical setting and to effectively address these limitations.

In terms of future directions, we foresee the development of a scalable variant of the proposed approaches that can handle large datasets without incurring excessive computational costs. Additionally, similar to the CSNE method in [40], our methods could be extended to estimate two nonnegative embedding matrices: the joint and specific nonnegative embedding matrices, instead of estimating only a single nonnegative matrix. Furthermore, a promising avenue for future research involves the integration of clinical data alongside imaging data. This integration can enhance the diagnostic capabilities of our approach by combining radiological information with patient history, symptoms, and clinical outcomes.