1 Introduction

In today’s world, complex systems can be modeled by graphs. Complex networks such as social networks, traffic networks, and biological networks are examples of such applications [1,2,3]. A graph is a representation of networks using nodes and edges; the edges show the connections between the nodes in the graph. Furthermore, it is capable of forming different types of graphs, such as static graphs, dynamic graphs, hypergraphs, and multi-view graphs. As a result of analyzing networks, very useful knowledge and information can be derived from the types of complex network characteristics such as network structure, type of communication among network members, latent network communication, etc. Community detection has been found to be one of the most common and effective tools for clustering, analyzing, and exploiting complex networks. According to the type of connection in the network, this tool will show us which nodes are closer to each other and can be grouped into a bunch or cluster. It also indicates which nodes are intermediate nodes that will connect the two clusters. By clustering and shrinking networks, graph clustering will make analysis much easier [2]. In recent years, a number of methods have been proposed for clustering graphs, including modularity-based community detection [4, 5], non-negative matrix factorization (NMF) [6], label propagation [7, 8], random walk-based community detection [9, 10], evolutionary optimization-based community detection [11, 12] and multi-view semi-supervised community detection [13].

Using NMF, matrix graph similarity (such as adjacent matrices) is transformed into several dimensional matrices that are then used to generate optimal community detection and clustering. Different models and functions are utilized in this method to extract clusters, such as symmetric NMF [15], robust NMF [16, 17], three-factor NMF [18], deep NMF [19,20,21,22], graphic NMF [23, 24], multi-view clustering based on NMF [25], non-linear constrained NMF (NNMF) [26], graph regularized nonnegative matrix factorization (GNMF) [27] and NMF for dynamic graph [28]. Based on the robust model of NMF proposed in [16, 17], uncertainties such as incorrect knowledge priority and data type combination errors could be reduced through software by using \({l}_{2}\)-norms. According to three-factor NMF [18], the measurement criteria is combined with the three-factor model to produce improved performance in the final clustering step. As a result of the deep NMF [19,20,21], the error of solving the iterative method is greatly reduced. In addition, the measurement criterion has led to a significant improvement in modeling. Multi-view clustering based on NMF [25] considers several types of data or feature data to better identify dimensional matrices, however it has flaws such as poor feature extraction, slow convergence speed, and low accuracy. According to NNMF [26], dimensional matrices create a non-linear model in the NMF method. Also, graph regularized nonnegative matrix factorization (GNMF) in [27] uses graph or data feature to improve NMF models. Using Kalman filters and matching network features, [28] solves the challenge of identifying community changes in dynamic graphs. Conceptual factorization (CF) [29] is an extension of NMF. In this approach, each cluster is a linear combination of data points, and each data point is a linear combination of cluster centers. Therefore, to reduce the effect of noise for the CF method, in [30], dual graph-regularized sparse concept factorization (DGSCF) is proposed by adopting an optimization framework based on \({l}_{1}\) and Frobenius norms.

In order to detect communities, the NMF method makes use of the knowledge priority of the members of each cluster, making it one of the most common methods of semi-supervised community detection. It is important to note that all NMF- based community detection models are optimization problems; therefore, knowledge priority can be integrated into graph similarity matrices [31, 32] or added to the optimization problem as a separate expression [33,34,35]. Incorporating the knowledge priorities of each cluster will lead to errors, such as entering the wrong knowledge priority and making incorrect human decisions. Therefore, a new model based on the robust model is proposed in order to reduce error in [16]. As a result, the model reduces the errors that occur due to the correct knowledge priority compared to other semi-supervised methods and provides much better community detection quality in spite of the absence of these errors.

Detecting communities using the NMF also has the advantage of being applicable to a variety of data types, such as audio, video, text, and graphics. A number of clustering issues have been investigated using this community detection technique, including audio source separation [36, 37], image processing [38], keyword and document extraction [39], network clustering for information graphs and data [17], and semi-supervised clustering using NMF models for all data [40]. In spite of the fact that NMF is very general and can be applied to a wide range of different data types, it cannot study and identify the characteristics of a particular type of data in a specialized manner. This means that data types (audio, acoustic, text, network) will not affect the NMF-based community detection. In order to address this limitation, modularity was added to the community detection based on NMF in a linear format in order to improve the quality of clustering in complex networks in [14, 18, 21].

The concept of modularity was specifically introduced in [4] as a criterion for validating graphs. To determine the accuracy of community detection, the criterion considers the density of communication between nodes within a cluster as well as the level of communication between groups. In other words, modularity is a popular criterion for community detection methods; it is a criterion that is specialized for clustering complex networks. It has been demonstrated in [4] that solving the problem of modularity-based community detection is an NP-hard problem. The pervasiveness of this criterion has led to the development of various modularity-based methods for community detection in complex networks in recent years. Methods such as the greedy method [41], evolutionary optimization solution method [42], and the spectral method [4] are among the methods that can be mentioned. In spite of the fact that modularity-based community detection is a commonly employed method for clustering, it has limitations, including dependence on the sum of edges [43], lack of knowledge about the priority of clustering, and the complexity of the problem.

Accordingly, this paper aims to present a new solution combining the advantages of community detection based on NMF and community detection based on the modularity criterion. For this purpose, it is first demonstrated that modularity-based community detection has a similar structure to NMF-based community detection. With this advantage and innovation, robust community detection based on NMF and the modularity criterion can be combined with multi-view clustering methods. In addition, a robust model of the multi-view clustering method is employed in order to reduce the error associated with the user’s input of knowledge priority. Consequently, in addition to providing a new way to reinforce semi-supervised community detection methods, this paper provides an adaptive method that can be applied to clustering graphs utilizing graph attribute information, knowledge priority of cluster members, and network structure properties, as well as an providing an iterative solution and a convergence analysis for this method. In order to improve robust community detection based on NMF, the paper proposes a novel model for semi-supervised community detection called MRASNMF. In this regard, the main contributions of this paper can be summarized as follows:

  1. 1.

    It is shown that the structure of modularity-based community detection is similar to that of NMF-based community detection. Using this similarity, the knowledge priority of clustering graphs is combined with a novel community detection algorithm.

  2. 2.

    To use the proposed structure, a modularized robust adaptive semi-supervised NMF model (MRASNMF) is developed, and a multi-view robust combination is subsequently employed for detecting communities based on NMFs in a semi-supervised manner. A converging algorithm and iterative solution method are provided for the model, as well.

  3. 3.

    The clustering method are applied to five real-world networks. The model’s efficiency is examined in relation to the quality of the solution, and it is demonstrated that the model is adaptively trained. Furthermore, this paper shows that the MRASNMF model can improve the algorithm’s efficiency when dealing with erroneous knowledge priority errors.

The remainder of this paper is structured in such a way that Sect. 2 provides a brief overview of the concept of modular community detection based on the modularity criterion and a review of previous semi-supervised community detections based on NMF. The proposed robust semi-supervised adaptive algorithm, its iterative solution method, and the convergence analysis of this method are discussed in Sect. 3. Section 4 introduces the evaluation criteria and data set for testing. Afterward, the required parameters are extracted, and the results are displayed on the actual data set. Section 5 concludes and summarizes the discussion.

2 Related Works

The literature on graph clustering is briefly discussed in this section, and in the following section, the proposed model will be described. In order to provide clarity, at first, the frequently used notations are described in Table 1.

Table 1 Notations and descriptions

2.1 Community Detection Based on Modularity

Graph \(G = \left(V, E\right)\) is considered, where \(V\) is a set of nodes with a total number of n nodes and \(E\) is a set of edges between two nodes with a total number of \(m\) edges. The measurement criterion determines the validity of community detection based on the density of edges in clusters and intergroup communication. This criterion is also used independently to determine clusters of complex networks [4, 18]. Therefore, in general, community detection based on the modularity criterion can be rewritten as an optimization problem with the function of the modularity criterion and is subject to the following conditions [18]:

$$ \mathop {\max }\limits_{X} Q = \mathop {\max }\limits_{X} \frac{1}{2m}tr\left( {X^{T} BX} \right),\;B = A - B_{1} \;S.t.{ }X^{T} X = I, $$
(1)

where \(B\) is the modularity matrix, \(A\) is the graph adjacent matrix, \({k}_{i}\) is the degree corresponding to node \(i\), \(X=\left[{X}_{ij}\right]\in {R}^{n\times k}\) is the cluster members matrix for each cluster, \({\left({B}_{1}\right)}_{ij}=\frac{{k}_{i}{k}_{j}}{2m}\) and \(k\) is the number of communities in the graphs.

2.2 Community Detection Based on NMF

In NMF, the graph similarity matrix is converted to two reduced dimension matrices, and subsequently, the final clustering is yielded from the decomposed matrices. In the decomposed matrices, however, an attempt is made to preserve the latent properties of the original matrix in the later reduced matrices. In other words, assuming the clustering number \(k\), the similarity matrix \(Y\in {R}^{n\times n}\) is decomposed into two new matrices, namely \(H\in {R}^{n\times k}\) and \(X\in {R}^{n\times k}\) (\(Y\simeq H{X}^{T}\)), where \(H\) represents the community relationship matrix and \(X\) represents the community membership matrix of each cluster. To get the given reduction matrices, the optimization problem is introduced with the following cost function:

$$\underset{H,X}{{\text{min}}}{J}_{nmf}\left(H,X\right)={\left\Vert Y-H{X}^{T}\right\Vert }_{F}^{2}$$
(2)

where \({\Vert .\Vert }_{{\varvec{f}}}\) is the Frobenius norm. This optimization function is a non-convex optimization function that is proposed in several studies such as [14] to yield two matrices H and X and iterative solution methods based on Lagrangian functions.

Another model is symmetric NMF (SNMF), which reduces the modeling error by decreasing the variables. As expressed in [15], the optimization function is in the form of Eq. (3).

$$\underset{X}{{\text{min}}}{J}_{SNMF}\left(X\right)={\left\Vert Y-X{X}^{T}\right\Vert }_{F}^{2}$$
(3)

where \(X\in {R}^{n\times k}\) represents the membership of each cluster. Comparing Eqs. (2) and (3), the advantage of the proposed model is the inclusion of community communication information in \(H\).

2.3 Semi-Supervised Community Detection Based on NMF

Semi-supervised community detection methods refer to methods in which the knowledge priority of cluster members improves community detection. There are two ways in which cluster members’ knowledge priority can be expressed: being a member of a group or not belonging to other clusters. Additionally, the knowledge priority of each cluster is calculated as a percentage of the knowledge of the members. As a result, NMF-based community detection is one of the most common and comprehensive methods for semi-supervised community detection. Semi-supervised community detection based on NMF usually follows two general approaches.

In the first approach, the similarity matrix of community detection is changed using NMF. In [31], Zhang combined the knowledge priority of node labels with the basic structure of a graph. As a result, the generalized adjacent matrix (\(\overline{A }\)) replaced the adjacent matrix. The generalized matrix can be defined as follows:

$$ \overline{A} = \left\{ {\begin{array}{*{20}l} {\alpha ,} \hfill & {if\;x_{i} ,\;x_{j} \;have\;same\;labels} \hfill \\ {0,} \hfill & {if\;x_{i} ,\;x_{j} \;have\;diffrent\;labels} \hfill \\ {A_{ij} + I_{ij} ,} \hfill & {other{\text{wise}}} \hfill \\ \end{array} } \right. $$
(4)

where \(\alpha \) is a positive value equal to 2 and \(I\) is the identity matrix.

In another study [32], the adjacent matrix of the graph was revised with the initial knowledge of the pairing of nodes in a cluster and the group differences between the nodes of the adjacent matrix, resulting in the Semi-Supervised Nonnegative Matrix-based semi-supervised clustering (SNMF-SS) algorithm. The input matrix is rewritten in SNMF-SS is defined as follows:

$$\overline{Y }=Y-\alpha {W}_{ML}+\beta {W}_{CL}$$
(5)

where \({W}_{ML}\) denotes the matrix of the relationship between members of each cluster and \({W}_{CL}\) represents the matrix of the relationship between members of different clusters.

In [44], using the SNMF-SS algorithm model and substituting a random matrix for the Y matrix, the community partition algorithm based on the random walk and matrix factorization (SNMFRW) algorithm yielded a different community detection in the semi-supervised mode.

In the second approach, in order to use the knowledge priority, the ultimate goal is to add a new expression to the community detection optimization function based on the NMF; therefore, the cost function of community detection consists of the sum of the clustering section based on the NMF and the section related to the knowledge priority of clustering. Equivalently, Eq. (6) shows the total cost function for this approach, as:

$${J}_{PCNMF}={J}_{clustering}+{J}_{pairs}$$
(6)

where \({J}_{pairs}\) is the primary knowledge optimization function and \({J}_{clustering}\) is the community detection function. In the following, further literature is briefly discussed.

In [33], Yang proposed the Pairwise Constraints-guided Nonnegative Matrix Factorization (PCNMF) method in which knowledge priority is introduced into the optimization of NMF in the form of new relationships. In the new relation, the optimization function is rewritten as follows:

$${J}_{PCNMF}={\left\Vert Y-H{X}^{T}\right\Vert }_{F}^{2}+\lambda tr\left({X}^{T}LX\right)$$
(7)

where \(Y\) is the same as the adjacent matrix. Moreover, \(L=D-{A}_{1}\) where the diagonal matrix \(D\) is \({D}_{ii}=\sum_{i=1}^{n}{A}_{{1}_{ii}}\) and \({A}_{1}\) is rewritten with information from knowledge priority as follows:

$$ {\text{A}}_{{1_{ij} }} = \left\{ {\begin{array}{*{20}l} {\alpha ,} \hfill & {if\;x_{i} ,\;x_{j} \;have\;same\;labels} \hfill \\ { - \left( {1 - \alpha } \right),} \hfill & {if\;x_{i} ,\;x_{j} \;have\;diffrent\;labels} \hfill \\ {0,} \hfill & {otherwise} \hfill \\ \end{array} } \right. $$
(8)

In the above equation, \(\alpha \) is a positive value equal to 2.

As PCNMF in [45], in several models were proposed by considering the different matrix of \(Y\) (graph similarity) in \({\Vert Y-H{X}^{T}\Vert }_{F}^{2}\) and the quality and efficiency of the community detection methods are investigated. For instance, the semi-supervised community detection based on the Laplacian matrix (\(S{C}_{LAP}\)) considers community detection based on the Laplacian similarity matrix as the clustering section and presents the final semi-supervised community detection model as follows:

$${J}_{{{\text{SC}}}_{{\text{LAP}}}}=tr\left({X}^{T}\left(D-A\right)X\right)+\lambda tr\left({X}^{T}LX\right)$$
(9)

In another study [35], the knowledge of the members of each cluster is linearly combined with the clustering section with the new model. This algorithm considers the knowledge priority part of the members of each cluster as follows:

$$ \left( M \right)_{ij} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {i = j} \hfill \\ {\alpha ,} \hfill & {\left( {v_{i} ,v_{j} } \right) \in C_{ml} } \hfill \\ {0,} \hfill & {otherwise} \hfill \\ \end{array} } \right. $$
(10)

where the semi-supervised community detection model is considered as:

$${J}_{PNMF}={\left\Vert Y-\left(HX\right){\left(HX\right)}^{T}\right\Vert }_{F}^{2}+\alpha \left(tr\left({X}^{T}{D}_{M}X\right)-tr\left({X}^{T}MX\right)\right)$$
(11)

Here, \({D}_{M}\) is a diagonal matrix with the sum of the diagonal elements of each row of matrix \(M\).

Also, in [46], the SVDCNMF and SVDCSNMF methods extend the PCNMF models using singular value decomposition algorithm and use generalized adjacent matrix (\(\overline{A }\)) in Eq. (4). SVDCNMF and SVDCSNMF algorithms are, on the other hand, the same as Eq. (7) with the assumption of NMF and SNMF models. Therefore, the only difference in determining the Laplacian (\(L={D}_{{W}_{1}}\)) matrix is that the \({W}_{1}\) matrix is determined as follows:

$$ \left( M \right)_{ij} = \left\{ {\begin{array}{*{20}l} {\alpha ,} \hfill & {if\;x_{i} ,\;x_{j} \;have\;same\;labels} \hfill \\ {0,} \hfill & {if\;x_{i} ,\;x_{j} \;have\;diffrent\;labels} \hfill \\ {{\text{A}}_{{1_{ij} }} ,} \hfill & {otherwise} \hfill \\ \end{array} } \right. $$
(12)

Additionally, in [34] a Pairwise Constrained Symmetric Nonnegative Matrix Factorization (PCSNMF) was proposed, where the knowledge priority of members and relationships within a group and different clusters were combined with NMF. However, the proposed optimization function was as follows:

$${J}_{PCSNMF}={\left\Vert Y-X{X}^{T}\right\Vert }_{F}^{2}+\alpha \left(tr\left({X}^{T}MX{B}_{2}\right)+tr\left({X}^{T}CX\right)\right)$$
(13)

where, \({B}_{2}=\left[\begin{array}{ccc}0& 1& \begin{array}{cc}\dots & 1\end{array}\\ 1& 0& \begin{array}{cc}\ddots & \vdots \end{array}\\ \begin{array}{c}\vdots \\ 1\end{array}& \begin{array}{c}\ddots \\ \cdots \end{array}& \begin{array}{cc}\ddots & 1\\ 1& 0\end{array}\end{array}\right]\) and \(Y\) is the same adjacent matrix as defined. The knowledge priority is summarized in the following matrices:

$$ M_{ij} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {if\;{ }x_{i} ,\;x_{j} \;have\;same\;labels} \hfill \\ {0,} \hfill & {other} \hfill \\ \end{array} } \right. $$
(14)

and

$$ C_{ij} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {if\;x_{i} ,\;x_{j} \;have\;diffrent\;labels{ }} \hfill \\ {0,} \hfill & {other} \hfill \\ \end{array} } \right. $$
(15)

Moreover, in [16], due to human errors in the initial selection of node clustering, the Robust Semi-Supervised Nonnegative Matrix Factorization (RSSNMF) method is proposed as a robust model to improve efficiency in human errors, which could improve the PCSNMF method. Its optimization cost function is rewritten as follows:

$${J}_{RSSNMF}={\left\Vert Y-H{X}^{T}\right\Vert }_{2}+\alpha tr\left({X}^{T}MX{B}_{2}\right)+\beta tr\left({X}^{T}CX\right)$$
(16)

where \({\Vert .\Vert }_{2}\) is the second-order norm and \(Y\) is the same as the adjacent matrix. Due to the inherent capability of second-order norms to eliminate noise and uncertainties, the optimization function according to Eq. (16) has become robust to errors in the knowledge priority of the cluster members and enables better community detection considering the presence of these uncertainties. As a result of the inherent ability of second-order norms to reduce noise and uncertainties, the optimization function of Eq. (16) is not affected by errors in the initial knowledge of the members of each cluster and produces better detection of communities than when there are uncertainties present.

3 The Proposed Community Detection Algorithm

Based on a review of various models and requirements, including customization of the NMF-based community detection method for complex graphs and the enhancement of the model’s robustness to human errors, this paper proposes an approach to improving community detection based on NMF by using a multi-view community detection method based on modularity criterion. In order to combine the robust semi-supervised method based on NMF with the modularity criterion, first, a similar structure is proposed corresponding to the two methods of community detection based on the modularity criterion and based on NMF. An iterative solution method is then presented in order to combine the robust semi-supervised method based on NMF with the modularity criterion. Finally, the convergence of the iterative solution method is demonstrated.

3.1 Relationship Between Modularity-Based Community Detection and NMF

As presented in Sects. 2.2 and 2.1, Theorem 1 shows the equivalence between two community detection methods based on the modularity criterion and NMF.

Theorem 1

modularity-based community detection in complex networks has a structure similar to community detection based on NMF, i.e.:

$$\underset{X}{\mathit{max}}Q=\underset{X}{\mathit{min}}{\left\Vert B-X{X}^{T}\right\Vert }_{F}^{2}$$
(17)

Proof of Theorem 1

The modularity criterion optimization function concerning Eq. (1) is rewritten as follows:

$$\underset{X}{{\text{max}}}\frac{1}{2m}tr\left({X}^{T}BX\right)\propto -\frac{1}{2m}\underset{X}{{\text{min}}}tr\left({X}^{T}BX\right)\propto -\underset{X}{{\text{min}}}tr\left({X}^{T}BX\right)$$
(18)

If \(B\) is a constant matrix and \({X}^{T}X=I\), we have,

$$\underset{X}{{\text{max}}}\frac{1}{2m}tr\left({X}^{T}BX\right)\propto \underset{X}{{\text{min}}}\left(tr\left({X}^{T}X{X}^{T}X)-2tr\left({X}^{T}BX\right)+tr(X{X}^{T}\right)\right)$$
(19)

Equation (19) is rewritten as follows given to the characteristics of matrix transmissions such as \({tr(X}^{T}BX) =tr\left(BX{X}^{T}\right)\) and \(tr\left({X}^{T}X{X}^{T}X\right)=tr\left(X{X}^{T}X{X}^{T}\right)\):

$$\underset{X}{{\text{max}}}\frac{1}{2m}tr\left({X}^{T}BX\right)\propto \underset{X}{{\text{min}}}tr\left(X{X}^{T}X{X}^{T}-2BX{X}^{T}+B{B}^{T}\right)\propto \underset{X}{{\text{min}}}{\left\Vert B-X{X}^{T}\right\Vert }_{F}^{2}$$
(20)

Along with Eqs. (1) and (3), and via considering Eq. (20), the optimization cost function of the modularity matrix holds a structure similar to the optimization cost function of NMF. In other words:

$$\underset{X}{{\text{max}}}Q=\underset{X}{{\text{max}}}\frac{1}{2m}\mathit{tr}\left({X}^{T}\mathit{BX}\right)\propto \underset{X}{{\text{min}}}{\left\Vert B-X{X}^{T}\right\Vert }_{F}^{2}$$
(21)

Therefore, it can be proved that community detection based on the modularity criterion has a structure similar to community detection based on NMF.

3.2 MRASNMF Algorithm

As stated before, due to its widespread application in the analysis of all types of data, the NMF method requires specialization for complex graphs [14, 18, 21]. Considering the high efficiency of the modularity criterion in this context, it makes sense to use it for custom-designed community detection of complex networks based on the NMF. Theorem 1 states that community detection based on the modularity criterion is similar to community detection based on NMF with the assumption that the matrix is similar to modularity. A unique feature of this simulation is the use of the modularity criterion to detect communities using NMF. However, knowledge priority uncertainties will cause errors in semi-supervised algorithms based on NMF. For this reason, one method of reducing the error of this uncertainty is to use the robust model of the NMF-based community detection method in linear combination with the knowledge priority part of Eq. (15). Due to this, Theorem 1 utilizes the robust modularity method based on the criterion to reduce uncertainty errors. Therefore, according to the previous statements, in order to present an adaptive robust method of efficient semi-supervised community detection, several components, including NMF-based robust community detection, modularity-based robust community detection, and knowledge priority, must be incorporated in the optimization cost function, simultaneously. Utilizing and combining all of these components represent one of the primary challenges, in this area. In order to resolve the difficulty, a multi-view robust clustering approach is proposed in accordance with the optimization structure of the robust community detection model based on NMF and modular robust clustering. To this end, the final model of adaptive robust community detection, which incorporates all sections pertaining to community detection based on the modularity criterion, NMF, and knowledge priority [47], is presented as follows:

$${\underset{X,W,\gamma }{{\text{min}}} J}_{{\text{MRASNMF}}}=\gamma {\left\Vert A-W{X}^{T}\right\Vert }_{2}+\left(1-\gamma \right){\left\Vert B-X{X}^{T}\right\Vert }_{2}+\alpha \left(tr\left({X}^{T}MX{B}_{2}\right)+tr\left({X}^{T}CX\right)\right), S.t. W, X>0 , \sum_{r=1}^{k}{X}_{ir}=1$$
(22)

where \({B}_{2}=\left[\begin{array}{ccc}0& 1& \begin{array}{cc}\dots & 1\end{array}\\ 1& 0& \begin{array}{cc}\ddots & \vdots \end{array}\\ \begin{array}{c}\vdots \\ 1\end{array}& \begin{array}{c}\ddots \\ \cdots \end{array}& \begin{array}{cc}\ddots & 1\\ 1& 0\end{array}\end{array}\right]\) and \(W\) is the community relationship matrix. Besides, \(M\) and \(C\) are selected based on [16] as follows:

$$ M_{ij} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {if\;{ }x_{i} ,{ }x_{j} \;have\;same\;labels\;or\;i = j{ }} \hfill \\ {0,} \hfill & {otherwise} \hfill \\ \end{array} } \right. $$
(23)

and,

$$ C_{ij} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {if\;x_{i} ,{ }x_{j} \;have\;diffrent\;labels{ }} \hfill \\ {0,} \hfill & {other{\text{wise}}} \hfill \\ \end{array} } \right. $$
(24)

As a result, considering the variable parameters, the linear combination of two robust models of NMF with adjacent and modularity similarity matrices is presented for the semi-supervised community detection model of NMF where \(\gamma \) is an adaptive parameter that balances two community detection methods based on modularity and NMF.

3.3 The First Iterative Function of the MRASNMF Optimization Function

Since matrix \(X\) is a negative matrix in (22), therefore, the final community detection is extracted through the Lagrangian-based solution method. Accordingly, Eq. (22) is rewritten as its transcript form, namely as in Eq. (25). Assuming the matrix \(\varphi \) and \(\psi \) as the Lagrangian matrices, the Lagrangian function is defined as follows:

$$ \begin{aligned} \mathop {\min }\limits_{X,W,\gamma } J_{{{\text{MRASNMF}}}} = & \gamma \left( {tr\left( {AD_{1} A^{T} } \right) - 2tr\left( {AD_{1} WX^{T} } \right) + tr\left( {WX^{T} D_{1} XW^{T} } \right)} \right) \\ & \; + \left( {1 - \gamma } \right) \times \left( {tr\left( {BD_{2} B^{T} } \right){ } - 2tr\left( {BD_{2} XX^{T} } \right) + tr\left( {XX^{T} D_{2} XX^{T} } \right)} \right) \\ & \;\; + \alpha tr\left( {X^{T} MXB_{2} } \right) + \alpha tr\left( {X^{T} CX} \right),\;S.t.{ }X > 0{ },{ }\mathop \sum \limits_{r = 1}^{k} X_{ir} = 1 \\ \end{aligned} $$
(25)

Then, the corresponding Lagrangian function would be:

$$L\left(X,W\right)={ J}_{\mathrm{MRASNMF }}+tr\left(\varphi X\right)+tr\left(\psi W\right)+tr\left(\phi \gamma \right)$$
(26)

Therefore, the derivatives of the Lagrangian function concerning the variables are as follows:

$$\frac{\partial {L}_{{\text{MRASNMF}}}}{\partial X}=\varphi -A{D}_{1}W+{D}_{1}X{W}^{T}W-B{D}_{2}X+2{D}_{2}X{X}^{T}X+2\alpha MX{B}_{2}+2\alpha CX$$
(27)
$$\frac{\partial {L}_{{\text{MRASNMF}}}}{\partial W}=\psi -{A}^{T}{D}_{1}X+W{X}^{T}{D}_{1}X$$
(28)
$$ \begin{aligned} \frac{{\partial L_{{{\text{MRASNMF}}}} }}{\partial \gamma } = & \;\phi + tr\left( {AD_{1} A^{T} } \right) - 2tr\left( {AD_{1} WX^{T} } \right) + tr\left( {WX^{T} D_{1} XW^{T} } \right) - tr\left( {AD_{2} A^{T} } \right) \\ & \; + 2tr\left( {AD_{2} XX^{T} } \right) - tr\left( {XX^{T} D_{2} XX^{T} } \right) + tr\left( {B_{1} D_{2} B_{1}^{T} } \right) - 2tr\left( {B_{1} D_{2} XX^{T} } \right) \\ \end{aligned} $$
(29)

From Eqs. (27) to (29), as \(B=A-{B}_{1}\) and considering the Karush–Kuhn–Tucker (KKT) conditions\({\mathrm{\varphi }}_{ij}{X}_{ij}=0, {\psi }_{ij}{X}_{ij}=0 , \phi \gamma =0\), the MRASNMF iterative rule is generated as in Eqs. (30) to (32), as:

$$ X_{ij} : = X_{ij} \frac{{\frac{1}{2}\left( {A^{T} D_{1} W} \right)_{ij} + \frac{1}{2}\left( {AD_{2} X} \right)_{ij} }}{{\frac{1}{2}\left( {D_{1} XW^{T} W} \right)_{ij} + \frac{1}{2}\left( {B_{1} D_{2} X} \right)_{ij} + \left( {D_{2} XX^{T} X} \right)_{ij} + \alpha \left( {MXB_{2} } \right)_{ij} + \alpha \left( {CX} \right)_{ij} }} $$
(30)
$$ W_{ij} : = W_{ij} \frac{{\left( {A^{T} D_{1} X} \right)_{ij} }}{{\left( {WX^{T} D_{1} X} \right)_{ij} }} $$
(31)
$$ \gamma : = \gamma \frac{{2tr\left( {AD_{1} WX^{T} } \right) + tr\left( {AD_{2} A^{T} } \right) + tr\left( {XX^{T} D_{2} XX^{T} } \right) + 2tr\left( {B_{1} D_{2} XX^{T} } \right)}}{{tr\left( {AD_{1} A^{T} } \right) + tr\left( {WX^{T} D_{1} XW^{T} } \right) + 2tr\left( {AD_{2} XX^{T} } \right) + tr\left( {B_{1} D_{2} B_{1}^{T} } \right)}} $$
(32)

From [19], the condition \(\sum_{r=1}^{k}{X}_{ir}=1\) is applied as follows:

$$ X_{ij} : = \frac{{X_{ij} }}{{\mathop \sum \nolimits_{r = 1}^{k} X_{ir} }} $$
(33)

Eventually, according to Eqs. (30), (31), (32), and (33), the MRASNMF model is proposed as in Algorithm 1.

Algorithm 1
figure a

MRASNMF model

3.4 Proof of Converging

As the proposed model MRASNMF, the convergence of the two parameters \(X\) and \(W\) for the recursive Eq. (31) and Eq. (32) are explained as the following theorems.

Theorem 2

Updating parameter \(X\), assuming that the other parameters in Eq. (30) are constant, monotonically reduces the cost function of Eq. (22).

Theorem 3

Updating the \(W\) parameter, assuming that the other parameters in Eq. (31) are constant, monotonically reduces the cost function of Eq. (22).

Theorem 4

Updating the \(\gamma \) parameter, assuming that the other parameters in Eq. (32) are constant, monotonically reduces the cost function of Eq. (22).

However, Eq. (30) and Eq. (31) reduce the cost function of the MRASNMF in Eq. (22) model to a minimum. To illustrate, the auxiliary functions in [48] are proposed. If \(F\left(X\right)\) satisfies \(F\left(X\right)\le G\left(X,{X}^{t}\right)\) F and \(F\left(X\right)=G\left(X,X\right)\), then \(G\left(X,{X}^{t}\right)\) is an auxiliary function for \(F\left(X\right)\), hence:

$$F\left({X}^{t+1}\right)\le G\left({X}^{t+1},{X}^{t}\right)\le G\left({X}^{t},{X}^{t}\right)=F\left({X}^{t}\right)$$
(34)

Or according to [47], Eq. (34) can be rewriting as follows:

$${X}^{t+1}={\text{arg}}\underset{X}{{\text{min}}}G\left(X,{X}^{t}\right)$$
(35)

Therefore, as the definition of auxiliary functions implies, if an optimal auxiliary function with high conditions is found for the MRASNMF model, then the model is reduced to its minimum value and converged to a constant value. Therefore, an auxiliary function should be set for the MRASNMF model. Upon proving Theorem 2 due to the complexity, Theorems 3 and 4 are proved accordingly.

Proof of Theorem 2

According to the community detection model in Eq. (25) with the assumption of updating the parameter \(X\) and the concepts of the auxiliary function mentioned in [48], the model function required for the auxiliary functions and its first and second derivatives are defined as follows.

$$ \check{F} \left( X \right) = J_{{{\text{MRASNMF}}}} $$
(36)
$$ \check{F}_{{X_{ij} }}^{\prime } = \frac{{\partial J_{{\text{MRASNMF }}} }}{{\partial X_{ij} }} = 2\left( { - \frac{1}{2}AD_{1} W + \frac{1}{2}D_{1} XW^{T} W - \frac{1}{2}BD_{2} X + D_{2} XX^{T} X + \alpha MXB_{2} + \alpha CX} \right)_{ij} $$
(37)
$$ \check{F}_{{X_{ij} }}^{\prime \prime } = \frac{{\partial^{2} J_{{\text{MRASNMF }}} }}{{\partial \left( {X_{ij} } \right)^{2} }} = 2\left( {\frac{1}{2}D_{1} WW^{T} - \frac{1}{2}BD_{2} + 3D_{2} XX^{T} + \alpha B_{2} \left( M \right) + \alpha C} \right)_{ij} = 2G_{{2_{ij} }} $$
(38)

where \({G}_{2}\) is:

$${G}_{2}=\frac{1}{2}{D}_{1}{WW}^{T}-\frac{1}{2}B{D}_{2}+3{D}_{2}X{X}^{T}+\alpha {B}_{2}\left(M\right)+\alpha C$$
(39)

Since the parameter \(\alpha \) is adjustable, Eq. (38) could be positive considering the appropriate selection of \(\alpha \) [49]. Therefore, the function \(\check{F} \left( X \right)\) will have a local minimum. To prove that the Eq. (25) is monotonically decreasing, it is enough to find an auxiliary function for \(\check{F} \left( X \right)\) and show that the function is decreasing according to Eqs. (35) and (36). Therefore, Lemma 1 is required to find the auxiliary function and prove Theorem 2. Lemma 1 proposes an auxiliary function for \(\check{F} \left( X \right)\).

Lemma 1

The auxiliary function \({G}_{RSSNMF-Q}\)is an auxiliary function for \(\check{F} \left(X \right)\). The auxiliary function is defined as follows:

$$ \begin{aligned} G_{{{\text{MRASNMF }} - Q}}\left( {X_{ij} ,X_{ij}^{t} } \right) = & \check{F}_{{X_{ij} }}\left( {X_{ij}^{t} } \right) + \check{F}_{{X_{ij} }}^{\prime }\left( {X_{ij}^{t} } \right)\left( {X_{ij} - X_{ij}^{t} } \right) \\& \; + 2\frac{{\left( {\frac{1}{2}\left( {A^{T} D_{1} W}\right)_{ij} + \frac{1}{2}\left( {AD_{2} X} \right)_{ij} } \right) -\left( {\left( {G_{1} } \right)_{{{\text{ij}}}} } \right)}}{{\left({\frac{1}{2}\left( {A^{T} D_{1} W} \right)_{ij} + \frac{1}{2}\left({AD_{2} X} \right)_{ij} } \right)^{\beta } - \left( {\left( {G_{1} }\right)_{{{\text{ij}}}} } \right)^{{\beta { }}} }}\\& \times \frac{{\left( {\left( {G_{1} } \right)_{{{\text{ij}}}} }\right)^{{\beta { }}} }}{{X_{ij}^{t} }}\left( {X_{ij} - X_{ij}^{t} }\right)^{2} \\ \end{aligned} $$
(40)

where \({G}_{1}\)is defined as:

$$ \left( {G_{1} } \right)_{{{\text{ij}}}} =\frac{1}{2}\left( {D_{1} XW^{T} W} \right)_{ij} + \frac{1}{2}\left({B_{1} D_{2} X} \right)_{ij} + \left( {D_{2} XX^{T} X} \right)_{ij}+ \alpha \left( {MXB_{2} } \right)_{ij} + \alpha \left( {CX}\right)_{ij} $$
(41)

Proof of Lemma 1

First, one should note that assuming \({X}_{ij}\), \(\check{F}_{{X_{ij} }} \left( {X_{ij} } \right) = G_{MRSSNMF - Q} \left( {X_{ij} ,X_{ij} } \right)\), then we apply the Taylor series as Eq. (42) to test the validity of \(\check{F}_{{X_{ij} }} \left( {X_{ij} } \right) \le G_{RSSNMF - Q} \left( {X_{ij} ,X_{ij}^{t} } \right)\) [45], the nonlinear function \(\check{F}_{{X_{ij} }} \left( {X_{ij} } \right)\) is linearized around the point \(X_{ij}^{t}\) by the Taylor series; hence:

$$ \check{F}_{{X_{ij} }} \left( {X_{ij} } \right) \approx \check{F}_{{X_{ij} }} \left( {X_{ij}^{t} } \right) + \check{F}_{{X_{ij} }}^{\prime } \left( {X_{ij}^{t} } \right)\left( {X_{ij} - X_{ij}^{t} } \right) + \check{F}_{{X_{ii} }}^{\prime \prime } \left( {X_{ij} - X_{ij}^{t} } \right)^{2} $$
(42)

From the linear Eqs. (42) and (40) and to test \(\check{F}_{{X_{ij} }} \left( {X_{ij} } \right) \le G_{RSSNMF - Q} \left( {X_{ij} ,X_{ij}^{t} } \right)\), the following relation is established:

$$ \check{F}_{{X_{ij} }} \left( {X_{ij} } \right) \le G_{{\text{MRASNMF }}} \left( {X_{ij} ,X_{ij}^{t} } \right) \Leftrightarrow \frac{{\left( {\frac{1}{2}\left( {A^{T} D_{1} W} \right)_{ij} + \frac{1}{2}\left( {AD_{2} X} \right)_{ij} } \right) - \left( {\left( {G_{1} } \right)_{{{\text{ij}}}} } \right)}}{{\left( {\frac{1}{2}\left( {A^{T} D_{1} W} \right)_{ij} + \frac{1}{2}\left( {AD_{2} X} \right)_{ij} } \right)^{\beta } - \left( {\left( {G_{1} } \right)_{{{\text{ij}}}} } \right)^{{\beta { }}} }} \times \frac{{\left( {\left( {G_{1} } \right)_{{{\text{ij}}}} } \right)^{{\beta { }}} }}{{X_{ij}^{t} }} \ge \left( {G_{2} } \right)_{ii} $$
(43)

Since \({X}_{ij}^{t}>0\), then:

$$\frac{\left(\frac{1}{2}{\left({A}^{T}{D}_{1}X\right)}_{ij}+\frac{1}{2}{\left(A{D}_{2}X\right)}_{ij}\right)-\left({\left({G}_{1}\right)}_{{\text{ij}}}\right)}{{\left(\frac{1}{2}{\left({A}^{T}{D}_{1}W\right)}_{ij}+\frac{1}{2}{\left(A{D}_{2}X\right)}_{ij}\right)}^{\beta }-{\left({\left({G}_{1}\right)}_{{\text{ij}}}\right)}^{\beta }}\times {\left({\left({G}_{1}\right)}_{{\text{ij}}}\right)}^{\beta }\ge {\left({G}_{2}\right)}_{ii}{X}_{ij}^{t}$$
(44)

Next, we rewrite the left side of Eq. (44) as:

$$\begin{aligned} &\frac{{\left({\frac{1}{2}\left( {A^{T} D_{1} W} \right)_{ij} + \frac{1}{2}\left({AD_{2} X} \right)_{ij} } \right) - \left( {\left( {G_{1} }\right)_{{{\text{ij}}}} } \right)}}{{\left( {\frac{1}{2}\left({A^{T} D_{1} W} \right)_{ij} + \frac{1}{2}\left( {AD_{2} X}\right)_{ij} } \right)^{\beta } - \left( {\left( {G_{1} }\right)_{{{\text{ij}}}} } \right)^{{\beta { }}} }} \times \left({\left( {G_{1} } \right)_{{{\text{ij}}}} } \right)^{{\beta { }}}\\&\quad =\frac{{\left( {\left( {\frac{1}{2}\left( {A^{T} D_{1} W}\right)_{ij} + \frac{1}{2}\left( {AD_{2} X} \right)_{ij} } \right) -\left( {\left( {G_{1} } \right)_{{{\text{ij}}}} } \right)}\right)}}{{\left( {\left( {\frac{1}{2}\left( {A^{T} D_{1} W}\right)_{ij} + \frac{1}{2}\left( {AD_{2} X} \right)_{ij} }\right)^{\beta } - \left( {\left( {G_{1} } \right)_{{{\text{ij}}}} }\right)^{{\beta { }}} } \right)\left( {\left( {G_{1} }\right)_{{{\text{ij}}}} } \right)^{{ - \beta { }}} }}\; \\ &\quad=\frac{{\left( {\left( {\frac{{G_{3} }}{{G_{1} }}}\right)_{{{\text{ij}}}} - 1} \right)\left( {G_{1} }\right)_{{{\text{ij}}}} }}{{\left( {\left( {\frac{{G_{3} }}{{G_{1}}}} \right)_{ij} } \right)^{\beta } - 1}} \\ \end{aligned}$$
(45)

when \({G}_{3}\)is defined as follows.

$$ G_{3} = \left( {\frac{1}{2}\left( {A^{T}D_{1} W} \right)_{ij} + \frac{1}{2}\left( {AD_{2} X} \right)_{ij} }\right) $$
(46)

Considering Eqs. (44) and (45), then, we have:

$$ \frac{{\left( {\frac{{G_{3} }}{{G_{1} }}} \right)_{{{\text{ij}}}} - 1}}{{\left( {\left( {\frac{{G_{3} }}{{G_{1} }}} \right)_{ij} } \right)^{\beta } }} - 1\left( {G_{1} } \right)_{{{\text{ij}}}} > \left( {G_{2} } \right)_{ii} X_{ij}^{t} $$
(47)

Moreover, considering that all parameters are positive and \(0<\beta \le 1\), then Eq. (48) may be rewritten as:

$$ \frac{{\left( {\frac{{G_{3} }}{{G_{1} }}} \right)_{{{\text{ij}}}} - 1}}{{\left( {\left( {\frac{{G_{3} }}{{G_{1} }}} \right)_{ij} } \right)^{\beta } - 1}} \ge 1 \Rightarrow \frac{{\left( {\frac{{G_{3} }}{{G_{1} }}} \right)_{{{\text{ij}}}} - 1}}{{\left( {\left( {\frac{{G_{3} }}{{G_{1} }}} \right)_{ij} } \right)^{\beta } - 1}}\left( {G_{1} } \right)_{{{\text{ij}}}} > \left( {G_{1} } \right)_{{{\text{ij}}}} $$
(48)

Given Eqs. (44), (47), and (48), it is sufficient that \({\left({G}_{1}\right)}_{{\text{ij}}}>{\left({G}_{2}\right)}_{ii}{X}_{ij}^{t}\). Considering the definitions of \({G}_{1}\) and \({G}_{2}\) as in Eqs. (40), (38) and \({B}_{1}=A-B\), we have:

$$ \begin{aligned} G_{1} = & \frac{1}{2}\left( {D_{1} XW^{T} W} \right)_{ij} + \frac{1}{2}\left( {B_{1} D_{2} X} \right)_{ij} + \left( {D_{2} XX^{T} X} \right)_{ij} + \alpha \left( {MXB_{2} } \right)_{ij} + \alpha \left( {CX} \right)_{ij} \\ & \; = \mathop \sum \limits_{f = 1,f \ne i}^{n} G_{{2_{ii} }} X_{ij}^{t} + + \frac{1}{2}\left( {AD_{2} X} \right)_{ij} \ge \left( {G_{2} } \right)_{ii} X_{ij}^{t} \\ \end{aligned} $$
(49)

As the result, \({\left({G}_{1}\right)}_{{\text{ij}}}>{\left({G}_{2}\right)}_{ii}{X}_{ij}^{t}\).

Therefore, Lemma 1 is provable reversely from Eq. (49) to Eq. (43).

From Lemma 1, \({G}_{RSSNMF-Q}\) is an auxiliary function for \(\check{F}_{{X_{ij} }} \left( {X_{ij} } \right).\) Accordingly, given Eq. (35), the closed-loop form of solution is as follows:

$$ X^{t + 1} = \arg \mathop {\min }\limits_{X} G\left( {X,X^{t} } \right) \to \check{F}_{{X_{ij} }}^{\prime } \left( {X_{ij}^{t} } \right) + 2\frac{{\left( {G_{3} } \right) - \left( {G_{1} } \right)}}{{\left( {G_{3} } \right)^{\beta } - \left( {G_{1} } \right)^{{\beta { }}} }} \times \frac{{\left( {G_{1} } \right)^{{\beta { }}} }}{{X_{ij}^{t} }}\left( {X_{ij} - X_{ij}^{t} } \right) = 0 $$
(50)

Rewriting \(\check{F}_{{X_{ij} }}^{\prime }\) respecting \(G_{1}\) and \(G_{3}\) as \(\check{F}_{{X_{ij} }}^{\prime } \left( {X_{ij}^{t} } \right) = 2\left( { - \left( {G_{3} } \right) + \left( {G_{1} } \right)} \right)\), Eq. (50) is developed as:

$$\frac{2\left(\left({G}_{3}\right)-\left({G}_{1}\right)\right)}{{(\left({G}_{3}\right)}^{\beta }-{\left({G}_{1}\right)}^{\beta }){X}_{ij}^{t}}\times \left(-{(\left({G}_{3}\right)}^{\beta }-{\left({G}_{1}\right)}^{\beta }){X}_{ij}^{t}+{\left({G}_{1}\right)}^{\beta }\left({X}_{ij}-{X}_{ij}^{t}\right)\right)=0\to {X}_{ij}=\frac{{\left({{\text{G}}}_{3}\right)}^{\beta }}{{\left({G}_{1}\right)}^{\beta }}{X}_{ij}^{t}$$
(51)

Therefore, assuming Eq. (35), the aforementioned closed-loop equation is:

$${X}_{ij}^{t+1}=\frac{{\left({{\text{G}}}_{3}\right)}^{\beta }}{{\left({G}_{1}\right)}^{\beta }}{X}_{ij}^{t}={\left(\frac{\frac{1}{2}{\left({A}^{T}{D}_{1}W\right)}_{ij}+\frac{1}{2}{\left(A{D}_{2}X\right)}_{ij}}{\frac{1}{2}{\left({D}_{1}X{W}^{T}W\right)}_{ij}+\frac{1}{2}{\left({B}_{1}{D}_{2}X\right)}_{ij}+{\left({D}_{2}X{X}^{T}X\right)}_{ij}+\alpha {\left(MX{B}_{2}\right)}_{ij}+\alpha {\left(CX\right)}_{ij}}\right)}^{\beta }{X}_{ij}^{t}$$
(52)

Ultimately, the iterative equation in Theorem 2 is proved by \(\beta =1\).

3.5 Complexity Analysis

In the MRASNMF model, the computational complexity of updating rules is of \({\rm O}\left(k{n}^{2}\right)+O\left({n}^{2}\right)+{\rm O}\left({k}^{2}n\right)+{\rm O}\left({k}^{2}\right)+O\left(k\right)+{\rm O}\left(kn\right)\), \({\rm O}\left(k{n}^{2}\right)+O\left({n}^{2}\right)+{\rm O}\left({k}^{2}n\right)\) and \({\rm O}\left(nk\right)+{\rm O}\left({n}^{2}\right)+O\left(n\right)+O\left(k\right)\) for \(W\),\(X\) and \(\gamma \), respectively. Further, the computational complexity of MMNMF models is \(O\left({n}^{2}k\right)+O\left({k}^{2}n\right)+O\left(n\right)\). Since \(k\ll n\), the total computational complexity of this part for \({I}_{t}\) iteration is \({\rm O}\left({I}_{t}k{n}^{2}\right)+{\rm O}\left({I}_{t}{k}^{2}n\right)+{\rm O}\left({{I}_{t}k}^{2}\right)+{\rm O}\left({I}_{t}kn\right)+O\left({I}_{t}k\right)+O\left({I}_{t}n\right)+{\rm O}\left({I}_{t}{n}^{2}\right)\approx {\rm O}\left({I}_{t}k{n}^{2}\right)\).

4 Results and Comparative Analysis

The knowledge priority errors, the use of the NMF methods for all data types and the lack of knowledge about modularity clustering priority prompted us to present the MRASNMF method. Therefore, in this section, we will first introduce the assessment standards, then we will explain the consideration of the basic knowledge priority and knowledge priority errors of the communities of some nodes, and the set of networks will be introduced to evaluate the methods. Then, the tuning procedure of the \(\alpha \) parameter is explained. Finally, we will compare the performance of MRASNMF method with a variety of semi-supervised and unsupervised methods.

4.1 Assessment Standards

The modularity criterion (\(Q\)) and normalized mutual information (\(NMI\)) are applied to test the community detection models. Mutual information is a comprehensive measure for evaluating community detection models. The algorithm takes advantage of the ground truth partition labels of each node and a limited number of clusters. As a result, the criterion of mutual information may be rewritten as follows:

$$NMI\left(C,{C}^{\mathrm{^{\prime}}}\right)=\frac{-2{\sum }_{i=1}^{\left|C\right|}{\sum }_{j=1}^{\left|{C}^{\mathrm{^{\prime}}}\right|}{n}_{{C}_{i}\cap {{C}^{\mathrm{^{\prime}}}}_{j}}{\text{log}}\left(\frac{{n}_{{C}_{i}\cap {{C}^{\mathrm{^{\prime}}}}_{j}}n}{{n}_{{C}_{i}}{n}_{{{C}^{\mathrm{^{\prime}}}}_{j}}}\right)}{{\sum }_{i=1}^{\left|C\right|}{n}_{{C}_{i}}{\text{log}}\left(\frac{{n}_{{C}_{i}}}{n}\right)+{\sum }_{j=1}^{\left|{C}^{\mathrm{^{\prime}}}\right|}{n}_{{{C}^{\mathrm{^{\prime}}}}_{j}}{\text{log}}\left(\frac{{n}_{{{C}^{\mathrm{^{\prime}}}}_{j}}}{n}\right)}$$
(53)

where \({n}_{{C}_{i}}\) is the number of members in \({C}_{i}\) and \(\left|C\right|\) is the total number of clusters. The criterion of mutual information is equal to one if cluster \(C\) is completely similar to the cluster of \({C}^{\mathrm{^{\prime}}}\); otherwise, the criterion of mutual information equals zero if \(C\) and \({C}^{\mathrm{^{\prime}}}\) are entirely comparable to each other.

4.2 Knowledge Priority and Knowledge Priority Errors

For further clarification, two important features of this paper for evaluating the methods are using the basic knowledge of the communities of some nodes and introducing the basic knowledge errors in each algorithm. When using the initial knowledge section, it is first necessary to categorize \(\frac{n\left(n-1\right)}{2}\) pairs of nodes in order to implement and use the initial knowledge for each indirect graph. Random nodes are chosen based on percentages determined for the initial knowledge (1%, 5%, 10%, 20%, and 30%). Based on the knowledge of labels of real communities, the selected pairs of nodes are classified according to whether they belong to the same cluster or are present in separate clusters, and each algorithm generates matrices based on the initial understanding of the graphs. Additionally, to introduce basic knowledge errors, two nodes that are not default cluster members are considered members of one cluster, and nodes that are members of other clusters as a group. In this way, we enter different initial error percentages (2%, 4%, 6%, 8%, and 10%) into the initial knowledge error.

4.3 Network Sets

In this article, five real data sets are considered in such a way that both simple (karate network) and complex (other networks) data types are considered so that the efficiency of the proposed new algorithm can be checked for both types of data. Additional information on these five datasets is shown and explained in Table 2. In this table, the parameters \(m\) and \(n\), respectively, are the number of edges and nodes in each network, which are assumed to be known. As can be seen, complex networks have more nodes and edges. Therefore, by choosing these networks, it is possible to check how the proposed algorithm works for networks with different dimensions.

Table 2 Real-world networks information

4.4 Internal Parameter Adjustment

In the proposed MRSNMF model, \(\alpha \) is the weighting parameter to control and improve the effectiveness of the initial knowledge of whether the nodes are in the same group or not in the same group. Also, these parameters can have a significant effect on improving the optimization function and reaching the best answer. According to this issue, if \(\alpha \) values are very large, the community identification part based on a non-negative decomposition of the matrix will be ineffective. Therefore, the best clustering will not be achieved. On the other hand, if \(\alpha \) values are very small, close to zero, the effect of initial knowledge from the cluster of some nodes will not enter the community identification algorithm. As a result, checking and evaluating the best value for \(\alpha \) can lead to the best cluster of complex network nodes.

According to Fig. 1, an example of different values of the measurement criteria for the dataset of political books is considered with the assumption of 10% basic knowledge. As seen in Fig. 1, different values of the \(\alpha \) parameter will affect the quality of the response. Therefore, the best values of \(\alpha \) parameters for the data set in Table 2 with different initiall knowledge percentages for use in conducting final tests are recorded in Table 3. For further explanation, to adjust the parameters of the proposed MRSNMF model, we employed the method proposed in [16] for all the data sets in Table 2. That is, at first, several tests were performed with respect to different values of \(\alpha \) in the range of 0.5–10, assuming that the step is increased to 0.5. In each step, the values of the mutual information criterion and contract criterion have been calculated and measured. Then, by choosing the mutual information criterion as the main criterion for choosing the best value of α parameters, these parameters have been obtained according to Table 3.

Fig. 1
figure 1

A representation of the \(\alpha \) according to the modularity criterion (left) and the mutual information criterion (right) in political books with 10% knowledge priority

Table 3 Selection of parameter α for different percentages of initial knowledge

4.5 Comparative Performance Analysis of MRASNMF Method

In this section, several algorithms for unsupervised and semi-supervised community detection will be evaluated, in comparison with the proposed MRASNMF method. Semi-supervised community detection methods such as PCNMF [33], PCSNMF [34], RSSNMF [16], SVDCSNMF [35], and MRASNMF, SNMF-SS [31], and NMF_LSE [32] use linear combinations of knowledge priority and knowledge combination in the similarity matrix. Recent unsupervised community detection algorithms that have been considered for comparison are DPNMF [54], NMFGAAE [55], and NCNMF [56]. To determine the number of cores, the DPNMF method improves density peak clustering. To reduce the approximation, this method also employs non-negative double singular value decomposition initialization. The NMFGAAE method is composed of two major modules: NMF and Graph Attention Auto-Encoder. The NCNMF method makes use of node centrality as well as a new similarity measure that takes into account the proximity of higher-order neighbors.

Table 4 presents the experimental results of semi supervised and unsupervised algorithms on five real-world networks. As a consequence of the lack of knowledge priority, unsupervised clustering methods such as DPNMF and NCNMF will have better clustering than semi-supervised algorithms. Therefore, among the semi-supervised algorithm, MRASNMF algorithm has been able to get a better community than other algorithms. In addition, it was attempted in the following to analyze and compare the performance of semi-supervised methods with different knowledge priority ratios.

Table 4 Comparison of modularity information (\(Q\)) for different methods and sets

Tables 5, 6, 7, 8, 9, and 10 present the experimental results of semi-supervised algorithms in comparison to MRASNMF for various ratio of knowledge priority. To adjust the parameters of the MRASNMF algorithm according to the data, the required parameters are set as shown in Table 3. For other algorithms, the required configurable parameters, have been adjusted according to their underlying references. The following results are derived by comparing the results in Table 5, 6, 7, 8, and 9:

  • According to Table 5, the algorithm presented in this article for a simple data set like karate has been able to provide the best performance without using any initial knowledge like other methods.

  • Based on Table 5, 6, 7, 8, and 9, in all the algorithms by adding different percentages of basic knowledge, more appropriate clustering has been achieved in terms of real communities or the improvement of the mutual information criterion. But in some cases, such as Table 7 and 9, the detection of communities has decreased compared to the modularity criterion. The reason for this is that the detection of semi-supervised communities based on non-negative matrix decomposition will try to improve and reach real communities by introducing the initial knowledge of the cluster of each node. As a result, they do not consider the type and characteristics of the graph. Obviously, methods of detecting communities that aim to reach real communities improve the quality of reaching real communities or better mutual information but do not guarantee improved modularity.

  • From the comparison of the results, it can be seen that for complex data such as Table 5, 6, 7, 8, and 9, the algorithm proposed in this article can perform better than the other three algorithms. In other words, the MRSNMF algorithm is an algorithm that has been able to reach the maximum value of the mutual information criterion with less benefit than initial knowledge. In fact, it can be claimed that, on average, our algorithm has been able to provide better clustering than other up-to-date algorithms in the world with less initial knowledge.

  • According to the results presented in Table 5, 7, 8, and 9, the linear combination of basic knowledge algorithms, including SVDCSNMF, had a similar response to the basic combination of knowledge algorithms in the similarity matrix, including SNMF-SS. However, it should be noted that with the increase in the percentage of basic knowledge, a much better acceleration is observed in increasing the functional characteristics of the basic knowledge combination methods in the similarity matrix.

Table 5 The results of the methods on Karate networks
Table 6 The results of the methods on Football networks
Table 7 The results of the methods on Dolphins networks
Table 8 The results of the methods on Political books networks
Table 9 The results of the methods on Polblogs networks
Table 10 The run time of the community detection methods for real–world networks

4.6 Robustness of Algorithms

The robustness of the method refers to how well it maintains its clustering efficiency and quality even in the presence of error information in the initial knowledge and corrects the incorrect information. Therefore, to check and evaluate the algorithms, assuming the presence of 5% of the initial correct knowledge, the wrong information from the members of each cluster is randomly considered with different percentages (1–10%), and then the accuracy of each algorithm’s result based on the information provided by those members will be assessed. Then each cluster will be measured based on the mutual information criterion. Finally, the results are shown as a diagram in Fig. 2. Table 10 also shows the execution time of each algorithm for each network. The following results can be obtained from Fig. 2 and Table 10.

  • In all algorithms, by adding initial knowledge errors with different percentages, the quality and efficiency of community identification will decrease.

  • Robust algorithms such as MRSNMF and RSSNMF have a lower slope in reducing efficiency than other algorithms.

  • The algorithm presented in this article has a lower slope in efficiency reduction and the best performance compared to other methods. It is also more resistant to uncertainties with a larger percentage of errors.

  • The noteworthy point is that with the increase of the initial knowledge percentage, the linear combination algorithms based on the initial knowledge (such as SNMF-SS) will have a faster increase in error than the algorithms combining the initial knowledge of the similarity matrix (such as PCNMF and SVDCSNMF).

  • While the execution time of the proposed algorithm is comparable and close to other algorithms, it has perfromed a better grouping quality in removing uncertainty and initial knowledge of the grouping members.

Fig. 2
figure 2

Results of the robustness of the methods for different networks against various error percentages of knowledge priority (noise)

5 Conclusion

Complex networks are an approach for modeling complex systems. Various methods such as detecting communities and predicting hidden edges are used to analyze and evaluate these networks. Among the widely used tools for classification, analyzing the hidden parts of the network, and achieving predetermined goals are methods for community detection or clustering. The community detection method based on matrix non-negativity analysis is one of the common methods for detecting semi-supervised communities in graphs, which has challenges such as a lack of specialization for data and improving efficiency in the presence of errors in the initial knowledge of the members of each cluster. The modularity criterion is one of the common methods of detecting graph communities. Therefore, in order to deal with the mentioned challenges, in this paper we have proved that the method of detecting communities based on the modularity criterion has a similar structure to the method of detecting communities based on non-negative matrix analysis. Then, by using the new model of community detection based on the modularity criterion, the new model of semi-supervised robust community detection has been applied in such a way that it is possible to take advantage of basic knowledge, use the modularity criterion, and improve the matrix non-negativity analysis. Then the problem is solved by an iterative method, and the MRSNMF algorithm is developed. Also, the convergence of this algorithm has been checked. Finally, the output results of the algorithms were extracted from five different data sets. Based on its results, the efficiency and effectiveness of the proposed algorithm have been investigated. The effectiveness and robustness of the developed algorithm to initial knowledge errors have been demonstrated.