1 Introduction

Nowadays, the usage of social media has increased and resulted in the emergence of multiple social networking spots and applications. Social media enables people to express and share views on social platforms (Maulana and Atzmueller 2020). The data obtained from the social network is a rich source of information that helps social users browse the things of their interests quickly through a sea of information. It also reveals strong emotions like joy, compassion, anger, disgust, or hate in social users. It is useful to solve real-world problems in various fields like psychology, marketing, sociology, politics, etc. (Nawaz et al. 2023). One of the biggest advantages of using social media is that it connects anyone on this planet with technology that gives full freedom to everyone to create and share their ideas, opinions, career interests, photos, videos, etc. Using social media, people can communicate with each other via multiple social networking platforms like Twitter, Facebook, Instagram, LinkedIn, YouTube, and Google Plus (Neuschmied et al. 2021). The growing opportunities of social networks and their popularity have attracted many users to social networks. On the contrary, the emergence of spammers on the network has also increased (Wazarkar and Keshavamurthy 2019).

Any significant deviation between the observed behaviour and the model in the social network is regarded as an anomaly, which is renamed as an intrusion (Breuer et al. 2020). Even though not all anomalies are malicious, they can signify strange behaviour, such as credit card fraud, campaign donation irregularities, electronic auction fraud, email spam phishing, and many others. Hence, it is extremely important to detect these irregular behaviors. Detecting malicious activity in social networks refers to the process of identifying and analyzing behavior or content that violates platform policies, poses security threats, or intends to harm users or the network itself. This task encompasses the creation of algorithms, models, and systems that analyze network data such as user interactions, content propagation patterns, and network topology to automatically identify and categorize suspicious or malicious activities. The goal is to proactively identify and mitigate risks such as cyberbullying, fraud, spam, misinformation, account takeover, and other forms of malicious activities that can undermine the safety, trustworthiness, and integrity of the social network ecosystem. Earlier, anomaly detection is performed in the single-layer network. However, the determination of the anomaly in the network based on the nodal behaviour in the single layer affected the anomaly detection efficiency in a multilayer network (Meel and Vishwakarma 2020).

The multi-layer network is comprised of nodes and edges. Here, nodes correspond to the users in the network. All the users in the network are inter-connected with each other. Some of them have accounts on multiple sites and are called coupling nodes of the social network (Li et al. 2021). The importance of any node in the network is determined based on the participation and behaviour of the node in the social network. Various machine-learning techniques were used for the efficient detection of anomalies in the social network (Chaabene et al. 2020). As the entities in a social network are highly interrelated, anomaly detection methods have to examine the interactions in the network to identify anomalous units. Hence, the traditional anomaly detection techniques developed for multi-dimensional points cannot be directly applied to social networks (Wu et al. 2023). Moreover, the anomaly detection techniques on social networks have to analyze the whole network data to reveal the anomalies, which makes the detection process complex in terms of both computational time and space requirements. Thus, a graph-based approach is utilized to ease the analysis process (Karageorgiou and Karyotis 2022).

2 Problem statement

A Multi-Layer Social Network (MSN) is a complex network structure that comprises multiple interconnected layers, each representing different types of interactions or relationships among individuals. Mathematically, an MSN can be represented as a tuple \(G = \left\{ {L,{ }E,{ }V} \right\}\) where, \(L = \left\{ {L_{1} ,{ }L_{2} ,{ }L_{3} ,{ } \ldots } \right\}\) represents the set of layers, where each layer is a distinct graph representing a specific type of social interaction. \(E = \left\{ {E_{1} ,E_{2} ,{ }E_{3} ,{ } \ldots } \right\}\) denotes the set of inter-layer edges connecting nodes across different layers, capturing the relationships between individuals in different contexts. V represents the set of nodes, where each node corresponds to an individual in the social network.

Some of the existing challenges of traditional models in detecting malicious activity in social networks include:

  • One of the most significant problems in analyzing MSN is community detection.

  • In existing techniques, Mean Absolute Error (MAE) errors for feature reconstruction were previously not robust, as they exhibit a high degree of feature corruption.

  • Scalability is also a challenge; the large volume of user activity data in MSN leads to a low signal-to-noise ratio and detection accuracy, resulting in larger memory requirements.

Hence, to alleviate the aforementioned issues, an efficient multilayer anomaly detection technique is proposed and its main contributions are,

  • Utilization of a time-aware attention module for efficient detection of node communities.

  • Graph Auto Encoders (GAE) mitigate feature corruption by performing encryption operations.

  • Incorporation of clustering techniques in the proposed methodology enhances the efficiency of misbehaviour detection in MSN.

The remaining of the paper is organized as follows: Sect. 2 reviews state-of-art methods related to malicious detection in multi-layer social networks, Sect. 3 details the proposed methodology for detecting malicious users in multilayer networks, Sect. 4 presents the experimental analysis and discussion, and Sect. 5 concludes the paper with future enhancement.

3 Related work

The study conducted by Rahman et al. (2021) aimed to classify normal and abnormal users within social networks through the development of a hybrid anomaly detection method. This method utilized a cascade of the Naive Bayesian Classifier (NBC) and Support Vector Machine (SVM) machine learning algorithms. Training of the cascaded model was performed using two datasets, incorporating features extracted from both user profiles and contents. Performance analysis conducted on the system demonstrated its efficiency. However, the study identified a limitation in the method’s ability to accurately predict bots, attributing this to its focus on processing sequence data rather than structural data. While our focus lies on community detection within MSNs, Rahman et al. ’s research work’s hybrid anomaly detection method sheds light on the importance of leveraging machine learning techniques to identify abnormal behaviours within social networks, thus providing valuable insights into enhancing overall network analysis and security measures.

In their study, Mendonça et al. (2020) utilized computational ontologies and machine learning techniques for the automatic classification of social network posts. The method selected the suspicious posts and deciphered them by representing crime concepts and associating them with criminal slang expressions. The obtained results showed the effectiveness of detecting the user’s intention in written criminal posts. The lack of data caused poor results for the system as the machine learning model requires a large amount of data. This work provides valuable insights into social media analysis, which may inform aspects of community detection in multi-layered social networks by understanding patterns of behaviour and communication dynamics within online communities.

The study conducted by Choi and Jeon (2021) proposed a collaborative spam-detection model in which experts and machines detect spam in conjunction. A primary machine learning filter analyzed the messages as normal and suspicious messages to avoid the fatal error of misidentifying spam messages. The suspicious messages were flagged and analyzed by the expert. The model delivered high performance in detecting suspicious social network messages. However, the model was limited to current trends as it used spam detection studies released in 2015. While our focus lies on incorporating a clustering technique for improved misbehaviour detection, this study on a cost-based heterogeneous learning framework for real-time spam detection in social networks demonstrates the importance of leveraging expert decisions and real-time data for efficient detection of malicious activities, thus aligning with our objective of enhancing detection efficiency within MSNs.

Shukla et al. (2022) explored an AI-driven social media bot identification model for identifying fraudulent Twitter bots. Analyzing user profile-centric features and activity-centric characteristics, constructing filtering criteria, and performing language-based processing were the steps executed by the detection model. The analysis concluded that the model was highly capable of revealing suspicious activity on the social media platform, Twitter. The analysis of one-mode interactions could be poor in bot prediction. This provides insights into AI-driven algorithms for identifying online media bots in social networks, which could be relevant for understanding anomaly detection in networks, thus adding contextual understanding to the broader landscape of social network analysis. This approach involving the analysis of user profile-centric features and activity-centric characteristics aligns with the objective of leveraging attention mechanisms for community detection, emphasizing the importance of temporal dynamics in network analysis.

Ni et al. (2021) introduced a Multi-View Attention Network (MVAN) to detect fake news on social media. The text semantic attention and propagation structure attention captured the source tweet content and propagation structure. The attention mechanisms found key clue words in fake news texts and suspicious users in the propagation structure. The experimental results demonstrated that MVAN significantly outperformed state-of-the-art methods. However, the model failed to incorporate information on the user’s replies, which limits the performance of the model. This research work offers insights into attention mechanisms and network analysis techniques that could potentially inform the development of more sophisticated algorithms for detecting misbehaviour or anomalies. It focuses on capturing source tweet content and propagation structure through attention mechanisms, its limitation in incorporating information on user replies parallels the challenges in efficiently detecting node communities within multi-layered social networks, emphasizing the need for more comprehensive attention mechanisms to improve detection accuracy, it contributes to our work on time-aware attention module utilization in node community detection.

Zhang et al. (2020) designed a spam detection algorithm named Improved Incremental Fuzzy-kernel-regularized Extreme Learning Machine (I2FELM). The algorithm was based on a regularized extreme learning machine and analyzed Twitter spam characteristics, such as user attribute, content, activity, and relationship to detect Twitter spam accurately. As revealed from the experience validation results, the model efficiently identified Twitter spam. However, the model suffers from the problems of insufficient labeled data and selecting the proper feature set for improving the model performance. This research contributes to our work on the efficient detection of node communities within multi-layered social networks. Although it focuses on social network spam detection using an improved extreme learning machine, the methodology and techniques employed in the research work provide insights into enhancing detection efficiency, aligning with our objective of employing advanced techniques like time-aware attention modules for community detection.

Tandon et al. (2022) presented a social network spammer detection technology based on Graph Convolution Networks (GCN). The local structural features of the network were extracted and combined with GCN to obtain the global structural features. Thus, the global features were used for detecting spam. Experiments conducted on data from the social networking site demonstrated the method’s efficiency. However, the GCN method had an inability to distinguish nodes from receiving messages from other nodes. This research’s exploration of a graph-based CNN algorithm for detecting spammer activity over social media provides valuable insights into leveraging graph-based approaches for efficient detection of malicious behaviour within social networks, aligning with our objective of improving detection efficiency in MSN.

Dokuz (2022) discovered anomalous daily activities using social media datasets. The interest measure was developed based on spatial and temporal differences of successive posts and was used by the Naive algorithm and Social Velocity-based Anomalous Daily Activity Discovery (SV-ADAD) algorithms to find anomalous activity. The results showed that algorithms were successful in discovering anomalous activities of social media users. Avoiding content information from social media posts appears to be inconsistent in detecting users related to suspicious behaviour. This research provides insights into time-aware attention modules, which are crucial for efficient node community detection within multi-layered social networks, aligning with our objective of employing advanced techniques for detecting community structures.

Wanda and Jie (2021) classified the abnormal nodes on social networks using a supervised neural network called DeepFriend. The dynamic deep learning model constructed using the WalkPool pooling function was trained by the extensive features to classify malicious vertices using the node’s link information. The model gained higher accuracy than standard learning algorithms in the abnormal social network nodes’ classification. The model needs to explore handling accounts and malware hierarchy links to achieve better prediction. This research aligns with the utilization of time-aware attention modules for efficient community detection within social networks. Although the approach focuses on classifying abnormal nodes rather than detecting communities directly, the utilization of supervised neural networks, such as DeepFriend, with dynamic deep learning models trained on extensive features, demonstrates the potential for incorporating time-aware attention mechanisms to enhance community detection in multi-layered social networks.

Wanda and Jie (2020) dealt with fake account finding in social networks using Deep Neural Network (DNN). A dynamic Convolution Neural Network (CNN) was constructed using a pooling layer to optimize the neural network performance in the training process. The experiments demonstrated that a promising result with better accuracy and small loss than common learning algorithms was harvested in a malicious account classification task. The computation complexity of the model was higher due to the adoption of an ordinary loss function. This research’s exploration of dynamic neural networks and classification tasks sheds light on potential techniques for enhancing efficiency in community detection and misbehaviour detection within multi-layered social networks.

The study by Pham et al. (2022) introduced an approach called Bot2vec to Bot/Spammer detection with an application of Network Representation Learning (NRL). The model automatically preserved local neighbourhood relations and the intra-community structure of user nodes for learning. The user node embedding outputs for the detection tasks were obtained using the intra-community random walk strategy. Extensive experiments on two different types of real-world social networks demonstrated the effectiveness of the model. The random walk strategy was time-consuming in generating the walks of each node for large-scale social networks. This research approach highlights the importance of preserving local neighbourhood relations and intra-community structures, which aligns with the goals of leveraging time-aware attention modules and clustering techniques for efficient community detection and misbehaviour detection within MSNs.

The literature survey encompasses a range of studies that collectively provide valuable insights into various aspects of social network analysis, spanning from anomaly detection and spam detection to community detection and misbehaviour detection within multi-layered social networks. These studies employ diverse methodologies, including machine learning techniques, attention mechanisms, graph-based algorithms, and dynamic neural networks, highlighting the importance of leveraging advanced techniques for enhancing detection efficiency and accuracy within MSNs.

4 Proposed suspicious behaviour prediction model

The proposed model includes efficient EDER-based graph construction and PF-KMA based clustering strategies to spot anomalous users in the multilayer social network. The main goal of the proposed system is to accurately classify anomalous users by learning the network structure features and ranking the nodes as per the score level of information diffusion. Figure 1 draws the proposed methodology.

Fig. 1
figure 1

Block diagram of the proposed methodology

4.1 Preprocessing

Preprocessing is to handle and prepare the input data in such a way that the collected data must be useful for further processing. Data collection in the proposed methodology is achieved via the dataset of tweets and their related information. The dataset is provided with information regarding multi-layer social networking websites, such as Facebook, Twitter, and Instagram, which could be noisy, inconsistent, and contain some irrelevant parts to process.

Let \(\left( W \right)\) be the dataset containing \(\left( {n = 1\,to\,N} \right)\) number of data \(\left( {W_{i(n)} } \right)\) collected from \(\left( i \right)\) network websites. The collected data is mathematically expressed as,

$$W = \left\{ {W_{i(n)} |i = 1,\;2\;,\;3,\;n = 1,\;2,\;3,....,\;N} \right\}$$
(1)

Hence, the collected dataset is pre-processed to keep only the relevant data by employing the following steps,

Removal of Redundant data: This step is to avoid data duplication by promptly removing the data pieces no longer required in the training process. Data duplication is the occurrence of the same piece of data in two or more locations.

$$W_{{red\left( {W_{i(n)} } \right)}} = \Im_{rem} \left\{ {W_{i(n)} } \right\}\,\,\,\,\,\,\,\,\,\,\,\,\,Where,W_{i(n)} = W_{i(n + 1)}$$
(2)

Null Value Removal: This module aimed to remove the rows that contain null values, which are not useful for detecting suspicious behavior.

$$W_{nr} = \Im_{rem} \left\{ {W_{{red\left( {W_{i(n)} } \right)}} } \right\}\,\,\,\,\,\,\,\,\,\,\,\,\,Where,W_{i(n)} = null$$
(3)

where \(\Im_{{{\text{rem}}}}\) is the function for removing redundant data \(\left( {W_{i(n)} = W_{i(n + 1)} } \right)\) and null values \(\left( {W_{i(n)} = null} \right)\), \(W_{{red\left( {W_{i(n)} } \right)}}\), and \(W_{nr}\) are the resultant dataset after removing the irrelevant data.

4.2 Graph construction

In this section, the multiple social networks in the pre-processed dataset \(\left( {W_{nr} } \right)\) are represented as a Graph structure in which users are considered as nodes and the interactions among them are considered as edges for providing meaningful analysis. The EDER random graph model is used for graph construction because of its ability to generate graphs at the time of inclusion of the node in the social network. But, the randomized nature of the graph generation probability increased the divergence nature of the algorithm. Hence, the exponential distribution is utilized to generate the graphs continuously at the time of occurrence.

The method instantiates the graph by setting certain parameters, such as nodes, edges, and fixing probability. In this model, connecting random nodes constructs a graph in which the edges are selected according to an exponential distribution with probability.

$$W_{nr} (H,R,\rho ) = \left\{ \begin{gathered} \left\{ {H_{d} } \right\}_{d = 1,2,3,.....,D} \hfill \\ \left\{ {R_{e} } \right\}_{e = 1,2,3,.....,E} \hfill \\ \rho^{E} \left( {1 - \rho } \right)^{{\left( \begin{subarray}{l} D \\ 2 \end{subarray} \right) - E}} \hfill \\ \end{gathered} \right.$$
(4)
$$S(R) = \left\{ \begin{gathered} \varepsilon \exp \left( { - \varepsilon * R} \right),\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{if}}\left( {\rho \ge 0} \right) \hfill \\ 0,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,otherwise \hfill \\ \end{gathered} \right.$$
(5)

Here, \(H\) is the set of nodes \(\left\{ {H_{d} } \right\}\), \(D\) is the \(D^{th}\) number of node, \(R\) contains \(E\) number of edges \(\left\{ {R_{e} } \right\}\),\(\rho \in \left( {0.1} \right)\) is the probability, \(S(R)\) is the exponentially distributed random variables, and \(\varepsilon\) is the scaling parameter. The distribution of edges is binomial with the nodes and probability measures as,

$$\rho \left( R \right) = \left( \begin{gathered} D \hfill \\ E \hfill \\ \end{gathered} \right)\rho^{E} \left( {1 - \rho } \right)^{D - E}$$
(6)

Further, each edge is assigned with an independent weight to include graphs with more or less edges. As a result of this procedure, the graph is defined by the pair \((H,R,\rho )\) as,

$$T = W_{nr} (H,R,\rho ),\,\,\,H_{d} = H \times L_{i} ,\,\,\,\,\,\,R = H_{d} \times H_{d + 1}$$
(7)

here \(T\) denotes the generated graph and \(L_{i}\) denotes the number of layers.

4.2.1 Feature extraction

From the network structural graph \(\left( T \right)\), the important features that describe the user characteristics and indicate the personal importance of individuals within the social networks are extracted. The features extracted are,

Degree Centrality (DC): DC \(\left( {\varphi_{\deg \_cen} } \right)\) is a combination of the number of ties in the network that directly connect one node with others. Here, the node’s degree is a count of social connections held by each node. It is divided into two measures of degree, such as in-degree centrality and out-degree centrality.

$$\varphi_{\deg \_cen} = \frac{\deg \left( H \right)}{{D - 1}}$$
(8)

The in-degree centrality \(\left( {\varphi_{i\;\deg \_cen} } \right)\) measures the number of connections toward the focal node and the out-degree centrality \(\left( {\varphi_{o\;\deg \_cen} } \right)\) measures the number of connections sent from the focal node. It can be expressed as,

$$\varphi_{i\;\deg \_cen} = \frac{i\;\deg \;(H)}{{D - 1}}$$
(9)
$$\varphi_{o\;\deg \_cen} = \frac{o\;\deg \;(H)}{{D - 1}}$$
(10)

where \(\deg\) is the number of nodes directly connected to the node \(\left( H \right)\), \(i\;\deg\) is the number of connections inward to the node \(\left( H \right)\), and \(o\;\deg\) is the number of connections outward to other nodes.

Closeness Centrality (CC): CC signifies how quick the spreading of information is from a given node to all other nodes in the network. This measure emphasizes the position in the network by calculating the average of the shortest paths from the focal node to others. It can be derived as,

$$\varphi_{{{\text{cls\_cen}}}} = \frac{D - 1}{{\sum\limits_{\begin{subarray}{l} d + 1 \ne d \\ d,\;d + 1 \in H \end{subarray} } {{\text{cls}}\;\left( {H_{d} ,H_{d + 1} } \right)} }}$$
(11)

where \(\varphi_{cls\_cen}\) is the closeness centrality and \(cls\) is the closeness of two nodes \(\left( {H_{d} ,\;H_{d + 1} } \right)\).

Betweenness Centrality (BC): BC shows the nodes being bridges between nodes in the network. This measure indicates how many shortest paths experienced the given node i.e., the number of times the node is present between two other nodes.

$$\varphi_{bw\_cen} = \frac{{\sum\limits_{\begin{subarray}{l} p\; \ne \;H\; \ne \;q \\ p,\;q \in H \end{subarray} } {\frac{{bw_{pq} \left( H \right)}}{{bw_{pq} }}} }}{D - 1}$$
(12)

where \(\varphi_{{{\text{bw\_cen}}}}\) is the betweenness centrality and \(bw_{pq}\) is the number of shortest paths between \(\left( {p,\;q} \right)\) passing through \(\left( H \right)\).

Degree Prestige (DP): DP calculates the structural prestige of the node from its social connections. Counting the number of connections directed to the given node signifies the popularity of such node in the social network.

$$\varphi_{{\deg \_{\text{pre}}}} = \frac{{C_{n\;(H)} }}{{C_{n} }}$$
(13)

here \(\varphi_{\deg \_pre}\) is the degree prestige of the given node, \(C_{n(H)}\) is the number of incoming connections to \(\left( H \right)\), and \(C_{n}\) is all the possible connections in the network.

Social Position (SP): This evaluates the importance of an individual in the community by having the position values of the node’s direct contacts and their activities towards that node. This computation is made in an iterative way as follows,

$$\varphi_{SP\;(k + 1)} \left( {H_{d} } \right) = \left( {1 - \beta } \right) + \beta \sum\limits_{{H_{d + 1} \in H}} {\varphi_{SP\;(k)} \left( {H_{d + 1} } \right)} \Re_{{H_{d + 1} \to H_{d} }}$$
(14)

where \(\beta\) is the fixed coefficient, the strength of the relation between \(\left( {H_{d} } \right)\) and \(\left( {H_{d + 1} } \right)\) is represented by the community function \(\Re\), and \(\varphi_{SP}\) is the social position of member \(\left( {H_{d} } \right)\) after \(k\) and \(k + 1\) iterations.

4.2.2 Time aware attention module

This module is to improve prediction performance by exploiting the structure of the social network \(\left( T \right)\) and gathering social influence about users of a social network from past records. Collecting user representations includes a wide range of historical user interaction information, such as comment, like, re-tweet, mention, follow, etc., and the respective time stamp to record the data and time during the post. Two different strategies employed for this purpose are as follows,

Hard Selection Strategy: This strategy aims to find the time stamp of the user’s post and to determine the time intervals in which the time stamp exists. Let \(\tau_{m} = \left[ {\tau_{1} ,\tau_{2} ,\tau_{3} ,.....,\tau_{M} } \right]\) be the set of time intervals and \(H_{d(\tau )}^{\left( i \right)} = \left[ {H_{{1\left( {\tau_{m} } \right)}}^{\left( i \right)} ,H_{{2\left( {\tau_{m} } \right)}}^{\left( i \right)} ,H_{{3\left( {\tau_{m} } \right)}}^{\left( i \right)} ,.......,H_{{D\left( {\tau_{m} } \right)}}^{\left( i \right)} } \right]\) is the set of all user representations obtained from the graph structure \(T\). When the user \(H_{1}\) reposts message \(\left( {\phi_{m} } \right)\) at certain time stamp of \(\left( {\tau_{m} } \right)\), the user representation is recorded as,

$$H_{1(\tau ,\phi )} = \left[ {H_{{1\left( {\phi_{m} ,\tau_{1} } \right)}}^{\left( i \right)} ,H_{{1\left( {\phi_{m} ,\tau_{2} } \right)}}^{\left( i \right)} ,H_{{1\left( {\phi_{m} ,\tau_{3} } \right)}}^{\left( i \right)} ,.......,H_{{1\left( {\phi_{m} ,\tau_{M} } \right)}}^{\left( i \right)} } \right]$$
(15)

where \(H_{1(\tau )}\) contains \(M\) representations of user \(H_{1}\) for a particular post \(\left( {\phi_{m} } \right)\) ordered by timestamp. If the time stamp of the post from one layer to another belongs to the time interval \(\left( {\tau_{m} ,\tau_{m + 1} } \right)\), the final user representation is taken as, \(H_{{1\left( {\phi_{m} ,\tau_{m} } \right)}}^{\left( i \right)}\).

Soft Selection Strategy: Using the user data from the hard selection strategy, this module utilizes user representations from historical information. For a given user id \(\left( {H_{1} } \right)\), the user representations are fetched from all user representations, and the module to provide final user representations is derived as,

$$H_{(fr)} = \sum\limits_{{d \in H_{{1\left( {\tau ,\phi } \right)}} }} {\vartheta_{d} H_{{d(\tau_{m} )}} }$$
(16)
$$\vartheta_{d} = \varsigma_{sfm} \left( {\frac{{H_{d}^{\left( i \right)} \tau }}{\sqrt h } + J} \right)$$
(17)
$$\tau = \chi_{lu} \left( {\tau_{m} } \right)$$
(18)

where \(H_{(fr)}\) is the final representation fusing the historical user representations, \(\chi_{lu}\) is the function to transform time interval into time embedding, \(J\) is the mask matrix, \(\vartheta_{d}\) is the attention weight, \(\varsigma_{sfm}\) is the softmax function, and \(h\) is the dimensionality of user embeddings. Hence, \(H_{(fr)}\) comprises the social interaction information among users in the social network for a particular post.

4.3 Graph encoding

Encoding is to encode the graph data, such as features of the structural graph and user representation information obtained from the time-aware module. In this model, the Soft Sign activated Graph Auto Encoder (SS-GAE), which converts input data into an abstract representation is used for encoding. Graph auto encoders are probabilistic graphical models that learn a probability distribution over their set of inputs. The conventional GAE method is selected because of its great potential in dimensionality reduction and attention to graph embedding. However, the compression by the Relu activation function in GAE is lossy and it was replaced with the soft sign activation function. The structure of SS-GAE is shown in the Fig. 2.

Fig. 2
figure 2

Architecture of SS-GAE

Graph autoencoder uses Graph Convolution Network (GCN) as a graph encoder, which represents graph information and feature content in a unified representation. The feature matrix captures additional information associated with each node in the graph. This could include attributes such as node features, or any other relevant information. The size of the feature matrix depends on the number of features and the number of nodes in the network. We have l features for each node, and n nodes in total, the Feature Matrix could be of size \(l\times n\). The information matrix represents the structure of the graph, where each entry indicates the presence or absence of a connection between nodes. Given that the social network is very large, the assumption here is that the information matrix is also large. The size of the information matrix depends on the number of nodes and the density of connections in the network. Here we have n nodes, hence the information matrix could be of size \(n\times n\). The input graph data \(G_{inp}\) can be represented by,

$$G_{inp} = \left\{ {F_{tam(n \times n)} ,T_{fea(l \times n)} } \right\}$$
(19)

where \(T_{fea(l \times n)} \in \left\{ {\varphi_{\deg \_cen} ,\varphi_{cls\_cen} ,\varphi_{bw\_cen} ,\varphi_{\deg \_pre} ,\varphi_{sp} } \right\}\sqrt {b^{2} - 4ac}\) is the feature matrix of \(l\) features and \(F_{tam(n \times n)} \in \left\{ {H_{{\left( {fr} \right)}} ,H_{{d\left( {\phi_{m} ,\tau_{m} } \right)}}^{\left( i \right)} } \right\}\) is the information matrix of the time aware module.

GCN executes the convolution operation to the input data in the spectral domain and learns layer wise transformation of the convolution operation. The output of \(q^{th}\) layer is computed as,

$$O_{\left( q \right)} = \Phi_{con} \left( {G_{inp} |\theta_{\left( q \right)} } \right)$$
(20)

where \(O_{\left( q \right)}\) is the output after convolution, \(\Phi_{con}\) is the convolution operation, and \(\theta_{\left( q \right)}\) is the weight matrix of \(q^{th}\) layer. The output of each layer is represented by the function as,

$$O_{\left( q \right)} = \Phi_{af(q)} \left( {W^{ - 0.5} F^{\prime }_{tam} W^{ - 0.5} T^{\prime }_{{fea\left( {q - 1} \right)}} \theta_{\left( q \right)} } \right)$$
(21)

where \(W = \sum {F^{\prime }_{tam} }\) is the degree matrix formed using \(F^{\prime }_{tam} = F_{tam} + IM\) and \(IM\) is the identity matrix. The soft sign activation function \(\left( {\Phi_{af} } \right)\) is computed as,

$$\Phi_{af} = \frac{E}{{\left( {1 + \left| E \right|} \right)}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,Where,E = \left( {W^{ - 0.5} F^{\prime }_{tam} \;W^{ - 0.5} T^{\prime }_{{fea\left( {q - 1} \right)}} \theta_{\left( q \right)} } \right)$$
(22)

Finally, the graph encoder encodes both the feature data and graph content and the output gives the following representation as,

$$O_{enc} = \lambda \left( {O_{\left( q \right)} |G_{inp} } \right)$$
(23)

where \(O_{enc}\) gives the encoder output with the function \(\lambda \left( {O_{\left( q \right)} |G_{inp} } \right)\).

Algorithm 1
figure a

SS-GAE-based graph encoding

4.4 Graph decoding

Graph decoding is to reconstruct the encoded data \(\left( {O_{enc} } \right)\) and to predict the information diffusion level. Information diffusion is the process of spreading information from one place to another through interaction. Thus, decoding both the graph data and feature set is beneficial to rate the intensity of information diffusion by referring spread of information among interconnected nodes. The purpose of the decoder in the Soft Sign activated Graph Auto Encoder (SS-GAE) model is to reconstruct the encoded data and predict the information diffusion level. By decoding both the graph data and feature set, the decoder aids in rating the intensity of information diffusion by referencing the spread of information among interconnected nodes. Additionally, the decoded information, particularly the learned user representation, is utilized to output the diffusion sequence, enabling the estimation of diffusion probability to analyze information spread and detect anomalous accounts based on users' personal preference influences among users.

Initially, the decoder attains the output of the encoder \(O_{enc}\) and focuses on reconstructing the input data using the following computation as,

$$O_{dec} = \xi_{PR} \left( {O_{enc} } \right)$$
(24)
$$\xi_{PR} = \left\{ \begin{gathered} O_{enc} \,\,\,\,\,\,\,\,\,\,\,\,\,if\left( {O_{enc} > 0} \right) \hfill \\ bO_{enc} \,\,\,\,\,\,\,\,otherwise \hfill \\ \end{gathered} \right.$$
(25)

where \(O_{dec}\) is the decoded output, \(\xi_{PR}\) is the activation function and \(b\) is the learnable parameter.

From decoded information, the learned user representation is used to output the diffusion sequence. Then, the diffusion probability is estimated to analyze the information spread and detect anomalous accounts by inferring the users’ personal preference influences among users. The probability is computed as,

$$\aleph_{{O_{dec} }} = - \sum\limits_{n,\;n} {O_{dec} \log \left( {O_{dec} } \right)}$$
(26)

where \(\aleph_{{O_{{{\text{dec}}}} }}\) is the probability to indicate whether the diffusion behavior happened or not. Based on this probability, the anomaly score for each node is generated as,

$$\partial_{{as(H_{d} )}} = \eta_{loc} \left( {\aleph_{{O_{dec} }} ,\;F_{tam(n \times n)} } \right)$$
(27)

where \(\partial_{{as\;(H_{d} )}}\) is the generated anomaly scores for each node \(\left( {H_{d} } \right)\) in the multilayer social network and \(\eta_{loc}\) is a measure to assign the anomaly score of each data point.

4.5 Ranking

This section ranks different nodes of the multilayer network with respect to the anomaly scores \(\left( {\partial_{{as\;(H_{d} )}} } \right)\) along with the decoded output \(\left( {O_{dec} } \right)\).Hence, the Lurker-Ranking (LR) method is used for ranking, which has the behaviour of ranking users according to their degree of lurking on social networks. In the proposed system, this has been achieved using the anomaly score obtained by each node in the social network.

Given a node \(H_{d} \in T\) the multilayer LR is defined by,

$$\hbar_{{LR\;(H_{d} )}} = \gamma \left[ {\delta_{{in\;\left( {O_{{dec\;\left( {H_{d} } \right)}} } \right)}} \left( {1 + \delta_{{out\;\left( {O_{{dec\;\left( {H_{d} } \right)}} } \right)}} } \right)} \right] + \left( {1 - \gamma } \right)\partial_{{as\;(H_{d} )}}$$
(28)
$$\delta_{{in\;\left( {O_{{dec\;\left( {H_{d} } \right)}} } \right)}} = \sum\limits_{i} {\frac{{\varpi_{i} }}{{{}^{{\left| {X_{i\;(out)} \;(H_{d} )} \right|}}}}} \sum\limits_{{H_{d + 1} \in X_{i\;(in)} (H_{d} )}} {\frac{{\left| {X_{i\;(out)} (H_{d + 1} )} \right|}}{{\left| {X_{i\;(in)} (H_{d + 1} )} \right|}}} \hbar_{{LR\;(H_{d + 1} )}}$$
(29)
$$\delta_{{out\;\left( {O_{{dec\;\left( {H_{d} } \right)}} } \right)}} = \sum\limits_{i} {\frac{{\varpi_{i} \left| {X_{i\;(in)} (H_{d} )} \right|}}{{\sum\limits_{{{}^{{H_{d + 1} \in X_{i\;(out)} (H_{d} )}}}} {{}^{{\left| {X_{i\;(in)} (H_{d + 1} )} \right|}}} }}} \sum\limits_{{H_{d + 1} \in X_{i\;(out)} (H_{d} )}} {\frac{{\left| {X_{i\;(in)} (H_{d} )} \right|}}{{\left| {X_{i\;(out)} (H_{d} )} \right|}}} \hbar_{{LR\;(H_{d} )}}$$
(30)

where \(\delta_{{in\;\left( {O_{{dec\;\left( {H_{d} } \right)}} } \right)}}\) is the in-neighbours driven lurking function, \(\delta_{{out\;\left( {O_{{dec\;\left( {H_{d} } \right)}} } \right)}}\) is the out neighbours driven lurking function, \(\hbar_{{LR(H_{d} )}}\), \(\hbar_{{LR(H_{d + 1} )}}\) are the ranking score for two nodes \(\left( {H_{d} ,H_{d + 1} } \right)\), \(X_{i\;(in)}\), \(X_{i\;(out)}\) are the set of in-neighbours and out-neighbours of a particular node, and \(\gamma\) is the damping factor.

Regarding in-neighbours, the ranking score of a node increases with the number of its in-neighbours and their likelihood of being anomalous having a relatively high anomalous score. On the other hand, concerning a set of out-neighbours, the ranking score of a node increases according to the tendency of its out-neighbours being anomalous.

Thus, by incorporating both the contribution of in-neighbours and out-neighbours and their anomalous scores, all the nodes in the multilayer social network are ranked as follows,

$$H_{LR\;(d)} = \left\{ {H_{LR(1)} ,\;H_{LR(2)} ,\;H_{LR(3)} ,.....,\;H_{LR(N)} } \right\}$$
(31)

where \(H_{LR\;(d)}\) contains the set of ranked nodes.

4.6 Clustering

This section is where the ranked users in the multilayer social networks \(\left( {H_{LR\;(d)} } \right)\) are clustered into different groups using PF-KMA. KMA is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. Due to its ability in scaling large data, the KMA algorithm was selected for the proposed objective. Although a K-means provides better clustering, slow convergence occurs. Hence, Pareto front approach is used to provide an optimal solution for centroid selection.

This initially takes the ranked users of the social network, which consists of their rank values. After that, several clusters and centroids were initialized. Then, the method aims to partition users into clusters in which each user belongs to the cluster based on similarity. The steps involved in PF-KMA are explained as follows,

  • Specify the number of clusters \(Z\) and then initialize the \(d\) number of users with their rank values as \(H_{LR\;(d)} = \left\{ {H_{LR(1)} ,\;H_{LR(2)} ,\;H_{LR(3)} ,.....,\;H_{LR(N)} } \right\}\) and set of cluster centroid as \(G_{i} = \left\{ {G_{1} ,\;G_{2} ,....,\;G_{Z} } \right\}\).

  • Select the number of cluster centres using the PF technique to cluster the ranked users.

    $$G_{i} = \arg \min \left( {\Omega_{cl\;(1)} ,\;\Omega_{cl\;(2)} } \right)$$
    (32)

    where \(\Omega_{cl\;(1)}\) and \(\Omega_{cl\;(2)}\) are the objective function of two different clusters.

  • Compute the distance between each data point and all centroids, and assign each point to the closest centre whose distance from the cluster centre is the minimum of all the cluster centres.

  • The distance between the data points is calculated using Euclidean distance (ED). The ED between data point \(H_{LR(d)}\) and centre \(G_{i}\) is defined by,

    $$r_{eld} = \sqrt {\sum\limits_{i = 1}^{Z} {\left( {G_{i} - H_{LR(d)} } \right)^{2} } }$$
    (33)

    here \(r_{eld}\) is the ED between \(H_{LR(d)}\) and \(G_{i}\).

  • Compute the centroid for the clusters by taking the average of all data points that belong to each cluster.

  • The process is continued until the assignment of the cluster for the data point changes.

  • Finally, the ranked users are clustered into five different classes, such as active star, clique, passive, and newcomers. The obtained clusters are,

    $$Cl = \left\{ {cl_{K} |K = 1,\;2,\;3,\;4,\;5} \right\}$$
    (34)

    where \(Cl\) denotes five different groups of clusters \(cl_{K}\).

Algorithm 2
figure b

PF-KMA-based Clustering

The nodes are clustered based on the degree of deviation from their anomalousness in ranking. Thus, anomaly ranking is advantageous in classifying the different types of users in social networks. Hence, by efficient clustering, the malicious users are determined from multilayer social networks.

5 Results and discussion

Numerous experiments conducted in the working platform of PYTHON are presented in this section to evaluate the proposed model. The proposed malicious behaviour detection model utilized the synthetic dataset for the evaluation of model performance. Due to the network's large size, performing graph convolution becomes time-consuming. The time consumption at this stage is influenced by several factors, including the number of nodes and edges in the network, the complexity of the graph convolution operation, and the computational resources available. Considering a simplified dataset representing interactions among users across three social networking platforms: Facebook (FB), Instagram (IG), and Twitter (TW). Each row in the dataset represents an interaction between two users. In this synthetic dataset:

  • Users interact with each other across different social networking platforms (FB, IG, TW).

  • Each row represents an interaction between two users, with the corresponding platform where the interaction occurred.

  • Users and platforms are represented using alphanumeric identifiers for simplicity.

For the experiment purpose, the synthetic dataset representing interactions among users in different social networking platforms are created by first defining the set of unique users across these platforms, denoted by \(N\). Each user is assigned a unique alphanumeric identifier. Interactions between users are then simulated by generating rows in the dataset, with each row representing an interaction between two users along with the platform where the interaction occurred. For an instance, a row might indicate that user “A” interacted with user “B” on Instagram. The number of edges in the dataset, denoted by \(E\), corresponds to the total interactions recorded. Malicious edge injection is incorporated by introducing deceptive interactions between users, guided by predefined models or statistical patterns derived from real-world data. This ensures a diverse and realistic representation of user behavior, including both normal interactions and malicious activities. Approximately 70% of the dataset was allocated for training the model. This set included a representative sample of interactions from all platforms to capture diverse behaviors. The remaining 30% of the dataset was reserved for testing the model's effectiveness. Care was taken to ensure that the test set maintained similar statistical properties and distributions as the training set to avoid biased evaluation results. The resulting synthetic dataset serves as a valuable resource for evaluating malicious behavior detection models in social networks across multiple platforms while considering the complexities of real-world interaction patterns.

This dataset serves as input to the proposed model for detecting malicious users in the multilayer social network. The model undergoes various steps, including preprocessing, graph construction, encoding, decoding, ranking, and clustering, to identify and rank potentially malicious users based on their interactions across different platforms.

5.1 Performance analysis of clustering

In this section, the performance of the proposed PF-KMC is analyzed with the existing KMA, Fuzzy C-means (FCM), Clustering LARge Applications (CLARA), and Farthest First Clustering (FFC) methods based on clustering time and clustering efficiency.

Table 2 presents the clustering time taken by the proposed and existing techniques for partitioning the users into groups based on similarity. When analyzing Table 1, the time taken by the existing KMA is 2291 ms more than the proposed method. The clustering time of the existing FCM, CLARA, and FFC are higher by 6130 ms, 8451 ms, and 11,637 ms compared to the proposed method. Hence, the analysis concludes that avoiding the random selection of cluster centroids using PF reduced the clustering time significantly than existing clustering techniques.

Table 1 Analysis of clustering time

Figure 3 analyses the clustering efficiency of the proposed PF-KMA and existing KMA, FCM, CLARA, and FFC methods. The proposed method showed improved results than the existing clustering methods. Existing KMA, FCM, CLARA, and FFC attained an accuracy lower by 2.24%, 4.3%, 4.69%, and 6.88% than the proposed method. Hence, the analysis states that the selection of cluster centroids using PF improved its efficiency in the clustering of network users.

Fig. 3
figure 3

Analysis of clustering efficiency

5.2 Performance analysis of graph construction

The performance of the proposed EDER graph construction model is contrasted to the existing ER, Watts–Strogatz (WS), Barabási–Albert (BA), and Exponential Random Graph (ERG) methods with reference to the graph construction time.

Figure 4 analyses the graph construction time of the proposed and existing methods. While comparing, the proposed EDER method takes much lesser time for graph construction, which is reduced by 318 ms than ER, 594 ms than WS, 858 ms than BA, and 979 ms than ERG. This indicates that selecting edges using the ED technique greatly influenced the performance of the proposed graph construction model in terms of lesser graph construction time.

Fig. 4
figure 4

Graph construction time analysis

5.3 Performance analysis of graph encoding

This section analyses the performance of the proposed SS-GAE method and existing GAE, Variational GAE (VGAE), Adversarial Regularized GE (ARGE), and Adversarial Regularized Variational GE (ARVGE) methods with respect to Area Under Curve (AUC), Average Precision (AP), computational complexity, loss, and Normalized Mutual Information (NMI) measure. Further, the analysis of diffusion rate and computation complexity is carried out.

Table 2 presents the AUC analysis of proposed and existing graph encoding models. From Table 2, it can be said that the AUC of the proposed method is improved by 1.85% (GAE), 2.41% (VGAE), 4.05% (ARGE), and 5.94% (ARVGE) than the existing methods. Hence, the analysis concludes that the SS technique used in the proposed method consistently improved the encoding performance of the model.

Table 2 AUC analysis

The AP attained by the proposed and existing graph encoding methods is compared in the above Fig. 5. In contrast to the existing methods, the AP of the proposed method is enhanced to 94.71%, which is 2.05% higher than the existing GAE method. In comparison to other methods also, the proposed SS-GAE technique proffers better performance for graph encoding. From the analysis, it was proved that using the proposed SS-GAE method for graph encoding is advantageous over all existing methods.

Fig. 5
figure 5

Analysis of AP

The computational complexity of the proposed SS-GAE is analyzed in comparison with the baseline auto encoder models as shown in Table 3. Here, the retrieval of the encoded data by the proposed model takes 1684 ms, which is higher when compared to the GAE, VGAE, ARGE, and ARVGE. This shows that the graph generated by the proposed model is more complex to retrieve than the graph generated by other baseline techniques.

Table 3 Computation complexity analysis

Figure 6 analyses the loss i.e., the efficiency of the model in producing encoded output without loss of data. In Fig. 6, it can be detected that loss is consistently decreasing as expected for all epochs. Thus, it indicates that the proposed SS-GAE is lossless and had the ability to frame the encoded output without any loss.

Fig. 6
figure 6

Loss vs. Epochs

Figure 7 evaluates the quality of encoded outcomes by measuring the NMI score. It measures the efficiency of the proposed model to correlate the mutual information from the user representation data and the features extracted from the graph structure. On analyzing Fig. 6, the NMI score of the model slightly increases from 0.65 to 0.85 between 0 to 100 epochs and stabilizes within 200 epochs. This shows the learning capacity of the proposed model in reconstructing the encoded data efficiently.

Fig. 7
figure 7

NMI vs. Epochs

Figure 8 analyses the diffusion rate predicted by the proposed model for a particular post in different layers of the network. This indicates how information spreads in social community networks by selecting a node belonging to each category. From Fig. 8, the communication interaction is high in the Instagram layer compared to Facebook and Twitter.

Fig. 8
figure 8

Diffusion rate analysis

5.4 Comparative analysis

This evaluates the performance of the proposed EDER-PFKMA based malicious detection model with the existing literature papers discussed in Sect. 2.

Table 4 represents the efficiency analysis of the proposed framework and the recent related works. Even though (Rahman et al. 2021) and (Mendonça et al. 2020) used machine learning and filtering processes, they did not use the strategy of anomaly score ranking for clustering malicious users in the network. Thus, less efficiency was attained. Although (Choi and Jeon 2021) utilized the attention mechanism to detect fake news in social networks, still it attained 6.27% less efficiency than the proposed EDER-PFKMA framework. This shows that learning graph-based network structure and clustering malicious users based on anomaly score-based ranking improved the efficiency and scalability of the proposed model in detecting malicious users in multilayer social networks than the existing techniques.

Table 4 Comparative analysis

6 Conclusion

This paper proposed an efficient ED-ER and PFKMA based detection model for discovering malicious users from multilayer social networks. The model undergoes six major steps, such as Preprocessing, Graph Construction, Graph Encoding, Graph decoding, Ranking, and Clustering. To address challenging issues, the model devised a graph based learning strategy by employing time aware attention module for gathering user representation data, SS-GAE for avoiding feature corruption, and clustering time for improving network efficiency. In the performance analysis, the performance of the proposed ED-ER, SS-GAE, and PFKMA methods was compared with the existing methods. The proposed PF-KMA method obtained a clustering efficiency of 97.51% to cluster network users, which is higher than the existing methods. Hence, the analysis concludes that the proposed model is highly capable of revealing malicious users in the multi-layer social network. As the method is limited to only three layers, such as Facebook, Instagram, and Twitter, future work will be extended to consider more layers and capture potential anomalies for better detection performance.