Introduction

Recently, the extensive use of the internet coupled with the convenience offered by mobile devices has positioned social networks as the nexus for interpersonal connections [1]. We can now share our ideas, status, location, etc. on social networks in real time and in ways not previously available to us [2]. Their capacity to provide immediate information has made social networks significant instruments for interpersonal communication, entertainment, and marketing [3]. As the number of social network users surges [4], infiltration by malign individuals who seek benefit also rises. These individuals distribute phishing links, propagate misinformation, and sway public discourse. Hence, detecting abnormal behaviors in social networks involves recognizing and countering potential detrimental or malevolent actions, such as web-based scams, impersonation, cyber harassment, and spreading disinformation. Thus, monitoring user activities creates more secure settings for interactions, which is crucial for enhancing user satisfaction, retention rates, and trustworthiness. Moreover, detecting anomalous social behavior ensures adherence to legal stipulations related to user privacy and online security.

Detecting anomalous behavior in social networks often involves analyzing large amounts of user activity data through machine learning and pinpointing patterns that could reveal potentially harmful or malevolent actions. Resolving this issue holds profound implications in many aspects:

  1. 1)

    Identifying harmful content: Recognizing hate speech, extremist content, and online violence in social networks is essential for ensuring a safe communicative environment for users [5,6,7,8].

  2. 2)

    Fraud detection: Anomaly detection applies to various fields, including credit card fraud, insurance deception, and internal threat identification. It facilitates the identification of potentially fraudulent accounts in social networks by scrutinizing deviations in transaction patterns and user access behaviors. This allows for additional investigation or preemptive actions to secure user privacy and protect their assets [9,10,11,12,13,14,15].

  3. 3)

    Misinformation reduction: Applying anomaly detection to social networks is crucial for recognizing and restricting the broadcast of false or misleading information by labeling suspicious content before it spreads to an expansive circle of users [16,17,18,19,20,21]. This improves information accuracy and the reliability of social platforms. It also reduces the adverse effects of false information at personal, organizational, and societal levels.

  4. 4)

    Detecting irregular events: Amalgamating various anomaly detection strategies provides insight into the processes by which anomalous behavioral patterns contribute to abnormal event detection. For instance, the presence of highly provocative language or interactions that specifically target certain groups or issues may be indicators of anomalous events. These interactions can be tracked and examined to reduce their impact [22,23,24,25,26]. Social media platforms are subject to a large number of DoS attacks, which indicate that an individual or organization is attempting to disrupt services by overloading the network resources of the target system, which can be detected and responded to in a timely manner [27].

In this paper, we have selected high-quality papers from the last decade that are directly related to the topic of detecting anomalous behaviors in social networks, which cover the more classic and recent research results to reflect the current research trends and recent advances. This paper provides a summary of new advances and challenges in anomalous behavior detection methods in social networks, by summarising different types of anomalous behavior detection techniques and analyzing them to reveal potential research space and innovations. In addition, it provides a theoretical basis for developing more effective detection tools and strategies to address security threats in increasingly complex and diverse network environments. The contribution of this paper is as follows:

  1. 1)

    This study presents a hierarchical three-layer categorization scheme from the properties of base detection technologies and data types to explore anomaly detection in social networks. The main advantage of this classification is its ability to provide a clear, hierarchical framework for research or analysis. This categorization helps to explore in-depth and identify nuances and linkages between levels.

  2. 2)

    The existing abnormal behavior detection is classified into three categories: Anomaly detection based on user behavior characteristics, based on network topological structure and collaborative fusion. Enables the design and implementation of more specialized detection strategies for different types of anomalous behavior, while facilitating a deeper understanding and analysis of anomalous behavior.

  3. 3)

    Analyzing the existing technologies offers insights into the current technological status, challenges, and future opportunities.

This paper is structured as follows: “Background” provides background information regarding anomaly detection within social networks. In order to organize the paper in a clearer way and to improve its readability and logic, the paper reviews the hierarchical three-tier classification scheme in “Anomaly detection based on user behavior characteristics”, “Anomaly detection based on network topological structure”, “Anomaly detection based on collaborative fusion”, we review the hierarchical three-layer categorization scheme, discussing existing studies and the various dimensions of anomaly detection technologies. “Challenges and Opportunities” describes the challenges and opportunities. Finally, in “Conclusion”, we summarize the whole article.

Background

Statistical analysis, machine learning algorithms, and data mining techniques are used for anomaly detection in social networks to perform integrated analyses of user interaction behavior, content distribution, and network structure. These methods autonomously detect abnormal patterns and potential threats. In addition, anomaly detection includes single-user actions, broader collective behaviors, and irregular phenomena manifesting across the network layer. This section explores the background knowledge of anomaly detection.

Social networks

Social networks are a type of social structure that reflects the interactions and relational dynamics between societal entities, such as friends and colleagues. Since the introduction of electronic mail, which allowed users to send messages through computer networks, subsequent developments have included social networking sites and instant messaging [28]. Social networks have undergone significant development since their inception. What started as primitive online bulletin boards and email systems has transformed into a complex digital terrain of interconnected platforms. Starting from SixDegrees.com, the original social networking site, the popularity of MySpace, and the widespread dominance of Facebook, social networks have become an essential part of global communication. With the advent of mobile technologies, they have undergone further revolution, enabling real-time connections and catalyzing the emergence of platforms such as Instagram, Twitter, and Snapchat. As different social networks offer their users distinctive functions [29]. Recent developments include platforms like the video-sharing network TikTok, the professional networking site LinkedIn, and the specialized communities found on Reddit. Moreover, social networks have influenced how people forge connections, disseminate information, and participate online. The evolution of social networks is propelled by new functionalities, privacy considerations, and the development of innovative platforms. These networks are currently enhancing online social engagement, establishing connections among people, and fostering digital communities. Furthermore, they determine the modes of personal communication and information sharing in the digital era.

Anomalous behavior

Anomalous behavior is a complex concept for which there is currently no universally accepted definition. It is commonly understood by researchers as actions or action patterns within a network setting that are significantly noticeable and different from how most users engage with the network. These behaviors include cyberbullying, phishing, hate speech, stalking, circulating false information or rumors, and involvement in illegal acts. According to the behavioral origins within online social networks, anomalous activities can be classified into internal traffic anomalies and external intrusions:

Internal traffic anomalies

Internal traffic refers to activities occurring within the social network, such as user interactions, content uploads, and data transmission. The presence of significant internal traffic is commonplace for social networks due to user interactions with various platform features and content. For instance, users may upload photos, share posts, send messages, and watch videos. However, internal traffic that is excessive or associated with questionable actions can indicate the occurrence of fraudulent or harmful conduct. For example, a small group of users generating a disproportionately high level of internal traffic could be a harbinger of spam-related or other abusive actions. Similarly, generating a large segment of internal traffic using bots or automated scripts indicates fraudulent behavior. To identify and mitigate these activities, researchers have implemented several technological approaches, notably user log analysis and anomalous behavior detection.

User log analysis is the process of examining and interpreting the data generated from user interactions with a system, application, or website. It entails a detailed examination of user activity patterns to identify deviations from established behavioral norms. The log data is scrutinized to discern patterns and anomalies, thereby providing insights into user behavior and potential irregularities. In contrast, anomalous behavior detection involves applying machine learning algorithms to detect statistically significant deviations in behavioral patterns. This process entails analyzing data to identify irregularities that diverge from established behavioral norms and utilizing advanced algorithmic techniques. In summary, although internal traffic is a common social network aspect, disproportionate or abnormal internal traffic patterns indicate potential fraudulent or malicious activities.

External intrusions

In the context of social networks, external behavioral anomalies (i.e., intrusion issues) involve systematic network or system surveillance to reveal signs of illicit access, abuse, or other forms of nefarious conduct. Intrusion detection systems (IDS), designed for real-time detection and response to security threats, are predominantly categorized into two types: network- and host-based IDS. Network-based IDS monitors network traffic to detect suspicious activities, such as atypical traffic patterns or recognized attack signatures. These IDSs are strategically positioned at critical junctures within the network structure, such as perimeters, to facilitate the analysis of all incoming and outgoing traffic. Furthermore, network-based IDS can be passive or active. Passive IDS only monitors network traffic and generates alerts when suspicious activity is detected. Alternatively, active IDS can enact measures to block or mitigate the identified threat. In comparison, host-based IDS is designed to monitor individual systems, scanning for intrusion signatures such as unauthorized access attempts or system file alterations. It is commonly installed on separate hosts and monitors their activities, including file access, system calls, and user activity. In addition, host-based IDS can detect intrusions that may evade network-based IDS, such as attacks from internal network sources. To identify potential security threats, IDSs utilize several detection techniques, including signature-, anomaly-based, and stateful methods. Signature-based detection involves matching network or system activities to a pre-defined database of known attack signatures, enabling the efficient identification of known threats. However, it may overlook new or unidentified threats.

Anomaly detection

Social network anomaly detection is the process of identifying atypical or suspicious behavioral patterns within social networks. This process primarily focuses on discerning and highlighting behaviors that significantly diverge from the normal patterns of the user majority. These behaviors often raise concerns and are indicators of potentially fraudulent and malicious activities. Anomaly detection within social networks is essential for ensuring user safety, preserving user data integrity, and safeguarding privacy. By integrating data analytics, machine learning algorithms, and statistical models, anomaly detection frameworks can identify unconventional behaviors, patterns, or events on social media platforms. These platforms employ these models to reduce risks and provide a secure and trustworthy social environment for users.

Significant progress has been made in the anomalous behavior detection field. Advanced algorithms using machine and deep learning, natural language processing, and graph neural networks have been developed for detecting anomalous behavior patterns in social networks. These algorithms have proven their accuracy in identifying various harmful behaviors. Additionally, some researchers have developed real-time anomaly detection systems that can intervene and respond to potential risks promptly. Operating in real-time is critical for effectively handling scenarios that require urgent attention and restricting detrimental activities. Consequently, some researchers are exploring the inclusion of multimodal analysis in anomaly behavior detection. This method, which integrates images, videos, and user interactions, aims to provide a more exhaustive analysis of user behavior. Incorporating diverse data sources and patterns markedly improves anomaly detection accuracy and efficiency. Thus, collaboration among researchers, social media entities, law enforcement, and policymakers is essential for enhancing anomaly behavior detection. This joint effort, leveraging diverse expertise, aids in continuously refining detection technologies and harmonizing approaches across different platforms. Furthermore, user feedback systems and reporting tools empower users to actively engage in anomalous behavior detection by reporting suspicious activities, thereby aiding in the recognition and analysis of emerging threats. Following the successful identification of anomalies, early interventions, and preventive actions are crucial for mitigating potential risks and enabling effective and timely responses to harmful behaviors. Anomalous behavior detection is continuously progressing, driven by advancements aimed at enhancing detection capabilities. Therefore, the overarching objective is to cultivate social networks that offer increased safety and inclusivity for the global user base.

In social networks, detecting anomalous behaviors is a research topic of widespread importance, especially in vital sectors such as fraud detection [30,31,32], sentiment analysis [33, 34], and early warning systems. Hence, many scholars have conducted extensive research on this issue. This paper segregates the existing anomaly detection approaches into three primary categories, reflecting the detection technologies’ characteristics and the data diversity. These categories include methods based on user behavior characteristics, network topological structure, and application collaborative fusion.

These three methods have distinct technical characteristics, advantages, and disadvantages. First, the method based on user behavior characteristics is suitable for scenarios with a large amount of text data, such as false information dissemination, spam detection, and cyberbullying. It offers high accuracy and low computational complexity but has poor generalizability. Second, the approach based on analyzing the network’s topological structure excels in settings with minimal noise and substantial user interaction data. Its strengths include high accuracy, adequate scaling capacity, and interpretation clarity. However, its effectiveness is diminished by a lack of contextual understanding and a susceptibility to noise. Finally, the collaborative method based on the first two approaches captures and analyzes anomalous behavior from different perspectives, offering advantages such as high accuracy, scalability, interpretability, and adequate generalizability. Yet, reducing the computational burden and noise are pressing challenges that should be addressed in future studies.This paper presents an overview of these three categories, covering their key concepts and typical methodologies. It evaluates their strengths and limitations, discussing the prospective research avenues in anomalous behavior detection.

Anomaly detection based on user behavior characteristics

Related concepts

User behavior within social networks encompasses the array of activities undertaken by users during their interactions with the platform. These include communicative actions such as publishing textual content and posting comments. The anomaly detection method based on user behavior extracts statistical features from various data sources, utilizing them to develop machine learning models that can identify anomalous behaviors [35]. With this method, features are extracted from the users’ textual posts and related information to distinguish anomalous behaviors from standard ones in social networks. It is based on feature formulation, converting various observed data into a set of feature vectors, and using them on pre-existing models or creating new classifiers, as demonstrated in Fig. 1. Starting from data collection, the data is subjected to data preprocessing and structuring for better analysis. Subsequently, features are selected from the collated data and used to train machine learning models. Eventually, the anomalies in the behavioral patterns are depicted. Employing feature-based strategies can substantially enhance model performance and interpretability. This article offers a summary focusing on two primary aspects: features from textual information and interactive behaviors. The following are related concepts of user behavior characteristics:

1) User text information: User text information refers to the range of text content disseminated by users, including articles and comments. The text is typically structured as a linear sequence of distinct symbols. In the case of Chinese, these symbols comprise Chinese characters, punctuation, and additional elements like numerals, pinyin, and mathematical operators [36]. These symbols, when arranged in sequences, display semantic significance. Thus, analyzing this semantic information facilitates a deeper understanding of the users’ cognitive and behavioral patterns, including thoughts and interactions. In addition, analyzing the textual content created by users is crucial in uncovering anomalous behaviors, predicting user inclinations, and comprehending the broad dynamics characterizing social networks. The findings from this analytical process are vital for an in-depth understanding of user conduct.

2) User interaction information: User interaction information on social media platforms is defined by how users interact and communicate with one another. This involves various activities such as content posting and responses, liking, sharing, following other users, direct message exchanges, joining groups or communities, and participating in discussions and interactive engagements within the network. By analyzing user interaction patterns and employing social network analysis techniques, graph theory, and statistical modeling, meaningful information can be extracted from user interaction data. This provides deeper insight into the structure, dynamics, and user behaviors in social networks.

3) Feature extraction: Feature extraction is a critical process in machine learning and pattern recognition that transforms raw data into a compact and information-rich representation. Moreover, essential traits required for future analysis and modeling are obtained through feature extraction. The features become the cornerstone for anomaly detection, enhancing the detection and forecasting precision.

Fig. 1
figure 1

Anomaly detection based on user behavior characteristics

Typical method analysis

Feature selection based on user textual information

Feature selection based on user textual information mainly involves extracting statistical features from the text content published by users. This process classifies and arranges the textual content or tags in order [37] according to the information’s subject, enabling the textual feature analysis for unusual social behavior detection [38]. The most widely used methods for feature selection include natural language processing, data mining, and dimensionality reduction. These methods are important for detecting cyberbullying, hate speech, misinformation dissemination, and spam identification [39]. Hao et al. utilized user text information to investigate anomalous behaviors [33]. They defined four abnormality types based on data mining and information extraction: behavior displaying aggression, resulting in injury, arresting action, and fatal consequences. In addition, support vector machines (SVM) and trigger words were used to filter sentences with abnormal behaviors from massive data, extracting high-dimensional features. Then, using the extracted features, a behavioral co-construction network was designed. Thus, they proposed an anomalous behavior detection method for online public opinion data.

Analyzing user posts makes it possible to detect false information in social networks. To address the problem of missing texts in the massive user-published text data, Mu et al. constructed an attention self-encoder from existing context vector data descriptions (CVDD) [40]. They adjusted the encoder structure and added a decoder with a multi-head attention mechanism to capture the text distribution. In addition, various dimensional transformation matrices were employed to learn dimensional attribute information. Also, a central clustering loss function was designed to update the attention encoder’s data and text anomaly detection module. These functions were used to fuse normal text embeddings with the embedding space, enabling anomaly detection from normal sentences. Moreover, Al et al. suggested analyzing user-generated content and profile information for anomaly detection research in social networks, using data mining techniques for feature extraction from user text data [41]. The study employed the latent Dirichlet allocation (LDA) model for text data mining, formulating a probability-based method for generating explicit features while utilizing the collapsed Gibbs sampling technique to reveal hidden parameters within text features. The paper also ranked text-related feature themes to establish an appropriate data analysis model and construct an integrated social media content analysis platform. The platform was designed to detect large-scale malicious activities within social networks and predict future extensive malicious actions.

Furthermore, for anomaly detection in text information, researchers have only collected a few key datasets for feature extraction. This limits the ability to comprehensively extract and detect text information. Qasim et al. utilized three level of features, including user-generated content, social graph connections, and user profile activities, to analyze and detect anomalous behaviors that deviate significantly from the norm in large-scale social networks [42]. Drif et al. introduced a method for computing the similarity between extracted entities based on designated keywords and web page titles, aiming to reduce false positives [43]. Existing feature selection research for text data cannot comprehensively extract the core attributes of the texts published by users [44, 45]. Despite the widespread linguistic feature research for facilitating common natural language processing endeavors, such as text classification and clustering, the essential anomaly detection features in text-based contexts have not been fully determined [46]. Notably, word embedding approaches and deep neural networks generate excellent feature representations, improving feature extraction accuracy [42].

A primary benefit of using text information for anomaly detection lies in its scalability, automation potential, and ability to detect subtle behaviors that may not be apparent to human evaluators. In this field, many challenges persist, including interpreting complex and nuanced language, resolving issues posed by cultural and linguistic diversity, and achieving an equilibrium between privacy protection and anomaly detection.

Feature selection based on interactive behaviors

User interaction behaviors on social networks represent the various activities generated when users engage with a social networking platform [47], including comments on other users [48]. Feature selection based on interactive behaviors focuses on discerning the most pertinent features or variables from an array of behaviors that users exhibit while interacting with a system or social media platform. This method includes analyzing user interaction data, such as post publishing frequency, number of likes received, comment quantity, and discussion topic diversity. This helps to pinpoint the features that have the strongest correlation with unusual user behaviors.

Clustering analysis presents a challenge due to the heterogeneous nature of user interactions in social networks. Aljably et al. addressed this by defining behavioral features that are indicative of anomalies, including post frequency, timing, the number of likes and comments, and topic diversity [49]. For post-data alignment, users were grouped using clustering techniques based on these features. Then the models were trained to detect abnormal behaviors. A combined clustering and classification approach was employed to effectively detect users who exhibit anomalous behavior patterns.

While analyzing individual user interaction data, privacy protection issues inevitably arise. To tackle this, Persia et al. introduced a framework that uses a modified interactive structured query language (ISQL) algorithm for detecting advanced events on social networks [25]. First, this model applies local differential privacy (LDP) to create protected synthetic data replicas, including user data cleansing and reconstructed experimental data categorization. Then, it extends a sequential pattern mining technique (i.e., the ISQL algorithm) within the data sequences to synthesize social network data, revealing frequent patterns in events or activities for anomaly detection. This assists with predicting trends or major changes in abnormal user behavior. However, the current anomaly prediction methodologies overlook aggressive behaviors and fail to merge user interaction behaviors with anomalous content, limiting our understanding of the true intentions behind the dissemination of false information in social networks [50].

Utilizing user behavior characteristics for anomaly detection is advantageous due to its deep exploration of essential data attributes and precise tracking of minor behavior changes. This is largely based on real user activities, thereby improving detection accuracy and reliability. However, the disadvantages include the massive user behavior data requirement, reduced model efficacy with noisy or incomplete data, dependency on predefined behavioral features, and substantial computational and storage demands. It may also trigger privacy and data protection issues.

These methods are comparatively analyzed in Table 1. The table summarises research on anomaly detection based on user behavioral features, showing the core algorithms, datasets used, and evaluation criteria for each study. Most researchers tend to adopt Accuracy (ACC), Recall, Precision, and F1-score as the key metrics for model performance evaluation. Accuracy intuitively demonstrates the model’s overall classification accuracy; Recall focuses on the model’s ability to capture samples with actual positive categories; Precision focuses on the proportion of samples predicted by the model to be positive categories that are actually accurate, while F1-score delicately balances Precision and Recall, which is particularly suitable for dealing with datasets with uneven category distribution to ensure a comprehensive assessment of the model’s performance and to avoid bias towards any one category. In addition, some researchers have included APs, FPs, PR curves, ROC curves, AUC curves, etc. in the evaluation metrics to demonstrate model usability.

Table 1 Anomaly detection based on user behavior characteristics

Anomaly detection based on network topological structure

Related concepts

Malicious users disguise their account details to assimilate with genuine ones while engaging in progressively surreptitious behaviors. Consequently, the conventional methodology of deriving features from textual data to construct classifiers cannot precisely identify these covert anomalies [51, 52]. Thus, the anomaly detection method based on the network’s topological structure utilizes its characteristics to detect anomalies. Network topology is the connection arrangement among nodes or entities. Anomalous behavior is likely to produce topologies that deviate from conventional patterns, which can be observed through irregular linkages, clustering phenomena, or atypical centrality measures.

Anomaly detection through network topology focuses on detecting irregular units, edges, or subgraphs within the framework of graph-based data structures [28]. The graphs are used to model social networks and are defined as \(G=(V,E)\), where V is the user set and E denotes the edges in the social networks so that \(e\in E\) and \(e=\{\{u,v\}|u,v\in V\}\). The network topology feature extraction process is shown in Fig. 2. This figure shows how information can be extracted from a graph database, features extracted via a graph convolutional network, and vector representations generated and ultimately used for anomalous behavior detection. The following concepts are related to anomalies based on network topology:

1) Node anomalies: When monitoring for unusual behavior in social networks, users whose conduct substantially strays from the network’s standard behavioral patterns are typically identified as anomalies. Anomalous individuals send spam or perpetrate other harmful activities that are detrimental to other network users.

2) Group anomalies: Group anomaly detection in graph-structured data involves identifying subgraphs where the node interaction patterns are notably chaotic or abnormal compared to the broader network. This includes organized groups in social networks that engage in navy or collaborative fraud schemes.

3) Interaction anomalies: User interaction anomalies within networks can be detected by considering certain edges in the graph data as irregular. An edge is likely anomalous if it has a low probability of appearance or its weight changes significantly over time. The frequency of these edges can correspond to interaction levels among users in the network. For instance, observable shifts, such as changes in communication frequency or notable differences in interaction patterns compared to usual behaviors, are indicative of anomalies in social networks.

Fig. 2
figure 2

Network topology feature extraction

Typical method analysis

Node anomalies

Social network analyses that employ network topological structure regard “points” as nodes or vertices, each representing a single user or entity. Node anomalies are identified when a node’s behavior noticeably diverges from the normal behavioral patterns of the majority. These anomalies are flagged when a user or node shows distinctive behaviors compared to others, signaling a point anomaly. These include unique user behaviors, such as generating unusually large amounts of internal traffic, spamming, or exhibiting abnormal behavioral patterns [35, 53, 54].

The lack of transparency during the data collection phase of analyzing real social network data can lead to inaccurate results. Therefore, Liu et al. proposes a detection method that uses blockchain and smart contract technology to isolate anomalous behaviors into points [55]. The smart contracts are used to define anomalous behavior detection rules. These rules are identified, tagged, and stored in the anomaly chain and automatically trigger alerts. The authors also highlight the benefits of using blockchain and smart contract technology to improve the transparency and accountability of anomaly detection systems, such as ensuring that the data is not tampered with and having a clear audit trail of the detection process. In addition, some traditional approaches’ limitations were addressed and the potential of blockchain and smart contract technologies to improve the accuracy and transparency of anomaly detection systems was highlighted.

Jia et al. explored the use of knowledge graph representations for detecting point anomalies in structured data within social networks [56]. This approach utilizes clustering algorithms to assemble nodes with similar embeddings into groups, examining the relationship patterns relationships among these groups. Nodes that significantly stray from these relational patterns are identified using outlier detection techniques and are classified as anomalies. The paper emphasized the efficacy of knowledge graphs in enhancing anomaly detection precision and introduced a novel anomaly detection methodology for structured data encapsulated by knowledge graphs.

This method bases its detection on data pattern differences and lacks contextual consideration. It is restricted to identifying conventional deviations in group or systemic behaviors and struggles with novel anomalous behavior types. Moreover, it calculates pairwise distances between data points, leading to high computational complexity. To address these challenges, Han et al. presents an incremental parallel algorithm that combines graph partitioning, feature extraction, and anomaly detection to effectively spot anomalies in graph nodes and edges [57]. Notably, it emphasizes the algorithm’s parallelization to boost its scalability and efficiency in handling expansive and dynamic graphs. During the initial stage, the algorithm partitions the graph using a sliding window approach and simultaneously identifies standard and anomalous structures within various subgraphs.

The limitations of node anomaly detection algorithms lie in their sensitivity to the selected local neighborhood’s size. Small neighborhood sizes may overlook important network structures essential to anomaly detection, while overly large neighborhoods may include irrelevant or noisy data, thereby compromising the algorithm’s anomaly identification accuracy. Additionally, even nodes without genuine anomalies may display anomalous statistical patterns due to random fluctuations or measurement errors, potentially leading to false detections. To address the high time complexity of individual node analysis in social networks, researchers often resort to approaches such as clustering for group-based anomaly detection in local network areas, enhancing the overall detection efficiency.

Group anomalies

Group anomalies in social networks describe situations where the behavior of a user group deviates from standard or anticipated behavioral patterns. Techniques like clustering and centrality analysis are employed to identify such anomalies. In social networks, groups can be formed based on different standards, including shared interests, joint affiliations, or similar behavior patterns. Anomalies involving a group of users could indicate uncommon collective behaviors [22], such as substantial internal communication, spamming, or displaying behavioral patterns that differ from the norm.

Zhu et al. presented a novel clustering methodology utilizing anomaly density search for detecting multi-group coordinated fraudsters within social media [32]. Network embedding is a technique for representing nodes and edges in a network as low-dimensional vectors, which can be used for various tasks such as node classification and link prediction. In this study, a bipartite network embedding approach was adopted to model the nodes and edges within the social networks. Then, an anomaly density search was used to detect node clusters that demonstrate unusual behavioral patterns that may be indicative of fraudulent activities. Future research should focus on improving the security of social media platforms to protect users from scams or other fraudulent activities.

Moreover, Hc et al. introduces a technique that employs big data analytics to identify anomalies across multiple communities within social networks [58]. The authors proposed utilizing big data technologies, such as distributed computing and parallel processing to address the limitations of traditional anomaly detection methods in social networks, particularly regarding large-scale data handling and complex user relationship analysis. This solution focuses on large dataset management and multi-community anomaly detection, incorporating graph partitioning, feature extraction, and anomaly detection. This method’s anomaly detection efficiency was validated using several real-world datasets, proving its superior performance in multi-community anomaly detection compared to leading techniques. The results demonstrated the method’s capacity to process extensive datasets and complex user dynamics in social networks, improving anomaly detection accuracy. A notable advantage of this approach is its proficiency in managing large datasets and intricate user relationships in social networks by integrating big data technology. It further enhances the detection precision by including multiple communities.

Node centrality analysis within network topologies is another widely used technique. Wu et al. proposed using trust as a metric for assessing user similarities, contributing to the prevention of anomalous behaviors [59]. Additionally, Wu et al. proposed the collaborative public opinion fraud detection (CPOFD) method, which employs a distinctive metric known as “contrastive suspicion [60].” This metric accentuates the dynamic differences between fraudsters and normal users, focusing on variations in network topology, temporal peak occurrences, and ranking deviations. By applying a density-based subgraph clustering algorithm and decision tree classification, the method effectively segments users in social networks. Furthermore, classifying these clusters further improved the simulated annealing pruning technique, facilitating a more efficient search for near-optimal solutions. This method mainly focuses on utilizing topological connections to detect groups engaged in fraudulent activities more thoroughly.

The methods outlined above enable early anomalous behavior detection and reveal undisclosed patterns within social networks, providing novel insights into the network’s underlying dynamics and structures. These approaches find extensive use in various sectors, including network security, fraudulent activity detection, navy identification, and social network analysis. However, they are less effective when dealing with large, dynamic networks and do not support the continuous analysis of user behavior within social networks. To overcome these challenges, researchers have reframed user interactions as link-related issues within network topology and have focused on analyzing these interactions accordingly.

Interaction anomalies

Within social networks, user interaction includes actions such as liking, commenting, sharing, and messaging. User interaction-related anomalies may include instances where the behavioral patterns significantly deviate from the usual or expected interaction norms within the network. These instances may include an abrupt rise in likes or comments on a post, or a sudden increase in the frequency of messages shared among users. In network topology, these user interactions are represented as linkages among nodes, where the edges are regarded as their embodiment. Consequently, anomalies in node connections correspond to anomalies in user interactions.

The similarity measurement technique between connections is employed in the initial stages of anomaly examination in network topology linkages. Jin et al. describes a novel approach for detecting anomalous user behaviors in social networks using graph similarity [48]. This method begins by generating node embeddings via a graph embedding algorithm, capturing the structural nuances of the social network graph. Subsequently, similarity measurements are used to compare a user’s social network graph with those of regular users, thereby detecting anomalous interactions. Moreover, various graph similarity metrics, such as Jaccard or cosine similarity, can be utilized for these comparisons. Deepak et al. details the development of a novel algorithm, the enhanced graph-based supervised learning algorithm (EGSLA), which was designed to identify false information by measuring similarity using pivotal features extracted from prominent entities in weighted graphs [61].

While similarity measurement-based methods are effective for pairwise comparisons between individual nodes or links, they fail to provide a holistic, global overview of the network’s structure and dynamics. Lin et al. addresses network dynamism by creating a probabilistic graph model that represents the behavior and interaction sequences of source and target users within a dynamic interaction network [31]. The nodes in this graph symbolize individual entities, while the edges reflect their interactions. The study utilizes a temporal graph convolutional network (TGCN) to produce embeddings for these nodes by combining graph embedding and anomaly detection. Additionally, a reconstruction loss function is employed to characterize the structural and relational aspects of user behavior, thereby capturing the intricate local and global interaction patterns among users.

Zheng et al. emphasizes the significance of time as a distinct parameter in graph modeling [62]. The study harnesses the capabilities of attention-based temporal graph convolutional networks (TGCNs) combined with a reconstruction loss function to detect node and edge anomalies within graphs. This dual approach identifies stable and irregular patterns in dynamic graph environments. Furthermore, the paper proposes a versatile and comprehensive end-to-end framework for anomalous edge detection, catering to various graph-based anomaly detection scenarios. Addressing the data gap challenge in node linkages during end-to-end anomaly detection, Zheng et al. utilizes knowledge graphs to fill in the missing connections between nodes [63]. The research introduced “RegPattern2Vec,” a novel method for link prediction within knowledge graphs. It is also an unsupervised learning framework for temporal node behavior modeling and instantaneous anomaly detection. This approach utilizes regular expression patterns to identify significant subgraphs within knowledge graphs. Then, it uses graph embedding algorithms to produce node embeddings that encapsulate the subgraph’s structural and relational details. Finally, these embeddings are used to predict missing links among entities in the knowledge graph.

Anomaly detection based on network topology focuses on analyzing individual-to-individual relationships and the overarching network structure to identify possible anomalous patterns in community structures. This method’s strength lies in its acute sensitivity to group anomalies in large-scale social networks. It can also uncover intricate attack and fraud strategies that anomaly detection techniques focused on user behavior may have overlooked. Notably, this technique does not necessitate pre-training and is capable of detecting new, unknown anomalous behavior types, eliminating the need for manually defining specific rules. The sensitivity of topology-based anomaly detection methods to changes in the network structure is a key disadvantage, as it can lead to false alarms in dynamic and constantly evolving social environments. In addition, these methods require extensive computational resources to handle the complexities of network structures in real-world detection settings. An analytical comparison of these methods, including their various features and limitations, is provided in Table 2. This table provides an overview of network topology-based anomaly detection models, including the name of the model, the core algorithm, the applied dataset, and the evaluation criteria.

Table 2 Anomaly detection based on network topology

Anomaly detection based on collaborative fusion

Related concepts

While representing social networks as graph-structured data, attributes related to nodes or their interconnections tend to be neglected when relying solely on network topology methods. For instance, individuals may have various distinguishing attributes such as their residences, workplace, and age. Similarly, the nature of their interactions or the edges linking users may vary in type, duration, and frequency. The collaborative fusion method involves incorporating these attributes into the analysis and merging them with the network’s topological structure. This enables a more extensive examination of anomalous behaviors. Furthermore, metadata linked to nodes and their connections provides essential ancillary details that assist with discerning normal from abnormal behaviors, substantially boosting anomaly detection accuracy. Although user behavior characteristic-based methods are known for their quick detection capabilities, high accuracy, and sophisticated algorithm structures, they depend on pre-training classifiers and are limited in their ability to detect unknown anomaly types. Network topology-based methods provide a viable solution for this issue, yet certain challenges remain, such as relatively lower accuracy, a sparse network topology, and an oversight of the evolving nature of user behavior characteristics. The process of collaborative fusion is shown in Fig. 3. The figure explains how to start with graph data collection, encode the data through the attention mechanism, and then use the encoder-decoder architecture to further process the data and ultimately obtain a synthesized vector representation to be used in subsequent machine learning tasks.

Fig. 3
figure 3

Anomaly detection based on collaborative fusion

Typical method analysis

As graph data volumes expand, the graph structure’s integrity becomes increasingly critical to the efficacy of graph-based anomaly detection algorithms. Incomplete or noisy graph data can adversely affect the algorithm’s ability to detect anomalies effectively. Hence, anomaly detection methodologies dependent on graph structures often fail to effectively discern complex anomalies involving numerous nodes. Thus, an in-depth comprehension of the underlying network dynamics is necessary. To address this issue, certain researchers constructed social network graphs using user-centric information from knowledge graphs. They further enhanced the graphs’ robustness by incorporating behavioral data, such as user text publications, to address the incompleteness of graph data, augmenting the algorithms’ performance [64].

Table 3 Anomaly detection based on network topology

The network topology measurement approach that relies solely on similarity considerations overlooks node attribute information. This oversight limits the efficiency of anomalous behavior detection in the network. A novel anomaly detection approach for attribute networks that employs a dual autoencoder framework was introduced by Fan et al. [65]. This methodology focuses on learning standard behavioral patterns within the network using the dual autoencoder. Then, deviations from these established patterns are identified as potential anomalies. The framework advances the field by effectively capturing the intricate relationship between network structure and node attributes, while simultaneously ensuring high-quality node and attribute embeddings. This significantly improves the anomaly detection precision.

Yang et al. introduces an approach that trains a Graph Neural Network (GNN) model, using a combination of node- and graph-level features to effectively model social networks [52]. This approach emphasizes representation learning for users and their interactions. At the node level, the model incorporates features specific to users, such as the follower number, post volume, and the sentiment encapsulated in the posts. At the graph level, it considers the social network’s global structural properties, including metrics like graph density and centrality.

A novel dynamic anomaly detection framework that employs residual analysis for time-evolving attribute networks was presented by Xue et al. [66]. This framework focuses on networks that fluctuate over time, such as social networks where new users are continuously added and additional connections are established. This framework defines attributes as supplementary information related to each node or edge within the network, such as the user interaction frequency. The study leveraged minor disturbances between consecutive time points to represent the network’s ongoing evolution for progressive updates. This method is essential for pinpointing behavioral patterns that deviate from the norm, such as unexpected increases in connections or changes in node attributes.

Yasami et al. presented a feature diffusion-aware model that accounts for the spread of features among network nodes, merging this aspect with machine learning algorithms to forecast anomalous activities [67]. The model delineates anomalous behavior in social networks through nonlinear interactions between nodes. They captured this process via feature diffusion among these nodes. Utilizing a diffusion matrix, the model maps feature transmission between nodes and integrates this with machine learning algorithms, enhancing chaotic behavior prediction. This results in a probabilistic graph model tailored to dynamic and complex social networks, enabling anomaly detection through deviation analysis. Furthermore, the model’s performance was assessed using real-world social network datasets. It demonstrated its effectiveness in identifying chaotic behaviors, marking a significant advancement in behavior detection within complex social networks and contributing to the broader understanding and prediction of complex system behaviors.

In addition, Zhu et al. proposed a forensic model that utilizes the Naive Bayes algorithm to analyze user behavior indicating intrusions [68]. This model bolsters the precision and efficiency of intrusion detection and investigations in social networks. It comprehensively assesses various factors such as user engagement, network structural dynamics, and content features. The model significantly enhances the cybersecurity domain, particularly in the areas of detecting and investigating network intrusions. It also serves as a valuable asset for cybercrime investigations. This method employs a collaborative fusion-based approach, utilizing the strengths of node- and edge-related raw data to improve the detection process. This offers a comprehensive and precise mechanism for identifying anomalies. Furthermore, the forensic model considers the individual user’s behaviors and their structural relationships within social networks, efficiently pinpointing malicious activities driven by complex strategies and identifying potential fraudulent groups. This approach enhances the model’s capability to detect a wide array of anomaly types. However, some of the model’s limitations include effectively integrating and balancing these dual feature types, addressing the increased computational demands generated by the fusion, and ensuring that real-time performance is not adversely affected. Table 3 provides a comparative analysis of the social network anomaly detection methods based on collaborative fusion. The table summarises different social network-based anomaly detection models, listing their core algorithms and evaluation metrics.

Compared to anomaly detection based on user behavior characteristics and anomaly detection based on network topological structure, anomaly detection based on collaborative fusion reduces data noise and dimensionality by selecting relevant features while using network topology analysis to reveal complex relationships between entities, thus locating and identifying anomalous patterns more precisely. This not only enhances the generalization ability of the model to adapt to new data environments but also reduces the computational complexity and increases the processing speed.

Challenges and opportunities

When addressing anomaly detection in social networks, the process can typically be divided into three key stages: data processing, feature extraction, and anomaly detection. The challenges in anomaly detection are becoming more formidable due to the variability in user behavior, network dynamics, the multimodal nature of the content, and the rapid update of anomalous behaviors. To cope with these challenges, future development trends will mainly focus on four aspects: optimizing model parameters, incorporating temporal constraints, realizing deep fusion of multimodal information, and constructing detection frameworks that can be universally adapted to diverse anomalous behavior types. Through the implementation of these strategies, we expect to improve the accuracy and efficiency of anomaly detection, adapt to the dynamic changes in the social network environment, and effectively identify and respond to various anomalous behaviors.

Optimizing model parameters to refine feature selection

Optimizing parameters (i.e., variables in an algorithm or model) is a critical step in designing effective anomaly detection methodologies for social networks, as they are essential for determining the results. Adjusting these variables reduces the likelihood of false positives and negatives. While adjusting model parameters can improve performance and anomaly detection, expressing the optimization process for a specific parameter is complex due to each one’s efficacy depending on the research questions and datasets employed. However, optimization presents challenges. For example, the best parameter values tend to be highly specific to the features of the data being analyzed. Thus, the optimization techniques may become unable to achieve global optimum. Therefore, carefully selecting the optimization methodologies and validating their efficacy through relevant evaluation criteria are essential steps before the solution is applied. Future research should focus on the efficient refinement of control parameters and enhancing the selection of classification features to boost the predictive performance of classifiers.

Incorporating temporal constraints to accommodate network dynamics

The increasing complexity of social networks has intensified the requirement for real-time or near-real-time anomaly detection. Adding temporal constraints into the network means that time can be included as an explicit parameter in the model, allowing the network to evolve. By incorporating temporal constraints, capturing temporal dependencies and trends between nodes, and accurately reflecting the network’s dynamic characteristics is feasible. Time constraints can also serve to indicate the causal linkages between events. The methods used by malicious users to spread false information, dispatch spam, or conduct other harmful activities are continually updated. Thus, anomaly detection systems must rapidly adjust. To ensure the network’s dynamics are promptly monitored while avoiding time delays [69], anomaly detection methods must keep pace with network changes, internalize more complex operational environments, and integrate advanced technologies like stream learning or data streaming to guarantee real-time detection within the time constraints. Therefore, rapid and flexible anomaly detection methods are necessary for staying up-to-date with the social networks’ evolving dynamics. Furthermore, adding time constraints is critical to maintaining network dynamism and achieving actual or near-real-time anomaly behavior detection.

Fusion of multimodal information to reduce data limitations

Multimodal data-based user anomaly detection involves using multiple data types (i.e., two or more) to identify unusual behaviors [70]. For specific behaviors that require high detection accuracy, using unidimensional data to identify anomalies presents certain restrictions. Thus, combining multimodal data can provide additional contextual information, improving anomaly detection accuracy and reliability [71, 72]. Furthermore, a complete interpretation of user posts can be achieved by integrating text with visual data, making it easier to identify abnormal situations, such as online harassment, hate speech, or disinformation. In multimodal fusion, data from different modalities are integrated into a cohesive representation or are modeled in sequence. The data from each modality are processed according to a designated sequence or time series. Future investigations should involve combining an expanded array of multimodal user information to increase the modalities’ accuracy with constrained or sparse data.

Designing a versatile anomalous behavior detection framework

An increasing number of users, the anomalous behavior type, and the detection efficacy will likely show an inversely correlated trend with an increase in anomaly types. Existing anomaly detection technologies are effective in identifying known anomaly types. However, they struggle to detect and classify anomalies that have not yet occurred. In subsequent research, an automated mechanism could be developed to gather information from reliable sources to verify the existence of potential anomalous behavior. Therefore, integrating a general framework that consolidates various anomaly detection approaches has the potential to identify different anomalous activities. Applying this framework to social networks is a promising domain for future investigation.

Future research opportunities in social network anomaly detection lie in the effective refinement of control parameters and selection of classification features to improve the predictive performance of classifiers and in the development of detection methods that can quickly adapt to changes in the network, while incorporating advanced techniques to ensure real-time detection within time constraints. In addition, extending the combination of multimodal user information to enhance detection accuracy in the presence of limited or sparse data is also an important research direction.

Conclusion

Anomaly detection in social networks is moving towards methods that integrate user behavioral features with network topological structures through the collaborative fusion approach. This paper discussed and reviewed abnormal behavior detection in social networks. Our exposition began with an introduction to the principal anomaly detection methods. This was followed by the classification and analysis of each respective approach. By categorizing these methods, we provided a detailed comparative evaluation based on the strengths and weaknesses of each approach, enabling the efficient determination of an appropriate detection strategy. Furthermore, the increasing complexity and dynamics of social networks pose significant challenges to anomaly detection, making it a highly demanding and computationally intensive task. In conclusion, we explored the challenges encountered by anomaly detection within social networks and determined several relevant research paths for further investigation.