Community deception: from undirected to directed networks

Community deception is about hiding a target community that wants to remain below the radar of community detection algorithms. The goal is to devise algorithms that, given a maximum number of updates (e.g., edge additions and removal), strive to find the best way to perform such updates in order to hide the target community inside the community structure found by a detection algorithm. So far, community deception has only been studied for undirected networks, although many real-world networks (e.g., Twitter) are directed. One way to overcome this problem would be to treat the network as undirected. However, this approach discards potentially helpful information in the edge directions (e.g., A follows B does not imply that B follows A). The aim of this paper is threefold. First, to give an account of the state-of-the-art community deception techniques in undirected networks underlying their peculiarities. Second, to investigate the community deception problem in directed networks and to show how deception techniques proposed for undirected networks should be modified and adapted to work on directed networks. Third, to evaluate deception techniques both in undirected and directed networks. Our experimental evaluation on a variety of (large) directed networks shows that techniques that work well for undirected networks fail short when directly applied to directed networks, thus underlying the need for specific approaches.


Introduction
Complex network analysis is a powerful technique to model and analyze interactions between entities in complex systems (e.g., protein networks, social networks, signaling networks) (Strogatz 2001).One of the major tasks that can be performed over these networks is community detection, that is, the task of identifying a (non-overlapping) partition of nodes of the network, providing some insights about their structure (Fortunato and Hric 2016).Network analysis tools are routinely used by a variety of actors from data analysts that are interested, for instance, in suggesting items to buy to the users of a network.The problem arises when these spontaneously shared pieces of information are improperly used, as in the Cambridge Analytica case, where private personal information about users and their social relationships were used without their consent, or when information about communities is used to block forms of self-organization (King et al. 2013).Another example is the case of Bitcoin trading, where communities were used to identify multiple addresses belonging to the same user (Remy et al. 2017).
Hence, although community detection is an essential tool for discovering functional building blocks within networks, and to provide insights into the dynamics or modes of formation of networks (Leicht and Newman 2008), the question concerning what disclosing the community structure of networks can cause to the users remains primarily unsolved.The research community started to look into this problem giving rise to a new strand of research dubbed as community hiding (Waniek et al. 2018) or community deception (Fionda and Pirrò 2018).The general idea is to promote (simple) techniques that can be used by the participants to a community that wants to remain below the radar of network analysis techniques like community detection.This problem is particularly critical if who wants to evade community detection tools are malevolent users (e.g., criminals or terrorists) and who want to identify the communities are police enforcement.More formally, given a target community C

Contributions and outline
This paper studies the community deception problem from two different angles.On the one hand, we provide a systematic analysis of the state-of-the-art community deception techniques in undirected networks.On the other hand, we study the novel problem of community deception in directed networks.Specifically, we make the following main contributions: 1.A comprehensive overview of deception techniques in undirected networks under a common framework.2. A study of community deception in directed networks.This problem has not been studied before.We show that dealing with edge directions brings some non-trivial issues since it becomes more involved to tell apart whether a certain category of edge update is convenient deception-wise.In particular, edge directions bring a further intrinsic difficulty toward deception since, in directed networks, only one direction of edges can be managed, that is, edges that C 's members can directly add or delete.3.An experimental evaluation along three main dimensions: performance in terms of deception score, preservation of the community structure, and running time.
As a by-product, we make available a modular Python library where new deception techniques can be easily plugged.
This paper extends a previous paper published in CNA 2021 Fionda and Pirrò (2022).The present paper substantially differs in the following main respects.We expanded the introduction to the community deception problem.We introduce the novel problem of community deception in directed networks (Sect.4).We conducted a completely new experimental evaluation for directed networks, including three novel deception techniques as well as community detection algorithms specifically devised for directed networks.The remainder of the paper is organized as follows.Section 2 introduces the community deception problem.Section 3 reviews the state-of-the-art community deception techniques in undirected networks.Section 4 introduces the community deception problem in directed networks.Section 5 reports on an experimental evaluation.We conclude in Sect.6.

Background
The goal of community deception is to design algorithms to deceive community detection algorithms.In particular, given a community C , the goal is to determine a set of edge updates so that C will not be discovered by community detection algorithms.A network G = (V, E) is an undirected graph that includes a set of n:=|V| vertices and m:=|E| edges.We denote by deg(u) =|{(u, v) ∈ E}| the degree of u.The set of communities (i.e., a community structure), discovered by some community detection algorithm A D is denoted by C ={C 1 , C 2 , ...C k }; C i ∈ C denotes the i-th community.
Given a community C i , we distinguish between intra- community edges and inter-community edges.The set of intra-community edges E(C i ) is the set of edges of the form (u, v) ∶ u, v ∈ C i , where both endpoints are members of C i .The set of inter-community edges Ẽ(C i ) is the set of edges of the form (u, v) ∶ u ∈ C i , v ∉ C i , where one of the endpoint is external to C i .Given a community C i and a node u ∈ C i , we indicate by E(C i , u) (resp., Ẽ(C i , u) ) the set of intra-commu- nity (respectively, inter-community) edges of u.The degree of a community is denoted by: (C i )= ∑ u∈C i (u) , where (u) is the degree of node v. Given a network G = (V, E) , we indicate by E + and E − the set of edge additions and dele- tions, respectively, to be applied on G .Table 1 summarizes the notation discussed above.

Problem statement
Figure 1 reports a general deception framework.Given a network G, the Detector module (implementing a community detection algorithm) analyzes G to discover communities.The underlying assumption that stresses the need for deception techniques is that disclosing (part of) C leads to privacy leaks and should be avoided.The Deceptor module (implementing a community deception algorithm) analyzes the network G and suggests a set of edge rewiring involving nodes in C that help C 's members to be hidden as a group.To find the best set of edge updates, the Deceptor is based on some function to be optimized such as modularity (minimization) as in the case of DICE Waniek et al. (2018), node safeness (maximization) as for SAFDEC (Fionda and Pirrò 2018), or permanence (maximization) as for NEURAL (Mittal et al. 2021).After applying the modifications suggested by the Deceptor and obtaining a new network G ′ , the desideratum is that the Detector by analyzing G ′ is no more able to dis- cover C ; ideally because C 's members are scattered among different communities.In order to quantify the privacy leak

Symbol
Meaning Formula Product of the squared total degree of a community structure Set of nodes reachable from u passing only via nodes in C , excluding u itself Set of intra-community edges of node u belonging to the community Set of inter-edges of node u belonging to the community One way to approach the community deception problem would be to work directly with the deception score H . How- ever, this would require knowing how the community detection algorithm A D , that generated the community structure What is needed is a way to increase H by treating a commu- nity detection algorithm A D as a black box.One can model community deception in terms of the following optimization problem to tackle this challenge.
Problem 2 [Community Deception] Given a network G=(V, E), a target community C ⊆ V and a budget of updates, solving the community deception problem amounts at solving the following optimization problem: where In the above formulation, (G, G � , C) is a function that models a community deception algorithm while the budget limits the number of possible updates.In particular, the function (G, G � , C) computes a numerical value indicating the improvement in the network G ′ (obtained by applying modifications) in terms of the hiding of nodes in C .Ideally, the argmax function selects the network G ′ (and, thus a set of modifications) where the level of hiding is maximized.The crucial difference between the deception function and the deception score H is that the former picks the changes that maximize , while H quantifies (in an axiomatic way) the desirable property that the target community C is hidden inside C = {C 1 , C 2 , ...C k } (Fionda and Pirrò 2018).

Related work
Community deception (Fionda and Pirrò 2018) or hiding (Waniek et al. 2018) studies how to hide a target community C inside a community structure from community detection algorithms.The idea is to find the best (deception-wise) set of edge updates by optimizing some functions.In what follows, we review the state of the art.The notation for the various deception techniques discussed in this section is summarized in Table 1.Waniek et al. (2018) and Fionda and Pirrò (2018) devise deception optimization functions based on (Newman 2006).

Modularity-based deception
Definition 3 [Modularity] Given a network G , the modu- larity of the partition of this network into communities C ={C 1 , C 2 , ...C k } is given by: where = The intuition behind using modularity for deception can be summarized as follows: community detection quality is related to the value of modularity, the higher, the better.Then, by minimizing modularity wrt edge updates performed by C 's members should lead community detection algorithms astray.In particular modularity-based deception maximizes the modularity loss ML=M In what follows, we focus on the approach described in Fionda and Pirrò (2018) since Waniek et al.'s strategy does not always bring a modularity loss and thus can fail to contribute to the hiding of the members of C inside the com- munity structure. (1) Intra-edge addition.The modularity loss of an intra-community edge addition (u, w) s.t.u, w ∈ C i and {u, v} ∩ C ≠ � giving G � = (V, E ∪ {(u, w)}) is the following: Inter-edge addition.The modularity loss of an inter-community edge addition (u, w): brings the following potential modularity loss: Intra-edge deletion.The modularity loss of an intra-edge deletion (u, w) Inter-edge deletion.The modularity loss of an inter-community edge deletion (u, w):

Safeness-based deception
Safeness-based deception (Fionda and Pirrò 2018) has been introduced to correct for some drawbacks of modularitybased deception.In particular, with modularity-based deception, one needs to know the entire community structure to pick the best edge update (that depends on the degree of the community toward which a new edge should be inserted).Safeness-based deception only requires information that can be obtained from C 's members.
Definition 4 (Node Safeness) Let G = (V, E) be a network, C ⊆ V a community, and u ∈ C a member of C .The safe- ness of u in G is defined as: where V u (C) ⊆ C is the set of nodes reachable from u pass- ing only via nodes in C , E(u, C) (resp., Ẽ(u, C) ) is the set of intra-C (resp., inter-C ) edges, ,  > 0 , and + =1.
Definition 5 (Community Safeness) Given a network G = (V, E) and a community C ⊆ V , the safeness of C is defined as: This approach instantiates the function to be the safeness gain C = (C � )-(C) .It adopts a greedy strategy that, at each step, chooses the edge update that gives the highest C .Therefore, the goal is to understand what kind of update is more profitable safeness-wise (Fionda and Pirrò 2018).
Intra-edge addition.An intra-C edge addition (u, w) s.t.{u, w} ⊂ C can increase the safeness of the community only if the edge connects previously disconnected portions of C .The possible safeness gain is: where C u ( C w , respectively) is the connected component of C to which u (w, respectively) belongs before the edge addition.
Inter-edge addition.The best inter-C edge addition (u, w) The safeness gain is: Intra-edge deletion.Intra-C edges deletions do not always correspond to a safeness gain.The best possible intra-C edge deletion (u, w): u, w ∈ C safeness gain occurs when the value of the following formula is maximum.
Inter-edge deletion.An inter-C edge deletion (u, w): u ∈ C, w ∉ C always corresponds to a safeness decrease.Mittal et al. (2021) devised NEURAL, a permanence-based deception strategy, which aims at reducing permanence of the network wrt C .Permanence ( 2016) is a vertex-centric metric that quantifies the containment of a node u in a network community C:

Permanence-based deception
where E max (u) is the maximum number of connections of u to the same neighboring communities, C in (u) the fraction of actual and possible number of edges among the internal neighbors of u.The permanence for a network G is then defined as Perm(G) = ∑ u∈V Perm(u) �V� . NEURAL instantiates the f u n c t i o n t o b e t h e p e r m a n e n c e l o s s Intra-edge addition.An intra-community edge addition (u, w) s.t.u, w ∈ C i and {u, v} ∩ C ≠ � does not always ensure P l > 0. The possible permanence loss for node u (a similar loss can be also computed for node w) is: Inter-edge addition.Adding an inter-community edge (u, w) where u ∈ C i ∩ C and w ∈ C j , such that C j ≠ C i , always results in P l > 0. The loss is more if C j is the community that provides the maximum external pull for node u.In such a case, the permanence loss is: Intra-edge deletion.An intra-community edge deletion (u, w) s.t.u, w ∈ C i and {u, v} ∩ C ≠ � , always gives P l > 0 Inter-edge deletion.Deleting an inter-community edge (u, w) where u ∈ C i ∩ C and w ∈ C j such that C j ≠ C i never results in P l >0.
Other pieces of work (e.g., Nagaraja 2010; Magelinski et al. 2021;Liu et al. 2019) have studied a different problem, that is hiding (or at least changing) the whole community structure instead of a target community.Another line of research (e.g., Jia et al. 2020) has studied countermeasures, that is, the robustness to attacks.
We observe that all these pieces of related work have focused on undirected networks, although many real-world networks (e.g., social networks like Twitter) are directed.As pointed out by fundamental studies (e.g., Leicht and Newman 2008;Malliaros and Vazirgiannis 2013;Fortunato 2010), edge direction can play a fundamental role in revealing more accurate communities in networks.Moreover, devising direction-oblivious deception techniques (e.g., treating directed networks as undirected) to escape from direction-aware detection algorithms like leiden (Traag et al. 2019) may undermine the overall goal of protecting a community in a directed network.

Deception in directed networks
In this section, we study the community deception problem in directed networks.Although many real-world networks (e.g., Twitter) are intrinsically directed, state-of-the-art deception techniques have only focused on undirected networks.One way to solve the problem would be to ignore edge directions simply.However, this is limiting for at least two reasons.First, meaningful information about edge direction is discarded.In a network like Twitter, the fact that A follows B does not imply that B follows A. Second, as community detection has evolved to take into account edge directions (e.g., Leicht and Newman 2008;Traag et al. 2019), we believe that community deception should evolve to play a fairer game.
By referring to the framework reported in Fig. 1, in this new setting, the input network G is now a directed network, the Detector module can implement specific community detection algorithms proposed to work on directed networks, and the Deceptor will take edge direction into account when suggesting a set of directed edge updates.Therefore, the Deceptor can only consider edge additions and deletions whose source node is in the target community.
We show how to derive the counterpart for each of the three main deception strategies available for undirected networks in the directed case.Moreover, to understand the importance of taking edge directions into account and the need for introducing deception approaches specifically tailored to work on directed networks, we will discuss the network reported in Fig. 2. The figure reports the same network when edge direction is considered (Fig. 2(a)) and when it is neglected (Fig. 2(b)).We suppose that the Detector has identified two communities and the target community is C 1 = { 0, 1, 2, 3, 4 } and it has been completely disclosed.

Directed modularity-based deception
In this section, we investigate how the modularity-based deception analyzed in Section 3.1 can be adapted to work on directed networks.In particular, we consider a slightly modified version of the metric described by Leicht and Newman (2008).The notation used to define directed modularity is summarized in Table 2.
Definition 6 Let G = (V, E) be a directed network, the directed modularity of the partition of this network into communities C={C 1 , C 2 , ...C k } is given by: w h e r e = In terms of the general deception formulation (see Definition 2) the function can be instantiated to be the directed modularity loss ������� ⃗ We will analyze the impact of the different types of edge updates on the directed modularity loss.
Example 7 Consider the network in Fig. 2. The directed modularity of the network in Fig. 2(a) is: If we consider now its undirected version, reported in Fig. 2(b), we obtain the following modularity score: From this simple example, it is clear how neglecting edge directions can significantly change the sense of "goodness" assigned to a community structure.Indeed, in the case of directed modularity, the obtained value is lower than 0, indicating the absence of a community structure.In contrast, in the undirected case, the value is higher than 0, indicating the possible presence of a community structure.

Intra-edge addition
An intra-community edge addition (u, w) s.t.u, w ∈ C i and {u, v} ∩ C ≠ � giving an updated network G � = (V, E ∪ {(u, w)}) does not always correspond to a directed modularity loss.Indeed, the modularity loss is: With few algebraic manipulations, we obtain the following inequality ������� ⃗ Since the last term in the bracket is negligibly small, we can conclude that ������� ⃗

Inter-edge addition
An inter-community edge addition (u, w) does not always correspond to a directed modularity loss.Indeed, the directed modularity loss is: � ⃗  Product of the total output degree and total input degree of a community structure By some algebraic manipulation, this can be reduced to: Then, we can conclude that ������� ⃗ ML is positive if and only if the term inside square brackets is positive.That is if the following inequality holds:

Intra-edge deletion
An intra-community edge deletion (u, w) s.t.u, w ∈ C i and {u, v} ∩ C ≠ � giving an updated network G � = (V, E ⧵ {(u, w)}) does not always correspond to a directed modularity loss.Indeed, the modularity loss is: With few algebraic manipulations, we can conclude that Such inequality shows that ������� ⃗ ML will be positive if the term inside the brack- ets is positive and thus

Inter-edge deletion
An inter-community edge deletion (u, w) does not always correspond to a directed modularity loss.Indeed, the modularity loss is: By looking at the above formula, we can conclude that ������� ⃗ ML is negative if the term inside the square bracket is negative, that is:

Directed Safeness-Based Deception
Starting for the Safeness defined for undirected networks (see equation 2 and equation 5) we can study safeness in directed networks.The notation used to define directed safeness is summarized in Table 3.
We start by defining the safeness of a node in the directed case.
Definition 8 Consider a directed network G = (V, E) , a com- munity C ⊆ V , and a node u ∈ C .The directed safeness of u in G is defined as: The above formula split the two components of undirected safeness to take into account edge directions.Indeed, the V u (C) of the undirected case is split into two terms V u o (C) and V u i (C) that consider the portion of nodes in C that can be reached from u via directed paths originating from u and directed paths terminating in u, respectively.Similarly, E o (u, C) ( E i (u, C) , respectively) indicates the set of incoming (outgoing, respectively) edges linking u to other members of C ; Ẽo (u, C) ( Ẽi (u, C) , respectively) indicates the set of incoming (outgoing, respectively) edges linking u to nodes not in C ; and, o (u) ( i (u) , respectively) indicates the outgo- ing (incoming, respectively) degree of u.
Example 9 Consider again the network reported in Fig. 2 and let = = 1 2 .To discuss safeness, we will focus on node 1; in the directed network, it is connected by two outgoing edges to 2 and 8 and by three incoming edges from nodes 0, 3, and 5.Then, if we compute the directed safeness score of node 3, we obtain the following: If we compute the safeness score of node 1 on the undirected version of the network, we obtain: On the one hand, in the directed version of the network, node 1 can reach all the community nodes via outgoing and incoming paths.On the other hand, one out of the three incoming edges of node 1 is an inter-community edge; and one of the two outgoing edges is an inter-community edge.
When it comes to the undirected version of the network, it does not matter the direction in which the information can be transmitted; all edges are treated in the same way meaning that node 1 can reach all the other four nodes in C 1 by mean of its three intra-community edges and has two out of five inter-community edges.This causes the decrease of the safeness score from 0.73 to 0.325, meaning node 1 is more subjected to be discovered as a member of C 1 in the undirected version w.r.t the directed one.
Similar to the case of undirected networks, the directed community safeness can be defined by averaging the directed node safeness of the nodes in C .
Definition 10 Given a directed network G = (V, E) and a community C ⊆ V , the directed safeness of C in G is defined as: In terms of the general deception formulation (see Definition 2) this approach instantiates the function to be the directed safeness gain Then, in the follow- ing, we will discuss the impact of the different types of edge updates on the directed safeness score.

Intra-edge addition
An intra-C edge addition (u, w) s.t.u, w ∈ C giving an updated network G � = (V, E ∪ {(u, w)}) does not always introduce a directed safeness gain.Indeed, after the addition of the edge (u, w), if w ∉ V u o (C) , u will be able to reach w and all the nodes in V w o (C) ⧵ V u o (C) .The same holds for w that will be able to reach u and all the nodes in Then, the possible increase of the safeness score for the nodes u and w is the following: Obviously, such edge addition always results in a safeness decrease if w ∈ V u o (C) and u ∈ V w i (C).

Inter-edge addition
Any inter-C edge addition (u, w) s.t.u ∈ C and w ∉ C giv- ing an updated network G ′ =(V, E ∪ {(u, w)}) always corre- sponds to a directed safeness increase.Indeed, the directed safeness node increase for u is that is always greater or equals to 0 since o (u) ≥ | Ẽo (u, C)| .Note that the maximum increase in directed safeness happens for all the nodes u such that u ∈ argmin{

Intra-edge deletion
An intra-C edge deletion (u, w) s.t.u, w ∈ C giving an updated network G � = (V, E ⧵ {(u, w)}) does not always bring a directed safeness gain.Indeed, let V u− o (C) ( V w− i (C) , respectively) be the nodes of C that cannot be reached by following directed paths originating from u (ending in w, respectively) after the deletion of the edge (u, w).Then, the possible increase of the safeness score for the nodes u and w is the following: Obviously, such edge addition always results in a safeness increase if V u− o (C) = � and V w− i (C) = � , that is u and w will be able to reach exactly the same C 's members if (u, w) is deleted.

Inter-edge deletion
Any inter-C edge deletion (u, w) s.t.u ∈ C and w ∉ C giv- ing an updated network G ′ =(V, E ⧵ {(u, w)}) always corre- sponds to a directed safeness decrease.Indeed, the directed safeness node increase for u is Maximum number of edges originated from u connecting u to a neighbour community Maximum number of incoming edges of u connecting u to a neighbour community Page 11 of 24 74 that is always greater or equals to 0 since o (u) ≥ | Ẽo (u, C)|.

Directed permanence-based deception
This section shows how permanence can be adapted to directed networks.The notation used in this section is summarized in Table 4.We start by defining directed node permanence.
Definition 11 Let G = (V, E) be a directed network, and u ∈ V a node in G.The directed node permanence of u is defined as: where E o (u, C u ) and E i (u, C u ) denote the internal outgoing and incoming connections of u within its own community C u resp., E max o (u) and E max i (u) the maximum number of out- going and incoming connections of u to its neighboring communities resp., , where are the neighbors of u, the fraction of actual and possible number of edges among the internal neighbors of u (i.e., the clustering coefficient among the internal neighbors of u, where (u) indicates the total degree of u).As in the case of undirected networks, also for directed networks for all vertices u that do not have any inter-outgoing and/or incoming connections permanence is considered equal to the clustering coefficient, i.e., �������� ⃗ Perm(u) = C in (u) .Moreover, if the total number of internal outgoing and incoming connections of u is less than 2 the clustering coefficient C in (u) is set to be 0.
Example 12 Consider again the directed network reported in Fig. 2(a), the directed permanence of node 1 is the following: If we consider the undirected version of the same network, the permanence of node 1 will be the following: Then, for this example, the permanence of node 1 in the directed version is higher than the permanence in the undirected case, meaning that in the undirected version, node 1 is less committed to staying in C 1 than that in the directed version.
The directed permanence of w.r.t. a target community is defined as: Definition 13 Given a directed network G = (V, E) and a target community C , the directed permanence of C in G is defined as: In terms of the general deception formulation (see Definition 2) this approach instantiates the function to be the directed permanence loss �� ⃗ Then, in the following we will discuss the impact of the different types of edge updates on the directed permanence loss:

Intra-edge addition
An intra-community edge addition (u, w) s.t.u, w ∈ C i and {u, v} ∩ C ≠ � giving an updated network G � = (V, E ∪ {(u, w)}) does not always correspond to a directed permanence loss.In the following we will analyze the directed permanence loss for u, a similar reasoning will apply to w.Then, the directed permanence loss of u is: The first term (i.e., ) is always lower or equals to zero since o (u) ≥ I o (u) .The second term (i.e., (C in (u) − C � in (u)) ) can be lower or greater than 0 depending on how the clustering coefficient change after the edge addition.
Note that the addition of the edge (u, w) will also increase the directed permanence of all the nodes v that have both u and w in its neighborhood, since their clustering coefficient C in (v) will increase.

Inter-edge addition
Any inter-community edge addition (u, w) s.t.u ∈ C i ∩ C and w ∈ C w ≠ C i giving an updated network G ′ =(V, E ∪ {(u, w)}) always results in a directed permanence loss.Consider first the case in which E max o (u) does not change after the edge addition, then the directed permanence loss of u is: that is always greater or equals to 0 since I o (u) is at least 0.
Consider now the case in which E max o (u) changes after the edge addition, then the new value will be E max o (u)+ 1 .In this case the directed permanence loss of u is: that is always greater than 0 since it holds that Moreover, note that the maximum permanence loss is obtained in the second case when the edge is added to the community toward which u already has the maximum number of edges.

Intra-edge deletion
An intra-community edge deletion (u, w) s.t.u, w ∈ C i and {u, v} ∩ C ≠ � giving an updated network G � = (V, E ⧵ {(u, w)}) always results in a directed perma- nence loss.Indeed, after the deletion of the edge (u, w), , i (w) will be decreased by 1.Consider first the directed permanence loss of node u (a similar reasoning will apply also to node w).We will restrict the analysis to edges whose deletion will decrease C in (u) , to the new C � in (u) ≤ C in (u) .Then the directed permanence loss of u is: that is always greater or equals to 0 since o (u) (u).Note that the deletion of the edge (u, w) will also decrease the directed permanence of all the nodes v that have both u and w in its neighborhood, since their clustering coefficient C in (v) will decrease.

Inter-edge deletion
Any inter-community edge deletion (u, w) s.t.u ∈ C i ∩ C and w ∈ C w ≠ C giving an updated network G ′ =(V, sE ⧵ {(u, w)}) never brings a directed permanence loss.Consider first the case in which E max o (u) does not change after the edge deletion, then the directed permanence loss of u is: that is always lower or equals to 0 since |E o (u, C i )| is at least 0.
Consider now the case in which E max o (u) changes after the edge deletion, then the new value will be E max o (u) − 1 .In this case the directed permanence loss of u is:

Directed deception in practice
In this section, we will analyze the behavior of the different deception strategies on the synthetic network reported in  In the following, we suppose that the target community is C 1 = { 0, 1, 3, 4, 5 } and it has been completely disclosed.Moreover, we will consider a budget of update = 4.Effect of directed modularity deception.The modularitybased deceptor on the directed network in Fig. 3(a) will suggest, in order, the following edge updates: 1. Inter-community edge addition (0, 8); 2. Inter-community edge addition (3, 6); 3. Inter-community edge addition (1, 7); 4. Inter-community edge addition (4, 6); If we run the same detector on the network obtained after applying the four modifications suggested, we obtain the network reported in Fig 3(b).As it can be noted, the detector identifies three communities, and the target community's members are spread between C 1 and C 2 .
Effect of directed safeness deception.
The safeness-based deceptor on the directed network in Fig. 3(a) will suggest, in order, the following edge updates: 1. Inter-community edge addition (4, 10); 2. Inter-community edge addition (3, 6); 3. Inter-community edge addition (1, 10); 4. Inter-community edge addition (4, 9); If we run the detector on the network obtained after applying the four modifications suggested, we obtain the network reported in Fig 3(c).As it can be noted, the detector identifies two communities with the target community's members spread in both of them.
Effect of directed permanence deception.
The permanence-based deceptor on the directed network in Fig. 3(a) will suggest, in order, the following edge updates: 1. Intra-community edge deletion (1, 2); 2. Intra-community edge deletion (3, 1); 3. Inter-community edge addition (0, 5); 4. Intra-community edge deletion (0, 1); If we run the detector on the network obtained after applying the four modifications suggested, we obtain the network reported in Fig 3 (d).As it can be noted, the detector identifies three communities with the target community's members spread in two of them.
We want to point out that there is a slight difference in the interpretation of intra-and inter-community edge updates among the three different deception strategies described in the previous section.Consider the case in which the target community in the network in Fig. 3 is C = {0, 1, 3, 5, 8} .Then, directed modularity and directed permanence categorize intra-and inter-edge updates by considering the community structure identified by the detector, meaning that, for example, the addition of the edge (1,5) would be considered as an inter-edge addition that could decrease both modularity and permanence, while, of course, it is not a good update w.r.t. the target community C .Instead, the directed safe- ness does not suffer from this problem since it always looks at intra-C and inter-C edge updates.Thus, when applying modularity and permanence deception algorithms, one should also check that the intra-and inter-community edge updates suggested meet the requirement related to the identity of the target community C .By considering the example network reported in Fig. 3 such differences cannot be appreciated since the target community is completely revealed.

Experimental evaluation
This section reports on an experimental evaluation of the community deception approaches devised for directed networks.We set three main goals.The first one is to assess the feasibility of community deception approaches in directed networks.In particular, we want to gain some insight into how our approaches, which rework the state of the art in a directed network context, are effective.The second goal is to compare community deception approaches for directed networks with the state of the art that has focused on undirected networks.The comparison will shed further light on our novel techniques' effectiveness in hiding capabilities.The third goal is to assess the scalability, in terms of running time, of our novel approaches and make a parallel with deception in undirected networks.We also measure the impact of the deception strategies on the whole community structure by measuring the similarity between communities before and after deception.In what follows, we describe the experimental setting (Sect.5.1), the datasets (Sect.5.2), and then report on the experimental results.The algorithms have been implemented in Python.Code and datasets are available online2

Experimental setting
To introduce the experimental setting, we refer to the general framework outlined in Sect. 1 and provide details about the actors involved.

Detectors
We considered a variety of community detection algorithms (detectors) that will act as adversaries to the deception techniques.To make the comparison meaningful for our context, we focus on approaches that work on directed networks.We considered the following algorithms available in the cdlib library:3 • Leiden (Traag et al. 2019) (leiden): a community detection algorithm that corrects for some issues of the Louvain algorithm (Blondel et al. 2008) and can work on directed networks.• Directed modularity (Leicht and Newman 2008) (dm): this algorithm is an extension of the modularity maximization algorithm devised for undirected networks (Newman 2004).• Surprise community (Traag et al. 2015) (surprise): this algorithm uses the notion of asymptotic surprise, which assesses the quality of the partition of a network into communities.• InfoMap (Rosvall and Bergstrom 2008) (infomap): a detection algorithms that leverages information theory (the shortest description length for a random walk) to return a community structure.• Gemsec (Rozemberczki et al. 2019) (gemsec): an approach that leverages random walks to approximate the point-wise mutual information matrix obtained by pooling normalized adjacency matrix powers.This matrix is decomposed by an approximate factorization technique which is combined with a k-means-like clustering cost.

Deceptors
To tackle community deception in directed networks, we considered the following two categories of deceptors: • Approaches devised for undirected networks: to make the experiments possible for these approaches, we treat the directed network under consideration as undirected.We considered: -Delete Internal Connect External (Waniek et al. 2018) (DICE): this community deception algorithm is based on the heuristic of deleting intra-community edges and adding inter-community edges.DICE is based on the assumption that such kinds of edge updates always minimize modularity.-Modularity Minimization (Fionda and Pirrò 2018) (modMin): this approach corrects for some issues with DICE; the authors of modMin showed that in some cases, DICE fails to perform edge updates that minimize modularity.-Safeness-based deception (Fionda and Pirrò 2018) (SAF): this approach introduces safeness maximization for community deception.-Permanence-based deception (Mittal et al. 2021) (NEUR): this approach is based on permanence minimization.-Random edge updates (RND): we consider an approach that randomly selects both the type of update and the endpoints of the edge addition/deletion.
• Approaches devised for directed networks: for this category of deceptors we consider edge direction.In this case, we considered all the novel approaches described in the present paper.

Datasets
As this paper aims to introduce deception for directed networks, we focused on various real directed social networks from a wide range of domains.These networks are available online 456 .Table 5 gives an overview of the networks considered.The table also reports, for each network, the number of communities found by the Detectors considered.We note that some of the detectors could not complete community detection on the more extensive networks after a timeout of 3h.

Evaluation methodology
To test deception algorithms, we refer to the methodology introduced in our previous work (Fionda and Pirrò 2018).As an indicator of performance, we measure: • Deception Score: this score, which ranges between 0 and 1, combines a measure of reachability preservation among C members, community spread (in how many communities are C 's members spread), and community hidings ( C 's members should be included in the largest communities) (Fionda and Pirrò 2018).• Normalized Mutual Information(NMI) (Danon et al. 2005): this is a measure that we use to check how deception affects the original community structure.In particular, given the community structure before deception C and the community structure after deception C ', we have NMI(C, C � ) ∈ [0, 1].
• Running time: we also measured deception running time for the various algorithms without considering the time to find communities.
Related pieces of work (Mittal et al. 2021) considered community spread and community hiding separately.However, we believe that also reachability is relevant and that a good deceptor should be evaluate on all the above components simultaneously.
To pick the target community C , we looked at the dis- tribution of the size of the communities.For each detection algorithm, we considered different C (one for each experiment round) having sizes close to the center of the distribution.Experiments have been conducted on a PC i5 CPU with 3.0 GHz (4 cores) and 16GBs RAM.The results reported are the average (95% confidence interval) of 5 runs.

Evaluating directed community deception
We start with a discussion about the performance of the novel community deception approaches for directed networks presented in terms of deception score and NMI.

Deception score
Figure 4 shows the results in terms of deception score for medium-size networks.The figure reports, for each column, the network considered, and for each row, the deception score measured as the capability of deceiving a specific detection algorithm.By looking at this figure, we make the following observations.
On small networks (e.g., Facebook), obtaining larger values for the deception score seems easier.In general, the deception score always reaches a value greater than 0.5.This can be considered a reasonably good value because the initial deception score was 0 (that is, C was completely revealed).The deception score increases as the number of edge updates increase; this is consistent for all deception algorithms, detection algorithms, and networks.dsaf performs better than the other algorithms in almost all settings.One exception is the leiden detection algorithm, where dmod seems to perform slightly better than dsaf.dper seems to be the less performing detection algorithm.We looked into the deception score's community spread and reachability components to shed light on this behavior.In several cases, edge updates suggested by dper result in internal edge deletions that result in a disconnection of the community when the number of edge updates increases.gemsec seems to be a relatively robust detection algorithm for all three deception approaches.This is especially true in the Anybeat network where, when the number of edge updates is below 60% of the number of edges in C , the deception score remains quite low. Figure 5 shows the results for the largest networks considered.We note that only leiden and dm were able to complete the community detection task within a timeout of 3h.Even in larger networks, it can be observed that larger budget values (x-axis) correspond to larger values of the deception score.In particular, with a budget equal to 60% of the total number of edges in the community in all cases, the deception score is greater than 0.5.We recall that experiments were conducted in the worstcase scenario with a deception score pre-deception equal to 0 ( C completely revealed).However, it is reasonable to assume that the initial deception score is larger; in reality, when deception algorithms are applied, C is not com- pletely revealed.The larger the network, the more difficult it becomes the hide.With the same budget percentage, the results in terms of deception score are lower.By further digging out on the results, we observe that for Epinions the number of communities found by leiden and dm is 795 and 896, respectively.The average size of the C considered in the experiments is around 400. hence, with 160 updates, dsaf can achieve a score greater than 0.5.
On the largest network, google, the number of communities found by leiden is 2105, and the average size of the C considered in the experiments is 500.In this case, with 300 updates, dsaf can reach a deception score value greater than 0.5.We again note the leiden was the only algorithm able to complete the detection task within a 3h timeout.
Even in this case, we observe that the less effective system is dper, which is around 20% and 15% less performing than dsaf and dmod, respectively.One interesting case is the Academia social network, where nodes represent members that follow other members (hence the network is directed).On the one hand, we observe that dsaf even with lower budget values can achieve a significant result than the other approaches.On the other hand, we observe that dsaf seems to reach a saturation point where further edge updates do not add any benefit.The same is not true for dmod and dper, the performance of which has a significant increase when moving from a 30% to a 60% budget.Here, the average size of C is 100.
By considering the results from the perspective of detection algorithms, we observe that leiden is more robust to the deception strategies than dm.The reason for this can be found in the fact that leiden finds communities by ensuring that they are well-connected, while dm is an adaptation of modularity optimization to the directed case.Interestingly, the direct competitor of dm would be leiden, also based on directed modularity.However, dsaf consistently performs better both on medium and large networks.
One final observation that we make is related to the characteristics of the community C considered and the category of edge updates most performed by the deception strategies.We note that the lower the number of intra-C edges, the easier it becomes to hide it.This sounds natural as more intra-C edges reinforce the notion of community itself, thus making it challenging to separate nodes within, increasing the deception score.

Normalized mutual information (NMI)
The second dimension of the evaluation considered concerns the impact that community deception has on the original pre-deception communities found.Figure 6 reports the values of the NMI.Each column represents a deception algorithm where the x-axis represents one of the networks and the y-axis the value of NMI.
The values of NMI for medium networks are always around 0.8, meaning that most of the community structure is preserved after applying deception.More specifically, dsaf appears to be the deception algorithm that better preserves the community structure, followed by dmod.The detection algorithms that seem to suffer less from deception in terms of NMI are surprise and infomap.On larger networks (Fig. 6 (b)), the NMI values are a bit higher in general.Still, we have that dper is the deception approach that most changes the community structure.To shed more light on the results, we investigated the relationship between the number of communities before and after deception (referred to as Δ ). Figure 7 reports the results for medium size networks when the budget is set to 60% of the number of edges in C .We note that the number of communities after deception decreases; this is always true for the leiden, infomap and gemsec detection algorithms.
By relating the Δ reported in Fig. 7 and the initial num- ber of communities found by each detection algorithm and reported in Table 5 we observe that for the Anybeat network, the number of communities significantly increases (the initial number was 126 for leiden and becomes 167).For the surprise detection algorithm, the larger number of communities is observed in the email network (almost 100 additional communities) for dsaf.In this case, dmod and dper decreased the overall number of communities.By looking at the deception score related to this case (see Fig. 4), we note that dsaf obtained a score higher than dmod and dper.The explanation for this improvement is the significant change in the number of communities, which in turn corresponds to an increase in the community spread, that is, the number of communities where C 's members are scatted; indeed, this value went from 1 (the initial setting) to 67.A similar observation can be made for the WikiVote network and the leiden detection algorithm; here, dsaf added a larger number of communities that resulted in a more significant deception score than dmod and dper.
Figure 8 reports the community variation for larger networks.We observe a similar behavior; the larger the number of new communities after deception, the larger the deception score.This is especially true for Epinion and Social-Net for the leiden detection algorithm and the dsaf deception algorithm.We note that dsaf reaches a deception score of 0.7 (see Fig. 5).

Comparison with undirected deception
We now compare our novel approaches for community deception in directed networks with the state of the art.These approaches were not designed to work on directed networks, which sounds like a limitation.Indeed, several real-world networks, as those considered in our evaluation, are directed, which underlines the importance of adding directions in social network relations.For example, the Academia network represents follower-followee relations that naturally carry a direction.To run the experiments with the state-of-the-art deception algorithms, we treated the networks as undirected and considered the same C .In these experiments, we focus on a budget of updates equal to 60% of the edges of C as this configuration worked best for all approaches.In what follows, we report on comparison in terms of deception score and running time.

Deception score
We compare the deception score of directed and undirected community deception approaches on both medium and large networks.Figures 9 and 10 report results on medium-size networks.The figure considers for each column a detection algorithm.Moreover, the x-axis represents a network in each subfigure while the y-axis is the deception score.
In almost all networks, the directed approaches perform better than the undirected approaches.This is true for all detectors but leiden.Here, we observe that for the Facebook network, the undirected approaches (excluding the random edge update approach) perform better.To shed more light on this aspect, we looked into the difference between the number of communities after and before deception.In this case, the undirected approaches introduced a larger number of communities than the directed ones.This relation between the number of communities after deception and deception score was also observed when focusing on directed approaches alone (see Sect. 5.4.2).
When moving to the large networks (Fig. 11) we note a clear superiority of the directed approaches.This is especially true for the Epinions network.When considering leiden, the best performing approach was dmod while with leiden, dsaf obtained slightly better results.One crucial observation is that the undirected algorithms performed significantly worse, reaching in only a few cases a deception score greater than 0.5.As one would expect, the worst-performing deceptor is RND, which adds/remove edges randomly starting from C 's members.Also, NEUR seems to perform worse than other undirected approaches.To shed more light on this behavior, we looked again at the changes in the community structure and the structure of C ; even in this case, we observed that NEUR frequently performs internal edge deletions that disconnect C .

Deception with ground truth communities
In this section, we want to investigate the impact of deception and detection techniques on networks for which the ground truth communities are available.To do so, we do not generate artificial networks and communities but resort to a real-world network of emails for which the communities are available.We are aware that Peel et al. (2017) observed that working with planted communities does not reflect the true data generating process for real networks, which is typically unknown.However, we still believe that the analysis can shed light on how detection algorithms abefore and after applying deception approach these communities.We considered the email available from the SNAP repository7 , which represents communication between members of an For this experiment, we proceeded as follows.Given a community detection algorithm D , we considered each of the communities returned as the target community C ; the number of communities is reported in Table 5.Then, we considered a budget of updates equals to the 50% of the number of nodes in C .In this setting, we have three sets of communities: (i) C G : ground-truth communities; (ii) C B : communities returned by D before applying a decep- tion algorithm; (iii) C A : communities returned by D after applying the deception algorithm.We measured the average deception score and NMI values.
Table 6 reports results in terms of deception score.From the table, it emerges that the performance of deception algorithms is consistent with results observed in Fig. 10 where a smaller number of communities (one for each of the 5 experiment rounds) were tested as C .Even in this case, dsaf outperforms all the competitors, with approaches devised for undirected networks offering inferior performance.An interesting case is rnd, which performs worse than before.This indicates that the deception strategies do not heavily depend on the particular community chosen.However, we noticed that when the size of C is small, it is, in general, easier to obtain larger values of deception.
We now discuss the different values of NMI score, starting from the analysis of the difference between groundtruth communities and communities returned by a detection algorithm, that is, communities before applying deception algorithms.
Table 7 shows that NMI values are above 0.75, witnessing a quite high level of similarity between the ground-truth communities and the communities found by each detection algorithm.This experiment provides insights into the performance of detection algorithms on this particular network, with leiden being the most performing one.We now move to the analysis of the NMI values by comparing ground-truth communities and communities after applying community deception techniques.
We observe from Table 8, that NMI values are much lower than those returned when comparing ground-truth communities with communities returned by a detection algorithm before applying deception techniques.As an example, for the leiden detection algorithm and the dsaf deception algorithm, which was the best performing detection algorithm, we note that values of NMI drop from 0.891 to 0.585 on average.This means that no matter which of the 42 ground-truth communities we chose as C , there will be a significant difference between the ground truth communities and the communities returned after applying dsaf.This same reasoning applies to all other deception techniques.However, we have two observations.First, not always lower values of NMI correspond to higher values of the deception score, which is what ultimately community deception strives to obtain.As an example, although the value of NMI for the DICE deception algorithm when considering communities returned by the dm detection algorithm is lower than that of dsaf, we observe that with the latter, a much larger value of the deception score was obtained (see Table 6).This reasoning is evident when considering the RND deception strategy, which adds and removes edges without any clear objective.In fact, while the NMI values are always above 0.5 the corresponding deception score values are very low.
The second observation is that low NMI values after applying deception in a way show that although not specifically designed to hide the whole community structure, deception techniques have effects not only on C but also on the other communities.This comes as no surprise since hiding C , that is, moving its members across communities, changes the structure of each community that releases or receives C 's members.

Deception running time
Our last set of experiments was devoted to investigating the running times of both directed and undirected deception approaches.This will indicate whether considering edge directions brings an additional cost.Moreover, we also insert an asymptotic complexity analysis our novel deception strategies.In this set of experiments for all approaches, we considered a budget of updates equal to the 60% of the number of edges in C .Figures 12 and 13 show the running time for medium networks.Each column of the figures considers a detection approach.Moreover, in each chart, the x-axis represents a network.We observe that all approaches both directed and undirected run in a few seconds for the smaller networks (e.g., Freeman, Facebook).An exception is the modMin algorithm in the Facebook and Email networks when considering leiden and dm, respectively.We further investigated this behavior and hypothesized that it is difficult to exclude bridge edges from possible edge deletions.The same happens in the email network when considering leiden.We recall that both SAF and modMin try to exclude the deletion of internal bridge edges that would disconnect C .
In general, attacks on the output of leiden and dm appear to be the most costly in terms of running time.We also observe that directed approaches, especially dper, require more time than undirected ones in most cases.This comes as no surprise since, from the analysis conducted in this paper (Sect.4), finding the best edge updates requires a more involved formula where the edge direction and in/out node degrees play a significant role.
Figure 14 reports running time on the largest networks.We observe that here the running time significantly increases; this is especially true for the GooglePlus network and the leiden algorithm, where the dper algorithm required almost 400 seconds to perform the updates.By further analyzing the numbers, we observed that for the largest network, that is, GooglePlus the number of updates to be performed was in the order of the hundreds.However, in reality, one can expect that communities that want to implement deception strategies would have a smaller size.This would be reasonable since coordinating among a large group of people can quickly become problematic; in fact, edge updates need to be performed in a real scenario by friending/unfriending or following/unfollowing other nodes.
Asymptotic complexity analysis.For the sake of completeness, Table 9 reports the asymptotic complexity of the deception approaches analyzed in the paper.Note that the asymptotic complexity is the same for the approaches running on directed and undirected networks.The complexity of DICE and RND is not reported in the table since, at each iteration, the update is selected randomly.Thus, the theoretical complexity only depends on the number of updates ( ) that have to be performed.In the case of modularity minimization (row 1 in Table 9) the initialization (corresponding to the computation of , , and the (input/output) total degrees of the communities.Then, the best update to

Conclusions
Despite the plethora of approaches to discovering communities, there is not enough awareness that people can act strategically to evade such network analysis tools.This is particularly critical if who wants to evade such tools are malevolent users and who run the tools are police enforcement.We introduced the problem of hiding a target community C from detection algorithms in directed networks.This problem is interesting for two main reasons.First, several real-world networks have edge directions.Therefore, discarding the directions would necessarily result in an information loss.This loss may affect community detection algorithms.This is why specific approaches to finding communities in directed networks have been devised.Second, community deception was only studied in undirected networks.We showed that when throwing out edge direction information, the state of the art fails to reach a reasonable level of hiding of C inside a community structure.We also showed that it is possible to restore performance similar to that obtained in the undirected scenario when considering direction-aware deception.Specifically, we presented three novel deception strategies.Our theoretical analysis shows that finding the best deception strategy in terms of edge updates is more involved because of the need to distinguish between incoming and outgoing edges for each node.Our extensive experimental evaluation indicates that deception in the directed case is feasible and strictly related to the number of novel communities introduced after applying a deception strategy.Moreover, directed deception is a bit more expensive but still scalable with the size of the network in terms of running time.
There are a number of future research directions.The first is studying deception in the context of network embeddings.Indeed, besides traditional community detection techniques, several approaches perform community discovery via (node and possibly edge) embeddings.Existing deception techniques are not suitable to work in such a setting.The main challenge here consists in the fact that while in a non-embedding setting, one can study the impact of edge updates on some optimization functions (i.e., modularity minimization), understanding how updates reflect into the embedding space is not trivial.
Another exciting line of future research is the investigation of how deception and social bots (Khaund et al. 2022) can benefit from one another.Since social bots mimic the social behaviors of humans, one could think of using social bots to automatize the deception process.The analysis of deception as a cooperative and collective action (Yuce et al. 2014) is also worthy of investigation.
Moreover, we are also interested in applying deception in practice.Indeed, our algorithmic techniques need to be mapped into real-world networks like Facebook or Twitter.The challenge here is how to turn community deception into a collective effort from C 's members that, instructed by deception algorithms, rewire updates according to a deception function .Note that while community detection algorithms require complete network knowledge, deception algorithms should ideally only need to know C 's mem- bers and their links.In a network like Facebook, intra-C (resp., inter-C ) edge deletions can be simply implemented by "Unfriending" some C 's members (resp., external mem- bers).In Twitter, the same behavior can be achieved by "Unfollowing" some C 's members (resp., external mem- bers).As for additions, in Facebook, which requires the acceptance of friendship requests, an intra-C edge addition would not represent a problem.Conversely, an inter-C edge addition, which requires discovering new network members, can be implemented by picking the target node between colleagues, famous people, classmates, or even random people (by sending several friendship requests).This would reflect in just "Following" some network members on Twitter.Understanding how to implement these policies "silently" is undoubtedly challenging.

Fig
Fig. 1 Community Deception: a general framework

Fig. 2
Fig. 2 Example of a directed network (a) and its undirected version (b)

Fig. 4
Fig. 4 Directed deception on medium-size networks

Fig. 8
Fig. 8 Variation of the number of communities on large networks

Fig. 11
Fig. 11 Comparison between directed and undirected deception approaches on large networks

Fig. 12
Fig. 12 Running time (s) in medium networks

Fig. 13
Fig. 13 Running time (s) in medium networks

Table 1
Notation table undirected deception techniques

Table 2
Notation table for directed modularity Symbol Meaning Formula

Table 3
Notation table for directed safeness

Table 4
Notation table for directed permanence Symbol Meaning Formula

Table 6
Average deception score with ground-truth communities