Community detection in directed acyclic graphs

Some temporal networks, most notably citation networks, are naturally represented as directed acyclic graphs (DAGs). To detect communities in DAGs, we propose a modularity for DAGs by defining an appropriate null model (i.e., randomized network) respecting the order of nodes. We implement a spectral method to approximately maximize the proposed modularity measure and test the method on citation networks and other DAGs. We find that the attained values of the modularity for DAGs are similar for partitions that we obtain by maximizing the proposed modularity (designed for DAGs), the modularity for undirected networks and that for general directed networks. In other words, if we neglect the order imposed on nodes (and the direction of links) in a given DAG and maximize the conventional modularity measure, the obtained partition is close to the optimal one in the sense of the modularity for DAGs.


Introduction
Temporality and community structure are two common features present in various types of network data.Temporality of networks refers to nodes and links that vary over time.For example, a friendship link between a given pair of individuals is not always used even if they are close friends of each other.The link would be only occasionally active as the two individuals meet and then separate.Temporally varying networks are collectively called temporal networks [1].Community structure posits that nodes or links in networks can be classified into groups, called communities [2].Typically, a community is defined such that links are dense within a community and relatively sparse across different communities.Many networks in different domains have community structure.
The two features can be naturally combined into community detection in temporal networks, and several algorithms have been proposed to this aim.Examples include cost minimization when temporal non-smoothness is a part of the cost function [3,4], optimization under temporal smoothness constraints [5], methods based on the Potts model [6,7], clique percolation [8], decomposition of adjacency tensors [9], generalization of modularity for adjacency tensors [10][11][12], link clustering [13], and the minimum description length principle [14].
a The two authors contributed equally to the work.b e-mail: naoki.masuda@bristol.ac.ukRelated to temporal networks is the concept of directed acyclic graph (DAG).DAGs are directed networks without directed cycles.In DAGs, nodes can be positioned within a layer structure such that links only emanate from a node in a higher layer to a node in a lower layer (Fig. 1).DAGs have been common as a tool for statistical inference for decades [15,16].Equally importantly, we find various instances of DAGs in the real world such as some food webs [17], some dominance hierarchy networks [18,19], citation networks [20][21][22][23], family trees [24], and phylogenetic networks [25].
Temporal networks can be mapped to DAGs in at least two ways.First, citation networks, a type of temporal network, can be naturally mapped to DAGs.A citation network is a directed network in which a node represents an article such as a scientific paper, patent, or court decision, depending on the network, and a link is directed from the citing to the cited nodes [26].It is temporal in the sense that it grows over time due to the addition of new nodes and links [1].In principle, the links are directed backwards in time because newer nodes can cite older nodes but not vice versa, which makes the network a DAG.There may be links contradicting the arrow of time in real data sets, such as mutual citations [26], which would violate the definition of a DAG.However, these links are relatively few (see section 5.1 for exemplar numbers).
Second, a family of temporal networks can be mapped to DAGs, as schematically shown in Fig. 2. Consider a sequence of adjacency matrices indexed by time, i.e., (A (t) ) t=1,...,T , where t is discrete time, (A (t) ) ij = 1 if i and j are adjacent to each other in the tth snapshot, and (A (t) ) ij = 0 otherwise.By definition, (A (t) ) ij = 1 implies that i and j contacted each other at some (continuous) time contained in the time interval [t, t + 1).Such a representation of temporal networks as sequences of matrices can be induced by the temporal resolution of the recording device or by the aggregation of continuous-time temporal network processes over a finite time window to create a snapshot [1].In Fig. 2, each node i is duplicated T times, and each duplicated node is labelled (i, t), where 1 ≤ t ≤ T .Therefore, there are N T nodes in total in the representation shown in Fig. 2, where N is the number of nodes in the original temporal network.We draw a link from (i, t) to (i, t+1) for every node i (1 ≤ i ≤ N ) and 1 ≤ t < T .We also draw a link between two nodes in subsequent layers if they are connected in the corresponding time interval.In other words, we draw a link from (j, t) to (i, t + 1) if (A (t) ) ij = 1.In this way, a temporal network given as a sequence of adjacency matrices is uniquely mapped to a DAG in which links only span between two subsequent layers.This type of representation and its variants have been used in the analysis of temporal networks [27][28][29].
Community detection methods for temporal networks mentioned earlier in the present section are designed for the latter type of representation.However, to the best of our knowledge, community detection methods explicitly designed for the former type of temporal networks, or more generally DAGs, have not been proposed so far.Community structure in the former type of temporal networks has been studied using methods such as agglomerative hierarchical clustering [30], conventional undirected and directed modularity [21,31], and the Infomap method [32].These detection methods are designed for static networks.They neither incorporate the acyclic nature of citation networks nor temporal information such as the publication dates of articles.In these studies, temporal dynamics were analyzed once communities were obtained by the static methods.Therefore, if we apply existing methods, we have to map a temporal network of the former type to a static network by discarding the temporal information or to a snapshot representation.Both types of mapping seem to be suboptimal given the natural representation of the original network as a DAG.
In the present study, we develop a community detection method which exploits the intrinsic temporality in the former type of temporal networks.Technically, we propose a community detection method for general DAGs.We do so by developing a modularity measure and a maximization method for DAGs.A key to this development is the observation that the choice of a null model network characterizes communities to be detected.Null models randomize links while preserving some properties of the original network.Communities obtained by modularity maximization therefore are structures that are statistically surprising relative to the null model.Depending on the class of networks, specific null models have been proposed.Examples include networks without multilinks and selfloops [33], weighted networks [34], directed net-works [35,36], multi-partite networks [37][38][39], spatial networks [40], networks with a similarity measure imposed between nodes [41], networks directly formed by correlation matrices [42], and multilayer networks [10][11][12] in which temporal networks are included as a special case.

DAG
Let us denote a directed network by G = (V, E) with |V | = N nodes and |E| = M directed links.We assume that G is a DAG.We also assume that G has L layers, that node i (1 ≤ i ≤ N ) belongs to layer i (1 ≤ i ≤ L), and that any directed link from j to i satisfies j > i (Fig. 1).In words, any link emanates from a node in the upper layer (i.e., larger layer number) to a node in the lower layer (i.e., smaller layer number).No directed link exists between nodes in the same layer.A layer may include multiple nodes.For example, in Fig. 1, layers −1, , and +1 contain three, two, and three nodes, respectively.No order is imposed between nodes in the same layer.
Our definition is motivated by empirical observations that it is often more appropriate not to impose an order in subsets of nodes.For example, in citation networks, articles published on the same date cannot cite each other unless the two papers coordinate their citations [21][22][23]26].In dominance hierarchy networks of animals, it may be difficult to rank individual animals (i.e., nodes) if they have similar physical strengths and no prior interaction, which would be the case in large colonies.
We can always give such a layering to a given DAG although layering is not unique in general [43].In the remainder of the present article, we simply refer to a DAG supplied with a proper layering { 1 , . . ., N } as a DAG.

Modularity
Communities detected through modularity maximization depend on the null network model that serves as a baseline for defining the modularity.In this section, we first briefly review modularity measures for undirected and directed networks allowing for cycles.Then, we formulate modularity for DAGs by proposing a new null model tailored to the case of DAGs.

Modularity for undirected networks
The modularity for undirected networks is defined by where A ij is the (i, j) element of the adjacency matrix, and 2,44].The adjacency matrix is defined by A ij = 1 if nodes i and j are adjacent by a link and A ij = 0 otherwise.It is symmetric 1. Schematic of a DAG magnified around layer .The indices − 1, , + 1 represent three subsequent layers from lower to upper.The filled circles represent the nodes in layer .The quantities µ and λ are given by the number of links passing through the corresponding vertical lines.(i.e., A ij = A ji ) for undirected networks.Function δ is the Kronecker delta, and c i represents the index of the community to which node i belongs.In Eq. ( 1), the frequency of links within communities, corresponding to A ij δ ci,cj in Eq. ( 1), is compared against that for the null model, corresponding to k i k j δ ci,cj /2M .Under the null model, the expected number of links between nodes i and j is equal to . In other words, the null model is the configuration model in which each node has the same degree as in the original network.

Modularity for directed networks
The modularity for directed networks is defined by where k in i and k out j are the in-degree of node i and the out-degree of node j, respectively [36,45].In general, A is allowed to be asymmetric, and A ij = 1 if there is a link from node j to i and A ij = 0 otherwise.In Q dir , the number of intra-community links is compared against the null model in which the expected number of links emanating from node j to i is equal to P dir ij = k in i k out j /M .In other words, the null model is the directed variant of the configuration model in which each node has the same in-and out-degrees as in the original network.

Modularity for DAGs
Although it is possible to apply Q und and Q dir to detect communities in DAGs, neither of them incorporates the partial order imposed on nodes in DAGs.Therefore, we propose a null model for DAGs by generalizing the random DAG model proposed in Refs.[22,23] (for other null models of DAGs, see Ref. [46]).In the random DAG model, links are randomized while preserving the in-and outdegree of each node and the order of nodes as specified by i .The difference between the present definition and that in Refs.[22,23] is that the former allows a layer to contain multiple nodes, whereas each node constitutes a single layer in the latter case.
Following the derivation in Ref. [22,23], we calculate the expected number of links P dag ij from node j to node i.We start by defining where 1 ≤ ≤ L. In words, µ is equal to the number of links from nodes in layers { , + 1, . . ., L} to nodes in layers {1, 2, . . ., − 1}, and λ is equal to the number of links from nodes in layers { + 1, + 2, . . ., L} to nodes in layers {1, 2, . . ., − 1}.Graphically, λ and µ are equal to the number of links crossing a section at (dotted line in Fig. 1) and one before layer (dashed line in Fig. 1), respectively.In the example shown in Fig. 1, λ = 3 and µ = 6.The in-degree of node i, k in i , can be visualized as k in i in-stubs attached to node i, where a stub is a half-link.Likewise for the out-degree.Joining an in-stub and an outstub yields a link.We calculate the expected number for an in-stub attached to node i to connect to node j, which we denote by p ij .Let us first consider the case j = i + 1.
In this case, we obtain because j possesses k out j out-stubs and there are µ i+1 out-stubs from which an in-stub attached to node i can choose.The probability that an in-stub of node i does not connect to a node in layer i + 1 is given by Now we assume in general that node j belongs to any layer j > i .Then, an in-stub of node i can connect to node j only if it does not connect to a node in layers i < < j .By iteratively applying the argument leading to Eq. ( 6), we find that the probability that an in-stub of node i does not to connect to any node in layers i < < j is given by Provided that the in-stub of node i does not connect to a node in layers i < < j , the probability that it connects to node j is given by Using Eqs. ( 7) and ( 8), we obtain Because node i has k in i in-stubs, the expected number of links from j to i is given by Using P ord ij , we define the modularity for DAGs, denoted by Q dag , by where P dag ij is given by 4 Spectral methods for modularity maximization Because maximization of modularity is a combinationally hard problem, many approximate algorithms have been proposed for maximizing Q und [2], whose variants are also used to maximize Q dir [2,36,47].Here we adapt the spectral method [48,49] to the case of Q dag .

Spectral method for undirected networks
Before describing the spectral method for Q dag , we review the method for Q und [48,49].The spectral method realizes community detection by iteratively bipartitioning a tentative community.
We initially partition the entire network into two groups of nodes.To this end, we use the fact that Q und is written in matrix form as where denotes the transposition, s = (s 1 • • • s N ) , s i ∈ {−1, 1}, and B und is an N × N symmetric matrix whose elements are given by B und ij Node i is classified to the first and second groups if s i = −1 and s i = 1, respectively.
Finding the vector s that maximizes Q und is an NPcomplete problem [50].A commonly applied heuristics is to relax elements of s to take continuous values and impose the normalization constraint s s = N .Then, Q und is maximized when s is the eigenvector associated with the largest eigenvalue of B und , which we denote by u.We carry out bipartition by setting The spectral method takes advantage of the fact that we can rapidly calculate u using the power method.In fact, the power method requires calculation of the product B und x for changing x.Although this computation may look computationally costly because B und is a dense matrix, B und x can be expressed as where k ≡ (k 1 , k 2 , . . ., k N ) [48,49].The inner product k x is calculated in time O(N ), while the calculation of Ax requires time O(N + M ).For sparse graphs, M = O(N ) such that B und x can be calculated in O(N ) time, accelerating the power method.
Once the network is partitioned into two tentative communities, we repeat selecting and bipartitioning one tentative community until Q und stops increasing.To decide whether to approve a proposed bipartition of a tentative community C into two groups of nodes C 1 and C 2 , one needs to calculate the change in modularity by the partitioning, which is given by In matrix form, ∆Q und is written as where s is a column vector with length |C|, and Bund is a |C| × |C| symmetric matrix whose elements are defined by Bund [48].When C is the entire network, ∆Q und , Bund , and s are equal to Q und , B und , and s, respectively.Under the constraint s s = |C|, Eq. ( 16) is maximized if s is the eigenvector associated with the largest eigenvalue of Bund , which we denote by ũ.Finally, we bipartition C by setting si = sgn(ũ i ) (i ∈ C).

Spectral method for directed networks
The spectral method for directed networks was implemented in Ref. [36].Here we briefly recapitulate their arguments.We can rewrite the modularity for directed network as ) is a symmetric matrix, we can maximize Q dir by replacing B und in Eq. ( 14) by B dir + (B dir ) and applying the same spectral method as that for Q und .
To calculate the change in modularity ∆Q dir caused by bipartitioning, we replace B und by B dir + (B dir ) in Eqs. ( 15) and ( 16).Then, we maximize ∆Q dir by following the same steps as for maximizing ∆Q und .

Spectral method for DAGs
The application of the spectral method to Q dag is straightforward.For Q dag , a single step in the power method can be accelerated as in the case of Q und and Q dir as follows.
First, we write where To find a decomposition of B dag + (B dag ) x similar to Eq. ( 14), we write where As discussed in detail in Ref. [23], Eq. ( 19) when i < j is identical to where, low (i) is the largest value of (≤ i ) such that λ = 0, and up (j) is the smallest value of (≥ j ) such that λ = 0.If there is no (≤ i ) or (≥ j ) such that λ = 0, we obtain low (i) = 1 and up (j) = L, respectively.In words, f ( i , j ) > 0 if and only if for every layer between i and j , we find at least one link penetrating the layer.Using Eq. ( 20), we decompose P dag ij in the case i < j and λ i+1 , . . ., λ j −1 = 0 as It should be noted that P dag ij = 0 otherwise.Because λ i+1 , . . ., λ j −1 = 0 if and only if low (i) = low (j) and up (i) = up (j), we can rewrite Eq. ( 21) as For each layer , we define and respectively.By combining Eqs. ( 22), ( 23) and ( 24), we obtain Finally, we decompose B dag x as Therefore, once κ in ( ), κ out ( ) 1≤ ≤L have been calculated beforehand, the computation of B dag x takes O(LN ) time, and likewise for (B dag ) x.
To implement repeated bipartitioning, we maximize the change in modularity, ∆Q dag , which a biparition of a community causes.We calculate ∆Q dag by replacing B und by B dag + (B dag ) in Eqs. ( 15) and ( 16).Then, we follow the same steps as for maximizing ∆Q und .

Results
In this section, we test the proposed spectral method for maximizing Q dag for several DAG data.We test the method against the spectral method maximizing Q und and Q dir and the so-called Louvain method, which is a non-spectral algorithm that heuristically maximizes Q und [51].We use the implementation of the Louvain method in the igraph package of R [52].In our implementation of the spectral method, we add a fine tuning step after every bipartitioning in a similar fashion to Refs.[48,49] (Appendix A).We do not apply this fine tuning to the Louvain method because this fine tuning is specialized to the spectral method.We also apply a postprocessing procedure after the spectral method or the Louvain method has terminated (Appendix B).To apply the spectral method and the Louvain method to maximize Q und , and also to calculate Q und for the final partition obtained by various methods, we ignore link direction, or equivalently, use A+A as the adjacency matrix of the undirected network.
To quantify similarity between two partitions of the same network, we calculate the Jaccard index for each pair of partitions [2].The Jaccard index is defined by where a 0 and a 1 are the number of node pairs that are classified to the same community in only one partition and in both partitions, respectively.If J = 1, the two partitions are exactly the same.If J = 0, they completely disagree.

Citation networks
As a representative example of temporal networks represented by DAGs, we study two citation networks of articles posted on the e-print archive arXiv.org.We use data on the High Energy Physics Phenomenology (HepPh) and High Energy Physics Theory (HepTh) sections of arXiv, which exhaustively cover the citations between January 1993 and April 2003 [53,54] (http://snap.stanford.edu/data/cit-HepPh.html(Accessed: February 20, 2015)).
The HepPh and HepTh data contain 34,546 and 27,770 nodes, and 421,578 and 352,807 links, respectively.There are four mutually exclusive types of nodes and links excluded from the following analysis.First, we discard 44 and 39 self-loops in the HepPh and HepTh networks, respectively.Second, we discard 2,622 and 1,475 links in HepPh and HepTh networks, respectively, which contradict the arrow of time (i.e., articles citing others newer than themselves).Third, in the publicly available data of the HepPh network, the information about the publication date is not provided for the articles posted after 11 March 2002.Because we cannot assign a date-based layer i to these articles, we remove 3,985 such articles.Fourth, we remove 74,246 links in the HepPh network that are incident to at least one node whose date information is missing.Although we have removed a fraction 74, 276/421, 578 = 0.176 of the links according to the fourth criterion, all removed links are those incident to the newest nodes.Therefore, the network restricted to the dates before 11 March 2002, which we effectively consider in the following, is not affected by the removal of these links.
After removing self-loops, links contradicting the arrow of time, nodes without the information about the Table 2. Modularity values for the citation networks.The results obtained by maximization of Q und , Q dir , and Q ord using the spectral method are shown in the rows labelled "sund", "s-dir", and "s-dag", respectively.The results obtained by maximization of Q und using the Louvain method are shown in the rows labeled "L-und".Nc denotes the number of communities.

Data
Method It should be noted that some layers had no node because no article was published on that date.This fact does not change the following results because Q dag and its optimization procedure described in the previous sections are not affected by empty layers.Some basic quantities of the two networks (i.e., largest weakly connected components) are summarized in Table 1.Modularity values obtained after the maximization procedure are summarized for the two networks in Table 2.A row corresponds to a combination of the maximized modularity measure (i.e., Q und , Q dir , or Q dag ) and the maximization method (i.e., spectral or Louvain).For example, s-und represents the spectral method for maximizing Q und .The columns correspond to the three modularity values measured at the end of the maximization procedure, given a modularity maximization method corresponding to a row.
The modularity values shown in Table 2 seem to be large.It turns out that, in both networks, a partitioning method designed for Q und (i.e., L-und for the HepPh network and s-und for the HepTh network) has yielded the largest value for not only Q und but also Q dir and Q dag .Therefore, for these networks, it is better to use a method designed for maximizing Q und to (also) maximize Q dag than a method designed for maximizing Q dag .Table 2 also indicates that the spectral method designed for maximizing Q dag (s-dag) does not fall far behind s-und or L-und in terms of the obtained Q dag value.In fact, regardless of the method, we obtained similar values of Q und , Q dir , and The similarity between each pair of the four optimized partitions for each network (Table 2) is shown in Table 3, where similarity is measured in terms of the Jaccard index.We find that the Jaccard indices are not very large, in particular for the HepTh network, although the different partitioning methods have yielded close modularity values, as shown in Table 2. Nevertheless, the Jaccard index may not be very large even between partitions with close modularity values obtained from the same stochastic partitioning algorithm applied to the same network.For example, the Jaccard index values between 0.5 and 0.9 reported in Ref. [55] are larger than but comparable with the present values.It has also been reported that partitioning results obtained from the same network yielding close modularity values can be fairly different [2,56].

Transcriptional regulation network
We study the compartmentalization of the transcriptional regulation network of the bacteria Escherichia coli (E. coli ) using publicly available data [57] (http://wws.weizmann.ac.il/mcb/UriAlon/index.php?q=e-coli-transcriptionnetwork (Accessed: February 17, 2015)).Nodes of this network are operons.Links are directed from the operon that encodes a transcription factor to an operon regulated by the transcription factor.The network, which is a DAG, contains 423 nodes and 519 directed links without self loops.We extracted the largest weakly connected component, which consisted of 328 nodes and 456 links.Layers were determined by a leaf-removal algorithm [43].In this algorithm, the nodes with out-degree zero are classified Table 6.Modularity values for five dominance networks.The results obtained with the optimal modularity maximization method are shown in rows labeled "o-und", "o-dir", and "odag".In the table, "-" indicates that the optimal method has not terminated.See the caption of Table 2 for the other legends.1.
The modularity values obtained from the four modularity maximization algorithms are shown in Table 4. Algorithm s-und produces the largest Q und , Q dir , and Q dag , similarly to the results for the HepTh network (Table 2).The Jaccard indices are shown in Table 5.They are much larger than those for the citation networks (Table 3), suggesting that the partitions obtained by the different meth-works) and do not solely depend on the size of the network.Therefore, different colonies may have different degrees of community structure.The table also indicates that s-dag realizes the largest Q dag values in all colonies except C1, among the four heuristic methods (i.e., s-und, s-dir, sdag, and L-und).Algorithm s-dag realizes the optimal Q dag value obtained by o-dag for C2 and C3.The Q dag value obtained by s-dag is also close to that obtained by odag for C6, whereas this is not the case for C1.Similarity among the partitions obtained by the different algorithms depends on colonies without clear patterns (Table 7).

Conclusions
We have proposed a modularity measure for DAGs and a spectral algorithm to maximize it by extending the spectral algorithm developed for undirected networks.We found that the obtained modularity values are rather independent of whether we used the proposed algorithm or other algorithms known for less restricted networks such as undirected or directed networks allowing for cycles.
The spectral method was presented as an example heuristic for maximizing the proposed modularity measure.The proposed modularity measure can be also maximized by other approximate methods, which may surpass the performance of the spectral method.
We acknowledge Steve Gregory for discussion and careful reading of the manuscript.L.S. acknowledges the support provided through DAAD.N.M. acknowledges the support provided through JST, CREST, and JST, ER-ATO, Kawarabayashi Large Graph Project.

A Fine tuning after bipartitioning
In the spectral methods for maximizing Q und , Q dir , and Q dag , we carried out the following fine tuning procedure every time after a community was bipartitioned.Suppose that we have bipartitioned a community C into two communities C 1 and C 2 .For every node in C, we tentatively moved the node to the opposite community and calculated the change in the targeted modularity value.Then, we adopted the attempted move that maximized the modularity under the condition that the modularity increased by the move.We repeated this procedure under the constraint that each node was moved at most once, until no further increase in modularity was possible.This fine tuning procedure slightly modifies that proposed for Q und [48,49].We modified the original procedure because it involved repetition of a procedure similar to the aforementioned one and did not terminate on our data when we attempted to maximize Q dag .

B Postprocessing
After the completion of modularity maximization with the spectral or Louvain method, we carried out the following postprocessing algorithm to enhance the targeted modularity value.Our postprocessing algorithm resembles that employed in other modularity maximization algorithms [51,58].
Step 1: For a given node i (1 ≤ i ≤ N ), we tentatively moved it to the community to which each adjacent node of i belonged.The move attempt that increased the modularity by the largest amount was adopted.We scanned the N nodes in random order.We neglected the direction of link when judging adjacency in the case of directed networks including DAGs.
Step 2: First, we selected the smallest community in terms of the number of nodes, called the seed community.Second, we tentatively merged the seed community with each of the remaining communities to which the seed community is directly connected by a link regardless of the direction.Third, we measured the modularity value.We adopted the merge attempt that yielded the largest modularity but only when the modularity increased by the merge.Then, among the community that had never been selected as seed community, we selected the smallest community as seed community and attempted merger.We repeated the same procedure until no further increase in the modularity occurred.
We applied step 1 and then step 2. We repeated the combination of the two steps ten times.We also verified that swapping the order of the two steps had little impacts on the final modularity value.

Fig. 2 .
Fig. 2. Schematic of the DAG representation of a temporal network, in which the link configurations between the same set of nodes change over time.The lower row shows snapshots of the network taken at times in [t−1, t) and [t, t+1), respectively.

Table 1 .
Properties of the networks analyzed in section 5.The number of nodes, that of links, and that of layers are denoted by N , M , and L, respectively.These values refer to those of the largest weakly connected component of each network.

Table 3 .
Jaccard indices between the partitions obtained for the citation networks.

Table 4 .
Modularity values for the E. coli transcriptional regulation network.See the caption of Table2for legends.

Table 5 .
Jaccard indices between the partitions obtained for the E. coli transcriptional regulation network.
network.We use the largest weakly connected component of each network.Layers are defined by the date of publication.