First Betti number of the path homology of random directed graphs

Path homology is a topological invariant for directed graphs, which is sensitive to their asymmetry and can discern between digraphs which are indistinguishable to the directed flag complex. In Erd\"os-R\'enyi directed random graphs, the first Betti number undergoes two distinct transitions, appearing at a low-density boundary and vanishing again at a high-density boundary. Through a novel, combinatorial condition for digraphs we describe both sparse and dense regimes under which the first Betti number of path homology is zero with high probability. We combine results of Grigor'yan et al., regarding generators for chain groups, with methods of Kahle and Meckes in order to determine regimes under which the first Betti number is positive with high probability. Together, these results describe the gradient of the lower boundary and yield bounds for the gradient of the upper boundary. With a view towards hypothesis testing, we obtain tighter bounds on the probability of observing a positive first Betti number in a high-density digraph of finite size. For comparison, we apply these techniques to the directed flag complex and derive analogous results.


Introduction
In applications, networks often arise with asymmetry and directionality.Chemical synapses in the brain have an intrinsic direction (see [20, §5]); gene regulatory networks record the causal effects between genes (e.g.[1]); communications in social networks have a sender and a recipient (e.g.[17]).A common hypothesis is that the structure of a network determines its function [13,21], at least in part.In order to investigate such a claim, one requires a topological invariant which describes the structure of the network.To obtain such a summary for a digraph, one often symmetrises to obtain an undirected graph, before applying traditional tools from TDA(e.g.[12]).This potentially inhibits the predictive power of the descriptor, since the pipeline becomes blind to the direction of edges.In recent years, particularly in applications related to neuroscience (e.g.[2,21]), researchers have explored the use of topological methods which are sensitive to the asymmetry of directed graphs.
A much-studied construction, for undirected graphs, is the clique complex (or flag complex)a simplicial complex in which the k-simplices are the (k + 1)-cliques in the underlying graph.An obvious extension to the case of directed graphs is the directed flag complex [18].This is an ordered simplicial complex in which the ordered k-simplices are the (k + 1)-directed cliques: (k + 1)-tuples of distinct vertices (v 0 , . . ., v k ) such that v i → v j whenever i < j.An important property of this construction is that is able to distinguish between directed graphs with identical underlying, undirected graphs; it is sensitive to the asymmetry of the digraph.
Path homology (first introduced by Grigor'yan et al. [9]) provides an alternative construction which, while more computationally expensive, is capable of distinguishing between digraphs which are indistinguishable to the directed flag complex (e.g. Figure 1, c.f. [5]).Moreover, the non-regular chain complex, from which path homology is defined, contains the directed flag complex as a subcomplex.Intuitively, the generators of the k th chain group of the directed flag complex are all the directed paths, of length k, such that all shortcut edges are present in the graph.Whereas, the k th chain group of the non-regular chain complex consists of all linear combinations of directed paths, of length k, such that any missing shortcuts of length (k − 1) are cancelled out.
Other desirable features of path homology include good functorial properties in an appropriate digraph category [10,8] and invariance under an appropriate notion of path homotopy [10,Theorem 3.3].Furthermore, path homology is a particularly novel method since it operates directly on directed paths within the digraph, rather than first constructing a simplicial complex.Rather than being freely generated by distinguished motifs, the chain groups for path homology are formed as the pre-images of the boundary maps.As such, finding a basis for the chain groups is often non-trivial, which complicates the understanding of how homology arises in a random digraph.Hence, it is desirable to develop an understanding of the statistical behaviour of path homology, both from an applied perspective and from independent interest.Key questions include (as discussed for the clique complex by Kahle and Meckes [16]): when should one expect homology to be trivial or non-trivial; when homology is non-trivial, what are the expected Betti numbers; and how are the Betti numbers distributed?To date, traditional topological invariants enjoy a greater statistical understanding in the context of basic null models.In particular, Kahle showed the following: Theorem 1.1 (Kahle [14,15]).For an Erdös-Rényi random undirected graph G ∼ G(n, p), denote the k th Betti number of its clique complex X(G) by β k .Assume p = n α , then In essence, this characterises the understanding that, in any given degree, random graphs only have non-trivial, clique complex homology in a 'goldilocks' region, wherein graph density is neither too big nor too small.Moreover, the boundaries of this region are dependent on the number of nodes in the graph, scaling as a power law.Our primary contribution is a similar description for two different flavours of path homology, in degree 1.

Summary of results
As seen in Theorem 1.1, in order to obtain concise, qualitative descriptions, one often makes assumptions about how null model parameters depend on the number of nodes n.Then, one can show that a property P holds with probability tending to 1 as n → ∞.Under these conditions, we say that the property P holds with high probability [15].Moreover, in order to derive useful probability bounds, it is often necessary to prescribe a null model which is highly symmetric and depends on few parameters.Therefore, throughout this paper we will be focusing on an Erdös-Rényi random directed graph model, in which the number of nodes is fixed (at n) and each possible directed edge appears independently, with some probability p. Note, this model allows for the existence of a reciprocal pair of directed edges.
Although individual results are potentially stronger, the following theorems characterise the theoretical understanding that we will develop.Firstly, as with conventional homologies, the bottom Betti number of path homology, − → β 0 , captures the connectivity of the directed graph.Thus, we use a standard result due to Erdős and Rényi [7,14] to prove the following.

Theorem 1.2. For an Erdös
The same result holds for regular path homology.
Our primary contribution identifies a similar 'goldilocks' region for the first Betti number of path homology, − → β 1 .
The same result holds for regular path homology.
By way of justifying the assumption p(n in colour, against log(n) and log(p) along the two axes.We observe two transitions between three distinct regions in parameter space.There is an interim region, in which we observe mostly − → β 1 > 0; when p becomes too small we suddenly observe mostly − → β 1 = 0, and likewise when p becomes too large.On this plot, the boundaries between the three regions appear as straight lines.Hence a reasonable conjecture is that these boundaries follow a powerlaw relationship log(p) = α log(n) + c.Therefore, following power-law trajectories through parameter space will allow us to derive either Turning our attention to higher degrees, we provide weak guarantees for the asymptotic behaviour of − → β k , for arbitrary k ≥ 1, at low densities.For comparison, in Section 5, we apply the techniques used to prove Theorem 1.3 in order to obtain analogous results for the directed flag complex.In Section 6, we summarise these results and compare path homology and the directed flag complex to more traditional symmetric methods.We provide Table 1 in which we record, for each of the homologies under consideration, the α-region in which we know β 1 is either zero or positive, with high probability (assuming p = n α ).
In Appendix A, with a view towards hypothesis testing, we derive a tighter explicit bound for P( − → β 1 (G) > 0), which becomes useful when p is large.In order to identify a given Betti number as statistically significant, against a Erdös-Rényi null model, one would usually resort to a Monte Carlo permutation test (e.g.[6]).This would require the computation of path homology for a large number of random graphs.For large graphs (n ≥ 100 nodes), this is often infeasible, due to the computational complexity of path homology.However, if graph density falls into one of the regions identified by the results in Appendix A, one can potentially circumvent this costly computation.

Acknowledgements
The author would like to thank his supervisors Ulrike Tillmann and Heather Harrington for their support and guidance throughout this project.The author would also like to thank Gesine Reinert and Vidit Nanda for their helpful feedback on a prior draft.The author is a member of the Centre for Topological Data Analysis, which is funded by the EPSRC grant 'New Approaches to Data Science: Application Driven Topological Data Analysis' EP/R018472/1.

Data Availability
The code and data for the experiments of Appendix B, as well as an implementation of the algorithm described in Lemma A.7, are available at [3].A copy of this repository is also included in the ancillary files of this arXiv submission.All code is written in MATLAB and data from the experiments is available in the .matformat.

Graph theory definitions and assumptions
For clarity, we present a number of standard definitions, and assumptions that we will use throughout this paper.First, we fix our notation for graphs.
, where V is an arbitrary set and The density of a simple digraph G = (V, E) is the ratio of edges present, relative to the maximum number of possible edges: .
(2.1) Assumption 2.2.Throughout this paper, unless stated otherwise, we assume that all digraphs G = (V, E) are simple.This means that they contain no self loops and contain at most one edge between any ordered pair of vertices.
Given a directed graph G, we make the following definitions to refer to subgraphs within G.
Definition 2.3.Given a digraph G = (V, E), we make the following definitions.
(a) A subgraph is another graph and node-set V (G 1 ∪ E 2 ), the smallest superset of V (G 1 ) that contains all endpoints of edges in E 2 .(c) A (combinatorial) undirected walk is an alternating sequences of vertices and edges such that edges connect adjacent vertices, in either direction.That is, for each i, either (d) A (combinatorial) directed walk is an undirected walk such that all edges are forward edges, that is e i = (v i−1 , v i ) for every i.(e) A (combinatorial) directed/undirected path is a directed/undirected walk which never repeats vertices or edges, that is v i = v j or e i = e j implies i = j.(f) A (combinatorial) directed/undirected cycle is a directed/undirected walk such that (2.4) (g) The length of a walk is the number of edges it traverses, e.g. the length of ρ in equation (2.3) is n.(h) A double edge is an unordered pair of vertices {i, j} ⊆ V such that both directed edges are in the graph, i.e. (i, j), (j, i) ∈ E.
Notation 2.4.(a) For vertices i, j ∈ V , we write i Remark 2.5.Assumption 2.2 allows for the existence of double edges.

Analytic and algebraic definitions
Next, we provide definitions of 'Landau symbols', which we use describe the asymptotic behaviour of two functions, relative to one another.
Definition 2.6.Given two functions f, g : R → R we write Finally, we make a formal, algebraic definition, which will be required later in order to define path homology.Definition 2.8.Given a ring R and a set V , we let R V denote the R-module of formal R-linear combinations of elements of V .That is, where {e v | v ∈ V } are formal symbols which form a basis of the free R-module R V .

Erdös-Rényi random graphs
Throughout this paper, we will primarily be investigating random directed graphs under an Erdös-Rényi model.G as topological spaces by giving them the natural structure of a simplicial complex and delta complex respectively.Both of these structures have no simplices above dimension 1, so clearly Lemma 2.12.Given a random directed graph G ∼ − → G (n, p), the flat symmetrisation is distributed as Ḡ ∼ G(n, p) where (2.9) Proof.A given undirected edge {i, j} appears in Ḡ if and only if at least one of (i, j) or (j, i) is in G. Therefore P({i, j} ∈ Ē) = P (i, j) ∈ E and (j, i) ∈ E = (1 − p) 2 . (2.10) Therefore, the undirected edge appears with probability 1 − (1 − p) 2 .The existence of each undirected edge depends on the existence of a distinct pair of directed edges.Hence each undirected edge appears independently.
Definition 2.13.Throughout this paper, we define p as in (2.9), whenever the underlying p is clear from context.
Note that asymptotic conditions on p do not differ significantly from asymptotic conditions on p.
Definition 2.15.Given an undirected graph G = (V, E), (a) a k-clique is a subset of vertices V ′ ⊆ V , such that #V ′ = k and for any two, distinct vertices, i, j ∈ V ′ , the edge between them is present, i.e. {i, j} ∈ E; (b) the clique complex, X(G) is a simplicial complex where the k-simplices are the (k + 1)cliques in G.
First Betti number of the path homology of random directed graphs

A Preprint
We now investigate the behaviour of these 'symmetric methods' on random directed graphs.Since the flat symmetrisation of a random digraph − → G (n, p) is a random graph G(n, p) (by Lemma 2.12) and the asymptotics of p do not differ greatly from those of p (by Lemma 2.14), Theorems 1.1 has an immediate corollary.

Corollary 2.16. For an Erdös
Next, we prove that if p = p(n) shrinks too quickly then β 1 will vanish for Ḡ and • G, with high probability.This is a special case of the proof given by Kahle [14,Theorem 2.6].We repeat the proof to illustrate that it can be applied to Theorem 2.17. (2.12) Proof.Note that the existence of an undirected cycle in Ḡ is a necessary condition for − → β 1 ( Ḡ) > 0. When taking the flat symmetrisation, any undirected cycle of length 2 in G becomes a single edge, so the minimum cycle length is 3.Moreover, the vertices of a cycle must be distinct so the maximum length of a cycle is n.Therefore, it suffices to show that the probability that there exists an undirected cycle of any length L ∈ [3, n] tends to 0. For each L, by a union bound, the probability of there being an undirected cycle of length L is at most Hence, the probability that there is an undirected cycle of any length is at most By Lemma 2.14, the assumption p = o(n −1 ) implies that lim n→∞ (np) = 0.This ensures that the series (2.14) converges (at least eventually in n) and moreover the bound converges to 0 as n → ∞.
To prove − → β 1 ( • G) = 0 with high probability, all that remains is to bound probability of there being an undirected cycle on 2 nodes (i.e. a double edge) in G.The probability that there is some double edge is at most (2.15) The assumption p = o(n −1 ) ensures that n 2 p 2 → 0.
Finally, we investigate conditions under which we expect β 1 ( Ḡ) > 0 and β 1 ( • G) > 0 with high probability, and determine the growth rate of Moreover, (2.17) Proof.Denoting the original digraph G = (V, E), we deal with the flat symmetrisation first.
For convenience, we define n 1 := # Ē.The Euler characteristic fo Ḡ can be computed either via the alternating sum of the Betti numbers of the number of simplices [11] and hence we have an equation since there are no 2-dimensional simplices.This implies that Thanks to the condition on p(n), using Lemma 2.14, we can see which rearranges to show > n and hence eventually we can use the inequalities (2.19) to conclude (2.23) Since n 1 is a binomial random variable, on n(n − 1) trials each with probability p, we can bound the second moment Therefore we can bound further (2.24) Taking the limit n → ∞, we have seen that that first term tends to 1.The second term also tends to 1, since The case for the weak symmetrisation has an identical proof, except that E[# 3 Path homology of directed graphs

Definition
Path homology was first introduced by Grigor'yan et al. [9,8].The key concept behind path homology is that, in order to capture the asymmetry of a digraph, we should not construct a simplicial complex, but instead a path complex.In a simplicial complex, one can remove any vertex from a simplex and obtain a new simplex in the complex.This property may not hold for directed paths in digraphs; if we bypass a vertex in the middle of a path then we may not obtain a new path.However, we can always remove the initial or final vertex of a path and obtain a new path.This is the defining property of a path complex [9, §1].Path homology can be defined on any path complex but for this paper we focus on the natural path complex associated to a digraph.Throughout this section we fix a ring R and a simple digraph G = (V, E).Definition 3.1.We make the following definitions to classify sequences of vertices in V : (a) Any sequence v 0 . . .v p of (p + 1) vertices v i ∈ V is an elementary p-path.(b) An elementary path is regular if no two consecutive vertices are the same, i.e. v i = v i+1 for every i.(c) If an elementary path is not regular then it is called non-regular or irregular.(d) An elementary path is allowed if subsequent vertices are joined by a directed edge in the graph, i.e. (v i , v i+1 ) ∈ E for every i.
Remark 3.2.An allowed path coincides with a combinatorial, directed walk.
Definition 3.3.The following R-modules are defined to be freely generated by the generators specified, for p ≥ 0: we define e τ := e ab as an alias for the basis element of A 1 .
We can construct homomorphisms between the Λ p .where v 0 . . .vi . . .v p denotes the elementary (p − 1)-path v 0 . . .v p with the vertex v i omitted.This defines ∂ p on a basis of Λ p , from which we extend linearly.In the case p = 0, we define which yields an element of R. Lemma 2.4] and hence {Λ p , ∂ p } forms a chain complex.(b) Since we assume all digraphs are simple, there are no self-loops.Therefore, any allowed path must be regular and hence In order to incorporate information about paths in the graph we would like a boundary operator between the A p .However, the boundary of an allowed path may not itself be allowed, because it involves removing vertices from the middle of paths.To resolve this, we define a R-module, for each p ≥ 0, called the space of ∂-invariant p-paths Hence, we can make the following construction.
Definition 3.6.The non-regular chain complex is where each ∂ p is the restriction of the non-regular boundary map to Ω p .
Definition 3.7.The homology of the non-regular chain complex (3.8) is the non-regular path homology of G.The k th homology group is denoted The rank of the k th homology group is When computing Ω p , one regularly encounters paths v ∈ A p with irregular summands in their boundary.For example, Since irregular summands are never allowed, these must be cancelled to obtain an element of Ω p .An alternative construction, which is featured more frequently in the literature, alters the boundary operator to remove these irregularities.
There is a projection map π : Λ p → R p which sends every irregular path to 0. This allows us to make the following construction: Definition 3.8.For each p ≥ 0, the regular boundary operator ∂ R p : R p → R p−1 is defined by With this new boundary operator we still have the issue that the boundary of an allowed path may not be allowed.Therefore, we again construct an R-module, for each p ≥ 0, called the space of ∂ R -invariants p-paths.
One can check that, given any irregular path v, either ∂v = 0 or ∂v is a sum of irregular paths [9, Lemma 2.9] and hence Definition 3.9.The regular chain complex is where each ∂ R p is the restriction of the non-regular boundary map to Ω R p .Definition 3.10.The homology of the regular chain complex chain complex is the regular path homology of G and the k th homology group is denoted We denote the Betti numbers for these homology groups by (b) Since we augment the chain complex with R in dimension −1, this is technically a reduced homology, but we omit additional notation for simplicity.(c) As noted in [9, §5.1], given a subgraph G ′ ⊆ G then, for every p ≥ 0, Notation 3.12.When G is clear from context, we shall omit it from notation.If the coefficient ring R is omitted from notation, assume that R = Z.
Note that the primary difference between the regular and non-regular chain complex is the boundary operator.The difference between the boundary operators ∂ p and ∂ R p affects the difference between the R-modules Ω p and Ω R p .

Proof of Theorem 1.4
As an easy first step, we show that it is very unlikely that there are any long paths within the digraph, when graph density is too low.Therefore, for large k, A k becomes trivial and consequently − → Proof.Note that it suffices to show that P(A N = {0}) → 1 as n → ∞ because, if there are no allowed N -paths, then there are certainly no allowed k-paths.If there are no allowed k-paths then Ω k = {0} and so For A N to be non-trivial there must be some combinatorial, directed walk of length N .Equivalently, there must exist a combinatorial, directed cycle or a combinatorial, directed path of length N (or both).
If p = o(n −(N +1)/N ) then certainly p = o(n −1 ) and hence, following the proof of Theorem 2.17, the probability that there is a directed cycle tends to 0 as n → ∞.
A combinatorial, directed path is a sequence of N + 1 distinct nodes, each joined by an edge in the forward direction.By a union bound, the probability that there exists such a sequence is at most which, by the assumption on p, tends to 0 as n → ∞.
Proof of Theorem 1.4.By Theorem 3.13, it suffices to note that This theorem is very weak.For example, to obtain − → β 1 = 0 with high probability, we require p = o(n −2 ), in which case the expected number of edges in the digraph tends to 0. The weakness of this result stems from its reliance on the chain of inequalities There is likely a region of graph densities wherein one or more of these inequalities is strict.Hence, in order to obtain stronger results, we require an understanding of Ω k , at the very least.

Chain group generators
Proposition 3.14 ([9, § 3.3]).For any simple digraph G = (V, E), Proof.Certainly Ω 0 ⊆ A 0 and Ω 1 ⊆ A 1 .Moreover, the boundary of any vertex is just an element of R = A −1 and hence allowed.The boundary of any edge is a sum of vertices and any vertex is an allowed 0-path.Therefore A 0 ⊆ Ω 0 and A 1 ⊆ Ω 1 .
We can also see that the non-regular chain complex is a subcomplex of the regular chain complex, which immediately implies an inequality between the Betti numbers.This subcomplex relation was first noted by Grigor'yan et al. [9,Proposition 3.16].
Proposition 3.15.For any simple digraph G, the non-regular chain complex is a subcomplex of the regular chain complex.In particular, for each p ≥ 0, we have Hence, if we project ∂ p (v) onto R p−1 via π, we do not remove any summands.Therefore Certainly v ∈ A p and hence v ∈ Ω R p .Since the two operators, ∂ p and ∂ R p , agree on Ω p , the non-regular chain complex is a subcomplex of the regular chain complex.
Proof.By Proposition 3.14, the two complexes coincide in dimensions 0 and 1 and hence rank ker From this, it is easy to obtain a characterisation of the lowest Betti number in terms of a symmetrisation Ḡ.

Proof of Theorem 1.2. A standard argument shows that
where #C is the number of weakly connected components of the digraph G. Note, #C coincides with the number of connected components of the symmetrisation Ḡ.The result follows by Lemma 2.12 and a standard result due to Erdős and Rényi (see e.g.[7,14]).
Unfortunately, higher chain groups do not enjoy such a concise description.However, when working with coefficient over Z, it is possible to write down generators for Ω R 2 , in terms of motifs within the digraph G.The following result was proved by Grigor'yan et al. [10].
Theorem 3.17 ([10, Proposition 2.9]).Let G be any finite digraph.Then any ω ∈ Ω R 2 (G; Z) can be represented as a linear combination of 2-paths of the following three types: 1. e iji with i → j → i (double edges); 2. e ijk with i → j → k and i → k (directed triangles); 3. e ijk − e imk with i → j → k, i → m → k, i → k and i = k (long squares).Note further that each of the generators in Theorem 3.17 are elements of Ω R 2 and hence they form a generating set for Ω R 2 (G; Z).Note that elements of each type reside in mutually orthogonal components of A 2 because they are supported on distinct basis elements.That is, we can write where D is freely generated by all double edges e iji in G, and T is freely generated by all directed triangles e ijk in G.The final component, S, is generated by all long squares e ijk − e imk in G.However, they may not be linearly independent, for example, as seen in Figure 3, Note that double edges are not ∂-invariant paths, i.e. e iji ∈ Ω 2 .However there are linear combinations of double edges which do belong to Ω 2 .For example, suppose i → j → i and i → k → i, then e iji − e iki ∈ Ω 2 .It is possible to state a non-regular version of Theorem 3.17, in which all generators are elements of Ω 2 .This can be achieved by replacing double edge generators with such differences of double edges, which share a common base point.However, we omit this result, as it is not necessary for our main contribution.
Remark 3.19.(a) An alternative approach to computing rank Ω 2 and rank Ω R 2 was first seen in [9, Proposition 4.2] and is explored further in Appendix A.2.(b) For the interested reader, Grigor'yan et al. [9,8] prove more results which characterise relations between the Ω p .

Asymptotic results for path homology
Intuitively, we expect that the two transitions, identified in Figure 9, correspond to two distinct topological phenomena.When density becomes sufficiently large, cycles start to appear in the graph and ker ∂ 1 is non-empty for the first time.Then, when density becomes too large, boundaries enter into Ω 2 which begin to cancel out all of the cycles, removing all homology.
In the interim period, we expect that the number of cycles and the number of boundaries is approximately balanced.Therefore, in order to understand the lower boundary we should study ker ∂ 1 and in order to understand the upper boundary we should study im ∂ 2 .In order to show that − → β 1 > 0 in the 'goldilocks' region we should compare the growth rates of rank ker ∂ 1 and rank im ∂ 2 , or some approximation thereof.Moreover we expect reasonable conditions on p(n) to be of the form p = o(n α ) or p = ω(n α ) for some α, since conditions of this sort constrain p(n) relative to straight lines through Figure 9.

Proof of Theorem 1.3(a)
In order to characterise the behaviour of − → β 1 when it is non-trivial, we will follow the approach of Kahle and Meckes in [16,Theorem 2.4].The approach is to use the 'Morse inequalities' which state, for any chain complex of finitely generated, abelian groups (C • , d • ), defining n k := rank C k and letting β k (C • ) denote the Betti numbers, we have It is easier to compute the rank of chain groups than the rank homology groups.Hence, we use the limiting behaviour of n k to investigate the limiting behaviour of − → β k .First we will need estimates for E [n k ].
Proof.The first two claims are clear since they count the expected number of nodes and edges in G, respectively.There is no difference between the regular and non-regular chain complex in dimensions 0 and 1.
We use Theorem 3.17 to compute bounds for E rank Ω R 2 and then the bound on E [rank Ω 2 ] follows immediately because Ω 2 ⊆ Ω R 2 (by Proposition 3.15).Since both orientations of a double edge constitute a distinct basis element of Ω R 2 , the expected number of double edges is n(n − 1)p 2 , which is bounded above by n 2 p 2 .The expected number of directed triangles is 6 n 3 p 3 , because each subset of 3 vertices can support 6 distinct directed triangles.Counting linearly independent long squares is more involved.For an upper bound, note that any subset of 4 vertices can support 12 long squares (not double counting for the two orientations since they differ by a factor of ±1).Each fixed long square appears with probability p 4 (1 − p).Therefore an upper bound on the number of linearly independent long squares is Proof.We prove the non-regular case, but the regular case follows from an identical argument.
For convenience, we define n k := rank Ω k (G; Z).The Morse inequalities (4.1) applied to the non-regular chain complex at k = 1 are Taking expectation and dividing through by where the final equality follows from the assumption p = ω(n −1 ).For the latter, note The assumption p = o(n −2/3 ) is equivalent to n 2/3 p → 0 as n → ∞.This is sufficient to ensure p → 0, np 2 → 0 and n 2 p 3 → 0 as n → ∞, which concludes the proof.

Proof of Theorem 1.3(b)
Using a second moment method, we can prove that − → β 1 (G) > 0 with high probability, under suitable asymptotic conditions on p = p(n).The approach is similar to that of Theorem 2.18, except that we must use the Morse inequalities to show that E Proof.We prove the non-regular case; the regular case follows from an identical argument.Since − → β 1 (G) is a non-negative random variable, an application of the Cauchy-Schwarz inequality to Again for convenience, we define n k := rank Ω k (G; Z).Then, the Morse inequalities yield where the last inequality follows since n 1 is a Binomial random variable on n(n − 1) trials, each with independent probability p.
In the proof of Theorem 4.2, under the same conditions on p = p(n), we saw that Therefore, eventually we have (4.16) Taking the limit n → ∞, we have seen that that first term tends to 1.The second term also tends to 1, since ) and p = o(n −2/3 ).Hence, by Theorem 4.4,

Proof of Theorem 1.3(c)
Having understood the behaviour of E[ − → β 1 ] in the 'goldilocks' region, we turn our attention to the boundaries of this region.As with the symmetric methods, we expect that if p is too small then − → β 1 will vanish due to the lack of cycles.
Proof.Given a double edge iji, note ∂ R 2 (e iji ) = e ij +e ji .Hence, for the regular case, a necessary condition for − → β R 1 > 0 is that there is some undirected cycle, of length at least 3, in the digraph.Whereas, for the non-regular case, a necessary condition is that there is some undirected cycle, of length at least 2, in the digraph.Therefore, the proof of the regular case is identical to the proof that − → β 1 ( Ḡ) = 0 with high probability and the proof of the non-regular case is identical to the proof that − → β 1 ( • G) = 0 with high probability, as seen in Theorem 2.17.

Proof of Theorem 1.3(d)
For the previous subsection we chose p small enough to ensure that it is highly likely that ker ∂ 1 is empty.We also observe − → β 1 vanishing for larger values of p.In these regimes ker ∂ 1 is likely non-empty but all cycles are cancelled out by boundaries.Put another way, we wish to show that, when p is large, every cycle ω ∈ ker ∂ 1 can be shown to satisfy The strategy is to find conditions under which cycles supported on many vertices can be reduced down to cycles supported on just 3 vertices, and then show that small cycles can be reduced to 0. For this subsection, we will prove that P[ − → β 1 (G) = 0] → 1 which then implies, by Corollary 3.16, that P[ First, we need to ensure that we can choose a basis for ker ∂ 1 which will be amenable to our reduction strategy.Proof.Take an undirected spanning forest T for G, i.e. a subgraph of T in which every two vertices in the same weakly connected component of G can be joined by a unique undirected path through T .One can check that ∂ 1 : Ω 1 (T ; R) → Ω 0 (T ; R) has trivial kernel, since there are no undirected cycles in T .
Given an edge outside the forest τ = (a, b) ∈ E(G) \ E(T ), there is a unique undirected path ρ through T which joins the endpoints of τ : Since there are no cycles in the spanning forest T , the kernel of ∂ 1 on the first component is trivial.Therefore, rank ker ∂ 1 ≤ #B and hence B spans ker ∂ 1 .
Now we can describe the strategy by which systematically reduce long fundamental cycles into smaller ones.We design a combinatorial condition on a directed graph which is more likely to occur at higher densities.Such a vertex, κ, is called a directed centre for σ if there is some subset of linking edges J ′ ⊆ J σ,κ such that − → β 1 (σ ∪ J ′ ) = 0 and σ ∪ J ′ contains an undirected path, of length 2, on the vertices (v 0 , κ, v 3 ).
for all i.
In the following examples, we demonstrate the utility of directed centres.
Figure 5: Examples of the reductions used in Lemma 4.10 which are explained in greater depth in Example 4.9.Black, solid edges indicate the initial cycle.Blue, dash-dotted edges are new edges in the reduced cycle.Red, dashed edges are those removed in the reduced cycle.Green, dotted edges must be present in order to do the illustrated reduction.Square nodes symbolise directed centres for the undirected path (v 0 , v 1 , v 2 , v 3 ).
Example 4.9. Figure 5 shows four examples of the reduction strategy described by Lemma 4.10.For illustration, we describe these reductions in more detail below.
(a) In Figure 5a, the initial undirected path of length 3 has a directed centre κ which does not coincide with a vertex in the rest of the cycle.Therefore, we can write (b) In Figure 5b, the path has a directed centre κ = v 5 .Replacing the initial path with the smaller path, via the directed centre, yields a sum of two fundamental cycles: (c) In Figure 5c, the path has a directed centre κ = v 4 .Replacing the initial path (v 0 , v 1 , v 2 , v 3 ) with the smaller path (v 0 , κ, v 3 ) yields a much smaller support since the edge (v 3 , v 4 ) gets cancelled out: (4.28) (d) Finally, in Figure 5d, the initial path is reducible via the shortcut edge (v 0 , v 2 ) and hence These examples tell the story of each case in the following lemma, in which we confirm that the presence of directed centres allows us to systematically reduce fundamental cycles.
Proof.Since v is a fundamental cycle, it is supported on some combinatorial, undirected cycle for some v i ∈ V and τ i ∈ E ordered such that where Since k ≥ 4, the vertices (v 0 , . . ., v 3 ) are distinct and, along with the edges τ 1 , τ 2 , τ 3 , form an undirected path of length 3. Either this is reducible via some shortcut edge τ ∈ E, or there exists a directed centre κ ∈ V .In either case, there is some undirected path, from v 0 to v 3 , of length at most 2.This path is represented by some η ′ ∈ Ω 1 with coefficients in {±1}, such that Since both η ′ and η := 3 i=1 α i e τ i are supported on undirected paths from v 0 to v 3 , we have either due to reducibility or a directed centre), there is some u ∈ Ω 2 (G) such that ∂ 2 u = η − η ′ .Therefore we can replace the initial undirected path of length 3, in v, with an undirected path of length at most 2, i.e so ω has a strictly smaller support.It remains to prove that ω can be decomposed into a sum of at most two fundamental cycles.
In the case that the path has a directed centre κ, we split into two further sub-cases.
If supp(ω) ∩ supp(η ′ ) = ∅ then all coefficients of ω are still ±1.However, the replacement procedure has the effect of pinching supp(ω) into two edge-disjoint, undirected cycles, which share a vertex at κ. Hence, we can easily decompose ω into a sum of two fundamental cycles ω1 and ω2 , supported on each of these underlying cycles.
If the intersection is non-empty, then supp(ω) ∩ supp(η ′ ) ⊆ {τ 4 , τ k }, so there are at most two offending edges.Moreover, in order to attain ∂ 1 (η − η ′ ) = 0 these edges must appear with opposite signs in ω and η ′ respectively.If there are two offending edges then we must have 4 = k − 1 and the replacements procedure yields ω = 0.If there is only one offending edge then this edge is no longer contained in supp(ω) and the length of the underlying undirected cycle is further reduced.
Case 2: If the path was reducible, in most cases supp(ω) and supp(η ′ ) are disjoint and the replacement process simply removes one or two vertices from the undirected cycle.The only remaining case is if k = 4 and supp(ω) ∩ supp(η ′ ) = {τ k }, in which case the replacement procedure yields ω = 0.
Once we have reduced large cycles into smaller ones, we need conditions to ensure that the resulting small cycles are themselves homologous to zero.
Proof.For some vertices v 0 , . . ., v k−1 ∈ V and edges τ 1 , . . ., τ k ∈ E we can write the underlying cycle as Since κ is a cycle centre, either γ i := e κv i v i+1 ∈ Ω 2 for every i or γ i := e v i v i+1 κ ∈ Ω 2 for every i (identifying v k = v 0 ).In either case, by a telescoping sum argument, After adjusting for a factor of ±1, this concludes the proof.
Piecing these lemmas together, gives us a topological condition, which implies − → β 1 (G) = 0, and which is likely to occur in high density graphs.Proposition 4.12.For any simple digraph G, if every irreducible, undirected path of length 3 has a directed centre, and every directed cycle of length 2 or 3 has a cycle centre, then Proof.We prove the non-regular case from which the regular case immediately follows by Corollary 3.16.
If supp(ω i,j ) is a directed triangle then ωi,j is the boundary of the corresponding basis element in Ω 2 ; hence ωi,j = 0 (mod im ∂ 2 ).Otherwise, supp(ω i,j ) must be a directed cycle or length 2 or 3.In either case, the support has a cycle centre and hence, by Lemma 4.11, ωi,j = 0 (mod im ∂ 2 ).Therefore ω i = 0 (mod im ∂ 2 ).
Remark 4.13.Every double edge (i, j), (j, i) ∈ E appears as an allowed 2-path in Ω R 2 and ∂ R 1 (e iji ) = e ji + e ij .Therefore, the requirement that every cycle of length 2 has a cycle centre is not strictly necessary to ensure − → β R 1 (G) = 0. Definition 4.14.For each n ∈ N, the complete directed graph on n-nodes, K n , is defined by (c) for σ ∈ P n 3 or σ ∈ C n k for some k ≥ 2, S σ is the event that σ is a subgraph of G; (d) for σ ∈ P n 3 , I σ is the event that σ is irreducible in the graph G ∪ σ; (e) for σ ∈ P n 3 , A σ,κ is the event that κ is a directed centre for σ in the graph G ∪ σ; (f) for σ ∈ C n k for some k ≥ 2, B σ,κ is the event that κ is a cycle centre for σ in the graph G ∪ σ.Remark 4.15.For a fixed σ ∈ P n 3 , the events S σ , I σ and A σ,κ for every κ ∈ V (G) \ V (σ) are mutually independent.For a fixed σ ∈ C n k for some k ≥ 2, the events S σ and B σ,κ for every κ ∈ V (G) \ V (σ) are mutually independent.
Proof.By Proposition 4.12, it suffices to show that the probability that there exists an irreducible, undirected path of length 3 without directed centre, or a cycle of length 2 or 3 without directed centre, tends to 0 as n → ∞.The probability that there is an irreducible, undirected path of length 3 without a directed centre is at most . This count arises because an undirected path of length 3 is determined by a choice of 4 nodes, an order on the nodes and a choice of orientation on each edge.However, this counts each path twice: once in each direction.Also, each path arises in G with probability P [S σ ] = p 3 and clearly P [I σ ] ≤ 1.
For each σ ∈ P n 3 and κ ∈ V (G) \ V (σ), there is at least one choice of 3 directed edges, from κ to the vertices of the path, which forms a directed centre.Namely, label the vertices of σ by (v 0 , . . ., v 3 ).Then we can always choose an edge between κ and v 0 and another edge between κ and v 2 so that there is a long square on {κ, v 0 , v 1 , v 2 }, as illustrated in Figure 4.The third edge can then be chosen to ensure that there is a directed triangle on {κ, v 2 , v 3 }.If these three edges are present in G, they constitute J ′ ⊆ J σ,κ with the properties required to form a directed centre and hence P [A σ,κ ] ≥ p 3 .Therefore we can bound the probability (4.42) further by We wish to show that this bound tends to 0 as n → ∞.Since p ≤ 1, it suffices to show lim n→∞ n 4 exp(−p 3 n) = 0.By Lemma 4.17, the condition on p ensures that lim n→∞ (4 log(n) − p 3 n) = −∞.By the continuity of the exponential function, lim n→∞ n 4 exp(−p 3 n) = 0.
Note #C n 2 = n 2 and #C n 3 = 2 n 3 .By another union bound, we see that the probability that there is a directed cycle, of length 2, without cycle centre, is at most Similarly, the probability that there is a directed cycle, of length 3, without cycle centre is at most Again, by Lemma 4.17, the condition on p suffices to ensure that these two bounds also tend to 0 as n → ∞.
Remark 4.18.Lemma 4.17 reveals the origin of the ratio 1/3, which appears in Theorems 1.3(d) and 4.16.In particular, it arises as the ratio between the power of n and the power of p inside the exponential of equation (4.43).The power of n is 1 because there are on the order of n 1 possible directed centres for an undirected path of length 3. The power of p is 3 because we require at least 3 edges from κ to the path, in order for κ to form a directed centre.In Lemma A.7, we will see that this is indeed the minimal number of edges required to form a directed centre.
The bounds used in the proof of Theorem 4.16 are by no means the best possible.Indeed, by splitting P n 3 into four isomorphism classes, it is possible to get exact values for P [I σ ] and P [A σ,κ ].We explore this further in Appendix A.3 in order to obtain tighter bounds, useful for hypothesis testing.
Moreover, the topological condition for − → β 1 (G) = 0 presented in Proposition 4.12 was chosen since it is likely to occur at high densities.However, there may (and indeed probably does) exist weaker topological conditions which imply − → β 1 (G) = 0 and occur at somewhat lower densities.This could potentially allow for a weaker hypothesis on Theorem 4.16.In order to conjecture the weakest possible hypothesis, we conduct a number of experiments in Appendix B.

Directed flag complex of random directed graphs
For comparative purposes, we now apply the techniques of Section 4 to the directed flag complex, which features more readily in the literature.Definition 5.1.[18, Definition 2.2] An ordered simplicial complex on a vertex set V is a collection of ordered subsets of V , which is closed under taking non-empty, ordered subsets (with the induced order).A subset in the collection consisting of (k + 1) vertices is called a k-simplex.Definition 5.2.[18, Definition 2.3] Given a directed graph G = (V, E), (a) a directed (k +1)-clique is a (k +1)-tuple of distinct vertices (v 0 , . . ., v k ) such that (v i , v j ) ∈ E whenever i < j; (b) the directed flag complex, − → X (G), (often denoted dFl(G)) is an ordered simplicial complex, whose k-simplicies are the directed (k + 1)-cliques.
Given a ring R,the directed flag chain complex is { where (v 0 , . . ., vi , . . ., v k ) denotes the directed k-clique (v 0 , . . ., v k ) with the vertex v i removed.This defines ∂ k on a basis of − → X k (G), from which we extend linearly.We also define − → X −1 (G) = R and ∂ 0 simply sums the coefficients in the standard basis, as in equation (3.5).
The homology of this chain complex is the directed flag complex homology.The Betti numbers are denoted β k ( − → X (G)).When R is omitted from notation, assume R = Z.
Firstly, as with path homology, β 0 ( − → X (G)) captures the weak connectivity of a digraph G and hence Theorem 1.2 also holds for the directed flag complex.Next, since we have an explicit list of generators for − → X k (G), and they are easy to count, we can calculate the expected rank of the chain groups in every dimension.

Lemma 5.3. For an Erdös-Rényi directed random graph
) . (5.3) Proof.A possible directed clique is uniquely determined by an ordered (k + 1)-tuple of distinct vertices.Therefore, there are n k+1 (k + 1)! possible cliques.For the clique to be present, one edge must be present in G for every pair of distinct nodes.
Using the Morse inequalities as before, this allows us to compute the growth rate of the expected Betti numbers, under suitable conditions on p = p(n).
which tends to 0 thanks to the condition p which tends to 0 thanks to the condition p(n) = o(n −1/(k+1) ).An analogous argument to Theorem 4.2 concludes the proof.
The second moment method can also be used to show β 1 ( − → X (G)) > 0 with high probability, under the same conditions as Theorem 5.4.
Proof.The proof is identical to Theorem 4.4 except, in order to ensure we require p(n) = ω(n −1 ) and p(n) = o(n −1/2 ), as argued in the proof of Theorem 5.4.
As with path homology, degree 1 homology appears in the directed flag complex with the appearance of undirected cycles in the underlying digraph.Therefore, the same conditions show that β 1 ( − → X (G)) = 0 with high probability, when p = p(n) shrinks too quickly.Proof.Assume that σ ∈ P n 3,m and κ ∈ V (G) \ V (σ).Let q m,l denote the number of l-element subsets J ⊆ {(v, κ), (κ, v) | v ∈ V (σ)} such that κ is a generic directed centre for σ in the graph σ ∪ J. Note, q m,l is well-defined because, for a fixed m, all σ ∪ J are isomorphic for σ ∈ P 3,m and κ ∈ V (G) \ V (σ).It is worth reiterating that these interpretations and results all hold in the limit.That is, Theorem 1.3 provides no guarantees for a finite line segment, regardless of its gradients or length.However, we do observe that the boundaries between the three regions converge onto straight lines, of the correct gradient, relatively quickly (i.e.within n ≤ 50 nodes).We will see empirical evidence for this in the following section.

B.3 Finding boundaries
Note, Theorem 1.3 says nothing of the region −2/3 < α < −1/3.In the following discussion we attempt to determine, empirically, the equations of the boundaries between the positive region In Figure 12 we repeat the same analysis with Experiments 3 and 4 respectively, in order to discern the boundaries of the positive region for directed flag complex homology.Again, Figure 12a shows that the empirical lower boundary has a similar dependency on n to that predicted by Theorem 5.8(b, c), i.e. p l (n) ∼ n −1 .Figure 11b shows an upper boundary of p u (n) ∼ n −0.443 .This is also consistent with Theorem 5.8d since −0.437 < −1/4.This provides evidence that the zero region for directed flag complex can be expanded as far as (−∞, −1) ∪ (−1/2, 0].

B.4 Testing for normal distribution
In analogy to known results for the clique complex [16, Theorem 2.4], one conjecture is that, in the known positive region, the normalised Betti number − → β 1 approaches a normal distribution.where N(0, 1) is the normal distribution with mean 0 and variance 1.
To provide some empirical evidence towards this conjecture, we perform 10 normality tests on the distributions of − → β 1 , obtained in Experiment 1.We restrict our focus to the samples in which at most 5% of samples were zero, so that we are in a parameter region where we hope our conjecture would apply.
We normalise each of the remaining samples and perform 10 hypothesis tests under the null

Figure 1 :
Figure 1: Two motifs which are indistinguishable to the directed flag complex but have different path homology.

Theorem 1 . 4 .
For an Erdös-Rényi random directed graph G ∼ − → G (n, p(n)), let − → β k denote the k th Betti number of its non-regular path homology.Assume p(n) = n α with α < − N +1 N for some N ∈ N.Then, − → β k = 0 with high probability for every k ≥ N .The same result holds for regular path homology.

Figure 2 : 2 .
Figure 2: Generators for Ω R 2 (G; Z).(a) A double edge.(b) A directed triangle.(c) A long square.The red, dashed line must not be present for the third motif to constitute a long square.

(Figure 3 :
Figure 3: Linearly dependent long squares with source i and sink k.

Lemma 4 . 1 .
For a random directed graph G ∼ − → G (n, p) we have the following expectations
b τ := e τ − k i=1 α i e τ i ∈ ker ∂ 1 .Note that b τ is a fundamental cycle.The set B := {b τ | τ ∈ E(G) \ E(T )} is linearly independent because, given b τ ∈ B, no other b τ ′ ∈ B involves the basis element e τ of Ω 1 .Note, we can write

Figure 4 :
Figure 4: (a, b) Directed centres for undirected paths of length 3. The blue, dashed edges constitute J σ,κ .(c) A cycle centre for a 3-cycle.(d) A cycle centre for a 2-cycle.
define the following collection of subgraphs contained within K n :(a) P n 3 := {subgraphs σ ⊆ K n | σ is an undirected path of length 3}; (b) For each k ≥ 2, C n k := {subgraphs σ ⊆ K n | σ is a directed cycle of length k}.Given a random graph G ∼ G(n, p), we define the following events:

Figure 7 :
Figure 7: The four possible '3-path motifs' which partition P n3 , shown with black edges.From left to right we see the edge orientations for σ belonging to P n 3,0 , P n 3,1 , P n 3,2 and P n 3,3 respectively.Dashed, red edges indicate edges which must not be present for the path to be considered irreducible.

Figure 9 :
Figure 9: Statistics for the path homology of samples of 100 random directed graphs G ∼ − → G (n, p), sampled at a range of parameter values.Our primary contribution is descriptions of the boundaries of the darker, blue region in Figure 9a and a limiting value for the lighter, yellow region of Figure 9b as n → ∞.

Figure 10 :
Figure 10: Statistics for the directed flag complex homology of samples of 200 random graphs G ∼ − → G (n, p) sampled at a range of parameter values.

Figure 11 :
Figure 11: Over a range of parameters (n, p) we measure the first Betti number, − → β 1 (G), for 100 sampled random graphs G ∼ − → G (n, p).Then, at each n, we determine maximum p l (n) (and minimum p u (n)) such that for all p ≤ p l (n) (and all p ≥ p u (n)) at most 5% of graphs sampled from− → G (n, p) have − → β 1 (G) > 0.The figures show log-log plots of these two functions.In both cases, we fit a line of best fit in order to obtain an approximate power-law relationship.

Figure 13 :
Figure 13: Results for 10 normality tests on samples of − → β 1 (G) for G ∼ − → G (n, p) at a range of n and p. Red rectangles indicate that at least 5% of samples were zero and hence are excluded from the experiment.Colour indicates the P-value for the hypothesis test in question.Finally, for each test and at each n, we average the P-value of the range of relevant p, which is recorded on the line graph.Note adjacent densities, p, on the horizontal axis are shown with equal width, despite being logarithmically spaced.