Markov processes on quasi-random graphs

We study Markov population processes on large graphs, with the local state transition rates of a single vertex being linear function of its neighborhood. A simple way to approximate such processes is by a system of ODEs called the homogeneous mean-field approximation (HMFA). Our main result is showing that HMFA is guaranteed to be the large graph limit of the stochastic dynamics on a finite time horizon if and only if the graph-sequence is quasi-random. Explicit error bound is given and being of order $\frac{1}{\sqrt{N}}$ plus the largest discrepancy of the graph. For Erd\H{o}s R\'{e}nyi and random regular graphs we show an error bound of order the inverse square root of the average degree. In general, diverging average degrees is shown to be a necessary condition for the HMFA to be accurate. Under special conditions, some of these results also apply to more detailed type of approximations like the inhomogenous mean field approximation (IHMFA). We pay special attention to epidemic applications such as the SIS process.

Due to the inherent dependencies between individuals exact analysis of such models are infeasible when the population is large.To mitigate the problem, one may focus on population level quantities and derive a deterministic ODE approximation for them by assuming the law of mass action holds.We call such technique the homogeneous mean-field approximation (HMFA).Based on the pioneering works of Kurtz [19,20] HMFA describes the dynamics of macroscopic quantities accurately in the limit when the population size approaches infinite when said population is well mixed -every individual interacts with each other whit equal chance.
Besides well mixed populations, attention to processes whit an underlying topology represented by a network connecting individuals who can interact whit each other also emerged.From the '70s processes like the voter model and the contact process were studied on lattices [15].Later in the 2000s inspired by new developments in network theory extended these investigations to scale free networks such as Barabási-Albert graphs [2,4].For these networks, HMFA turns out to be a too rough approximation [26], hence, new methods have been developed retaining more details about the topology, yet keeping the resulting ODE system tractable for analytical and computational studies.
An intermediate approximation was given by Pastor-Satorras and Vespignany for the Susceptible-Infected-Susceptibel (SIS) process based on the heuristic that vertices with higher degrees are more prone to get infected [28].To account for the this heterogeneity, the authors grouped together vertices whit the same degree into classes and treated elements statistically indistinguishable resulting in an ODE system whit the number of equations proportional to the number of different degrees.We will refer to this method as the inhomogenous mean field approximation (IMFA).
A more detailed approach called quenched mean field approximation or Nintertwined mean field approximation (NIMFA) takes into account the whole network topology via the adjacency matrix only neglecting the dynamical correlations between vertices.This results in a larger ODE system whit size proportional to the number of vertices this time.
Both IMFA and NIMFA were studied in depth yielding valuable insight about epidemic processes most notable about the epidemic threshold.To justify said approximations several numerical studies had been carried out comparing theoretical forecasts to stochastic simulations showing moderate deviations [22,14].
Rigorous theoretical results showing convergence or error bounds, on the other hand, are few and far between.Examples are the above mentioned results for complete graphs by Kurtz [19,20], Volz's equation [32] for the Susceptible-Infected-Revered process (SIR) on graphs generated by the configuration model [25] being the correct mean field limit [10,11] and some exact bounding dynamics [30,33,6].
Our work was heavily inspired by [12] where concepts of discrepancy and spectral gap were utilized to bound one source of the error arising form mean field approximations called the topological approximation.
The aim of this paper is to carry out a rigorous analysis of the HMFA.The motivation behind dwelling into the accuracy of HMFA is two folds.
Firstly, applications may includes graphs like expanders, where well mixing is only an approximation yet one expects them to perform well, therefore, explicit error bounds might be usefull for these settings.
Secondly, we think of HMFA as a stepping stone for understanding more detailed approximations whit the right balance of complexity, rich enough to be interesting and relevant but not too difficult to be unreachable.In some special cases more advanced approximation techniques can be reduces to HMFA.For example, when the graph is regular IMFA and HMFA gives the same approximation as there are only one degree class.
Our main contribution is to characterize the type of graph sequences for which the ODE given by HMFA is the correct large graph limit.These type of graph sequences are those for which the appropriately scaled discrepancy goes to 0, called quasi random graphs sequences.Thus, we reduce the problem to a purely graph-theoretical one.Explicit error bound is given containing the number of vertices and the largest discrepancy.For the later we provide an upper bound based on the spectral gap.
For two types of random graphs: Erdős Rényi graphs and random regular graphs, we show that the error of HMFA can be upper-bounded by a constant times the inverse square root of the average degree, making them accurate for diverging average degrees.We also show that in general, diverging average degree is a necessary condition for the HMFA to be accurate.
The paper is structured as follows: In section 2 we introduces concepts and notations relating graph theory including discrepancies and spectral properties.In section 3 we introduce the type of Markov processes which are evolving on the networks.These models are such that only neighboring vertices can influence directly each others transition rates in a linear fashion.This framework includes among many other models the SIS and the SIR process.In section 4 we introduce HMFA in detail and define precisely what we mean by the HMFA being accurate for a given graph sequence.Section 5 states the theorems and propositions which for which the proof are given in Section 6.

The adjacency matrix is denoted by
A n is symmetric and the diagonal values are 0. The degree of vertex i is denoted by The average degree is denoted by We assume dn > 0.
For regular graphs, The largest connected component (or one of the largest components, if there are several) is denoted by V n conn . θ denotes the ratio of vertices not covered by the largest connected component.
A subset A ⊂ [N ] is called an independent set if none of the vertices in A are connected to each other.The size of the largest independent set is denoted by α n .
Based on the work of Caro and Wei [18], α n can be estimated from below by Applying Jensen's inequality yields Turán's bound: The number of edges between A, B ⊂

Discrepancies
In this section, we introduce several different measures of how well-mixed a graph is [7].This will be measured mostly by edge density in and between various subsets.The discrepancy between two subsets of vertices When A, B are random sets whit given sizes E (e(A, B)) δ(A, B) inherits the symmetric and additive properties of e(A, B), but not monotonicity, since δ(A, B) might take negative values.
From 0 ≤ e(A,B) N dn , |A| Our main focus is the largest possible value of |δ(A, B)| denoted by In general, a low ∂ n guarantees that edge density is relatively homogeneous throughout the graph.
Based on this observation, we distinguish between two types of graph sequences: These are the only possibilities.The term quasi-random is motivated by the fact that for certain classes of random graphs, ∂ n will indeed be small.This is addressed in more detail in Section 2.4.
The following measures of discrepancy will also be utilized: Intuitively, ∂ n 1 measures the worst possible discrepancy within a single set, while ∂ n 2 between two disjoint sets.∂ n * , on the other hand, depends only on the degree sequence of the graph, and measures the concentration of the degree distribution around dn .∂ n * = 0 holds if and only if G n is regular.Note that where The hierarchies between the quantities are stated below.
∂ n * ≤∂ n (10) According to (9), it is easy to see that ∂ n and ∂ n 1 are equivalent in the sense that either both of them converge to 0 or neither.Thus ∂ n 1 is also appropriate for characterizing whether the sequence (G n ) ∞ n=1 is quasi-random or not.Due to (10), ∂ n * → 0 is necessary for the graph sequence to be quasi-random.However, it is not sufficient as the following example shows: G n is a bipartite graph on N (n) = 2n vertices and each vertex having degree 1.As it is a regular graph, ∂ n * = 0 while choosing A as one of the two classes leads to Finally, from ( 9) and (11), under the condition ∂ n * → 0, ∂ n 1 , ∂ n 2 and ∂ n are equivalent in the sense that either all of them converge to 0 or none of them.
Another measure of discrepancy, more suited towards spectral theoretical considerations later, is based on the volume of a set A ⊂ [N ] defined as The corresponding discrepancy is then defined as When the degree distribution is fairly homogeneous, the two quantities do not differ much: We also define discrepancy with respect to induced subgraphs.Let H be an induced subgraph of G (identified as a subset of the vertices H ⊂ [N ]).Then for any A, B ⊂ [N ], e H (A, B) := e (A ∩ H, B ∩ H) .
The discrepancy on H is defined as These quantities are insensitive to the structure of G on H c .When H includes most of the vertices of the original graph, ∂n and ∂n H are close, as formulated rigorously by the following lemma: A similar statement was given in [24] for a related quantity called modularity.

Spectral properties
In this section we discuss how discrepancies can be bounded using spectral theory.
We introduce the diagonal degree matrix D n := diag (d n (1), . . ., d n (N )) and re-scale the adjacency matrix as We order the eigenvalues as According to the Perron-Frobenious theorem, all eigenvalues are real, and The second largest eigenvalue in absolute value is denoted by 1 − λ n is called the spectral gap.Matrices with large spectral gap are generally "nice": they have good connectivity properties, random walks on them converges to equilibrium fast and vertices are well-mixed.The following proposition is a special case of Lemma 1 in [3].
From ( 12) it is easy to see that meaning a large spectral gap guarantees low discrepancies, at least in the degree biased setting.When G n is regular, the expressions simplify to ∂n = ∂ n and λ n can be expressed as For fixed d n = d, the spectral gap can not be too close to 1. Based on [27], for every d and ε > 0 there exists an N 0 such that for every graph with at least N 0 vertices

Random graphs
In this subsection we discuss the types of random graphs used in this paper and their properties.
The first example is the Erdős-Rényi graph G ER (N, p n ), which contains N vertices and each pair of vertices are connected with probability p n , independent from other pairs.The expected average degree is Another type of random graphs of interest is the random regular graph, denoted by G reg (N, d n ).It is a random graph chosen uniformly from among all d n -regular graphs on N vertices (N d n is assumed to be even).
The bound ( 14) is sharp for d-regular random graphs with even d ≥ 4 in the following sense [13]: Similar results are shown in [13] when with probability at least 1 − 1 N [31].The spectral gap of the Erdős-Rényi graph is analogous whit the catch that d n should be replaced with d n at least when one constrain ourselves to an appropriate large subgraph.For this purpose we introduce the core of the matrix defined in [9].
The core where H is constructed as follows.
• Initialize H as the subset of vertices who have at least d n 2 vertices.
• While there is a vertex in H with eat least 100 neighbors in [N ]\H remove that vertex.
According to the following proposition core (G n ) is a sub graph covering most of the vertices whit a large spectral gap.Proposition 2. Assume c 0 ≤ d n ≤ 0.99N for some sufficiently large c 0 .Then there is a c 1 such that w.h.p.
where 1 − λ n H is the spectral gap on the on the subgraph.
The proof of Proposition 2 can be found in [9].When d n ≥ log 2 N a simpler statement can be made based on [8].

Markov processes on graphs
This section describes the dynamics.Assume a graph G n is given.Each vertex is in a state from a finite state space S. ξ n i,s (t) denotes the indicator that vertex i is in state s at time t; the corresponding vector notation is Our main focus is to describe the behavior of the average ξn s (t) can be interpreted as the ratio of vertices in state s ∈ S at time t.It is worth noting that both ξ n i (t) and ξn (t) lie inside the simplex t) = 1 denote the set of vertices in state s at time t.The normalized number of edges between vertices in state s and s ′ is denoted by We may also reformulate the ratio ξn s (t) as Each vertex may transition to another state in continuous time.The transition rates of a vertex may depend on the number of the neighbors of that vertex in each state.For vertex i, the number of its neighbors in state s is We introduce the normalized quantity with corresponding vector notation φ n i (t) := φ n i,s (t) s∈S .Note that Typically φ n i (t) / ∈ ∆ S .Transition rates are described by the functions q ss ′ : R S → R. With slightly unconventional notation, q ss ′ will refer to the transition rate from state s ′ to s.This convention enables us to work with column vectors and multiplying by matrices from the left.The matrix notation Q(φ) = (q ss ′ (φ)) s,s ′ ∈S will be used.
We require q ss ′ (φ) ≥ 0 for s = s ′ for non-negative inputs φ ≥ 0. For the diagonal components, corresponds to the outgoing rate from state s.
The dynamics of (ξ n i (t)) N i=1 is a continuous-time Markov chain with statespace S N where each vertex performs transitions according to the transition matrix Q (φ n i (t)), independently from the others.After a transition, vertices update their neighborhood vectors φ n i (t).This means that, at least for a single transition, each vertex is affected only by its neighbors.We call such dynamics local-density dependent Markov processes.
Our main assumption is that the rate functions q ss ′ are affine (linear, also allowing a constant term), meaning there are constants q (0) ss ′ ,r φ r .
From the non-negative assumption it follows that these coefficients are nonnegative.Let q (1) max denote the maximum of q or, writing it out for each coordinate, Using the identity after taking the average in (18) with respect to i we get that the rate at which the ratio ξn s (t) changes can be calculated as

Examples
Next we give some examples for Markov processes on graphs, with special focus on epidemiological ones.
Conceptually the easiest epidemiological model is the SIS model.The state space is S = {S, I}, S for susceptible and I for infected individuals.The transition rates are q SI (φ) =γ, q IS (φ) =βφ I , meaning susceptible individuals are cured with constant rate and susceptible individuals become infected with rate proportional to the number of their infected neighbors.
The SIR model describes the situation when cured individuals cannot get reinfected.The state space is S = {S, I, R}, including R for recovered individuals.The dynamics is modified as The SI model describes the situation when there is no cure.This might be a realistic model for the diffusion of information.It is the special case of either SIS or SIR with γ = 0.In this paper, we will regard it as a special case of SIR, allowing for a state R which basically acts as an isolated state.(This approach will be useful for counterexamples.) For later use, we also introduce notation for terms of order 2 and 3: denote the expected number of AB pairs and ABC triples.Note that According to Theorem 4.4 in [17] for the SI process We also introduce an auxiliary model called the degree process.The state space is S = {a, b} and the only transition, a → b, with rate Since the state of the neighbors does not influence the transition rate, the evolution of the vertices is independent from each other.

Homogeneous mean field approximation
The evolution of the Markov processes introduced in Section 3 could be described a system of linear ODEs given by Kolmogorov's forwards equation in principle.However, as the state space is S N , solving said system is not viable even for relatively small values of N .A remedy for this problem is to assume interactions are well mixed so that the dynamics could be described by a few macroscopic averages then derive equations for the reduced system.This is what he homogeneous mean-field approximation (HMFA) hopes to achieve whit and ODE system whit |S| variables.
HMFA is based on the following two assumptions: • Low variance: ξn s (t) is close to deterministic when N is large.
• The graph is well-mixed, discrepancies are low.
We present an intuitive derivation of the governing ODE using these two assumptions.We replace ν n s ′ r (t) by ξn s ′ (t) ξn r (t) in (19) based on the well-mixed assumption to get with corresponding vector notation f is Lipschitz continuous in ℓ 1 norm on the simplex ∆ S ; its constant is denoted by L f .
The error arising from this approximation can be bounded from above by ∂ n due to Using the low variance assumption, we replace the ratio ξn s (t) with a deterministic quantity u s (t).Based on the second assumption and (19), u(t) must satisfy the system of ODEs This system of ODEs satisfy the the following existence uniqueness and positivity properties Lemma 4. Assume u(0) ∈ ∆ S .Then there is a unique global sollution to (22) such that u(t) ∈ ∆ S for all t ≥ 0.
. As the coordinates (u s (t)) s∈S are linearly dependent only |S| − 1 ODEs need to be solved in practice.
For the degree process (22)  if the following holds: we fix any arbitrary linear model Q : R S×S → R S and asymptotic initial condition u(0) ∈ ∆ S .(ξ n i (0)) ∞ n=1 is an arbitrary sequence of initial conditions such that ξn (0) → u(0).Then for any T > 0 Otherwise we say the HMFA is not accurate or inaccurate for the graph sequence.
Note that we implicitly assumed N (n) → ∞, otherwise ξn (t) would remain stochastic thus the deterministic approximation would trivially contain some non-vanishing error.
The requirement for the HMFA to work for any linear Q is somewhat restrictive, as there may be cases when the HMFA is accurate for some processes but not for others.Notably for Q(φ) = Q (0) constant, the vertices are independent, hence HMFA works for any (G n ) ∞ n=1 based on the law of large numbers.We wish to exclude these pathological cases by requiring convergence for all Q.
Similarly, it may be possible that HMFA works for some sequence of initial conditions but not for others.For example, in the SIS model starting the epidemics with 0 infected individual results in the same limit for both the exact stochastic process and the ODE, regardless of the graph sequence.It is also possible that the stochastic process exhibits wildly different behavior for different initial conditions (ξ n i (0)) ∞ i=1 and (ξ n i ) ′ (0) n i=1 while ξn (0) and ξn (0) ′ converge to the same u(0), rendering the ODE unable to distinguish between the two cases.
This can be illustrated by the following example: Let G n be the star graph with N (n) = n vertices and i = 1 being the hub.For the SI process if we choose the initial condition with no infection, then ξn (t) and u(t) will lead to the same conclusion.However, when we only infect i = 1 leaving the rest susceptible, then i = 1 will stay in that state forever while the leaves are infected independently with rate β dn ≈ β 2 , thus ξn while ξn I (0) = 1 n → 0 =: u I (0).In general, for non quasi-random graph sequences, the initial condition can be selected in a similar manner to conclude that HMFA is not accurate.

Results
The central claim of this paper is the following: Theorem 1 (Main).For a graph sequence (G n ) ∞ n=1 the HMFA is accurate if and only if said sequence is quasi-random.
The following theorem shows the ⇐ direction of Theorem 1 and provides a quantitative error bound which can be used for non-quasi-random graphsequences as well or even concrete graphs.
The first two terms are vanishing as N (n) → ∞ and ξn (0) → u(0) so the only nontrivial part is O(∂ n ), which goes to 0 when the graph sequence is quasirandom by definition.The term vanishes when q (1) max = 0 since in that case Q(φ) = Q (0) is constant making vertices independent.It is worth mentioning that based on the proof of Lemma 6, O p Ä max , T and apart form N (n), it is independent from the graph sequence.
For some random graphs we may bound ∂ n with something as the same order as dn − 1 2 .
Theorem 3 (Discrepancy bound for Erdős-Rényi graphs).For a G ER (N, p n ) Erdős-Rényi graphs sequence with d n → ∞ Similar results can be said about G reg (N, d n ) random regular graphs.Based on ( 13),( 15) and ( 16) when the appropriate conditions hold.The next theorem shows the ⇒ direction for Theorem 1.
Lastly, we introduce some graphs that are not quasi-random.
Theorem 5.The graph sequence (G n ) ∞ n=1 is not quasi-random if at least one of the statement below is true: 1) G n is bipartite for infinitely many n, 2) lim sup n→∞ θ n > 0, meaning, the giant component does not cover most of the vertices, 3) lim sup n→∞ α n N > 0 (there are large independent sets), 4) lim inf n→∞ dn < ∞ ( dn does not approach infinite).
Interestingly, based on Proposition 5, not even well known random graphs such as the Erdős-Rényi graph and random d-regular graphs are quasi-random when the degrees are bounded.Combined with Theorem 3, an Erdős-Rényi graph will be quasi-random if and only if d n → ∞.

Smaller statements
In this section we are giving the proofs for Lemma 1-4.

Proof. (Lemma 1)
as the maximum is taken on a wider domain in ∂ n .This immediately shows (10) and the left hand side of (9).For the right hand side, let Ã, B ⊂ [N ] be two disjoint sets.
Let A, B ⊂ [N ] be two extreme sets maximizing |δ(A, B)| .They can be decomposed into the disjoint sets: For the first inequality, For the second inequality, choose A, B ⊂ [N ] to be extremes sets reaching the value the maximum for |δ(A, B)|.
Proof.(Lemma 3) For the first term in δ(A, B) and δH (A, B): For the second term where we used in the last step.Putting together the two bounds yields ) .
The second inequality can be proven in the same fashion as in Lemma 2 by choosing A, B ⊂ [N ] that maximizes δ(A, B) then the same whit δH (A, B) .
With Q(φ) = (q ss ′ (φ)) s,s ′ ∈S the altered ODE system is defined as Since the right hand side is locally Lipschitz a unique local solution exists.This solution is either global, or there is a blow up at some time t 0 .Indirectly assume the latter.
Introduce an auxiliary time inhomogenous Markov process defined on [0, t 0 [.The process makes transitions acording to the transition matrix Q(û(t)) where we think of û(t) as a known function.p s (t) is the probability of the auxiliary process is in state s ∈ S at time t.p(t) := (p s (t)) s∈S ∈ ∆ S as p(t) is a distribution on S.
The Kolmogorov equations for the auxiliary process takes the form This enables us to use a Grönwall-type argument.

Positive results
Int this section Theorem 2 and 3 are proven.We start by proving Theorem 2.
The way we are going to achieve that is by decomposing the approximation error into two parts: fluctuation error and topological error, and then applying Grönwall's inequality to mitigate error propagation.
The fluctuation error in state s ∈ S is denoted by U n s (t) and defined as Note that due to (19) the conditional expectation can be written as The topological error denoted by K n s (t) is defined as Using the vector notations U n (t) = (U n s (t)) s∈S , K n (t) = (K n s (t)) s∈S the ratio vector ξn (t) can be given as where C(T ) := q (1) max |S| 3 T .

Proof. (Lemma 5)
The integral form of ( 22) is Therefore using Grönwall's inequality one gets What is left to do is to upper-bound the topological error term.Note that g s and f s only differ in their quadratic terms.Using (21) yields The last step in proving Theorem 2 is to show an appropriate upper-bound for the fluctuation error.

Proof. (Lemma 6 )
We represent the ratios ξn s (t) with the help of Poisson-processes similarly as in [21].For each s, s ′ ∈ S let N n ss ′ (t) be independent Poisson-processes with rate 1 representing a transition from state s ′ to s.Then ξn s (t) can be written in distribution as To understand (26), the term 1 N comes from coming (leaving) vertices increases (decreases) ξn s (t) by 1 N .If the ith vertex is at state s ′ , then it makes a transition to state s with rate q ss ′ (φ n i (t)).In total, vertices vertices comeing from state s ′ to s with rate N i=1 q ss ′ (φ n i (t)) ξ n i,s ′ (t).The difference between the Poisson processes and their conditional expectation is denoted by As there are finitely many terms, it is Using (17) the rates can be upper-bounded by 2 is a submartingale, thus from Doob's inequality we get Now we can turn to proving the error bound for the Erdős Rényi graph sequence.First, we show that for such graph sequences have close to homogeneous degree distributions.
Proof.(Lemma 7) Firstly, we modify ∂ n * such that we replace dn with d n . Secondly, , where in the last step we used d n ≥ 1 for large enough n.Therefore, Proof.(Theorem 3) Let c 0 and H = core(G n ) be the constant and the core from Proposition 2. For large enough n c 0 ≤ d n holds based on the assumption.
In the first case, assume c 0 ≤ d n ≤ 0.99N.then based on Proposition 2 for large enough n with high probability, hence according to Lemma 3 ∂n ≤ ∂n H + 10 In the second case d n ≥ 0.99N ≥ log 2 N. Proposition 3 ensures in this case as well.Lastly, applying Lemma 2 yields

Negative results
In this section Theorem 4 and 5 are proven.The main idea for Theorem 4 is the following: Instead of the concrete values of ξn (t) we are going to work with the expected value µ n (t) := E ξn (t) .
By standard arguments one can show that sup 0≤t≤T ξn (t) − u(t) 1 → 0 st.implies |µ n s (t) − u s (t)| → 0 st.for all 0 ≤ t ≤ T and s ∈ S as the quantities in question are uniformly bounded, hence, it is enough to disprove this simpler statement.
According to (19), d dt µ n (t) and d dt u(t) differs in the quadratic term and those differences can be described by discrepancies.When ∂ n does not vanish we can choose the initial conditions in such a way that the discrepancies are high resulting in a different rates.Since µ n s (t) − u s (t) ≈ d dt µ n s (0) − d dt u s (0) t this would leave a difference between the prediction of the ODE and the expectation for small values of t resulting in the desired contradiction.
In order for this argument to be valid, we require the second order terms to be negligible or in other words some regularity in the second derivative.For the ODE it automatically holds, however, for the stochastic processes some problem can arise when the degrees have extreme variation.
To illustrate this phenomena, let G n a sequence of star graphs and examine the SI process on it.Note that dn = 2 1 − 1 N .We initially choose the leafs to be infected and keep the center susceptible.This would lead to the large discrepancy therefore, based on the argument above, one might expect µ n I (t) to differ considerably from u I (t).This is not the case thou.Clearly ξn I (0) = N −1 N → 1 =: u I (0), so the ODE approximation is the constant u(t) ≡ 1.The expected value will be close to 0 as well as To mitigate this problem, we will take the following rout: When ∂ n * does not converge to 0, then we show that the HMFA is not accurate for the degree process.When ∂ n * → 0, on the other hand, we can thin our extreme sets whit large discrepancy by removing vertices whit too large degrees.The thinned sets will still contain vertices whit high discrepancy, while the second derivatives renaming bounded.
For technical reasons, we will utilize function u n (t), which is the solution of ( 22) with the error-free initial condition u n (0) = ξn (0).

Lemma 8. (Technical lemma for counterexamples)
Assume there is a sequence of initial conditions such that there are s ∈ S, K, t 0 > 0 and a sub-sequence lim sup Then we can modify the initial conditions such that ξn (0) → u(0) for some u(0) ∈ ∆ S , while there is some t > 0 such that hence HMFA is inaccurate.

Proof. (Lemma 8)
We can choose K to be large enough such that also holds uniformly in time and for any initial conditions.Here we used that f s is a polynomial and the solutions stay at the simplex ∆ S .Let 0 < t ≤ t 0 .Then there is mean value τ ∈ [0, t] such that Similarly, From the construction, it is clear that the inequality is also true for any further subsequence.
As ∆ S is compact we can choose a further sub-sequence (n ′′ k ) such that µ n ′′ k (0) → u(0) for some u(0) ∈ ∆ S .When n = n ′′ k for some k, we modify the initial conditions such that µ n (0) → u(0) hold for the whole sequence.Due to the continous dependence on the initial conditions, this also ensures u n (t) → u(t).Thus, lim sup Lemma 9. When lim sup n→∞ ∂ n * > 0 the HMFA is not accurate for the degree process.
Proof.(8).Choose the initial conditions to be ξ n i,a (0) = 1 {i∈V n k − } and the rest of to vertices to be b.Based on (7) The expectation and the solution for the ODE can be calculated explicitly.
As for (28), For technical reasons, we assume lim inf n→∞ dn ≥ 1. Luckily, this is not a too restrictive condition based on the lemma below.
Let examine the SI process whit initial conditions ξ n k i,I (0) = 1 {i∈A n k } and the rest of the vertices are susceptible.As all the infected vertices are isolated the dynamics in the Markov-process remains constant At first, assume 0 < θ sup < 1.
When θ sup = 1, then even the largest connected component covers only o (N (n k )) vertices.Let the connected components be V n k 1 , . . ., V n k Qn k ordered increasingly.All of them contains at most o (N (n k )) vertices.
+ o (N (n k )) vertices and does not share edges whit its complement.Using this set and its complement concludes the proof of 2) as For 3), let (n k ) ∞ k=1 be a sequence such that lim k→∞ α n k N (n k ) = lim sup n→∞ α n N =: α * > 0. A n k is a sequence of independent set with size Lastly, (n k ) ∞ k=1 is a sub-sequence with lim k→∞ dn k = lim inf n→∞ (D n ) c =: D. Then according to (1) hence the condition for 3) holds proving 4).

Conclusion
In this work the accuracy of the homogeneous mean field approximation of density-dependent Markov population processes whit linear transition rates was studied on graph sequences.The motivation for examining HMFA was giving

δ 1 2
µ n k I (t) ≡ µ n k I (0).For the ODE on the other hand d dt u n k I (0) = βµ n k I (0) (1 − µ n k I (0)) → βd 0 (1 − d 0 ) > 0, so the conditions of Lemma 8 clearly hold.Lemma 11. (Getting rid of the extreme vertices) Assume lim sup n→∞ ∂ n 2 > 0 and∂ n * → 0. A n , B n ⊂ [N ] are disjoint sets such that |δ(A n , B n )| = ∂ n 2 .D n := i ∈ [N ] d n (i) ≤ 2 dn A n D := A n ∩ D n B n D := B n ∩ D n Then lim sup n→∞ |δ (A n D , B n D )| > 0.Proof.(Lemma 11) First, we show that lim sup n→∞ |δ(A n D , B n )| > 0. Introduce Dn := i ∈ [N ] d n (i) > 2 dn Ān D := A n ∩ Dn Clearly A n D ⊔ Ān D = A n and Ān D ⊂ Dn .Let ι n be a uniform random variable on [N ].Dn N =P d n (ι n ) > 2 dn ≤ P d n (ι n ) − dn > dn ≤ 1 dn E d n (ι n ) − dn = 1 N dn n i=1 d n (i) − dn = 2∂ n * → 0 Assume indirectly |δ (A n D , B n )| → 0. Therefore lim sup n→∞ |δ(A n , B n )| > 0 imply lim sup n→∞ δ Ān D , B n > 0.On the other hand, Dn , [N ] ≤ lim sup n→∞ ∂ n * = 0, resulting in a contradiction.lim sup n→∞ |δ (A n D , B n D )| > 0 follows from the same argument in the second variable.Whit this tool is hand we can finally prove Theorem 4. Proof.(Theorem 4 ) lim sup n→∞ ∂ n > 0 is assumed.We may also assume ∂ n * → 0 and lim sup n→∞ dn ≥ 1, otherwise, Lemma 9 and 10 proves the statement.Based on ∂ n * → 0 and Lemma 1 ∂ n 2 , ∂ n are equivalent, hence lim sup n→∞ ∂ n 2 > 0. Based on Lemma 11 there are disjoint sets A n D , B n D ⊂ [N ] such that d n (i) ≤ 2 dn for all i ∈ A n D , B n D and δ 0 := lim sup n→∞ |δ(A n D , B n D )| > 0. We can choose a sub-sequence (n k ) ∞ k=1 such that |δ(A n k D , B n k D )| → δ 0 and dn k ≥ as we have lim sup n→∞ dn ≥ 1 as well.Define an SI process whit initial conditions V n S (0) = A n D , V n I (0) = B n D and the rest of the vertices are in state R. Our goal is to show that the conditions in Lemma 8 are satisfied for state S. Thus 1) is true.Define θ sup := lim sup n→∞ θ n > 0 and (n k ) ∞ k=1 a sub-sequence such that θ n k → θ sup .