On Edge Exchangeable Random Graphs

We study a recent model for edge exchangeable random graphs introduced by Crane and Dempsey; in particular we study asymptotic properties of the random simple graph obtained by merging multiple edges. We study a number of examples, and show that the model can produce dense, sparse and extremely sparse random graphs. One example yields a power-law degree distribution. We give some examples where the random graph is dense and converges a.s. in the sense of graph limit theory, but also an example where a.s. every graph limit is the limit of some subsequence. Another example is sparse and yields convergence to a non-integrable generalized graphon defined on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$(0,\infty )$$ \end{document}(0,∞).


Introduction
A model for edge exchangeable random graphs and hypergraphs was recently introduced by [11,12], who also gave a representation theorem showing that every infinite edge exchangeable random hypergraph can be constructed by this model. An equivalent model, using somewhat different formulations, was given by [7,8], see Remark 4.7.
The idea of the model is that random i.i.d. edges, with an arbitrary distribution, are added to a fixed vertex set; see Sect. 4 for a detailed definition (slightly modified but equivalent to the original definition).
The general model defines a random hypergraph. In the present paper, we concentrate on the graph case, although we state the definitions in Sect. 4 more generally for hypergraphs.
Since edges can be repeated, the model defines a random multigraph, but this can as always be reduced to a random simple graph by identifying parallel edges and deleting loops. Typically, many of the edges will be repeated many times, see e.g. Remark 6.7, and thus the multigraph and the simple graph versions can be expected to be quite different. Both versions have interest and potential, possibly different, applications, and we consider both versions. Previous papers concentrate on the multigraph version; in contrast and as a complement, in the present paper we study mainly the simple graph version.
The model is, as said above, based on an arbitrary distribution of edges. Different choices of this distribution can give a wide range of different types of random graphs, and the main purpose of the paper is to investigate the types of random graphs that may be created by this model; for this purpose we give some general results on the numbers of vertices and edges, and a number of examples ranging from dense to very sparse graphs. The examples show that the model can produce very different graphs. In some dense examples we show that the random graphs converge in the sense of graph limit theory. However, that is not always the case, and we even give a chameleon example (Theorem 8.7) that has every graph limit as the limit of some subsequence. We give also a sparse example (Example 9.1) with a power-law degree distribution and convergence to a generalized graphon in the sense of [40].
An important tool in our investigations is a Poisson version of the construction by [12], see Sect. 4.2, which seems interesting also in its own right.
After some preliminaries in Sects. 2-3, we give the definitions of the random hypergraphs in detail in Sect. 4. The graph case is discussed further in Sect. 5. Section 6 studies the numbers of vertices and edges in the graphs. Section 7 considers an important special case of the model, called rank 1; we study two multigraph examples previously considered by [11,36] and show that they are of this type. The

Some Notation
In general, we allow hypergraphs to have multiple edges; we sometimes (but usually not) say multihypergraph for emphasis. Moreover, the edges in a hypergraph may have repeated vertices, i.e., the edges are in general multisets of vertices, see Remark 4.3. An edge with repeated vertices is called a loop. A simple hypergraph is a hypergraph without multiple edges and loops. (Warning: different authors give different meanings to "simple hypergraph".) The vertex and edge sets of a multigraph G are denoted by V (G) and E(G), and the numbers of vertices and edges by v(G) := |V (G)| and e(G) := |E(G)|.
f (x) ∼ g(x) means f (x)/g(x) → 1 (as x tends to some limit, e.g. x → ∞). We also use v ∼ w for adjacency of two vertices v and w in a given graph, and X ∼ L meaning that the random variable X has distribution L; there should not be any risk of confusion between these (all standard) uses of ∼.
f (x) g(x) for two non-negative functions or sequences f (x) and g(x) (defined on some common set S) means that f /g and g/ f both are bounded; equivalently, there exist constants c, C > 0 such that cg (x) f (x) Cg(x) for every x ∈ S. f (x) g(x) as x → ∞means that f (x) g(x) for x in some interval [x 0 , ∞).
We use 'increasing' (for a function or a sequence) in its weak sense i.e., x y ⇒ f (x) f (y), and similarly with 'decreasing'.
x ∧ y is min{x, y} and x ∨ y is max{x, y}. If μ is a measure on a set S, then μ := μ(S) ∞. Exp(λ) denotes the exponential distribution with rate λ, i.e., the first point in a Poisson process with rate λ; this is thus the exponential distribution with mean 1/λ. For convenience we extend this to λ = 0: X ∼ Exp(0) means X = +∞ a.s.
We say that a sequence G n of simple graphs with v(G n ) → ∞ is dense if e(G n ) v(G n ) 2 , sparse if e(G n ) = o(v(G n ) 2 ), and extremely sparse if e(G n ) v(G n ) as n → ∞, and similarly for a family G t of graphs with a continuous parameter.

Some Preliminaries on Graph Limits, Graphons and Cut Metric
We recall some basic facts on graph limits and graphons. For further details, see e.g. [5,6], [14] and the comprehensive book [32]. A (standard) graphon is a symmetric measurable function W : × → [0, 1], where = ( , F , μ) is a probability space. ( may without loss of generality be taken as [0, 1] with Lebesgue measure, but it is sometimes convenient to use other probability spaces too.) If ϕ : 1 → 2 is a measure-preservingmap between two probability spaces 1 and 2 , and W is a graphon on 2 , then W ϕ (x, y) := W (ϕ(x), ϕ(y)) is a graphon on 1 called the pull-back of W .
If W is an integrable function on 2 , then its cut norm is taking the supremum over all measurable sets T, U ⊆ . For two graphons W 1 and W 2 , defined on probability spaces 1 and 2 , their cut distance is defined as taking the infimum over all pairs (ϕ 1 , ϕ 2 ) of measure-preservingmaps ϕ j : → j defined on some common probability space . Two graphons W 1 and W 2 are equivalent if δ (W 1 , W 2 ) = 0. Note that a graphon W and any pullback W ϕ of it are equivalent. For characterizations of equivalent graphons, see [4] and [22,Sect. 8]. The cut distance δ can be regarded as a metric on the set W of equivalence classes of graphons, and makes W into a compact metric space.
A graph limit can be identified with an equivalence class of graphons, so we can regard W as the space of graph limits. Thus, every graphon defines a graph limit, and every graph limit is represented by some graphon, but this graphon is unique only up to equivalence.
For every finite graph G, there is a corresponding graphon W G that can be defined by taking = V (G) with the uniform probability measure μ{i} = 1/v(G) for every i ∈ V (G) and letting W G (i, j) := 1 {i∼ j} ; thus W G equals the adjacency matrix of G, regarded as a function V (G) 2 → {0, 1}. (W G is often defined as an equivalent graphon on [0, 1]; for us this makes no difference.) We identify G and W G when convenient, and write for example δ (G, W ) = δ (W G , W ) for a graph G and a graphon W . 1 Remark 3.1 Let G be a finite graph. A blow-up G * of G is the graph obtained by taking, for some integer m 1, the vertex set V (G * ) = V (G) × [m] with (v, i) ∼ (w, j) in G * if and only if v ∼ w in G. Then, W G * is a pull-back of W G (for ϕ : V (G * ) → V (G) the natural projection), and thus δ (G * , G) = δ (W G , W G * ) = 0. Hence the graphs G and G * , which are different (if m > 1) are equivalent when regarded as graphons.
There are several, quite different but nevertheless equivalent, ways to define convergence of a sequence of graphs, see e.g. [5,6,14,32]. For our purposes it suffices to know that a sequence G n with v(G n ) → ∞ is convergent if and only if there exists a graphon W such that δ (G n , W ) → 0 as n → ∞. We then say that G n converges to W , or to the corresponding graph limit.

Remark 3.2
The standard graphons defined above are appropriate for dense graphs. For sparse graphs, other, more general, graphons have been constructed by several authors. We will in Sect. 5.1 compare the edge exchangeable graphs studied in the present paper with random graphs defined by graphons that are defined on R + or another infinite (σ -finite) measure space instead of a probability space, see [3,39]. Furthermore, in Sect. 9 we consider an example of edge exchangeable graphs that yields sparse graphs, where we show that the graphs converge in a suitable sense (see [40]) to such a graphon defined on R + . We postpone the definitions to these sections.

Constructions of Random Hypergraphs
In this section, we define the random hypergraphs. We give several versions; we define both multihypergraphs and simple hypergraphs, and we give both the original version with a fixed number of edges and a Poisson version. In later sections we consider only the graph case, but we give the definitions here in greater generality.
We begin with some preliminaries. Let (S, F ) be a measurable space, for convenience usually denoted simply by S. To avoid uninteresting technical complications, we assume that S is a Borel space, i.e., isomorphic to a Borel subset of a complete separable metric space with its Borel σ -field. Let S * be the set of all finite non-empty multisets of points in S. We can regard a multiset with n elements as an equivalence class of sequences (x 1 , . . . , x n ) ∈ S n , where two such sequences are equivalent if one is a permutation of the other. Denoting this equivalence relation by ∼ = and the set of multisets of n elements in S by S ∨n , we thus have S ∨n = S n / ∼ = and S * = ∞ n=1 S ∨n . Note that S ∨n and S * are Borel spaces. (One way to see this is to recall that every Borel space is isomorphic to a Borel subset of [0, 1]. We may thus assume that S ⊆ [0, 1], and then we can redefine S ∨n as {(x 1 , . . . , x n ) ∈ S n : x 1 · · · x n }, which is a Borel subset of [0, 1] n .) Remark 4.1 Definitions 4.2 and 4.8 below use a probability measure μ to define the random (hyper)graphs. In general, this measure may be a random measure, and then the constructions should be interpreted by conditioning on μ, i.e., by first sampling μ, and then using the obtained measure throughout the construction. In other words, the distribution of the random hypergraphs constructed by a random measure μ is a mixture of the distributions given by deterministic μ. For convenience, and because most examples will be with deterministic μ, we usually tacitly assume that μ is deterministic; results in the general case with random μ then follow by conditioning on μ. (See Remark 4.11 for a typical example, where this for once is stated explicitly.)

Random Hypergraphs with a Given Number of Edges
We give a minor modification of the original definition by [11,12]; we will see at the end of this subsection that our definition is equivalent to the original one. Definition 4.2 Given a Borel space S and a probability measure μ on S * , define a sequence of finite random (multi)hypergraphs (G * m ) ∞ m=1 as follows.
is the vertex set spanned by the edges; thus there are no isolated vertices in G * m . (The same holds for the related definitions in (4.2), (4.6), (4.7) below.) We also similarly define the infinite (multi)hypergraph G * ∞ having edges (Y i ) ∞ i=1 . The edges in G * m may be repeated, so G * m is in general a random multihypergraph. We define G m as the simple hypergraph obtained by merging each set of parallel edges in G * m to a single edge and deleting loops; thus the simple hypergraphs (G m ) ∞ 1 are defined by: and strict inequality is possible if there are loops.

Remark 4.3
We follow [12] and allow for increased generality Y i to be a multiset (see e.g. the examples in Sect. 7); thus the edges in G * m and G m are multisets and may contain repeated vertices. If we choose μ with support in the set S * * := ∞ n=1 S ∧n ⊂ S * of finite subsets of S, where S ∧n ⊂ S ∨n is the set of subsets of S with n distinct elements, then the edges in G * m and G m are ordinary sets of vertices (i.e., without repeated vertices). (This is commonly assumed in the definition of hypergraphs.) In particular, if μ has support in S ∧2 = {{x, y} : x, y ∈ S, x = y}, then G * m is a multigraph without loops, and G m is a simple graph with V (G m ) = V (G * m ).
The construction above yields hypergraphs with vertices labelled by elements of S. We (usually) ignore these labels and regard G * m and G m as unlabelled hypergraphs.
Remark 4. 4 We usually also ignore the labels on the edges. If we keep the labels i on the edges Y i , then the distribution of G * m is obviously edge exchangeable, i.e., invariant under permutations of these edge labels, because (Y i ) i is an i.i.d. sequence. Conversely, as shown by [12,Theorem 3.4], every infinite edge exchangeable hypergraph is a mixture of random hypergraphs G * ∞ , i.e., it can be constructed as above using a random measure μ. In the present formulation, the proof in [12] simplifies somewhat: give the vertices in the edge exchangeable hypergraph random labels that are i.i.d. and U (0, 1) (uniformly distributed on [0, 1]), and independent of the edges. Then the edges become multisets in [0, 1] * , and their distribution is clearly exchangeable, so by de Finetti's theorem, the edges are given by the construction above for some random probability measure μ on S * , taking S = [0, 1].
It is obvious from the definition that if ψ : S → S 1 is an injective measurable map of S into another measurable (Borel) space S 1 , then μ is mapped to a probability measure μ 1 on S * 1 , which defines the same random hypergraphs G * m and G m as μ. Hence, the choice of Borel space S is not important, and we can always use e.g. S = [0, 1]. Moreover, we can simplify further.
Define the intensity of μ as the measure on (S, F ) where Y has distribution μ. Note that for a singleton set {x}, |{x} ∩ Y | = 1 {x∈Y } , and thus (4.3) yieldsμ We haveμ(A) = ∞ n=1μ n (A), whereμ n (A) := E |A ∩ Y | · 1 {|Y |=n} , and since eachμ n is a finite measure, it follows that the set of atoms is a countable (finite or infinite) subset of S. By (4.4) and (4.5), if x / ∈ A, then P(x ∈ Y ) = 0. Hence, in the construction of G * m , if an edge Y i has a vertex x / ∈ A, then a.s x / ∈ Y j for every j = i. Consequently, a vertex x / ∈ A of G * ∞ a.s appears in only one edge. (Such a vertex is called a blip in [12].) On the other hand, if x ∈ A, so P(x ∈ Y ) =μ({x}) > 0, then by the law of large numbers, a.s x belongs to infinitely many edges Y i of G * ∞ . It follows that when constructing the hypergraphs G * m , if the edge Y i = {y i1 , . . . , y in i }, we do not have to keep track of the vertex labels y i j unless they belong to A; any y i j / ∈ A will be a blip not contained in any other edge and the actual value of y i j may be forgotten. (Except that if we allow repeated vertices in the edges, see Remark 4.3, then we still have to know whether two vertex labels y i j and y ik on the same edge are the same or not.) where N ∞, and replace, for every multiset Y = (y 1 , . . . , y ) ∈ S * , every vertex label y j = a k for some a k ∈ A by the new label y j = k, and the vertex labels y j / ∈ A on Y by 0, −1, . . . . (For definiteness, we may assume that S ⊆ [0, 1] so S is ordered, and take the labels in order in case Y has more than one vertex label not in A.) This maps μ to a probability measure μ on the set Z * of finite multisets of integers, and it follows from the discussion above that we can recover the random hypergraphs G * m from μ by the construction in Definition 4.2, if we first replace each vertex label y j ∈ {0, −1, . . . } by a random label with a continuous distribution in some set, for example U (0, 1), making independent choices for each Y i . Equivalently, and more directly, we obtain G * m from the probability measure μ on Z * by the following construction, which is the original definition by [11,12]. Definition 4.5 [11,12] Given a probability measure μ on Z * , we define a sequence of finite random (multi)hypergraphs (G * m ) ∞ m=1 as in Definition 4.2 with the modification that in every edge Y i = {y i1 , . . . , y i i } we replace every vertex label y i j 0 (if any) with a new vertex that is not used for any other edge.
Since we ignore the vertex labels in G * m , it does not matter what labels we use as replacements for 0, −1, . . . in Definition 4.5. Crane and Dempsey [11,12] use the same set 0, −1, . . . of integers, taking the first label not already used. An alternative is to take random labels, e.g. i.i.d. U (0, 1) as above.
Remark 4.6 To be precise, Definition 4.5 is the definition in [12]. The definition in [11] treats only the binary case |Y n | = 2 in detail; and differs in that only labels y i 0 are used, and that an edge {0, 0} is replaced by an edge {z 1 , z 2 } with two new vertex labels z 1 and z 2 .
This version is essentially equivalent; apart from a minor notational difference, the only difference is that this version does not allow for "loop dust", where a positive fraction of the edges are isolated loops. Cf. Remark 5.2.
We have shown that Definition 4.2 is essentially equivalent to the original definitions by [11,12]. One advantage of Definition 4.2 is that no special treatment of vertex labels 0 is needed; the blips (if there are any) come automatically from the continuous part of the label distribution; a disadvantage is that this continuous part is arbitrary and thus does not contain any information. Another advantage with Definition 4.2 is that it allows for arbitrary Borel spaces S; even if it usually is convenient to use S = N to label the vertices, it may in some examples be natural to use another set S. Remark 4.7 The construction in [8] is stated differently, but is equivalent. It uses a generalization of Kingman's paintbox construction of exchangeable partitions; in the version in [8], the paintbox consists of families (C k j ) k, j 1 and (C jl ) j,l 1 of subsets of [0, 1]; it is assumed that every x ∈ [0, 1] is an element of only finitely many of these sets, and that for each j and k = l, C jk ∩ C jl = ∅ and C jk ∩ C jl = ∅. (In general these sets may be random, but similarly as above, in the construction we condition on these sets so we may assume that they are deterministic.) Furthermore, we generate i.i.d. U (0, 1) random labels φ k and φ N jl for k, N , j, l 1. For each N 1 we construct a edge Y N by taking a uniformly random point V N ∈ [0, 1], independent of everything else; then, for each ( j, k) such that V N ∈ C jk , Y N contains k vertices labelled φ j , and for each ( j, k) such that V N ∈ C jk and every l k, Y N contains j vertices labelled φ N jl . (The latter vertices are thus blips.) Note that this gives the vertices random labels as in Remark 4.4; however, we then ignore the vertex labels. (Actually, in [8], each vertex is represented by a multiset of edge labels (called a trait), which contains the label of each edge that contains the vertex, repeated as many times as the vertex occurs in the edge. This is obviously an equivalent way to describe the hypergraph.) It is obvious that, conditioned on the labels φ k and φ N jl , this construction gives a random multiset with some distribution μ; conversely, every distribution μ of a random (finite) multiset can easily be obtained in this way by suitable choices of C jk and C jk . Hence, the construction is equivalent to the one above. (In our opinion, it is more natural to focus on the distribution of the edges, since the sets C jk and C jk in the paintbox construction have no intrinsic meaning; they are just used to describe the edge distribution.)

The Poisson Version
The multihypergraph G * m has exactly m edges (not necessarily distinct). It is often convenient to instead consider a Poisson number. (This was done by Broderick and Cai in [7,Example 2.7].) It is then natural to consider a continuous-parameter family of hypergraphs, which we define as follows. We may think of the second coordinate t as time.

Definition 4.8
Given a probability measure μ on S * , we define a family of random (multi)hypergraphs (G * t ) t 0 as follows. Recall that a Poisson point process on an infinite, σ -finite measure space is a random countably infinite set of points that can be enumerated as in (i), in our case, for some random Y i ∈ S * and τ i ∈ [0, ∞).
DefineG t as the simple hypergraph obtained by merging each set of parallel edges inG * t to a single edge, and deleting loops (together with their incident vertices, unless these also belong to some non-loop). Hence, with (i) as in (4.6), Note that the random hypergraphsG * t andG t are a.s finite for every t < ∞.

The projection
of the Poisson process to the second coordinate is a Poisson point process on [0, ∞) with intensity 1, and we may and will assume that the points of are enumerated with τ i in increasing order; thus a.s 0 < τ 1 < τ 2 < . .  Although we usually tacitly consider t < ∞, we may here also take t = ∞: G * ∞ =G * ∞ and G ∞ =G ∞ .
Note that the relations in Proposition 4.9 hold not just for a single m or t, but also for the entire processes. Hence, asymptotic results, and in particular a.s limit results, are (typically) easily transfered from one setting to the other.

Remark 4.10
Instead of stopping at the random time τ m , we can also obtain G * m and G m fromG * t andG t by conditioning on N (t) = m, for any fixed t > 0.

Remark 4.11
One reason that the Poisson version is convenient is that different edges appear independently of each other. If we for convenience assume that there are no blips, we may as explained above assume that S = N, so V (G * t ) ⊆ N. In this case, the number of copies of an edge I ∈ S * inG * t has the Poisson distribution Po(tμ({I })), and these numbers are independent for different I ∈ S * . Hence, different edges I ∈ S * appear independently iñ G t . (In the case μ is random, this holds conditionally on μ, but not unconditionally.) Note that this independence does not hold for G m ; the stopping in Proposition 4.9 or the conditioning in Remark 4.10 destroys the independence of different edges.

Unnormalized Measures
We have so far assumed that μ is a probability measure. This is very natural, but we can make a trivial extension to arbitrary finite measures. This will not produce any new random hypergraphs but it is convenient; for example, it means that we do not have to normalize the measure in the examples in later sections.
When necessary, we denote the measure used in the construction of our random hypergraphs by a subscript; we may thus write e.g. G m,μ .
. It is obvious that, using obvious notation, the Poisson process μ can be obtained from μ 0 by rescaling the time: Hence, the random hypergraph process defined by μ is the same as for μ 0 , except for a simple deterministic change of time. This implies the following result.
In particular, the law of large numbers yields, as m → ∞, Remark 4.14 Definition 4.8 can be employed also when μ is an infinite and, say, σfinitemeasure. In this case,G * t has a.s an infinite number of edges for every t > 0. We will not consider this case further.

Random Graphs
From now on, we consider the graph case, where μ is a finite measure on S ∨2 = {{x, y} : x, y ∈ S}. This allows for the presence of loops; often we consider μ supported on S ∧2 = {{x, y} : x = y}, and then there are no loops.
As explained in Sect. 4, in particular Definition 4.5, if there are no blips (i.e., if the intensitȳ μ is discrete), we may without loss of generality assume that S = N, and if there are blips, we may assume that S = N ∪ {0, −1} with the special convention that 0 and −1 are interpreted as blips. Unless stated otherwise, we use this version, and we then write μ i j for μ({i, j}); we say that μ i j is the intensity of edges i j. Thus, (μ i j ) is an infinite symmetric matrix of non-negative numbers, with indices in N ∪ {0, −1} (or in N if there are no blips); note that, because we consider undirected edges, the total mass of μ is We assume that 0 < μ < ∞, or equivalently that i, j μ i j is finite (and non-zero), but we do not insist on μ being a probability measure. As described in Sect. 4.3, we can always normalize μ to the probability measure μ −1 μ when desired.
We also define (for i 1) this is the total intensity of edges adjacent to vertex i.
The diagonal terms μ ii correspond to loops. Loops appear naturally in some examples, see e.g. Example 7.1 below, but we are often interested in examples without loops, and then take μ ii = 0. Moreover, in the construction of the simple graphs G m andG t we delete loops, so it is convenient to take μ ii = 0 and avoid loops completely. Note that, since different edges appear independently inG * t , see Remark 4.11, deleting all loops fromG * t is equivalent to conditioningG * t on containing no loops; this is also equivalent to changing every μ ii to 0. (For G m this is not quite true, since the number of non-loop edges may change; however, the difference is minor.) Note also the in the construction leading to Definition 4.5, in the graph case, vertex label −1 is used only for the edge {0, −1}, so we may (and will) assume that μ i,−1 = 0 unless i = 0.
Suppose now that we are given such a matrix (μ i j ) i, j −1 . We can decompose the matrix into the three parts , which by the construction and properties of Poisson processes correspond to a decomposition of the Poissonian multigraphG * t as a union of three parts, which are independent random graphs: Central part: The edges i j ∈G * t with i, j ∈ N. Attached stars: For each i 1 a star with Po(tμ i0 ) edges centred at i. Dust: Po(tμ 00 ) isolated loops and Po(tμ 0,−1 ) isolated edges.
Moreover, the Poisson random variables above, for different i and for the two types of dust, are independent. The vertex set is by definition the set of endpoints of the edges, so there are no isolated vertices. The edges and loops in the dust are always isolated, i.e.with endpoints that are blips (have no other edges). Similarly, the peripheral vertices in the attached stars are blips without other edges, while the central vertex i may, or may not, also belong to the central part.
Note that multiple edges only occur in the central part.
Remark 5. 2 We have here discussed the model in full generality, but it is obvious that the main interest is in the central part, and all our examples will be with μ supported on N × N, i.e., without dust and attached stars. (Of course, there may be other stars or isolated edges, created in the central part.) In particular, the dust part is quite trivial, and the dust loops are even less interesting than the dust edges. In a case with dust but no loops in the dust, it is convenient to relabel μ 0,−1 as μ 00 , so μ is a symmetric matrix with index set N 0 ; this corresponds to using the version of the definition in [11], see Remark 4.6.

A Comparison with Vertex Exchangeable Graphs
Consider the case without dust, attached stars and loops, so μ is supported on N × N, with μ ii = 0. ThenG * In the classical case [5,6,32], with a standard graphon W defined on a probability space ( , ν) as in Sect. 3, the vertex exchangeable random graph G(n, W ) has a given number n of vertices and is constructed as follows. The generalization to graphons on R + or another σ -finite measure space ( , ν) [3,9,39] is similar. This type of graphon is still a symmetric measurable function W : 2 → [0, 1] (satisfying some conditions to make the graphs G(t, W ) defined below a.s finite). can be regarded as a space of types, and the random graph, here denotedḠ(t, W ), is defined as follows. The number of vertices N ∼ Po(tν( )) is a random variable, with N = ∞ if ν is an infinite measure.
Finally, we may delete all isolated vertices, giving a graph G(t, W ) without isolated vertices (as in the construction in Sect. 4 above): In both cases ( The Poisson versions of the edge exchangeable and vertex exchangeable random graphs thus add edges in the same way, if we condition on the types of the vertices in the latter and let μ i j = t −1 W (x i , x j ). However, the vertices are constructed in very different ways. To see the similarities and differences clearly, consider the case where the type space = N, with some (finite or infinite) measure ν, and consider the Poisson multigraph version of the vertex exchangeable graphs, which we denote byḠ * (t, W ) and G * (t, W ). Then the vertex exchangeableḠ * (t, W ) has a Poisson number Po(tν{i}) of vertices of type i, for each i ∈ N, while the edge exchangeableG *

Numbers of Vertices and Edges
By construction, the number of edges is m in the multigraph G * m and random Po(t μ ) in the multigraphG * t . The numbers of vertices in the graphs and the numbers of edges in the simple graphs G m andG t are somewhat less immediate, and are studied in this section.
We use the notation of Sect. 5, and assume that we are given a (deterministic) matrix μ = (μ i j ) of intensities. Moreover, for simplicity we assume that μ is concentrated on N × N, so there are no attached stars and no dust, and that μ ii = 0 for every i, so there are no loops. We consider briefly the case with dust or attached stars in Sect. 6.1.
Note that G m is a simple graph without isolated vertices, and thus  [13,15,19,30,31], where central limit theorems have been proved under various assumptions, see Theorem 6.8 below. These results are often proved using Poissonization, which in our setting is equivalent to consideringG t instead of G m . We too find it convenient to first study the Poisson version.
The Poisson model is convenient because, as said before, edges i j arrive according to a Poisson process with intensity μ i j and these Poisson processes are independent for different pairs {i, j}. Let N i j (t) be the number of copies of the edge i j inG * t , and let N i (t) be the degree of vertex i inG * t . Then and, recalling (5.2), be the random times that the first edge at i and the first edge i j appear, respectively. Thus, Recall that for every fixed t, the numbers N i j (t) are independent random variables, and thus the indicators in the sums (6.5) are independent. However, the numbers N i (t) and the indicators in the sums in (6.4) are dependent, which is a complication. (For example, v(G t ) = 1 is impossible, since there are no isolated vertices and no loops. ) We give first a simple lemma for the type of sums in (6.5), where the terms are independent.
Then, the following hold. (i) For every t 0, and thus a.s W (t) < ∞ for every t 0. Furthermore, EW (t) is a strictly increasing and concave continuous function and (t n ) are two sequences of positive numbers with t n /t n → 1, then Proof This is presumably all known, but it seems easier to give a proof than to find references. Note that W (t) is increasing as a function of t.
(i) The calculation (6.6) of the expectation is immediate, and the sum is finite because 1 − e −λ i t λ i t. Hence W (t) is a.s finite for, say, each integer t, and thus for all t 0. It follows by (6.6) that EW (t) is strictly increasing and concave. Moreover, the sum converges uniformly on every finite interval [0, T ], and thus EW (t) is continuous.
where each summand tends to 0 as t → ∞, and is bounded by λ i . Hence EW (t)/t → 0 as t → ∞by dominated convergence of the sum. (ii) An immediate consequence of (6.6) and (6.10) (iv) First, by (6.6) and monotone convergence, as t → ∞, Furthermore, if L < ∞, then a.s W (t) = L for all large t, and thus (6.9) holds.
Suppose now that L = ∞. Then EW (t) → ∞ by (6.11). Let δ ∈ (0, 1), let a := 1 + δ and choose, for n 1, t n > 0 such that EW (t n ) = a n . (This is possible by (i).) By (6.8) and Chebyshev's inequality, for any t > 0, . (6.12) Hence, by our choice of t n and the Borel-Cantelli lemma, a.s there exists a (random) n 0 such that 1 − δ W (t n )/EW (t n ) 1 + δ for n n 0 . This, and the fact that W (t) is increasing, implies that if t t n 0 , and we choose n n 0 such that t n t < t n+1 , then (6.13) and similarly Consequently, a.s Since δ is arbitrarily small, (6.9) follows. (v) By (i), EW (t) is increasing, and furthermore it is concave with EW (0) = 0, and thus and the result follows.
In order to extend this to the dependent sum (6.4), we use a lemma.
Fix i and j with i = j, and letĪ i := k = jĪ ik andĪ j := k =iĪ jk . ThenĪ i =Ī i jĪ i and I j =Ī i jĪ j , withĪ i j ,Ī i andĪ j independent, and thus In particular, Summing over j, we obtain for every i, since the events E j := {I i j = 1, I ik = 0 for k = j} in (6.19) are disjoint and with union (6.20) and (6.17) follows by summing over i.
Moreover, all results of Lemma 6.1 hold except (iii), which is replaced by Proof It is well-known and elementary that Z i ∼ Exp(λ i ), since (Z i j ) j are independent for every i. Parts (i), (ii) and (v) of Lemma 6.1 deal only with the expectation, and their proofs do not need Z i to be independent. Lemma 6.2 yields (6.21). Finally, the proof of (iv) holds as before, now using (6.21).
Hence, to find asymptotics of the numbers of vertices and edges in our random graphs, it suffices to study the expectations in (6.22)-(6.23). In particular, we note the following consequences.

is a.s extremely sparse if and only if e(t) v(t) as t → ∞.
Proof By Theorem 6.4(ii). Hence v(Ct) v(t) for any constant C > 0. We have so far considered only simple first order properties of v(G m ) and e(G m ). For the number of edges, much more follows from the central limit results in the references mentioned above. In particular, the local and global central limit theorems in [19] apply and yield the following. (Although the estimates (6.32) and (6.35) are uniform in all x, the main interest is for x constant, or perhaps tending to infinity very slowly.) Var(e(G m )) = Var(e(G m )) + O(1), (6.34) and, recalling (6.23) and definingσ 2 t := Var(e(G t )), N (0, 1) and Note that e(m) = Ee(G m ) andσ 2 m = Var e(G m ) are given by (6.23) and (6.8); they are usually simpler and more convenient to handle than Ee(G m ) and σ 2 m = Var(e(G m )). We conjecture that similar results holds for v(G m ), the number of vertices. However, we cannot obtain this directly from results on the occupancy problem in the same way as Theorem 6.8, again because the variables N i (t) are dependent. (The number of vertices corresponds to an occupancy problem where balls are thrown in pairs, with a dependency inside each pair.) Problem 6.9 Show asymptotic normality for v(G m ) when Var(v(G m )) → ∞.

The Case with Dust or Attached Stars
We consider briefly the case when the model contains dust (other than loops) or attached stars. In this case, the results are quite different. We may for simplicity assume that there are no loops at all, since loops are deleted in any case. Thus μ ii = 0 for i 0 and μ 0i > 0 for some i ∈ N ∪ {−1}.
The number of edges in the dust and attached stars ofG t is Po(ct) with c := ∞ i=−1 μ 0i > 0, and thus this number is a.s ∼ ct t as t → ∞, by the law of large numbers for the Poisson process. (Recall that all edges in the dust and attached stars ofG * t are simple, so the number of them is the same inG t and inG * t .) It follows by Proposition 4.9 that the number of edges in the dust and attached stars of G m a.s is m. Moreover, since each edge in the dust or an attached star has at least one endpoint that is not shared by any other edge, the same estimates hold for the number of vertices in the dust and attached stars. This leads to the following theorem, which shows that if there is any dust or attached star all, then those parts will dominate the random graphs. Consequently, the random graphs G m are a.s extremely sparse, but in a rather trivial way.

Proof
The argument before the theorem shows (i) and (ii).
Moreover, Corollary 6.6 applies to the central part ofG t and shows that the number of edges and vertices there a.s are o(t), and thus only a fraction o(1) of all edges and vertices. By Proposition 4.9, the same holds for G m .

Rank 1 Multigraphs
We turn to considering specific examples of the construction. One interesting class of examples are constructed as follows.
. This is clearly a random multigraph of the type constructed in Sect. 5, with We thus have, by (5.2), In particular, μ i q i . The corresponding Poisson modelG * t is by Proposition 4.9 obtained by taking a Poisson number of edges e 1 , . . . , e N (t) , with N (t) ∼ Po(t).
As usual, we obtain the corresponding simple graphs by omitting all repeated edges and deleting all loops.
We call a random multigraph constructed as in Example 7.1, or equivalently by (7.1), for some (possibly random) probability distribution (q i ) ∞ 1 , a rank 1 edge exchangeable multigraph, for the reason that the matrix (7.1) is a rank 1 matrix except for the diagonal entries.

Remark 7.2
The diagonal entries, creating loops, are less important to us. In the multigraph examples below, it is natural, and simplifies the results, to allow loops. However, when we consider the simple graphsG t and G m , we ignore loops and, see Remark 5.1, it is then simpler to modify (7.1) by taking μ ii = 0; we still say that the resulting random graphs are rank 1.

Remark 7.3
Note that the rank 1 random graphs in [2] are different; they are simple graphs, and they are vertex exchangeable or modifications of vertex exchangeable random graphs, cf. Sect. 5.1. Nevertheless, both types of "rank 1" random graphs can be seen as based on the same idea: each vertex is given an "activity" (q i in our case), and the probability of an edge between two vertices is proportional to the product of their activities. (See the references in [2] for various versions of this idea.) Recall that the configuration model is an important model for constructing random multigraphs with a given degree sequence, which is defined as follows, see e.g. [1].

Definition 7.4 (Configuration model) Given a sequence (d i ) n
i=1 of non-negative integers with i d i even, the random multigraphG * (n, (d i ) n i=1 )is defined by considering a set of i d i half-edges (or stubs), of which d i are labelled i for each i ∈ [n], and taking a uniformly random matching of the half-edges; each pair of half-edges is interpreted as an edge between the corresponding vertices.
By construction, the multigraphG * (n, (d (With a loop counted as 2 edges at its only endpoint.) Note that the distribution ofG * (n, (d i ) n i=1 ) is not uniform over all multigraphs with this degree sequence. (As is well-known, and easy to see, the probability distribution has a factor (weight) 1/2 for each loop and 1/ ! for each edge of multiplicity > 1; in particular, conditioned on being a simple graph,G * (n, (d i ) n i=1 ) has a uniform distribution.) Nevertheless,G * (n, (d i ) n i=1 ) has the right distribution for our purposes. Theorem 7.5 The random multigraph G * m constructed in Example 7.1 has, conditioned on its degree sequence (d i ) n i=1 , the same distribution as the random multigraphG * (n, (d i ) n i=1 ) constructed by the configuration model for that degree sequence.
The same holds forG * t . Proof In the construction of G * m above, the sequence V 1 , . . . , V 2m is i.i.d., and thus exchangeable; hence its distribution is unchanged if we replace each V i by V π(i) for a uniformly random permutation π of [2m], independent of everything else. Consequently, the distribution of G * m is the same if we modify the definition above and let the edges be V π(1) V π(2) , . . . , V π(2m−1) V π(2m) ; but this is the same as saying that the edges are obtained by taking a random matching of the multiset {V 1 , . . . , V 2m }, which is precisely what the configuration model does. (Note that the vertex degree d i is the number of times i appears in V 1 , . . . , V 2m .) The result forG * t follows, since the degree sequence tells how many edges there are, so conditioning on the degree sequence implies conditioning on e(G * t ) = N (t), which reduces to the case of G * m just proved, see Remark 4.10. Remark 7.6 In statistical language, the theorem implies that the degree distribution is a sufficient statistic for the family of distributions of multigraphs G * m (orG * t ) given by Example 7.1 with different distributions (q i ) ∞ 1 . Example 7.7 A trivial example of the construction in Example 7.1 is obtained by fixing n 1 and letting q i = 1/n, 1 i n, i.e., the uniform distribution on [n]. This means that we consider a sequence of i.i.d. edges, each obtained by taking the two endpoints uniformly at random, and independently, from [n]. In other words, the endpoints of the edges are obtained by drawing with replacement from [n]. This gives the random multigraph process studied in e.g. [25], which is a natural multigraph version of the (simple) random graph process studied by [16].
The rank 1 random multigraphs in Example 7.1 appear also hidden in some other examples.
Example 7.8 (The Hollywood model) The Hollywood model of a random hypergraph was defined in [11] using the language of actors participating in the same movie, see [11] for details. We repeat their definition in somewhat different words.
The model can be defined by starting with the two-parameter version of the Chinese restaurant process, see e.g. [ Here α and θ are parameters, and either (i) 0 α 1 and θ > −α, or (ii) α < 0 and θ = N |α| > 0 for some N ∈ N.
In case (ii), there are never more than N tables; in case (i), the number of tables grows a.s to ∞.
In the construction of the Hollywood model hypergraph, the vertices are the tables in the Chinese restaurant process. We furthermore draw the sizes of the edges as i.i.d. random variables X j with some distribution ν on the non-negative integers N. The first edge is then defined by (the set of tables of) the first X 1 customers, the second edge by the next X 2 customers, and so on. The random hypergraphG m with m edges is thus described by the first X 1 + · · · + X m customers.
A standard calculation shows that the sequence of table numbers is exchangeable, except that the numbers occur for the first time in the natural order; to be precise, the probability of any finite sequence of table numbers, such that the first 1 appears before the first 2, and so on, depends only on the number of occurences of each number. Consequently, as noted in [11], since we ignore vertex labels, and the sequence X 1 , X 2 , . . . is i.i.d. and independent of the Chinese restaurant process, the random hypergraphG * ∞ is exchangeable, and by the representation theorem by [11,12], see Remark 4.4, the Hollywood model can be constructed as in Definition 4.2 for some random measure μ on N.
We can see this more concretely by replacing the table labels i ∈ N by i.i.d. random labels U i ∼ U (0, 1); then the sequence of table labels of the customers is exchangeable. Hence, by de Finetti's theorem, there exists a random probability measureP on [0, 1] such that conditioned onP, the sequence of (new) table labels is an i.i.d. sequence with distribution P. Clearly, the random measureP = i P i δ U i for some random sequence P i of numbers with i P i = 1. Furthermore, by the law of large numbers, for every i ∈ N, P i equals a.s the asymptotic frequency of customers sitting at the table originally labelled i in the Chinese restaurant process. Hence, the random probability measure P = ( P i ) ∞ 1 on N has the distribution GEM(α, θ ), see [35, Theorem 3.2 and Definition 3.3]. (An alternative version of this argument uses Kingman's paintbox representation for exchangeable random partitions [35, Theorem 2.2] instead of the random lables U i above; we leave the details to the interested reader.) Consequently, the Hollywood model hypergraph can be constructed as follows: Let the random probability measure P on N have the distribution GEM(α, θ ); conditionally given P take an infinite i.i.d. sequence of vertices with distribution P; construct the edges by taking the first X 1 vertices, the next X 2 vertices, …; finally, ignore the vertex labels.
We specialize to the graph case and assume from now on that X j = 2 (deterministically). Thus edges are constructed by taking the customers pairwise as they arrive. We then see by comparing the constructions above and in Example 7.1 that the Hollywood model yields the same result as the rank 1 model in Example 7.1, based on a random probability distribution with distribution GEM(α, θ ).
Since the order of the probabilities q i does not matter in Example 7.1, we obtain the same result if we reorder the probabilities P i in decreasing order; this gives the Poisson-Dirichlet distribution PD(α, θ ) [35,Definition 3.3], and thus the Hollywood model is also given by the rank 1 model based on PD(α, θ ). Theorem 7.5 shows that yet another way to define the Hollywood model multigraph G * m is to take the configuration model where the degree sequence (d i ) m 1 is the (random) sequence of numbers of customers at each table in the Chinese restaurant process when there are 2m customers.
Example 7.9 [36] considers the random multigraph process with a fixed vertex set [N ], where edges are added one by one (starting with no edges) such that the probability that a new edge joins two distinct vertices i and j is proportional to 2(d i + α)(d j + α), and the probabiity that the new edge is a loop at i is proportional to (d i + α)(d i + 1 + α); here d i is the current degree of vertex i and α > 0 is a fixed parameter. ( [36] considers also the corresponding process for simple graphs; we do not consider that process here.) It is easily seen that this multigraph process can be obtained as above, with a minor modification of the Chinese restaurant process. Consider now a restaurant with a fixed number N of tables, initially empty, and seat each new customer at table i with probability (n i + α)/(n + N α), (7.4) where n i 0 is the number of customers at table i and n is their total number. Then construct edges by taking the customers pairwise, as above; this yields the multigraph process just described. Furthermore, although this construction uses a modification of the Chinese restaurant process, we can relabel the tables in the random order that they are occupied. It is then easily seen that we obtain the Chinese restaurant process (7.3) with parameters (−α, N α). Since the vertex labels are ignored, this means that Pittel's multigraph process is the same as the Hollywood model with parameters (−α, N α). Consequently, it can be defined by the rank 1 model in Example 7.1 with the random probability distribution GEM(−α, N α) on [N ] ⊂ N, or, equivalently, the random probability distribution PD (−α, N α).
Moreover, the restaurant process (7.4) can be seen as a Pólya urn process, with balls of N different colours and initially α balls in each colour, where n i is the number of additional balls of color i in the urn; balls are drawn uniformly at random from the urn, and each drawn ball is replaced together with a new ball of the same colour. Note that then n i is the number of times colour i has been drawn. (It does not matter whether α is an integer or not; the extension to non-integer α causes no mathematical problem, see e.g. [20,Remark 4.2], [21] or [28].) The sequence of vertex labels is thus given by the sequence of colours of the balls drawn from this urn. It is well-known, by an explicit calculation, see e.g. [33] (where N = 2), that this sequence is exchangeable. By de Finetti's theorem it can thus can be seen as an i.i.d. sequence of colours with a random distributionP, which equals the asymptotic colour distribution. Moreover, it is well-known [29] (see also [33,37] for N = 2) that this asymptotic distribution is a symmetric Dirichlet distribution Dir(α/N , . . . , α/N ), with the density function c x Consequently, the multigraph processG * N can be obtained by the rank 1 model in Example 7.1 with the random probability distribution Dir(α/N , . . . , α/N ).
Alternatively, by Theorem 7.5, G * m may be obtained by the configuration model, with vertex degrees given by the first 2m draws in the Pólya urn process described above.

Rank 1 Simple Graphs
We will in the following sections study several examples of the simple random graphs G m in the rank 1 case. We note here a few general formulas. We ignore the trivial case when the probability distribution {q i } is supported on one point. (ThenG t and G m have only a single vertex and no edges. In fact, the interesting case is when the support of {q i } is infinite.) We thus assume max q i < 1. Since we ignore loops when constructing the simple graphsG t and G m , we modify (7.1) by taking μ ii = 0, see Remark 7.2; this changes (7.2) to μ i = 2q i − 2q 2 i , but we still have μ i q i . Thus (6.22) and (6.23) yield Moreover, adding the diagonal terms to the sum in (7.5) does not affect this estimate, since if we assume as we may that q 1 , q 2 > 0, then q 2 i = O(q 1 q i ) and q 2 1 = O(q 1 q 2 ), and thus Hence also Note that although we are interested in large t, the argument tq i in (7.7) is small for large i, so (7.7) requires that we consider v(t) for both large and small t. Similarly, the expected degree of vertex i inG t is

Dense Examples
We may obtain examples where G m andG t are dense by letting μ i j decrease very rapidly. We begin with an extreme case, which gives complete graphs. For example, we may take μ i j = ((i ∨ j)!) −4 , or the rank 1 example μ i j = q i q j with q i = exp(−3 i ).
We will show that a.s , for all large n, G ( n 2 ) is the complete graphs K n . Define a i := sup j μ i j . Then (8.1) implies, for every k 2, In particular, for k 2, a k+1 1 2 a k . Moreover, (k − 1) 4 a k a k k 4 a k+1 ; hence the sequence k 4 a k+1 is decreasing for k 1.
On the other hand, if Z n is the number of pairs (i, j) with i < j n such that i j is not an edge ofG * t n , i.e., N i j (t n ) = 0, then Moreover, if i < j n, then by (8.1) and (8.2), μ i j j 4 a j+1 n 4 a n+1 and thus t n μ i j t n n 4 a n+1 = n. Hence, (8.4) yields EZ n n 2 e −n , and we see, by the Borel-Cantelli lemma again, that a.s also Z n = 0 for all large n.
We have shown that a.s for all large n,G * t n contains at least one edge i j whenever i < j n, but no other edges; in other words, the simple graphG t n is the complete graph K n . Since K n has n 2 edges, this also means that G ( n 2 ) = K n , as asserted above. We have shown that a.s , for all large m, G m is the complete graph K n if m = n 2 ; since G n is an increasing sequence of graphs, it follows that for intermediate values m = n 2 + , 1 < n, G m consist of K n plus an additional vertex joined to of the other vertices. We thus have a complete description of the process (G m ) for large m. (And thus also of the processG t .) In particular, for all large m, G m differs from the complete graph K n with n = v(G m ) by less than n edges, and thus, see Sect. 3, δ (G m , K n ) W G m − W K n L 1 2/n = o(1). It follows that in the sense of graph limit theory, G m → 1 a.s , where 1 is the graph limit defined as the limit of the complete graphs, which is the graph limit defined by the constant graphon W 1 (x, y) = 1 (on any probability space ).
The assumption (8.1) in Example 8.1 is is not best possible, and may easily be improved somewhat, but we only wanted to give a class of examples. Here is another example, where the limit is less trivial.

Example 8.3
Consider a rank 1 example μ i j = q i q j , i = j, where q i has a geometric decay q i b −i for some b > 1. Let n 1 and suppose b n t b n+1 . Then the expected number of edges i j inG t with i + j > n is at most, with C := sup i b i q i < ∞ and letting = i + j, i+ j>n tq i q j t i+ j>n Similarly, the expected number of edges i j with i + j n not inG t is at most, for c : Moreover, the same argument shows that the expected number of edges i j inG t with i + j > n + n 0.1 and the number of non-edges i j with i + j < n − n 0.1 both are O(nb −n 0.1 ); hence the Borel-Cantelli lemma shows that a.s for every large n and every t ∈ [b n , b n+1 ],G t contains every edge with i + j < n − n 0.1 and no edge with i + j > n + n 0.1 ; a consequence, we also have [n − n 0.1 − 1] ⊆ V (G t ) ⊆ [n + n 0.1 ]. It follows that if H n is the graph with vertex set {1, . . . , n} and edge set {i j : i + j n}, then a.s the cutdistance δ (G t , H n ) = o(1), when b n t b n+1 . As n → ∞, H n → half , the graph limit defined by the graphon W (x, y) = 1 {x+y 1} on [0, 1] (known as the "half-graphon"). Consequently,G t → half a.s as t → ∞. By Proposition 4.9, G m → half a.s as m → ∞. Example 8.4 Example 8.3 can be generalized without difficulty. Consider, for example, a rank 1 case μ i j = q i q j with for some constants c > 0 and ε > 0. Arguing as in Lemma 8.3 we see that a.s , for every large n and all t ∈ [e cn , e c(n+1) ],G t contains all edges i j with i + j < n − n 1−ε/2 and no edges i j with i + j > n + n 1−ε/2 . Consequently, a.s , δ (G t , H n ) = o(1) and thusG t → half as t → ∞and G m → half as m → ∞.
Example 8.5 Consider the simple graphsG t and G m given by the Hollywood model in Example 7.8 in the case α = 0. As shown there, the resulting random graphs are the same as the ones given by the rank 1 model with a random probability distribution (q i ) ∞ 1 having the distribution GEM(0, θ), where θ ∈ (0, ∞) is a parameter.
By a well-known characterization of the GEM distribution, see [35,Theorem 3.2], this means that In Examples 8.1-8.6, G m converges a.s to some graph limit. There are also many examples, see e.g. Sects. 9-10, for which G m are sparse, which is equivalent to G m → 0 , the zero graph limit defined by the graphon W (x, y) = 0. In fact, any graph limit can occur as a limit of G m , at least along a subsequence. Moreover, the following result shows that there exists a "chameleon" example where every graph limit occurs as the limit of some subsequence. (Note that this includes that there is a subsequence converging to the zero graph limit 0 , which means that e(G m ) = o(v(G m ) 2 ) along this subsequences; hence this example is neither dense nor sparse.) Theorem 8.7 There exists a matrix μ = (μ i j ) such that a.s the graphs G m are dense in the space of graph limits, in the sense that for every graph limit , there exists a subsequence G m that converges to .
Proof Let F k , k 1, be an enumeration of all finite (unlabelled) simple graphs without isolated vertices, each repeated an infinite number of times. Let v k := v(F k ) and let f k (i, j) be the adjacency matrix of F k .
Let N 0 := 1 and, inductively, N k := kv k N k−1 for k 1. Let also Clearly, N k k!, N k N k−1 and a k a k−1 . Finally let, for i = j, Let I k := [1, N k ] and divide I k into the v k subintervals I k, := [( − 1)k N k−1 + 1, k N k−1 ], = 1, . . . v k . Note that (8.12) says that if i ∈ I k, p and j ∈ I k,q and not both i, j ∈ I k−1 , then μ i j = a k f k ( p, q).
Let t k := N k a −1 k . If n > k, then the expected number of edges i j inG * t k with i ∨ j ∈ I n \I n−1 is at most, using (8.11), t k i∨ j∈I n \I n−1 μ i j t k a n N 2 n t k a n−1 N 2 n = a n−1 a k N k N 2 Hence the probability thatG * t k contains some edge with endpoint not in I k × I k is at most and by the Borel-Cantelli lemma, a.s this happens for only finitely many k. Similarly, if (i, j) ∈ I 2 k \I 2 k−1 , then μ i j ∈ {0, a k }, and the probability that there exists some such pair (i, j) with μ i j = a k but N i j (t k ) = 0 is at most Consequently, again by the Borel-Cantelli lemma, a.s for every large k, there exists no such pair (i, j).
We have shown that a.s for every large k, the simple graphG t k contains no edge with an endpoint outside I k , and for (i, j) ∈ I 2 k \I 2 k−1 , recalling (8.12), if i ∈ I k, p and j ∈ I k,q , then there is an edge i j if and only if f k ( p, q) = 1. In particular, since F k has no isolated vertices, every i ∈ I k is the endpoint of some edge inG t k and thus a vertex, but no i / ∈ I k is; in other words, a.s for every large k, V (G t k ) = I k . It follows that if F * k is the blow-up of F k with every vertex repeated k N k−1 times, then a.s for every large k, the graphsG t k and H * k have the same vertex set I k and their adjacency matrices can differ only for (i, j) ∈ I 2 k−1 . Consequently, using Remark 3.1, a.s for all large k. Now, let be a graph limit, and let 1. By graph limit theory (or definition), there exists a sequence of graphs H j with v(H j ) → ∞ and δ (H j , ) → 0 as j → ∞; hence we may take j so large that H := H j satisfies v(H ) > and δ (H, ) < 1/ . H may have isolated vertices, so we define H by choosing a vertex v ∈ H and adding an edge from v to any other vertex in H . Then at most v(H ) − 1 edges are added, and thus, similarly to (8.16), Moreover, H has no isolated vertices, and thus H occurs infinitely often in the sequence (F k ) above. Consequently, a.s., there exists k > such that (8.16) holds and F k = H . Then, by (8.16) and (8.17), The chameleon example in Theorem 8.7 is theoretically very interesting, but it is hardly useful as a model in applications; since the behaviour of G m changes so completely with m, it is a model of nothing rather than a model of everything.
If we want convergence of the full sequence G m and not just subsequence convergence as in Theorem 8.7, we do not know whether every graph limit can occur as a limit.

Problem 8.8
For which graph limits does there exist a matrix (μ i j ) such that for the corresponding simple random graphs, G m → ?

Sparse Examples
We gave in the preceding section some dense examples. It seems to be more typical, however, that the graph G m contains many vertices of small degree (maybe even degree 1), and that the graph is sparse. We give here a few, related, rank 1 examples; see also the following section.
Example 9.1 Consider a rank 1 example with q i i −γ for some γ > 1. Then (7.6) yields This yields by (7.7), for t 2, Hence, using Theorem 6.4, a.s v(G t ) t 1/γ and e(G t ) t 1/γ log t as t → ∞, and v(G m ) m 1/γ and e(G m ) m 1/γ log m as m → ∞. It follows that the average degree in G m is log m.
In this example we may also show that the degree distribution has a power-law; we state this as a theorem. There is no standard precise definition of what is meant by a power-law degree distribution; we may say that a random variable X has a power-law distribution with exponent τ if P(X > x) x −(τ −1) as x → ∞, but this does not make sense for the degree distribution of a finite graph, so we must either consider the asymptotic degree distribution, provided one exists, or give uniform estimates for a suitable range of x. (See e.g. [18,Sects. 1.4.1 and 1.7] for a discussion of power-laws for degree distributions.) We follow here the second possibility.
For a (finite) graph G, let v k (G) be the number of vertices of degree at least k, and let π k (G) := v k (G)/v(G), the probability that a random vertex has degree k. Theorem 9.2 In Example 9.1, the random graphs G m have a power-law distribution with exponent 2 in the following sense. There exist positive constants c and C such that a.s for every large m, As usual, the same result holds forG t . Note that the restriction k cv(G m ) in (9.4) is necessary, and best possible (up to the value of the constants); we necessarily have π k (G) = 0 when k v(G). Note also that we have the same exponent τ = 2 for every γ > 1.
Proof As usual, we prove the results forG t ; the results for G m follow by Proposition 4.9. We then can write (9.3)-(9.4) as v k (G t ) Cv(G t )/k, k 1, and v k (G t ) cv(G t )/k, 1 k cv(G t ), and by Theorem 6.4 and (9.1), it suffices (and is equivalent) to prove that a.s for every large t.
Let I i j be the indicator of an edge i j inG t ; thus I i j ∼ Be 1−e −2tq i q j . Let D i := j =i I i j be the degree of i in the simple graphG t . (The degree is defined as 0 if i is not a vertex.) (i) The upper bound (9.5) We fix t 1 and an integer k 1; for convenience we often omit them from the notation, but note that many variables below depend on them, while all explicit and implicit constant are independent of t and k. Let J i := 1 {D i k} and N : Let A be a large constant, chosen later, and assume that k A, let i 0 := At 1/γ /k and let N * := i>i 0 J i . Thus N N * + i 0 . If i i 0 , then using (7.8), (6.31) and (9.1), Thus ED i C 2 k/A for some C 2 0, and choosing A = max(14C 2 , 4), we find that ED i k/14 (k − 1)/7. Since D i is a sum j I i j of independent Bernoulli variables, a Chernoff bound (see e.g. [26, (2.11) and Theorem 2.8]) yields i i 0 , (9.8) and also, for later use, For i t 1/γ we also have, by (9.7) and (9.1), Let (x) r := x(x −1) · · · (x −r +1), the falling factorial. Since D i is a sum of independent indicators, it is easily seen that for any positive integer r , the factorial moment can be bounded by E(D i ) r (ED i ) r . Hence, by (9.10) and Markov's inequality, since we assume k A 4, (This also follows from [26, (2.10) and Theorem 2.8]). Summing (9.8) and (9.11), we obtain For the variance of N * , we note that the indicators J i are not quite independent, since an edge i j influences both J i and J j , but conditioned on I i j , J i and J j are independent. Hence, for any distinct i and j, and thus Cov(J i , J j ) P(I i j = 1)P(D i k − 1)P(D j k − 1). (9.13) By (9.13) and (9.9), for i, j i 0 with i = j, (9.14) Consequently, using also (9.12), Hence, by Chebyshev's inequality, We have so far kept t and k fixed. We now sum (9.16) over all k A and t = 2 for ∈ N, and find by the Borel-Cantelli lemma that a.s for every large t of this form and every k A, N * − EN * t 1/γ /k, and consequently, using also (9.12), This is (9.5) for k A and t ∈ {2 }; since N increases with t, (9.5) follows in general (with a different constant), a.s for large t and all k A. For k < A, (9.3) and (9.5) follow trivially from v k (G t ) v(G t ).
(ii) The lower bound (9.6) Fix again t 1 and k 1, let B be a large constant chosen later, and assume that k t 1/γ /B. Let L be the set of odd integers i with 1 i i 1 := B −1 t 1/γ /k, and let R be the set of even integers j with 1 j 6k. By our assumption on k, i 1 1, and thus |L| = (i 1 +1)/2 i 1 /3. Note that the indicators {I i j } i∈L , j∈R are independent. For i ∈ L, let D i := j∈R I i j and J i = 1 {D i k} . Thus the indicators {J i } i∈L are independent. Also, let If i ∈ L and j ∈ R, then i j 6ki 1 = 6B −1 t 1/γ , and thus Since |R| = 3k, it follows that if i ∈ L, then ED i 3(1 − e −2 )k > 2.5k, and moreover, by a Chernoff bound (e.g. [26, (2.12)]), Since the indicators J i are independent for i ∈ L, another Chernoff bound shows that Alternatively, (9.20) and a union bound yield If 1 k t 1/2γ , then i 1 B −1 t 1/2γ , and thus (9.21) yields P(N < |L|/2) e −c 7 t 1/2γ . If t 1/2γ < k t 1/γ /B, then (9.22) yields P(N < |L|/2) i 1 e −c 8 t 1/2γ C 9 e −c 9 t 1/2γ . Consequently, for every k t 1/γ /B, P(N < |L|/2) P(N < |L|/2) C 10 e −c 10 t 1/2γ . (9.23) We have kept k and t fixed, but we now sum (9.23) over all k t 1/γ /B and t = 2 for some ∈ N 0 . It follows by the Borel-Cantelli lemma that a.s for every large t of this form and every k t 1/γ /B, N |L|/2 i 1 /6 c 11 t 1/γ /k. This proves (9.6) for t of the form 2 , and again the general case follows since N is monotone in t.
Furthermore, assuming q i ∼ ci −γ in Example 9.1, we can show thatG t and G m converge a.s to a graphon of the type defined by [39] and mentioned in Sect. 5.1; these graphons are measurable functions W : R 2 + → [0, 1], such that the random graphs G(t, W ) defined in (5.5) are a.s finite. (See [39] for precise conditions; see also [3,9] for related versions.) Recall that the standard graphons discussed in Sect. 3 are useful for dense graphs, but not for sparse graphs as here; the more general graphons in [39] are intended for sparse graphs.
Veitch and Roy [40] defined two notions → GP and → GS of convergence for such general graphons on R + (and the even more general graphexes defined in [39]) based on convergence in distribution of the corresponding random graphs G(t, W ). We can define W n → GP W as meaning G(r, W n ) d −→ G(r, W ) for each fixed r < ∞, see further [24,40]. Furthermore, the random graphs G(r, W ) are naturally coupled for different r and form an increasing graph process (G(r, W )) r 0 . Let (G τ k (W )) k be the sequence of different graphs that occur among G(r, W ) for r 0. Then W n → GS W if (G τ k (W n )) k d −→ (G τ k (W )) k ; again see further [24,40].
Recall that for a finite graph G, we defined a corresponding graphon W G in Sect. 3. In the context of graphons on R + , [40] define for every s > 0 a modification W G,s , called the dilated empirical graphon, as follows. We may assume that G has vertices labelled 1, . . . , v(G); then W G (i, j) := 1 {i∼ j} for i, j v(G); we extend this by W G (i, j) := 0 when i ∨ j > v(G). Then, for every s > 0, let the dilated graphon W G,s be the function R 2 + → {0, 1} given by W G,s (x, y) := W G ( sx , sy ). Hence, every vertex in G corresponds to an interval of length 1/s in the domain of W G,s .
If G n is a sequence of graphs and W a graphon, then G n → GS W means that W G n → GS W ; furthermore, the convergence → GS is insensitive to dilations, so G n → GS W is equivalent to W G n ,s n → GS W for any sequence s n > 0. Remark 9. 3 We have in Sect. 5.1 given the version of G(r, W ) without loops; more generally, one can allow i = j in (5.4) and thus allow loops. The loopless case considered here then is obtained by imposing W (x, x) = 0 for x > 0. Hence, for the version with loops, Theorem 9.4 below still holds, provided we redefine W to be 0 on the diagonal.
Note that W (x, y) 1 − exp(−2c 2 ) > 0 when x y 1, and thus W = ∞. We prove first two lemmas. Lemma 9.5 Let (Z kl ) k,l be an array of i.i.d. random variables. Furthermore, let x 1 , . . . , x n > 0 be distinct and let X be a random variable, independent of the array (Z kl ) k,l , with as t → ∞.
In other words, conditionally on (Z kl ) k,l and for a.e.every realization of (Z kl ) k,l , the random vector (Z t x i , t X ) n i=1 converges in distribution to (Z i,n+1 ) n i=1 , where (Z kl ) k,l is an independent copy of (Z kl ) k,l .
Proof It suffices to prove that for every fixed rational z 1 , . . . , z n , Let further I k,l,i := 1 {Z kl z i } and J l := n i=1 I t x i ,l,i . Then, with the error term coming from edge effects, If t is sufficiently large, then t x 1 , . . . , t x n are distinct, and then, see (9.26), (9.28) Furthermore, then the variables J l ∼ Be(π) are i.i.d., so their sum in (9.27) has a binomial distribution, and a Chernoff bound shows that for every ε > 0, there is a c = c(ε) > 0 such that for large t, This shows that P t converges to π in probability as t → ∞. In order to show convergence a.s, we note that if 0 < t < u, and t (b − a) > 1, then (for fixed a and b) P( t X = u X ) = O(u − t), and consequently, for some C > 0, Let ε > 0, let N := C/ε and let t n := n/N . By (9.29) and the Borel-Cantelli lemma, a.s |P t n − π| ε for all large n. Furthermore, if n is large and t n t t n+1 , then (9.30) implies |P t − P t n | ε, and thus |P t − π| 2ε. Consequently, a.s , |P t − π| 2ε for every large t.
Since ε is arbitrary, this proves (9.25) and thus the lemma.
Lemma 9.6 Let (Z kl ) k,l be an array of i.i.d. random variables, and let (X 1 , . . . , X n ) be a random vector in R n + with an absolutely continuous distribution, independent of the array (Z kl ) k,l . Then, as t → ∞.

Proof
Step 1 Assume first that X 1 , . . . , X n are independent with X i ∼ U (I i ) for some intervals I 1 , . . . , I n . In this case we prove (9.31) by induction on n, so we may assume that Furthermore, by Lemma 9.5 and conditioning on X 1 , . . . , X n−1 , The result (9.31) follows by (9.32) and (9.33), which shows the induction step and completes the proof of this step.
Step 2 Suppose that there exists a finite family of disjoint intervals I k such that the density function f (x 1 , . . . , x n ) of (X 1 , . . . , X n ) is supported on k I k n and constant on each n i=1 I k i . Then Step 1 shows that for each sequence k 1 , . . . , k n of indices, (9.31) holds conditioned on (X 1 , . . . , X n ) ∈ n i=1 I k i . Hence (9.31) holds unconditioned too.
Step 3 The general case. Let f (x 1 , . . . , x n ) be the density function of (X 1 , . . . , X n ), and let ε > 0. Then there exists a density function f 0 (x 1 , . . . , x n ) of the type in Step 2 such that | f − f 0 | dx 1 . . . dx n < ε. We can interpret f 0 as the density function of a random vector X 0 = (X 0 1 , . . . , X 0 n ), and we can couple this vector with X = (X 1 , . . . , X n ) such that P X = X 0 < ε. Since Step 2 applies to X 0 , it follows that P the convergence in (9.31) holds Since ε > 0 is arbitrary, (9.31) follows.
Proof of Theorem 9.4 Let w x := c −1 q x x γ = 1 + o(1), as x → ∞. We can constructG t for all t > 0 by taking i.i.d. random variables Z kl ∼ Exp(1) and letting there be an edge kl inG t if 2tq k q l Z kl , for every pair (k, l) with k < l. LetŴ t := WG t ,t 1/2γ be the dilated empirical graphon in the statement. Fix r > 0, and consider the random graph G(r,Ŵ t ); this is by (5.4)-(5.5) obtained by taking a Poisson process {η i } i on R + with intensity r (where we assume η 1 < η 2 < . . . ), and then taking an edge i j if and only ifŴ t (η i , η j ) = 1. By the definition ofŴ t , this is equivalent toG t having an edge between t 1/2γ η i and t 1/2γ η j , and thus by the construction ofG t to (assuming that t is large so that t 1/2γ η i = t 1/2γ η j ) 2tq t 1/2γ η i q t 1/2γ η j Z t 1/2γ η i , t 1/2γ η j (9.35) or, equivalently, Fix n < ∞ and consider the edge indicators I i, j,t in G(r,Ŵ t ) for 1 i < j n. Furthermore, fix a large integer N and condition (η 1 , . . . , η n ) on N η 1 , . . . , N η n . By Lemma 9.6, and recalling w x = 1 + o(1), the distribution of the right-hand sideof (9.36) converges a.s to independent Exp(1) variables, jointly for 1 i < j n. Since I i, j,t equals the indicator of (9.36), it follows by first replacing the left-hand sideof (9.36) by upper and lower bounds obtained by rounding each η i down or up to the nearest multiple of 1/N , applying Lemma 9.6 and then letting N → ∞, that (9.37) Here, conditioned on η 1 , . . . , η n , the indicators in the right-hand sideare independent, and have (conditional) expectations . (9.38) This equals the (conditional) probability of an edge i j in G(r, W ). Consequently, if I i, j is the indicator of an edge i j in G(r, W ), (9.37) shows that a.s , as t → ∞, This shows the desired convergence G(r,Ŵ t ) d −→ G(r, W ), provided we restrict the graphs to a fixed finite set of vertices.
Moreover, G m has a.s a power-law degree distribution with exponent τ = 2 in the sense of Theorem 9.2. Furthermore, Theorem 9.4 shows that G m → GS W a.s as m → ∞ and that the dilated empirical graphon converges a.s in the sense WG t ,t α/2 → GP W , where W is the random graphon W (x, y) = 1 − exp −2Z 2 x −1/α y −1/α on R + .

Extremely Sparse Examples
We can obtain extremely sparse examples in several ways.
First, Theorem 6.10 shows that any example including dust or attached stars is extremely sparse.
Another way to obtain extremely sparse graphs is to force the degrees to be bounded, as follows.
Example 10.1 Let μ = (μ i j ) ∞ i, j=1 be a symmetric non-negative matrix with 0 < μ < ∞ and assume that each row contains at most d non-zero entries, for some d < ∞. (For example, let μ be a band matrix, with μ i j = 0 unless 0 < |i − j| d/2.) Since an edge i j can exist only when μ i j > 0, it follows that every vertex in G m has degree at most d. Hence the sequence G m has bounded degree, and in particular G m is sparse; to be more precise we have v(G m ) 2e(G m ) dv(G m ). (10.1) Less obviously, it is also possible to obtain extremely sparse graphs in the rank 1 case, with a sequence q i that decreases very slowly (remember that i q i = 1 by assumption). We give one such example.
Example 10.2 Consider the rank 1 case (Sect. 7.1) with q i = c/(i log 2 i) for i 2, where c is the appropriate normalization constant. (Any (q i ) with q i 1/(i log 2 i) would yield the same results below.) Recall that, by comparison with an integral, i k 1/(i log 2 i) ∼ 1/ log k as k → ∞.
We will see that (in a sense made precise below) almost all edges belong to stars, and that, moreover, most edges and vertices belong to a small (finite) number of stars; in particular, most vertices have degree 1.
For large t, let (t) := t/ log 2 t . Then (t) log 2 (t) ∼ t, and thus, using (7.6), The expected number of edges is by (7.5), We split the sum in (10.3) into three (overlapping) parts. The case j t 0.4 yields at most i 1 j t 0.4 t i log 2 (i + 1) j log 2 ( j + 1) C 1 t log t . We can be more precise. Recall that N i (t) is the degree of vertex i inG * t ; by (6.3) N i (t) ∼ Po(μ i t). Consequently, using also (7.2), Summing over i > t/ log 2 t we obtain Hence the expected number of edges that have one endpoint in (t/ log 2 t, ∞) and that endpoint is not isolated is O(t/ log 2 t). Moreover, the expected number of edges with both endpoints in [1, t/ log 2 t] is at most i< j t/ log 2 t 2((tq i q j ) ∧ 1) t 0.8 + t 0.4 < j t/ log 2 t i< j 2((tq i q j ) ∧ 1) (10.9) where the last sum is at most a constant times, cf.

Conclusions
For the multigraph version, the examples in Sect. 7 seem very interesting, but perhaps a bit special. We do not know whether they are typical for a large class of interesting examples or not.
For the simple graph version, the examples above show a great variety of different behaviour. Nevertheless, the results are somewhat disappoining for applications; the relations between the intensity matrix (μ i j ) and properties of the random graphs G m such as edge density and degree distribution are far from obvious, and it is not clear how one can choose the intensity matrix to obtain desired properties; for example, we do not know any example of a power-law degree distribution with an exponent τ = 2.
Consequently, for both versions, it seems desirable to study more examples, as well as to find more general theorems.
The present paper is only a first step (or rather second step, after [7,8,11,12]), of the investigation of these random graphs, and it seems too early to tell whether they will be useful as random graph models for various applications or not.