1 Introduction

Many networks were found to have a degree distribution that is well approximated by a power-law distribution with exponent \(\tau \in (2,3)\). These power-law real-world networks are often modeled by random graphs: randomized mathematical models that create networks. One of the most natural random graph models to consider is the uniform random graph [14, 16]. Given a degree sequence, the uniform random graph samples a graph uniformly at random from all possible graphs with exactly that degree sequence.

The most common way to analyze uniform random graphs, is to analyze the configuration model, another random graph model that is easier to analyze, instead [2]. The configuration model creates random multigraphs with a specified degree sequence, i.e., graphs where multiple edges and self-loops can be present. When conditioning on the event that the configuration model results in a simple graph, it is distributed as a uniform random graph. If the probability of the event that the configuration model results in a simple graph is sufficiently large, it is possible to translate results from the configuration model to the uniform random graph. In the case of power-law degrees with exponent \(\tau \in (2,3)\) however, the probability of the configuration model resulting in a simple random graph vanishes, so that the configuration model cannot be used as a method to analyze power-law uniform random graphs [11]. In this setting, uniform random graphs need to be analyzed directly instead. However, analyzing uniform random graphs is in general complex: the presence of edges are dependent, and there is no simple algorithm for constructing a uniform random graph with power-law degrees.

Several other random graph models that are easy to generate, create graphs with approximately the desired degree sequence. The most prominent such models are rank-1 inhomogeneous random graphs [1, 3, 4]. In these models, every vertex is equipped with a weight, and pairs of vertices are connected independently with a probability that is a function of the vertex weights. Another such model is the erased configuration model, which erases all multiple edges and self-loops in the configuration model [3]. As these models are easy to generate and easy to analyze, they are often analyzed as a proxy for random graphs with a desired degree sequence.

In this paper, we investigate induced subgraphs in uniform random graphs. Several special cases of subgraph counts in uniform random graphs have been analyzed before, such as cycles [8, 16]. However, existing results often need a bound on the maximal degree in the graph or the assumption that all degrees are equal, which does not allow for analyzing power-law random graphs with \(\tau \in (2,3)\). Recently, triangles in uniform power-law random graphs have also been analyzed [5]. In this paper, we investigate the subgraph count of all possible induced subgraphs by using a recent method based on optimization models [7, 10] which enabled to analyze subgraph counts in erased configuration models and preferential attachment models. We combine this method with novel estimates on the connection probabilities in uniform random graphs [6] to obtain an optimization model that finds the most likely composition of an induced subgraph of a power-law uniform random graph. This method allows us to localize and enumerate all possible induced subgraphs.

We then use this optimization problem to design a randomized algorithm that distinguishes two types of rank-1 inhomogeneous random graphs from uniform random graphs in linear time. Interestingly, this shows that approximate-degree power-law random graphs are fundamentally different in structure from power-law uniform random graphs. Indeed, there are subgraphs that appear significantly more often in uniform random graphs than in these rank-1 inhomogeneous random graphs. Furthermore, the optimization problem that we use to prove results on the number of subgraphs allows to detect these differences in linear time, while subgraph counting in general cannot be done in linear time.

We first introduce the uniform random graph and the induced subgraph counts in Sect. 1. Then, we present our main results on subgraph counts in the large network limit in Sect. 2. After that, we discuss the implications of these results for distinguishing uniform random graphs from inhomogeneous random graphs in Sect. 2.2. We then provide the proofs of our main results in Sects. 36.

Notation We denote \([k]=\{1,2,\ldots ,k\}\). We say that a sequence of events \(({{\mathcal {E}}}_n)_{n\ge 1}\) happens with high probability (w.h.p.) if \(\lim _{n\rightarrow \infty }{{\mathbb {P}}}\left( {{\mathcal {E}}}_n\right) =1\) and we use \({\mathop {\longrightarrow }\limits ^{\scriptscriptstyle {{{\mathbb {P}}}}}}\) for convergence in probability. We write \(f(n)=o(g(n))\) if \(\lim _{n\rightarrow \infty }f(n)/g(n)=0\), and \(f(n)=O(g(n))\) if |f(n)|/g(n) is uniformly bounded. We write \(f(n)=\Theta (g(n))\) if \(f(n)=O(g(n) )\) as well as \(g(n)=O(f(n))\). We say that \(X_n=O_{\scriptscriptstyle {{{\mathbb {P}}}}}(g(n))\) for a sequence of random variables \((X_n)_{n\ge 1}\) if \(|X_n|/g(n)\) is a tight sequence of random variables, and \(X_n=o_{\scriptscriptstyle {{{\mathbb {P}}}}}(g(n))\) if \(X_n/g(n){\mathop {\longrightarrow }\limits ^{\scriptscriptstyle {{{\mathbb {P}}}}}}0\).

Uniform random graphs Given a positive integer n and a graphical degree sequence: a sequence of n positive integers \({\varvec{d}}=(d_1,d_2,\ldots , d_n)\), the uniform random graph (\(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\)) is a simple graph, uniformly sampled from the set of all simple graphs with degree sequence \((d_i)_{i\in [n]}\). Let

\(d_{\max }=\max _{i\in [n]}d_i\) and \(L_n=\sum _{i=1}^n d_i\). We denote the empirical degree distribution by

$$\begin{aligned} F_n(j)=\frac{1}{n}\sum _{i\in [n]}\mathbb {1}{\left\{ d_i\le j\right\} }. \end{aligned}$$
(1.1)

We study the setting where the variance of \({\varvec{d}}\) diverges when n grows large. In particular, this implies that \(d_{\max }\) grows as a function of the network size n. In particular, we assume that the degree sequence satisfies the following assumption:

Assumption 1.1

(Degree sequence)

  1. (i)

    There exist \(\tau \in (2,3)\) and constants \(K_1,K_2>0\) such that for every \(n\ge 1\) and every \(0\le j\le d_{\max }\),

    $$\begin{aligned} K_1 j^{1-\tau }\le 1-F_n(j)\le K_2 j^{1-\tau }. \end{aligned}$$
    (1.2)
  2. (ii)

    There exist \(\tau \in (2,3)\) and a constant \(C>0\) such that, for all \(j=O(\sqrt{n})\),

    $$\begin{aligned} 1-F_n(j) = Cj^{1-\tau }(1+o(1)). \end{aligned}$$
    (1.3)

It follows from (1.2) that

$$\begin{aligned} d_{\max } < M n^{1/(\tau -1)},\quad \text{ for } \text{ some } \text{ sufficiently } \text{ large } \text{ constant }\ M>0. \end{aligned}$$
(1.4)

Furthermore, Assumptions (i) and (ii) together show that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^nd_i=\lim _{n\rightarrow \infty }\frac{L_n}{n}=\mu <\infty , \end{aligned}$$
(1.5)

for some \(\mu >0\).

2 Main Results

We now present our main results. Let \(H=(V_H,{{{\mathcal {E}}}}_H)\) be a small, connected graph. We are interested in N(H), the induced subgraph count of H, the number of subgraphs of \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\) that are isomorphic to H. Let \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}\) denote the induced subgraph obtained by restricting \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\) to vertices \({\varvec{v}}\). We can write the probability that an induced subgraph H with \(|V_H|=k\) is created on k uniformly chosen vertices \({\varvec{v}}=(v_1, \ldots , v_k)\) in \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\) as

$$\begin{aligned} {{\mathbb {P}}}\left( \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}= H\right) =\sum _{{\varvec{d}}'}{{\mathbb {P}}}\left( \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}=H \mid d_{{\varvec{v}}}={\varvec{d}}'\right) {{\mathbb {P}}}\left( d_{{\varvec{v}}}={\varvec{d}}'\right) , \end{aligned}$$
(2.1)

where the sum is over all possible degrees on k vertices \({\varvec{d}}'=(d_i')_{i\in [k]}\), and \(d_{{\varvec{v}}}=(d_{v_i})_{i\in [k]}\) denotes the degrees of the randomly chosen set of k vertices. Recently, it has been shown that in erased configuration models, there is a specific range of \(d_1',\ldots ,d_k'\) that gives the maximal contribution to the amount of subgraphs of those degrees, sufficiently large to ignore all other degree ranges [10]. In this paper, we show that also (2.1) is optimized for specific ranges \(d_1',\ldots ,d_k'\) that depend on the subgraph H.

Furthermore, we show that when (2.1) is maximized by a unique range of degrees, there are only four possible ranges of degrees that maximize the term inside the sum in (2.1). These ranges are constant degrees, or degrees proportional to \(n^{(\tau -2)/(\tau -1)}\), to \(\sqrt{n}\) or to \(n^{1/(\tau -1)}\). Interestingly, these are the same ranges that contribute to the erased configuration model [10]. However, the optimal distribution of the subgraph vertices over these ranges may be different in the erased configuration model and the uniform random graph.

2.1 Optimizing the Subgraph Degrees

We now present the optimization problems that maximizes the summand in (2.1) for induced subgraphs. Let \(H=(V_H,{{{\mathcal {E}}}}_H)\) be a small, connected graph on \(k\ge 3\) vertices. Denote the set of vertices of H that have degree one inside H by \(V_1\). Let \({{\mathcal {P}}}\) be all partitions of \(V_H\setminus V_1\) into three disjoint sets \(S_1,S_2,S_3\). This partition into \(S_1,S_2\) and \(S_3\) corresponds to the optimal orders of magnitude of the degrees in (2.1): \(S_1\) is the set of vertices with degree proportional to \(n^{(\tau -2)/(\tau -1)}\), \(S_2\) the set with degrees proportional to \(n^{1/(\tau -1)}\), and \(S_3\) the set of vertices with degrees proportional to \(\sqrt{n}\). We then derive an optimization problem that finds the partition of the vertices into these three orders of magnitude that maximizes the contribution to the number of induced subgraphs. When a vertex in H has degree 1, its degree in \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\) is typically small, i.e., it does not grow with n.

Given a partition \({{\mathcal {P}}}=(S_1,S_2,S_3)\) of \(V_H\setminus V_1\), let \({{{\mathcal {E}}}}_{S_i}\) denote the set of edges in H between vertices in \(S_i\) and \(E_{S_i}=|{{{\mathcal {E}}}}_{S_i}|\) its size, \({{{\mathcal {E}}}}_{S_i,S_j}\) the set of edges between vertices in \(S_i\) and \(S_j\) and \(E_{S_i,S_j}=|{{{\mathcal {E}}}}_{S_i,S_j}|\) its size, and finally \({{{\mathcal {E}}}}_{S_i,V_1}\) the set of edges between vertices in \(V_1\) and \(S_i\) and \(E_{S_i,V_1}=|{{{\mathcal {E}}}}_{S_i,V_1}|\) its size. We now define the optimization problem that optimizes the summand in (2.1) as

$$\begin{aligned} B(H)&=\max _{{{\mathcal {P}}}}\Big [\left| S_1\right| +\frac{1}{\tau -1}\left| S_2\right| (2-\tau -k+|S_1|+k_1)\nonumber \\&\quad -\frac{2E_{S_1}-2E_{S_2}+E_{S_1,S_3}-E_{S_2,S_3}+E_{S_1,V_1}-E_{S_2,V_1}}{\tau -1}\Big ]. \end{aligned}$$
(2.2)

In Sect. 3, we show that this optimization problem is a measure of how likely a configuration of vertices in the three sets \(S_1, S_2, S_3\) is to form subgraph H. The first term in B(H) gives a positive contribution for all vertices in \(S_1\). These are vertices with relatively low degree. In the second term, note that \(k-|S_1|-k_1=|S_3|+|S_2|\), so that the term within brackets is negative, as we assume that \(\tau \in (2,3)\). Thus the second term gives a negative contribution for vertices in \(S_2\), which have high degrees. Therefore, the first two terms in the optimization problem capture that high-degree vertices are rare, and low-degree vertices abundant. The last term gives a negative contribution for all edges between vertices with relatively low degrees in the subgraph, such as edges to vertices in \(S_1\), while it gives a positive contribution for edges to high-degree vertices with one end in \(S_2\). This captures the other part of the trade-off: high-degree vertices are more likely to connect to other vertices than low degree vertices. Note that \(B(H)\ge 0\), since putting all vertices in \(S_3\) yields zero.

Let \(S_1^* ,S_2^* ,S_3^* \) be a maximizer of (2.2). Furthermore, for any \((\alpha _1,\ldots , \alpha _k)\) such that \(\alpha _i\in [0,1/(\tau -1)]\), define

$$\begin{aligned} M_n^{(\varvec{\alpha })}(\varepsilon )=\{ ({v_1,\ldots , v_k}):d_{{v_i}}\in [\varepsilon ,1/\varepsilon ] (\mu n) ^{\alpha _i}\ \forall i\in [k] \}. \end{aligned}$$
(2.3)

These are the sets of vertices \((v_1,\ldots , v_k)\) such that \({d_{v_1}}\) is proportional to \(n^{\alpha _1}\) and \({d_{v_2}}\) proportional to \(n^{\alpha _2}\) and so on. Denote the number of subgraphs with vertices in \(M_n^{(\varvec{\alpha })}(\varepsilon )\) by \(N (H,M_n^{(\varvec{\alpha })}(\varepsilon ))\). Define the vector \(\varvec{\alpha }\) as

$$\begin{aligned} {\alpha }_i ={\left\{ \begin{array}{ll} (\tau -2)/(\tau -1)&{} i\in S _1^*,\\ 1/(\tau -1)&{} i\in S _2^*,\\ \tfrac{1}{2} &{} i\in S _3^*,\\ 0 &{} i\in V_1. \end{array}\right. } \end{aligned}$$
(2.4)

The next theorem shows that sets of vertices in \(M_n^{\varvec{\alpha } }(\varepsilon )\) contain a large number of subgraphs, and computes the scaling of the number of induced subgraphs:

Theorem 2.1

(General induced subgraphs) Let H be a subgraph on k vertices such that the solution to (2.2) is unique.

  1. (i)

    For any \(\varepsilon _n\) such that \(\lim _{n\rightarrow \infty }\varepsilon _n=0\),

    $$\begin{aligned} \frac{N \big (H,M_n^{(\varvec{\alpha } )}\left( \varepsilon _n\right) \big ) }{N (H)}{\mathop {\longrightarrow }\limits ^{\scriptscriptstyle {{{\mathbb {P}}}}}}1, \end{aligned}$$
    (2.5)

    with \(\varvec{\alpha }\) as defined in (2.4).

  2. (ii)

    Furthermore, for any fixed \(0<\varepsilon <1\),

    $$\begin{aligned} \frac{N (H,M_n^{(\varvec{\alpha } )}(\varepsilon ))}{n^{\frac{3-\tau }{2}(k_{2+}+B (H))+k_1/2}} \le f(\varepsilon )+o_{\scriptscriptstyle {{{\mathbb {P}}}}}(1), \end{aligned}$$
    (2.6)

    and

    $$\begin{aligned} \frac{N (H,M_n^{(\varvec{\alpha } )}(\varepsilon ))}{n^{\frac{3-\tau }{2}(k_{2+}+B (H))+k_1/2}} \ge {\tilde{f}}(\varepsilon )+o_{\scriptscriptstyle {{{\mathbb {P}}}}}(1), \end{aligned}$$
    (2.7)

    for some functions \(f(\varepsilon ),{\tilde{f}}(\varepsilon )<\infty \) not depending on n and with \(\varvec{\alpha }\) as defined in (2.4). Here \(k_{2+}\) denotes the number of vertices in H of degree at least 2, and \(k_1\) the number of degree-one vertices in H.

Thus, Theorem 2.1(i) shows that asymptotically, all induced subgraphs H have vertices in \(M_n^{\varvec{\alpha } }(\varepsilon )\), and Theorem 2.1(ii) then computes the scaling in n of the number of such induced subgraphs.

Now we study the special class of induced subgraphs for which the unique maximum of (2.2) is \(S_3^*=V_H\). By the above interpretation of \(S_1^*\), \(S_2^*\) and \(S_3^*\), these are induced subgraphs where the maximum contribution to the number of such subgraphs is from vertices with degrees proportional to \(\sqrt{n}\) in \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\). Plugging in \(S_3^*=V_H\) in (2.2) yields \(B(H)=0\), as then \(S_1^*,S_2^*=\emptyset \) so that all terms containing \(S_1\) and \(S_2\) disappear. This also means that when the maximizer of (2.2) is unique, and attained by \(B(H)=0\), the maximizer must be attained by \(S_3^*=V_H\). For such induced subgraphs, we can obtain the detailed asymptotic scaling including the leading constant:

Theorem 2.2

(Induced subgraphs with \(\sqrt{n}\) degrees) Let H be a connected graph on k vertices with minimal degree 2 such that the solution to (2.2) is unique, and \(B (H)=0\). Then,

$$\begin{aligned} \frac{N (H)}{n^{\frac{k}{2}(3-\tau )}}{\mathop {\longrightarrow }\limits ^{\scriptscriptstyle {{{\mathbb {P}}}}}}A (H)<\infty , \end{aligned}$$
(2.8)

with

$$\begin{aligned}&A (H) = \left( \frac{C\times (\tau -1)}{\mu ^{(\tau -1)/2}}\right) ^k\!\!\int _{0}^{\infty }\!\cdots \! \int _{0}^{\infty }(x_1\cdots x_k)^{-\tau }\prod _{{\{{i,j}\}\in {{{\mathcal {E}}}}_{H}}}\frac{x_ix_j}{1+x_ix_j} \nonumber \\&\quad \prod _{{\{{u,v}\}\notin {{{\mathcal {E}}}}_{H}}}\frac{1}{1+x_ux_v}\mathrm{d}x_1\cdots \mathrm{d}x_k. \end{aligned}$$
(2.9)

\(\sqrt{n}\)-subgraphs Theorem 2.2 provides detailed asymptotics for induced subgraphs where the optimizer of (3.19) is given by \(S_3=V_H\). These induced subgraphs include all complete graphs and all cycles. Figure 1 shows the optimal structures of all induced subgraphs on 4 vertices. This figure indicates that on 4 vertices, Theorem 2.2 is applicable to the cycle and the complete induced subgraphs only.

Fig. 1
figure 1

Optimal structures of induced subgraphs on 4 vertices. The vertex colors indicate the typical degrees. The gray vertices indicate vertices for which the optimal structures are non-unique

Optimal induced subgraph structures Interestingly, Theorem 2.1 implies that the number of copies of a specific induced subgraph H is dominated by the number of copies in which its vertices embedded in \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\) have specific degrees, determined by maximizing (2.2). First restricting to these degrees, and then analyzing the subgraph count, allows to obtain the scaling of the number of induced subgraphs in power-law uniform random graphs where the analysis method by using the configuration model breaks down. Furthermore, this does not only give us information on the total number of subgraphs, but also on where in the graph we are most likely to find them (i.e., on which degrees).

Automorphisms of H An automorphism of a graph H is a map \(V_H\mapsto V_H\) such that the resulting graph is isomorphic to H. In Theorem 2.2 we count automorphisms of H as separate copies, so that we may count multiple copies of H on one set of vertices and edges. Therefore, to count the number of induced subgraphs without automorphisms, one should divide the results of Theorem 2.2 by the number of automorphisms of H.

Uniqueness of the solution One of the assumptions of Theorem 2.2 is that the solution to (3.19) is unique. The smallest subgraph such that this is not the case is depicted in Fig. 2. Here one of the optimal solutions is \(S_3=V_H\) yielding \(B(H)=0\), as shown in Fig. 2a. The other optimal solution is shown in Fig. 2b, and contains vertices in \(S_1\) and \(S_2\) instead. On all 5-vertex subgraphs on the other hand, the optimizer turns out to be unique.

Fig. 2
figure 2

Two optimal structures for the diamond subgraph, where the optimizer is non-unique. The vertex colors indicate the typical degrees

2.2 Distinguishing Uniform Random Graphs from Rank-1 Inhomogeneous Random Graphs

Uniform random graphs create random networks that are uniformly sampled from all graphs with precisely a desired degree sequence. However, in the power-law degree range with \(\tau \in (2,3)\), it is difficult to generate such graphs, as the method that generates a configuration model until a simple graph is obtained does not work anymore [11]. Therefore, random graph models that generate networks with approximately a desired degree sequence are often used as a proxy for uniform random graphs instead, as many of these models are easy to generate. One such model is the rank-1 inhomogeneous random graph [1, 4]. In the inhomogeneous random graph, every vertex i is equipped with a weight \(w_i\). We here assume that these weights are sampled from a power-law distribution with \(\tau \in (2,3)\). Then, several choices of the connection probability \(p(w_i,w_j)\) are possible. Common choices are [1, 4]

$$\begin{aligned} p(w_i,w_j)&=\min \Big (\frac{w_iw_j}{\mu n},1\Big ), \end{aligned}$$
(2.10)
$$\begin{aligned} p(w_i,w_j)&=e ^{-w_iw_j/(\mu n)},\end{aligned}$$
(2.11)
$$\begin{aligned} p(w_i,w_j)&=\frac{w_iw_j}{w_iw_j+\mu n}, \end{aligned}$$
(2.12)

where \(\mu \) denotes the average weight. By choosing the connection probabilities in this manner, the degree of vertex i is approximately \(w_i\) with high probability [9].

Theorems 2.1 and 2.2 indicate that in terms of induced subgraphs, the uniform random graph produces the same results as the rank-1 inhomogeneous random graph with connection probabilities as in (2.12). Intuitively, this can be seen from the constant A(H) in Theorem 2.2, where this connection probability appears in a scaled form. Indeed, our proofs are based on the fact that in the uniform random graph, the probability that two vertices form a connection can be approximated by 2.12 (see Lemma 3.1). Therefore, Theorems 2.1 and 2.2 also hold for rank-1 inhomogeneous random graphs with connection probability (2.12):

Theorem 2.3

Let \((G_n)_{n\ge 1}\) be a sequence of rank-1 inhomogeneous random graph with weight sequence satisfying Assumption (1.1), and connection probabilities as in (2.12). Then, Theorems 2.1 and 2.2 also hold for the number of copies of subgraph H in \(G_n\).

In [10, 15], similar theorems for random graphs with connection probability (2.11) and (2.10) were derived. The number of all induced subgraphs in the model with connection probability (2.10) has the same scaling in n as the number of induced subgraphs in the models with connection probability (2.11). However, the scaling in n of the number of copies of some induced subgraphs in models generated from (2.11) and (2.10) may be different from the scaling in the uniform random graph. The smallest such subgraphs are of size 6, and are plotted in Fig. 3. Figure 3 shows that these two subgraphs appear significantly more often in the uniform random graph than in the inhomogeneous random graphs.

Interestingly, this means that rank-1 inhomogeneous random graphs and uniform random graphs can be distinguished by studying small subgraph patterns of size 6. Previous results showed that random graphs with connection probability (2.12) are distinguishable from those generated by connection probabilities (2.11) and (2.10) by their maximum clique size, which differs by a factor of \(\log (n)\) [12]. However, finding the largest clique is an NP-hard problem [13], while this method only needs subgraphs of size 6 as an input, which can be detected in polynomial time. Furthermore, the difference between the amounts of the induced subgraphs of Fig. 3 is not a logarithmic factor, but a polynomial factor, making it easier to detect such differences.

Specifically, we can show that in only O(n) time, it is possible to distinguish between power-law uniform random graphs and the approximate-degree random graph models of (2.10) and (2.11) with high probability:

Theorem 2.4

There exists a randomized algorithm that distinguishes power-law uniform random graphs from power-law rank-1 inhomogeneous random graphs with connection probabilities (2.10) or (2.11) in time O(n) with accuracy at least \(1-n^\gamma e ^{-cn^{\beta }}\) for some \(\gamma ,\beta ,c>0\).

We will prove this theorem in Sect. 6, where we also introduce the randomized algorithm that distinguishes between these two random graph models. This algorithm is based on the subgraph displayed in Fig. 3c and d. It first selects vertices that have degrees close to \(n^{1/(\tau -1)}\) and \(n^{(\tau -2)/(\tau -1)}\), and then randomly searches among those vertices for the induced subgraph of Fig. 3c. In a uniform random graph, this will be successful with high probability, whereas in the rank-1 inhomogeneous random graphs with connection probabilities (2.10) and (2.11) the algorithm fails with high probability.

Fig. 3
figure 3

Two induced subgraphs on 6 vertices with their scaling in n and optimal degrees the uniform random graph from Theorem 2.1 (a) and (c) and inhomogeneous random graphs with connection probabilities (2.10) and (2.11) obtained from [10] (b) and (d)

Organization of the proofs We will prove Theorems 2.12.4 in the following sections. First, Sect. 3 proves Theorem 2.1(ii), by calculating the probability that H appears on a specified subset of vertices, and optimizing that probability. Then, Sect. 4 proves Theorem 2.2 with a second moment method. Section 5 proves Theorem 2.1(i), and Sect. 6 introduces and analyzes the randomized algorithm that proves Theorem 2.4.

3 Proof of Theorem 2.1(ii)

We first provide an overview of the proof strategy. The main step in proving Theorem 2.1 is estimating the probability that a subgraph appears on vertices on specific degrees. We will show that this probability scales as a power of the network size n in Sect. 3.2. After that, we optimize this power of n as a function of the vertex degrees to obtain the vertex degrees that carry the most copies of subgraph H. In Lemma 3.2 we characterize these vertex degrees and show that they are given by the sets \(S_1\), \(S_2\) and \(S_3\), yielding the optimization problem (3.19).

3.1 Subgraph Probability in the Uniform Random Graph

We first investigate the probability that a given small graph H appears as an induced subgraph of \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\) on a specific set of vertices \({\varvec{v}}\). We denote the degree of a vertex i inside induced subgraph H by \(d_i^{\scriptscriptstyle {(H)}}\).

Lemma 3.1

Let H be a connected graph on k vertices, and let \({\varvec{d}}\) be a degree sequence satisfying Assumption 1.1. Furthermore, assume that \(d_{v_i}\gg 1\) or \(d_i^{(H)}=1\) for all \(i\in [k]\). Then,

$$\begin{aligned} {{\mathbb {P}}}\left( \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}=H\right) =\prod _{\{i,j\}\in {{{\mathcal {E}}}}_H}\frac{d_{v_i}d_{v_j}}{L_n+d_{v_i}d_{v_j}}\prod _{\{s,t\}\notin {{{\mathcal {E}}}}_H}\frac{{L_n}}{L_n+d_{v_s}d_{v_t}}(1+o(1)) \end{aligned}$$
(3.1)

Proof

Suppose that \(G^+\) is a subset of the edges of H, and \(G^-\) a subset of the non-edges of H. Let \({{\mathcal {G}}}^-\) denote the event that the non-edges of \(G^-\) are not present in \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}\) and let \({{\mathcal {G}}}^+\) denote the event that the edges of \(G^+\) are present in \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}\).

Let \(d_{(1)}\ge d_{(2)}\ge \cdots \ge d_{(n)}\) denote the ordered version of \({\varvec{d}}\). Then, by Assumption (i)

$$\begin{aligned} d_{(k)}\le W \left( \frac{n}{k}\right) ^{1/(\tau -1)}, \end{aligned}$$
(3.2)

for some constant \(W>0\). Therefore,

$$\begin{aligned} \sum _{i=1}^{Cn^{1/(\tau -1)}}d_{(i)}&\le \sum _{i=1}^{Cn^{1/(\tau -1)}} W\left( \frac{n}{i}\right) ^{1/(\tau -1)}\nonumber \\&\le Wn^{1/(\tau -1)}\int _{0}^{Cn^{1/(\tau -1)}}x^{1/(1-\tau )}\mathrm{d}x\nonumber \\&= {\tilde{C}}n^{1/(\tau -1)}n^{(\tau -2)/(\tau -1)^2} \end{aligned}$$
(3.3)

for some \({\tilde{C}}>0\). Thus, \(\sum _{i=1}^{Cn^{1/(\tau -1)}}d_{(i)}=o(n)\) for \(\tau \in (2,3)\), while \(L_n=\Theta (n)\) by (1.5). Therefore, we may apply [6, Corollary 2] to obtain

$$\begin{aligned} {{\mathbb {P}}}\left( \{i,j\}\in \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\mid {{\mathcal {G}}}^-,{{\mathcal {G}}}^+\right) =\frac{(d_i-d_i^{(G)})(d_j-d_j^{(G)})}{L_n+(d_i-d_i^{(G)})(d_j-d_j^{(G)})}(1+o(1)), \end{aligned}$$
(3.4)

where \(d_i^{(G)}\) denotes the degree of vertex i within \(G^+\). When \(d_i\gg 1\), then \(d_i-d_i^{(G)}=d_i(1+o(1))\) as \(d_i^{(G)}\le k-1\). Thus, when \(d_i\gg 1\) or \(d_i^{(G)}=0\) and \(d_j\gg 1\) or \(d_j^{(G)}=0\), then (3.4) becomes

$$\begin{aligned} {{\mathbb {P}}}\left( \{i,j\}\in \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\mid {{\mathcal {G}}}^-,{{\mathcal {G}}}^+\right) =\frac{d_id_j}{L_n+d_id_j}(1+o(1)). \end{aligned}$$
(3.5)

Therefore also

$$\begin{aligned} {{\mathbb {P}}}\left( \{i,j\}\notin \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\mid {{\mathcal {G}}}^-,{{\mathcal {G}}}^+\right) =\frac{{L_n}}{L_n+d_id_j}(1+o(1)). \end{aligned}$$
(3.6)

We now use (3.5) and (3.6) to compute the probability that H appears as an induced subgraph on vertices \({\varvec{v}}\). Let the m edges of H be denoted by \(e_1={\{i_1,j_1\},\ldots ,e_m=\{i_m,j_m\}}\), and the \({k\atopwithdelims ()2}-m\) non-edges of H by \({\bar{e}}_1={\{w_1,z_1\},\ldots ,{\bar{e}}_{k(k-1)/2-m}=\{w_m,z_m\}}\). Furthermore, define \(G_0^+=\emptyset \) and \(G_s^+=G_{s-1}^+\cup \{v_{i_s},v_{j_s}\}\). Similarly, define \(G_0^-=\emptyset \) and \(G_s^-=G_{s-1}^-\cup \{v_{w_s},v_{z_s}\}\). Then,

$$\begin{aligned} {{\mathbb {P}}}\left( \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}=H\right)&= \prod _{s=1}^m{{\mathbb {P}}}\left( \{v_{i_s},v_{j_s}\}\in \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\mid G_{s-1}^+\right) \nonumber \\&\quad \times \prod _{t=1}^{{k\atopwithdelims ()2}-m}{{\mathbb {P}}}\left( \{v_{w_s},v_{z_s}\}\notin \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\mid G_m^+,G_{s-1}^-\right) . \end{aligned}$$
(3.7)

We then use (3.5) and (3.6). This is allowed because \(d_{v_i}\gg 1\) or \(d_i^{\scriptscriptstyle {(H)}}=1\) for all \(i\in [k]\) so that when the edge \(e_l\) incident to vertex i is added to in (3.7), then \(d_i^{\scriptscriptstyle {(G_{l-1}^+)}}=0\). Indeed, when \(d_i^{\scriptscriptstyle {(H)}}=1\), then i has no other incident edges in H, and therefore degree zero in \(G_{l-1}^+\). Thus, we obtain

$$\begin{aligned} {{\mathbb {P}}}\left( \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}=H\right) =\prod _{\{i,j\}\in {{{\mathcal {E}}}}_H}\frac{d_{v_i}d_{v_j}}{L_n+d_{v_i}d_{v_j}}\prod _{\{s,t\}\notin {{{\mathcal {E}}}}_H}\frac{{L_n}}{L_n+d_{v_s}d_{v_t}}(1+o(1)). \end{aligned}$$
(3.8)

\(\square \)

3.2 Optimizing the Probability of a Subgraph

We now study the probability that H is present as an induced subgraph on vertices \((v_1, \ldots , v_k)\) of specific degrees. Assume that \(d_{{v_i}}\in [\varepsilon ,1/\varepsilon ]n^{\alpha _i}\) with \(\alpha _i\in [0,1/(\tau -1)]\) for \(i\in [k]\), so that \(d_{{v_i}}=\Theta (n^{\alpha _i})\).

Let H be an induced subgraph on k vertices labeled as \(1,\ldots ,k\). We now study the probability that \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}={{\mathcal {E}}}_H\).

Let \(X_{u,v}\) denote the indicator function of the event that edge \(\{u,v\}\) is present. When \(\alpha _i+\alpha _j< 1\), by Lemma 3.1

$$\begin{aligned} {{\mathbb {P}}}\left( X_{v_i,v_j}=0\right) =\Theta \Big (\frac{{\mu n}}{n^{\alpha _i+\alpha _j}+\mu n} \Big )(1+o(1))=1+o(1), \end{aligned}$$
(3.9)

while

$$\begin{aligned} {{\mathbb {P}}}\left( X_{v_i,v_j}=1\right) =\Theta \Big (\frac{n^{\alpha _i+\alpha _j}}{n^{\alpha _i+\alpha _j}+\mu n} \Big )(1+o(1))=\Theta (n^{\alpha _i+\alpha _j-1}). \end{aligned}$$
(3.10)

On the other hand, for \(\alpha _i+\alpha _j>1\),

$$\begin{aligned} {{\mathbb {P}}}\left( X_{v_i,v_j}=0\right) =\Theta \Big (\frac{{\mu n}}{n^{\alpha _i+\alpha _j}+\mu n} \Big )(1+o(1))=\Theta (n^{1-\alpha _i-\alpha _j}), \end{aligned}$$
(3.11)

while

$$\begin{aligned} {{\mathbb {P}}}\left( X_{v_i,v_j}=1\right) =\Theta \Big (\frac{n^{\alpha _i+\alpha _j}}{n^{\alpha _i+\alpha _j}+\mu n} \Big )(1+o(1))=1+o(1). \end{aligned}$$
(3.12)

Furthermore, when \(\alpha _i+\alpha _j=1\), \({{\mathbb {P}}}\left( X_{v_i,v_j}=0\right) =\Theta (1)\) and \({{\mathbb {P}}}\left( X_{v_i,v_j}=1\right) =\Theta (1)\). Combining this with Lemma 3.1 shows that we can write the probability that H occurs as an induced subgraph on \({\varvec{v}}=(v_1,\cdots ,v_k)\) as

$$\begin{aligned} \begin{aligned}&{{\mathbb {P}}}\left( \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}=H\right) = \Theta \bigg (\prod _{\{i,j\}\in {{{\mathcal {E}}}}_H:\alpha _{i}+\alpha _{j}<1}\!\!\!\!\!\! n^{\alpha _{i}+\alpha _{j}-1} \ \ \ \prod _{{\{u,v\}\notin {{{\mathcal {E}}}}_H:\alpha _u+\alpha _v>1}} {n^{1-\alpha _u-\alpha _v}}\bigg ). \end{aligned}\nonumber \\ \end{aligned}$$
(3.13)

Furthermore, by Assumption 1.1 the number of vertices with degrees in \([\varepsilon ,1/\varepsilon ](\mu n)^\alpha \) is \(\Theta (n^{(1-\tau )\alpha +1})\) for \(\alpha \le \frac{1}{\tau -1}\). Then, for \(M_n^{(\varvec{\alpha })}\) as in (2.3),

$$\begin{aligned} \# \text { sets of vertices with degrees in }M_n^{(\varvec{\alpha })}=\Theta ( n^{k+(1-\tau )\sum _i\alpha _i}). \end{aligned}$$
(3.14)

Thus,

$$\begin{aligned} N (H,M_n^{(\varvec{\alpha })}(\varepsilon ))=\Theta _{\scriptscriptstyle {{{\mathbb {P}}}}}\Big ( n^{k+(1-\tau )\sum _i\alpha _i} \ \ \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_H:\alpha _i+\alpha _j<1}} \ \ \ n^{\alpha _{i}+\alpha _j-1}\ \ \ \prod _{{\{u,v\}\notin {{{\mathcal {E}}}}_H:\alpha _u+\alpha _v>1}} \ \ n^{-\alpha _{u}-\alpha _v+1}\Big ).\nonumber \\ \end{aligned}$$
(3.15)

Maximizing the exponent yields

$$\begin{aligned} \begin{aligned}&\max _{\varvec{\alpha }} (1-\tau )\sum _{i}\alpha _i + \sum _{{\{i,j\}\in {{{\mathcal {E}}}}_H:\alpha _i+\alpha _j<1}} (\alpha _i+\alpha _j-1)\\&-\ \sum _{{\{u,v\}\notin {{{\mathcal {E}}}}_H:\alpha _u+\alpha _v>1}} \ \ \ (\alpha _u+\alpha _v-1) \end{aligned} \end{aligned}$$
(3.16)

The following lemma shows that this optimization problem attains its maximum for specific values of the exponents \(\alpha _i\):

Lemma 3.2

(Maximum contribution to subgraphs) Let H be a connected graph on k vertices. If the solution to (3.16) is unique, then the optimal solution satisfies \(\alpha _i\in \{0,\tfrac{\tau -2}{\tau -1},\tfrac{1}{2},\tfrac{1}{\tau -1}\}\) for all i. If it is not unique, then there exist at least 2 optimal solutions with \(\alpha _i\in \{0,\tfrac{\tau -2}{\tau -1},\tfrac{1}{2},\tfrac{1}{\tau -1}\}\) for all i. In any optimal solution \(\alpha _i=0\) if and only if vertex i has degree one in H.

The proof of this lemma follows a similar structure as the proof of [10, Lemma 4.2], and we therefore defer it to Appendix A. We now use the optimal structure of this optimization problem to prove Theorem 2.1(ii):

Proof of Theorem 2.1(ii)

Let \(\varvec{\alpha }\) be the unique optimizer of (3.16). By Lemma 3.2, the maximal value of (3.16) is attained by partitioning \(V_H\setminus V_1\) into the sets \(S_1,S_2,S_3\) such that vertices in \(S_1\) have \(\alpha _i=\tfrac{\tau -2}{\tau -1}\), vertices in \(S_2\) have \(\alpha _i =\tfrac{1}{\tau -1}\), vertices in \(S_3\) have \(\alpha _i=\tfrac{1}{2}\) and vertices in \(V_1\) have \(\alpha _i =0\). Then, the edges with \(\alpha _i+\alpha _j <1\) are edges inside \(S_1\), edges between \(S_1\) and \(S_3\) and edges from degree 1 vertices. Furthermore, non-edges with \(\alpha _i+\alpha _j>1\) are edges inside \(S_2\) (of which there are \(\frac{1}{2} |S_2|(|S_2|-1)-E_{S_2}\)) or edges between \(S_2\) and \(S_3\) (of which there are \(|S_2||S_3|-E_{S_2,S_3}\)). Recall that the number of edges inside \(S_1\) is denoted by \(E_{S_1}\), the number of edges between \(S_1\) and \(S_3\) by \(E_{S_1,S_3}\) and the number of edges between \(V_1\) and \(S_i\) by \(E_{S_1,V_1}\). Then we can rewrite (3.16) as

$$\begin{aligned} \begin{aligned} \max _{{{\mathcal {P}}}} \&\Big [(1-\tau ) \left( \frac{\tau -2}{\tau -1}\left| S_1\right| +\frac{1}{\tau -1}\left| S_2\right| +\tfrac{1}{2} \left| S_3\right| \right) +\frac{\tau -3}{\tau -1}E_{S_1}\\&\qquad +\frac{\tau -3}{2(\tau -1)}E_{S_1,S_3} -\frac{E_{S_1,V_1}}{\tau -1}-\frac{\tau -2}{\tau -1}E_{S_2,V_1}-\frac{1}{2} E_{S_3,V_1}\\&\qquad +\left( \frac{1}{2} |S_2|(|S_2|-1)-E_{S_2}\right) \frac{\tau -3}{\tau -1}+(|S_2||S_3|-E_{S_2,S_3})\frac{\tau -3}{2(\tau -1)}\Big ], \end{aligned} \end{aligned}$$
(3.17)

over all partitions \({{\mathcal {P}}}=(S_1,S_2,S_3)\) of \(V_H\setminus V_1\). Using that \(|S_3|=k-\left| S_1\right| -\left| S_2\right| -k_1\) and \({E_{S_3,V_1}=k_1-E_{S_1,V_1}-E_{S_2,V_1}}\), where \(k_1=\left| V_1\right| \) and extracting a factor \((3-\tau )/2\) shows that this is equivalent to

$$\begin{aligned} \begin{aligned}&\frac{1-\tau }{2}k+ \max _{{{\mathcal {P}}}} \ \frac{(3-\tau )}{2}\Big ( \left| S_1\right| +\frac{1}{\tau -1}\left| S_2\right| (2-\tau -k+|S_1|+k_1)+\frac{\tau -2}{3-\tau } k_1\\&\qquad \qquad \qquad -\frac{2E_{S_1}-2E_{S_2}+E_{S_1,S_3}-E_{S_2,S_3}}{\tau -1} -\frac{E_{S_1,V_1}-E_{S_2,V_1}}{\tau -1}\Big ). \end{aligned} \end{aligned}$$
(3.18)

Since k and \(k_1\) are fixed and \(3-\tau >0\), we need to maximize

$$\begin{aligned} B(H)&=\max _{{{\mathcal {P}}}}\Big [\left| S_1\right| +\frac{1}{\tau -1}\left| S_2\right| (2-\tau -k+|S_1|+k_1) \nonumber \\&\quad -\frac{2E_{S_1}-2E_{S_2}+E_{S_1,S_3}-E_{S_2,S_3}+E_{S_1,V_1}-E_{S_2,V_1}}{\tau -1}\Big ], \end{aligned}$$
(3.19)

which equals (2.2).

By (3.15), the maximal value of \( N (H,M_n^{(\varvec{\alpha })}(\varepsilon ))\) then scales as

$$\begin{aligned} n^{\frac{3-\tau }{2}(k+B (H))+\frac{\tau -2}{2}k_1}=n^{\frac{3-\tau }{2}(k_{2+}+B (H))+{k_1/2}} , \end{aligned}$$
(3.20)

which proves Theorem 2.1(ii). \(\square \)

4 Proof of Theorem 2.2

To prove Theorem 2.2, we need to obtain more detailed asymptotics for the subgraph with \(S_3=V_H\) than the one provided in Theorem 2.1. We will use a second moment method to prove the convergence in probability. We investigate the expected number of copies of H in Lemma 4.2, and show that the variance is small in Lemma 4.3.

In this section, we will prove Lemma 4.1 that is given below, from which we prove Theorem 2.2. For that, we define the special case of \(M_n^{\scriptscriptstyle {(\varvec{\alpha })}}(\varepsilon )\) of (2.3) where \(\alpha _i=\tfrac{1}{2}\) for all \(i\in V_H=[k]\) as

$$\begin{aligned} W_n^k(\varepsilon )=\{(v_1,\ldots ,v_k):d_{{v_s}}\in [\varepsilon ,1/\varepsilon ]\sqrt{\mu n} \quad \forall s \in [k]\}, \end{aligned}$$
(4.1)

and let \({\bar{W}}_n^k(\varepsilon )\) denote the complement of \(W_n^k(\varepsilon )\). We denote the number of subgraphs H with all vertices in \(W_n^k(\varepsilon )\) by \(N (H,W_n^k(\varepsilon ))\).

Lemma 4.1

(Major contribution to subgraphs) Let H be a connected graph on \(k{\ge 3}\) vertices such that (2.2) is uniquely optimized at \(S_3=[k]\), so that \(B (H)=0\). Then,

  1. (i)

    the number of subgraphs with vertices in \(W_n^k(\varepsilon )\) satisfies

    $$\begin{aligned} \frac{N (H,W_n^k(\varepsilon ))}{n^{\frac{k}{2}(3-\tau )}} \rightarrow&(C(\tau -1))^k\mu ^{-\frac{k}{2}(\tau -1)} \int _{\varepsilon }^{1/\varepsilon }\!\!\cdots \int _{\varepsilon }^{1/\varepsilon }(x_1\cdots x_k)^{-\tau }\nonumber \\&\times \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_H}}\frac{x_ix_j}{1+x_ix_j} \ \ \prod _{{\{u,v\}\notin {{{\mathcal {E}}}}_H}}\frac{1}{1+x_ux_v}\mathrm{d}x_1\cdots \mathrm{d}x_k . \end{aligned}$$
    (4.2)
  2. (ii)

    A(H) defined in (2.9) satisfies \(A (H)<\infty \).

We now prove Theorem 2.2 using this lemma.

Proof of Theorem 2.2

We first study the expected number of induced subgraphs with vertices outside \(W_n^k(\varepsilon )\) and show that their contribution to the total number of copies of H is small. First, we investigate the expected number of copies of H in the case where vertex 1 of the subgraph has degree smaller than \(\varepsilon \sqrt{\mu n}\). By Lemma 3.1, the probability that H is present on a specified subset of vertices \({\varvec{v}}=(v_1,\ldots ,v_k)\) satisfies

$$\begin{aligned} {{\mathbb {P}}}\left( \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}= {{{\mathcal {E}}}}_H\right)&=\Theta \Big ( \prod _{\{i,j\}\in {{{\mathcal {E}}}}_H}\frac{d_{v_i}d_{v_j}}{L_n+d_{v_i}d_{v_j}} \prod _{\{u,w\}\notin {{{\mathcal {E}}}}_H}\frac{L_n}{L_n+d_{v_u}d_{v_w}}\Big ) \end{aligned}$$
(4.3)

Furthermore, by (1.3), there exists \(C_0\) such that \({{\mathbb {P}}}\left( D=k\right) \le C_0k^{-\tau }\) for all k, where D denotes the degree of a uniformly chosen vertex. Let \(I(H,{\varvec{v}})=\mathbb {1}{\left\{ \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}= {{{\mathcal {E}}}}_H\right\} },\) so that \(N(H)=\sum _{{\varvec{v}}} I(H,{\varvec{v}})\).

Define

$$\begin{aligned} h_n(x_1,\ldots ,x_k)=\prod _{{\{i,j\}\in {{{\mathcal {E}}}}_H}} \ \frac{x_ix_j}{\mu n+x_ix_j} \ \ \prod _{{\{s,t\}\notin {{{\mathcal {E}}}}_H}} \ \frac{\mu n}{\mu n+x_sx_t}. \end{aligned}$$
(4.4)

We can use similar methods as in [5, Eq. (4.4)] to show that for some \(K^*>0\),

$$\begin{aligned}&\sum _{{\varvec{v}}}{{\mathbb {E}}}\left[ I(H,{\varvec{v}})\mathbb {1}{\left\{ d_{v_1}<\varepsilon \sqrt{\mu n}\right\} }\right] \nonumber \\&\quad = n^k\int _{1}^{\varepsilon \sqrt{\mu n}}\int _{1}^{\infty }\cdots \int _{1}^{\infty }h_n(x_1,x_2,\ldots ,x_n) \mathrm{d}F_n(x_k)\cdots \mathrm{d}F_n(x_1) \nonumber \\&\quad \le n^k K^* \int _{1}^{\varepsilon \sqrt{\mu n}}\int _{1}^{\infty }\cdots \int _{1}^{\infty }(x_2\cdots x_k)^{-\tau } h_n(x_1,x_2,\ldots ,x_n) \mathrm{d}x_k \dots \mathrm{d}x_2 \mathrm{d}F(x_1). \end{aligned}$$
(4.5)

For all non-decreasing g that are bounded on \([0,\varepsilon \sqrt{\mu n}]\) and once differentiable, where \({\bar{G}}(x)\) denotes a function such that \(\int _0^x{\bar{G}}(y)\mathrm{d}y=g(x)\)

$$\begin{aligned}&\int _{0}^{\varepsilon \sqrt{\mu n}}g(x)\mathrm{d}F_n(x)= \int _{0}^{\varepsilon \sqrt{\mu n}}\int _0^{x}{\bar{G}}(y)\mathrm{d}y\mathrm{d}F_n(x)\nonumber \\&\quad = \int _{0}^{\varepsilon \sqrt{\mu n}}(F_n(\varepsilon \sqrt{\mu n})-F_n(y)){\bar{G}}(y)\mathrm{d}y\nonumber \\&\quad = C\left( \int _{0}^{\varepsilon \sqrt{\mu n}}y^{1-\tau }{\bar{G}}(y)\mathrm{d}y-\int _{0}^{\varepsilon \sqrt{\mu n}}(\varepsilon \sqrt{\mu n})^{1-\tau }{\bar{G}}(y)\mathrm{d}y\right) (1+o(1))\nonumber \\&\quad =C\left( (\tau -1) \int _{0}^{\varepsilon \sqrt{\mu n}}y^{-\tau }g(y)\mathrm{d}y +\big [cg(y)y^{1-\tau }\big ]_0^{\varepsilon \sqrt{\mu n}}-(\varepsilon \sqrt{\mu n})^{1-\tau }g(\varepsilon \sqrt{\mu n})\right) (1+o(1))\nonumber \\&\quad =C(\tau -1) \int _{0}^{\varepsilon \sqrt{\mu n}}y^{-\tau }g(y)\mathrm{d}y +o((\varepsilon \sqrt{\mu n})^{1-\tau }g(\varepsilon \sqrt{\mu n})), \end{aligned}$$
(4.6)

where we have used Assumption 1.1(ii). Taking

$$\begin{aligned} g(x)=g_n(x)= \int _{1}^{\infty }\cdots \int _{1}^{\infty }(x_2\cdots x_k)^{-\tau } h_n(x,x_2,\ldots ,x_n) \mathrm{d}x_2\cdots \mathrm{d}x_k \end{aligned}$$
(4.7)

yields for (4.5)

$$\begin{aligned}&\sum _{{\varvec{v}}}{{\mathbb {E}}}\left[ I(H,{\varvec{v}})\mathbb {1}{\left\{ d_{v_1}<\varepsilon \sqrt{\mu n}\right\} }\right] \nonumber \\&\quad \le n^k K^* \int _{1}^{\varepsilon \sqrt{\mu n}}\int _{1}^{\infty }\cdots \int _{1}^{\infty }(x_1\cdots x_k)^{-\tau } h_n(x_1,x_2,\ldots ,x_n) \mathrm{d}x_1\cdots \mathrm{d}x_k \nonumber \\&\qquad + o\left( n^k (\varepsilon \sqrt{\mu n})^{1-\tau } \int _{1}^{\infty }\cdots \int _{1}^{\infty }(x_2\cdots x_k)^{-\tau } h_n(\varepsilon \sqrt{\mu n},x_2,\ldots ,x_n) \mathrm{d}x_2\cdots \mathrm{d}x_k \right) . \end{aligned}$$
(4.8)

Now we can bound the first term of (4.8) as

$$\begin{aligned} \begin{aligned}&n^k\int _{1}^{\varepsilon \sqrt{\mu n}}\int _{1}^{\infty }\cdots \int _{1}^{\infty }(x_1\cdots x_k)^{-\tau } \ \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_H}} \ \frac{x_ix_j}{\mu n+x_ix_j} \ \ \prod _{{\{u,w\}\notin {{{\mathcal {E}}}}_H}} \ \frac{\mu n}{\mu n+x_ux_w}\mathrm{d}x_1\cdots \mathrm{d}x_k\\&\quad =n^k(\mu n)^{\frac{k}{2}(1-\tau )} \int _{0}^{\varepsilon }\int _{0}^{\infty }\cdots \int _{0}^{\infty }(t_1\cdots t_k)^{-\tau } \ \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_H}} \ \frac{t_it_j}{1+t_it_j} \ \ \prod _{{\{u,w\}\notin {{{\mathcal {E}}}}_H}} \ \frac{1}{1+t_ut_w}\mathrm{d}t_1\cdots \mathrm{d}t_k\\&\quad = O\left( n^{\frac{k}{2}(3-\tau )}\right) h_1(\varepsilon ), \end{aligned}\nonumber \\ \end{aligned}$$
(4.9)

where \(h_1(\varepsilon )\) is a function of \(\varepsilon \). By Lemma 4.1(ii), \(h_1(\varepsilon )\rightarrow 0\) as \(\varepsilon \searrow 0\).

For the second term in (4.8), we obtain

$$\begin{aligned}&o\Big (n^k (\varepsilon \sqrt{\mu n})^{1-\tau } \int _{1}^{\infty }\cdots \int _{1}^{\infty }(x_2\cdots x_k)^{-\tau } g_n(\varepsilon \sqrt{\mu n},x_2,\ldots ,x_n) \mathrm{d}x_2\cdots \mathrm{d}x_k \Big ) \nonumber \\&\quad = o\Big (n^k (\mu n)^{\frac{k}{2}(1-\tau )}\varepsilon ^{1-\tau } \int _{0}^{\infty }\cdots \int _{0}^{\infty }(t_2\cdots t_k)^{-\tau } h(\varepsilon ,t_2,\ldots ,t_n) \mathrm{d}t_2\cdots \mathrm{d}t_k\Big )\nonumber \\&\quad = o\left( n^{\frac{k}{2}(3-\tau )}\right) h_2(\varepsilon ), \end{aligned}$$
(4.10)

where

$$\begin{aligned} h(t_1,\ldots ,t_k)=\prod _{{\{i,j\}\in {{{\mathcal {E}}}}_H}} \ \frac{t_it_j}{1+t_it_j} \ \ \prod _{{\{u,w\}\notin {{{\mathcal {E}}}}_H}} \ \frac{1}{1+t_ut_w}, \end{aligned}$$
(4.11)

and \(h_2(\varepsilon )\) is a function of \(\varepsilon \).

We can bound the situation where another vertex has degree smaller than \(\varepsilon \sqrt{n}\), or where one of the vertices has degree larger than \(\sqrt{n}/\varepsilon \), similarly. This yields

$$\begin{aligned} {{\mathbb {E}}}\left[ N(H,{\bar{W}}_n^k(\varepsilon ))\right] = O\left( n^{\frac{k}{2}(3-\tau )}\right) h(\varepsilon ) + o\Big (n^{\frac{k}{2}(3-\tau )}\Big ){\tilde{h}}(\varepsilon ) , \end{aligned}$$
(4.12)

for some function \(h(\varepsilon )\) not depending on n such that \(h(\varepsilon )\rightarrow 0\) when \(\varepsilon \searrow 0\) and some function \({\tilde{h}}(\varepsilon )\) not depending on n. By the Markov inequality,

$$\begin{aligned} \begin{aligned} N(H,{\bar{W}}_n^k(\varepsilon ))=h(\varepsilon )O_{\scriptscriptstyle {{{\mathbb {P}}}}}\left( n^{\frac{k}{2}(3-\tau )}\right) . \end{aligned} \end{aligned}$$
(4.13)

Thus, for any \(\delta >0\),

$$\begin{aligned} \limsup _{\varepsilon \rightarrow 0}\limsup _{n\rightarrow \infty } {{\mathbb {P}}}\left( \frac{N(H,{\bar{W}}_n^k(\varepsilon ))}{n^{k(3-\tau )/2}}>\delta \right) =0. \end{aligned}$$
(4.14)

Combining this with Lemma 4.1(i) gives

$$\begin{aligned} \frac{N(H)}{n^{\frac{k}{2}(3-\tau )}}{\mathop {\longrightarrow }\limits ^{\scriptscriptstyle {{{\mathbb {P}}}}}}&c^k\mu ^{-\frac{k}{2}(\tau -1)}\! \int _{0}^{\infty }\! \cdots \! \int _{0}^{\infty }(x_1,\cdots x_k)^{-\tau }\prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{H}}} \ \frac{x_ix_j}{1+x_ix_j} \nonumber \\&\quad \times \prod _{{\{u,w\}\notin {{{\mathcal {E}}}}_{H}}} \ \frac{1}{1+x_ux_w} \mathrm{d}x_1\cdots \mathrm{d}x_k. \end{aligned}$$
(4.15)

\(\square \)

4.1 Conditional Expectation

We will prove Lemma 4.1 using a second moment method. Thus, we will first investigate the expected number of copies of induced subgraph H in \(\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\), and then bound its variance. Let H be a subgraph on k vertices, labeled as [k], and m edges, denoted by \(e_1={\{i_1,j_1\},\ldots ,e_m=\{i_m,j_m\}}\).

Lemma 4.2

(Convergence of conditional expectation of \(\sqrt{n}\) subgraphs) Let H be a subgraph such that (2.2) has a unique maximizer, and the maximum is attained at 0. Then,

$$\begin{aligned} \frac{{{\mathbb {E}}}\left[ N (H,W_n^k(\varepsilon ))\right] }{n^{\frac{k}{2}(3-\tau )}}&\rightarrow (C(\tau -1))^k\mu ^{-\frac{k}{2}(\tau -1)}\int _{\varepsilon }^{1/\varepsilon }\!\!\cdots \int _{\varepsilon }^{1/\varepsilon }(x_1\cdots x_k)^{-\tau }\nonumber \\&\times \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{H}}} \ \frac{x_ix_j}{1+x_ix_j} \ \ \ \ \prod _{{\{u,v\}\notin {{{\mathcal {E}}}}_{H}}} \ \frac{1}{1+x_ux_v}\mathrm{d}x_1\cdots \mathrm{d}x_k . \end{aligned}$$
(4.16)

Proof

We denote

$$\begin{aligned} h(d_1,\ldots ,d_k)=\prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{H}}}\frac{d_id_j}{\mu n+d_id_j}\prod _{{\{u,v\}\notin {{{\mathcal {E}}}}_{H}}}\frac{{\mu n}}{\mu n+d_ud_v}. \end{aligned}$$
(4.17)

As

$$\begin{aligned} {{\mathbb {E}}}\left[ N (H,W_n^k(\varepsilon ))\right] =\sum _{(v_1,\ldots ,v_k)\in W_n^k(\varepsilon )}{{\mathbb {P}}}\left( \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}={{\mathcal {E}}}_H\right) , \end{aligned}$$
(4.18)

and \(d_{v_i}\ge \varepsilon \sqrt{n}\) for \(i\in [k]\), we get from Lemma 3.1

$$\begin{aligned} {{\mathbb {E}}}\left[ N (H,W_n^k(\varepsilon ))\right]&=\sum _{(v_1,\ldots ,v_k)\in W_n^k(\varepsilon )}\prod _{\{i,j\}\in {{{\mathcal {E}}}}_H}\frac{d_{v_i}d_{v_j}}{L_n+d_{v_i}d_{v_j}} \ \ \ \ \prod _{{\{s,t\}\notin {{{\mathcal {E}}}}_H}} \ \frac{{L_n}}{L_n+d_{v_s}d_{v_t}}(1+o(1))\nonumber \\&=(1+o(1))\sum _{1\le i_1<i_2<\cdots <i_k\le n}h(d_{i_1},\ldots ,d_{i_k})\mathbb {1}{\left\{ i_1,i_2,\ldots ,i_k\in W_n^k(\varepsilon )\right\} }. \end{aligned}$$
(4.19)

We then define the measure

$$\begin{aligned} M^{\scriptscriptstyle {(}n)}([a,b])=\mu ^{(\tau -1)/2}n^{(\tau -3)/2}\sum _{i\in [n]}\mathbb {1}{\left\{ d_i\in [a,b]\sqrt{\mu n}\right\} }. \end{aligned}$$
(4.20)

By [5, Eq. (4.19)]

$$\begin{aligned} M^{\scriptscriptstyle {(}n)}([a,b])\rightarrow C\times (\tau -1)\int _{a}^{b}t^{-\tau }\mathrm{d}t =:\lambda ([a,b]). \end{aligned}$$
(4.21)

Then,

$$\begin{aligned}&\frac{\sum _{1\le i_1<i_2<\cdots <i_k\le n}h(d_{i_1},\ldots ,d_{i_k})\mathbb {1}{\left\{ i_1,i_2,\ldots ,i_k\in W_n^k(\varepsilon )\right\} }}{n^{\frac{k}{2}(3-\tau )}\mu ^{-\frac{k}{2}(\tau -1)}}\nonumber \\&\quad =\frac{1}{k!}\int _{\varepsilon }^{1/\varepsilon }\cdots \int _{\varepsilon }^{1/\varepsilon } h(t_1,\ldots ,t_k)\mathrm{d}M^{\scriptscriptstyle {(}n)}(t_1)\cdots \mathrm{d}M^{\scriptscriptstyle {(}n)}(t_k). \end{aligned}$$
(4.22)

Because the function \(h(t_1,\ldots ,t_k)\) is a bounded, continuous function on \([\varepsilon ,1/\varepsilon ]^k\),

$$\begin{aligned} \begin{aligned}&\frac{\sum _{1\le i_1<i_2<\cdots <i_k\le n}h(d_1,\ldots ,d_k)\mathbb {1}{\left\{ i_1,i_2,\ldots ,i_k\in W_n^k(\varepsilon )\right\} }}{n^{\frac{k}{2}(3-\tau )}\mu ^{-\frac{k}{2}(\tau -1)}}\\&\quad \rightarrow \frac{1}{k!}\int _{\varepsilon }^{1/\varepsilon }\cdots \int _{\varepsilon }^{1/\varepsilon } h(t_1,\ldots ,t_k)\mathrm{d}\lambda (t_1)\cdots \mathrm{d}\lambda (t_k)\\&\quad =\frac{(C(\tau -1))^3}{k!}\int _{\varepsilon }^{1/\varepsilon }\cdots \int _{\varepsilon }^{1/\varepsilon }(x_1\cdots x_k)^{-\tau }\prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{H}}}\frac{x_ix_j}{1+x_ix_j} \ \ \ \ \prod _{{\{u,v\}\notin {{{\mathcal {E}}}}_{H}}}\frac{1}{1+x_ux_v}\mathrm{d}x_1 \cdots \mathrm{d}x_k. \end{aligned}\nonumber \\ \end{aligned}$$
(4.23)

Combining this with (4.19) proves the lemma. \(\square \)

4.2 Variance of the Number of Induced Subgraphs

We now study the variance of the number of induced subgraphs. The following lemma shows that the variance of the number of subgraphs is small compared to its expectation:

Lemma 4.3

(Conditional variance for subgraphs) Let H be a subgraph such that (2.2) has a unique maximum attained at 0. Then,

$$\begin{aligned} \frac{Var \left( N (H,W_n^k(\varepsilon ))\right) }{{{\mathbb {E}}}\left[ N (H,W_n^k(\varepsilon ))\right] ^2}\rightarrow 0. \end{aligned}$$
(4.24)

Proof

By Lemma 4.2,

$$\begin{aligned} {{\mathbb {E}}}\left[ N (H,W_n^k(\varepsilon ))\right] ^2=\Theta (n^{(3-\tau )k}), \end{aligned}$$
(4.25)

Thus, we need to prove that the variance is small compared to \(n^{(3-\tau )k}\). Denote \({\varvec{v}}=(v_1,\ldots ,v_k)\) and \({{\varvec{u}}}=(u_1,\ldots ,u_k)\) and, for ease of notation, we denote \(G=\mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})\). We write the variance as

$$\begin{aligned} Var \left( N (H,W_n^k(\varepsilon ))\right)&= \sum _{{\varvec{v}}\in W_n^k(\varepsilon )}\sum _{{\varvec{u}}\in W_n^k(\varepsilon )} \Big ({{\mathbb {P}}}\left( G|_{{\varvec{v}}}= {{{\mathcal {E}}}}_H,G|_{{\varvec{u}}}= {{{\mathcal {E}}}}_H\right) \nonumber \\&\quad \quad -{{\mathbb {P}}}\left( G|_{{\varvec{v}}}={{{\mathcal {E}}}}_H\right) {{\mathbb {P}}}\left( G|_{{\varvec{u}}}= {{{\mathcal {E}}}}_H\right) \Big ). \end{aligned}$$
(4.26)

This splits into various cases, depending on the overlap of \({\varvec{v}}\) and \({\varvec{u}}\). When \({\varvec{v}}\) and \({\varvec{u}}\) do not overlap,

$$\begin{aligned} \begin{aligned}&\sum _{{\varvec{v}}\in W_n^k(\varepsilon )}\sum _{{\varvec{u}}\in W_n^k(\varepsilon )}\big ({{\mathbb {P}}}\left( G|_{{\varvec{v}}}= {{{\mathcal {E}}}}_H,G|_{{\varvec{u}}}= {{{\mathcal {E}}}}_H\right) -{{\mathbb {P}}}\left( G|_{{\varvec{v}}}= {{{\mathcal {E}}}}_H\right) {{\mathbb {P}}}\left( G|_{{\varvec{u}}}= {{{\mathcal {E}}}}_H\right) \big )\\&\quad = \sum _{{\varvec{v}}\in W_n^k(\varepsilon )}\sum _{{\varvec{u}}\in W_n^k(\varepsilon )}\big ({{\mathbb {P}}}\left( G|_{{\varvec{v}}}= {{{\mathcal {E}}}}_H\right) {{\mathbb {P}}}\left( G|_{{\varvec{u}}}={{{\mathcal {E}}}}_H\right) (1+o(1)) \\&\qquad -{{\mathbb {P}}}\left( G|_{{\varvec{v}}}= {{{\mathcal {E}}}}_H\right) {{\mathbb {P}}}\left( G|_{{\varvec{u}}}= {{{\mathcal {E}}}}_H\right) \big )\\&\quad = {{\mathbb {E}}}\left[ N (H,W_n^k(\varepsilon ))\right] ^2o(1), \end{aligned} \end{aligned}$$
(4.27)

by Lemma 3.1.

The other contributions are when \({\varvec{v}}\) and \({\varvec{u}}\) overlap. In this situation, we bound the probability that induced subgraph H is present on a specified set of vertices by 1. When \({\varvec{v}}\) and \({\varvec{u}}\) overlap on \(s\ge 1\) vertices, we bound the contribution to (4.26) as

$$\begin{aligned} \begin{aligned} \sum _{{{\varvec{v}},{\varvec{u}}\in W_n^k(\varepsilon ):\left| {\varvec{v}}\cup {\varvec{u}}\right| =2k-s}} \ \ {{\mathbb {P}}}\left( G|_{{\varvec{v}}}={{{\mathcal {E}}}}_H,G|_{{\varvec{u}}}= {{{\mathcal {E}}}}_H\right)&\le \left| \{i:d_i\in \sqrt{\mu n}[\varepsilon ,1/\varepsilon ]\}\right| ^{2k-s}\\&=O\left( n^\frac{(3-\tau )(2k-s)}{2}\right) , \end{aligned} \end{aligned}$$
(4.28)

by Assumption (i). This is \(o(n^{(3-\tau )k})\) for \(\tau \in (2,3)\), as required. \(\square \)

Proof of Lemma 4.1

We start by proving part (i). By Lemma 4.3 and Chebyshev’s inequality,

$$\begin{aligned} N (H,W_n^k(\varepsilon ))={{\mathbb {E}}}\left[ N (H,W_n^k(\varepsilon )\right] (1+o_{\scriptscriptstyle {{{\mathbb {P}}}}}(1)). \end{aligned}$$
(4.29)

Combining this with Lemma 4.2 proves Lemma 4.1(i). Lemma 4.1(ii) is follows from Lemma 5.1 in the next section, when \(|S_3^*|=k\). \(\square \)

5 Major Contribution to General Subgraphs: Proof of Theorem 2.1(i)

In this section, we prove Theorem 2.1(i), which shows that asymptotically, all induced subgraphs H have vertices in \(M_n^{(\varvec{\alpha })}(\varepsilon )\). We will show that the expected number of copies of H with degrees outside of \(M_n^{(\varvec{\alpha })}(\varepsilon )\) is small. To do so, we will need to use the finiteness of two integrals defined in Lemmas 5.2 and 5.2 .

We first introduce some further notation. As before, we denote the degree of a vertex i inside its subgraph H by \(d^{\scriptscriptstyle {(H)}}_{i}\). Furthermore, for any \(W\subseteq V_H\), we denote by \(d^{\scriptscriptstyle {(H)}}_{i,W}\) the number of edges from vertex i to vertices in W. Let H be a connected subgraph, such that the optimum of (2.2) is unique, and let \({{{\mathcal {P}}}}=(S_1^*,S_2^*,S_3^*)\) be the optimal partition. Define

$$\begin{aligned} \zeta _i= {\left\{ \begin{array}{ll} 1 &{} \text {if }d^{\scriptscriptstyle {(H)}}_i=1,\\ d^{\scriptscriptstyle {(H)}}_{i,S_1^*}+d^{\scriptscriptstyle {(H)}}_{i,S_3^*}+d^{\scriptscriptstyle {(H)}}_{i,V_1} &{} \text {if }i\in S_1^*,\\ d^{\scriptscriptstyle {(H)}}_{i,V_1} +d^{\scriptscriptstyle {(H)}}_{i,S_1^*}+d^{\scriptscriptstyle {(H)}}_{i,S_2^*}-|S_2^*|-|S_3^*|+1&{} \text {if }i\in S_2^*,\\ d^{\scriptscriptstyle {(H)}}_{i,S_1^*}+d^{\scriptscriptstyle {(H)}}_{i,V_1} +d^{\scriptscriptstyle {(H)}}_{i,S_2^*}-|S_2^*|&{} \text {if }i\in S_3^*. \end{array}\right. } \end{aligned}$$
(5.1)

We now provide two lemmas that show that two integrals related to the solution of the optimization problem (2.2) are finite. These integrals are the key ingredient in proving Theorem 2.1(i).

Lemma 5.1

(Induced subgraph integrals over \(S_3^*\)) Suppose that the maximum in (3.16) is uniquely attained by \({{{\mathcal {P}}}}=(S_1^*,S_2^*,S_3^*)\) with \(|S_3^*|=s>0\), and say that \(S_3^*=[s]\). Then

$$\begin{aligned} \int _{0}^{\infty }\cdots \int _{0}^\infty \prod _{i \in [s]}x_i^{-\tau +\zeta _i}\prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{S_3^*}}}\frac{x_ix_j}{1+x_ix_j} \ \ \prod _{{\{u,w\}\notin {{{\mathcal {E}}}}_{S_3^*}}}\frac{1}{1+x_ux_w}\mathrm{d}x_s\cdots \mathrm{d}x_1<\infty . \end{aligned}$$
(5.2)

Lemma 5.2

(Induced subgraph integrals over \(S_1^*\cup S_2^*\)) Suppose the optimal solution to (3.16) is unique, and attained by \({{{\mathcal {P}}}}=(S_1^*,S_2^*,S_3^*)\). Say that \(S_2^*=[t_2]\) and \(S_1^*=[t_2+t_1]\setminus [t_2]\). Then, for every \(a>0\),

$$\begin{aligned} \begin{aligned} \int _{0}^{a}\cdots \int _0^a\int _0^\infty \cdots \int _0^\infty&\prod _{{j\in [t_1+t_2]}}x_j^{-\tau +\zeta _j} \ \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{S_1^*,S_2^*}}}\frac{x_ix_j}{1+x_ix_j} \\&\times \prod _{{\{i,j\}\notin {{{\mathcal {E}}}}_{S_1^*,S_2^*}}}\frac{1}{1+x_ix_j}\mathrm{d}x_{t_1+t_2}\cdots \mathrm{d}x_1<\infty . \end{aligned} \end{aligned}$$
(5.3)

The proofs of Lemma 5.1 and 5.2 are similar to the proofs of [10, Lemmas 7.2 and 7.3] and are therefore deferred to Appendix B.

Proof of Theorem 2.1(i)

Note that \(d_{\max }\le M n^{1/(\tau -1)}\) by Assumption (i). Define

$$\begin{aligned} \gamma _i^u(n)={\left\{ \begin{array}{ll} Mn^{1/(\tau -1)}&{} \text {if }i\in S_2^*,\\ n^{\alpha _i}/\varepsilon _n &{} \text {else,} \end{array}\right. } \end{aligned}$$
(5.4)

with \(\alpha _i\) as in (2.4), and denote

$$\begin{aligned} \gamma _i^l(n)={\left\{ \begin{array}{ll} 1&{} \text {if }i\in V_1,\\ \varepsilon _n n^{\alpha _i}&{} \text {else.} \end{array}\right. } \end{aligned}$$
(5.5)

We then show that the expected number of subgraphs of G isomorphic to H where the degree of at least one vertex i satisfies \(d_i\notin [\gamma _i^l(n),\gamma _i^u(n)]\) is small, similarly to the proof of Theorem 2.2 in Sect. 4.

We first study the expected number of copies of H where the first vertex has degree \(d_{v_1}\in [1,\gamma _1^l(n))\) and all other vertices satisfy \(d_{v_i}\in [\gamma _i^l(n),\gamma _i^u(n)]\), by integrating the probability that induced subgraph H is formed over the range where vertex \(v_1\) has degree \(d_{v_1}\in [1,\gamma _1^l(n))\) and all other vertices satisfy \(d_{v_i}\in [\gamma _i^l(n),\gamma _i^u(n)]\). Using Lemma 3.1, and that the degree distribution can be bounded as \({{\mathbb {P}}}\left( D=k\right) \le M_2k^{-\tau }\) for some \(M_2>0\) by Assumption (i), we bound the expected number of such copies of H by

$$\begin{aligned} \begin{aligned}&\sum _{{\varvec{v}}}{{\mathbb {E}}}\left[ I(H, {\varvec{v}})\mathbb {1}{\left\{ d_{v_1}<\gamma ^l_1(n),d_{v_i}\in [\gamma _i^l(n),\gamma _i^u(n)] \ \forall i>1\right\} }\right] \le Kn^k\int _{1}^{\gamma _1^l(n)}\int _{\gamma _2^l(n)}^{\gamma _2^u(n)}\cdots \\&\quad \int _{\gamma _k^l(n)}^{\gamma _k^u(n)}(x_1\cdots x_k)^{-\tau } \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_H}} \ \ \frac{x_ix_j}{L_n+x_ix_j} \ \ \ \prod _{{\{u,w\}\notin {{{\mathcal {E}}}}_H}} \ \frac{L_n}{L_n+x_ux_w}\mathrm{d}x_k\cdots \mathrm{d}x_1, \end{aligned}\nonumber \\ \end{aligned}$$
(5.6)

for some \(K>0\), and where we recall that \(I(H, {\varvec{v}})=\mathbb {1}{\left\{ \mathrm{URG}^{\scriptscriptstyle {(n)}}({\varvec{d}})|_{{\varvec{v}}}=H\right\} }\). This integral equals zero when vertex 1 is in \(V_1\), since then \([1,\gamma _1^l(n))=\varnothing \). Suppose that vertex 1 is in \(S_2^*\). W.l.o.g. assume that \(S_2^*={[t_2]}\), \(S_1^*={[t_1+t_2]\setminus [t_2]}\) and \(S_3^*={[t_1+t_2+t_3]\setminus [t_1+t_2]}\). We bound \(x_ix_j/(L_n+x_ix_j)\) by

  1. (a)

    \(x_ix_j/L_n\) for \(i,j\in S_1^*\);

  2. (b)

    \(x_ix_j/L_n\) for i or j in \(V_1\);

  3. (c)

    \(x_ix_j/L_n\) for \(i\in S_1^*\), \(j\in S_3^*\) or vice versa; and

  4. (d)

    1 for \(i,j\in S_2^*\) and \(i\in S_2^*\), \(j\in S_3^*\) or vice versa.

Similarly, we bound \(L_n/(L_n+x_ix_j)\) by

  1. (a)

    1 for \(i,j\in S_1^*\);

  2. (b)

    1 for i or j in \(V_1\);

  3. (c)

    1 for \(i\in S_1^*\), \(j\in S_3^*\) or vice versa; and

  4. (d)

    \(L_n/(x_ix_j)\) for \(i,j\in S_2^*\) and \(i\in S_2^*\), \(j\in S_3^*\) or vice versa.

Combining these bounds with the change of variables \(y_i=x_i/n^{\alpha _i}\) yields for (5.6), for some \({\tilde{K}}>0\), in the bound

$$\begin{aligned}&{\sum _{{\varvec{v}}}{{\mathbb {E}}}\left[ I(H, {\varvec{v}})\mathbb {1}{\left\{ d_{v_1}<\gamma ^l_1(n),d_{v_i}\in [\gamma _i^l(n),\gamma _i^u(n)] \ \forall i>1\right\} }\right] }\nonumber \\&\quad \le {\tilde{K}}n^k n^{|S_1^*|(2-\tau )+|S_3^*|(1-\tau )/2-|S_2^*|}n^{\frac{\tau -3}{\tau -1}E_{S_1^*}+\frac{\tau -3}{2(\tau -1)}E_{S_1^*,S_3^*}-\frac{1}{\tau -1}E_{S_1^*,V_1}-\frac{1}{2}E_{S_3^*,V_1}-\frac{\tau -2}{\tau -1}E_{S_2^*,V_1}}\nonumber \\&\qquad \times n^{\left( \frac{1}{2} |S_2|(|S_2|-1)-E_{S_2}\right) \frac{\tau -3}{\tau -1}+(|S_2||S_3|-E_{S_2,S_3})\frac{\tau -3}{2(\tau -1)}}\nonumber \\&\qquad \times \int _{0}^{\varepsilon _n}\int _{0}^{M}\cdots \int _{0}^{M}\int _{0}^{\infty }\cdots \int _{0}^{\infty }\prod _{i\in V_H\setminus V_1}y_i^{-\tau +\zeta _i}\prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{S_3^*}\cup {{{\mathcal {E}}}}_{S_1^*,S_2^*}}}\frac{y_iy_j}{y_iy_j+1}\nonumber \\&\qquad \times \prod _{{\{u,w\}\notin {{{\mathcal {E}}}}_{S_3^*}\cup {{{\mathcal {E}}}}_{S_1^*,S_2^*}}}\frac{1}{y_uy_w+1}\mathrm{d}y_{t_1+t_2+t_3}\cdots \mathrm{d}y_{1} \prod _{j \in V_1}\int _{1}^{\infty } y_j^{1-\tau }\mathrm{d}y_j, \end{aligned}$$
(5.7)

where the integrals from 0 to M correspond to vertices in \(S_2^*\) and the integrals from 0 to \(\infty \) to vertices in \(S_1^*\) and \(S_3^*\). Since \(\tau \in (2,3)\), the integrals corresponding to vertices in \(V_1\) are finite. By the analysis from (3.17) to (3.20),

$$\begin{aligned}&|S_1^*|(2-\tau )+|S_3^*|(1-\tau )/2-|S_2^*|+k+\frac{\tau -3}{\tau -1}E_{S_1^*}+\frac{\tau -3}{2(\tau -1)}E_{S_1^*,S_3^*}\nonumber \\&\qquad \qquad -\frac{1}{\tau -1}E_{S_1^*,V_1} -\frac{1}{2}E_{S_3^*,V_1}-\frac{\tau -2}{\tau -1}E_{S_2^*,V_1}\nonumber \\&\qquad \qquad +\left( \frac{1}{2} |S_2|(|S_2|-1)-E_{S_2}\right) \frac{\tau -3}{\tau -1}+(|S_2||S_3|-E_{S_2,S_3})\frac{\tau -3}{2(\tau -1)}\nonumber \\&\qquad = \frac{3-\tau }{2}(k_{2+}+B(H))+k_1/2. \end{aligned}$$
(5.8)

The integrals over \(y_i\in V_H\setminus V_1\) can be split into

$$\begin{aligned} \begin{aligned}&\int _{0}^{\varepsilon _n}\int _{0}^{M}\cdots \int _{0}^{M}\int _{0}^{\infty }\cdots \int _{0}^{\infty } \ \prod _{{i\in S_1^*\cup S_2^*}} \ y_i^{-\tau +\zeta _i} \\&\quad \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{S_1^*,S_2^*}}}\frac{y_iy_j}{y_iy_j+1} \ \ \prod _{{\{u,w\}\notin {{{\mathcal {E}}}}_{S_1^*,S_2^*}}}\frac{1}{y_uy_w+1}\mathrm{d}y_{t_1+t_2}\cdots \mathrm{d}y_{1}\\&\quad \times \int _{0}^{\infty }\cdots \int _{0}^{\infty }\\&\quad \prod _{i\in S_3^*}y_i^{-\tau +\zeta _i} \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{S_3^*}}} \ \frac{y_iy_j}{y_iy_j+1} \ \ \prod _{{\{u,w\}\notin {{{\mathcal {E}}}}_{S_3^*}}} \ \ \frac{1}{y_uy_w+1}\mathrm{d}y_{t_1+t_2+t_3}\cdots \mathrm{d}y_{t_1+t_2+1}. \end{aligned} \end{aligned}$$
(5.9)

By Lemma 5.1 the set of integrals on the second line of (5.9) is finite. Lemma 5.2 shows that the set of integrals on the first line of (5.9) tends to zero for \(\varepsilon _n\rightarrow 0\). Thus,

$$\begin{aligned} \int _{0}^{\varepsilon _n}\!&\int _{0}^{M}\!\!\cdots \int _{0}^{M}\!\!\int _{0}^{\infty }\!\!\cdots \!\int _{0}^{\infty } \ \prod _{{i\in S_1^*\cup S_2^*}} \ y_i^{-\tau +\zeta _i} \nonumber \\&\quad \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{S_1^*,S_2^*}}} \, \, \frac{y_iy_j}{y_iy_j+1} \ \prod _{{\{u,w\}\notin {{{\mathcal {E}}}}_{S_1^*,S_2^*}}} \ \ \frac{1}{y_uy_w+1}\mathrm{d}y_{t_1+t_2}\cdots \mathrm{d}y_{1}\nonumber \\&= o(1). \end{aligned}$$
(5.10)

Therefore, (5.7), (5.8) and (5.10) yield

$$\begin{aligned}&{\sum _{{\varvec{v}}}{{\mathbb {E}}}\left[ I(H, {\varvec{v}})\mathbb {1}{\left\{ d_{v_1}<\gamma ^l_1(n),d_{v_i}\in [\gamma _i^l(n),\gamma _i^u(n)] \ \forall i>1\right\} }\right] }\nonumber \\&\qquad =o\left( n^{\frac{3-\tau }{2}(k_{2+}+B(H))+k_1/2}\right) , \end{aligned}$$
(5.11)

when vertex 1 is in \( S_2^*\). Similarly, we can show that the expected contribution from \(d_{v_1}<\gamma _1^l(n)\) satisfies the same bound when vertex 1 is in \(S_1^*\) or \(S_3^*\). The expected number of subgraphs where \(d_{v_1}>\gamma _1^u(n)\) if vertex 1 is in \(S_1^*\), \(S_3^*\) or \(V_1\) can be bounded similarly, as well as the expected contribution where multiple vertices have \(d_{v_i}\notin [\gamma _i^l(n),\gamma _i^u(n)]\).

Denote

$$\begin{aligned} \Gamma _n(\varepsilon _n) = \{(v_1,\ldots ,v_k):d_{v_i}\in [\gamma _{v_i}^l(n),\gamma _{v_i}^u(n)] \}, \end{aligned}$$
(5.12)

and define \({\bar{\Gamma }}_n(\varepsilon _n)\) as its complement. Denote the number of subgraphs with vertices in \({\bar{\Gamma }}_n(\varepsilon _n)\) by \(N(H,{\bar{\Gamma }}_n(\varepsilon _n))\). Since \(d_{\max }\le Mn^{1/(\tau -1)}\), \(\Gamma _n(\varepsilon _n)={M}_n^{(\varvec{\alpha })}\). Therefore,

$$\begin{aligned} N\Big (H,{\bar{M}}_n^{(\varvec{\alpha })}\left( \varepsilon _n\right) \Big ) = N\Big (H,{\bar{\Gamma }}_n(\varepsilon _n)\Big ), \end{aligned}$$
(5.13)

where \(N\Big (H,{\bar{M}}_n^{(\varvec{\alpha }))}\left( \varepsilon _n\right) \big )\) denotes the number of copies of H on vertices not in \(M_n^{(\varvec{\alpha })}\left( \varepsilon _n\right) \). By the Markov inequality and (5.11),

$$\begin{aligned} N(H,{\bar{M}}_n^{(\varvec{\alpha })}(\varepsilon ))= N\Big (H,{\bar{\Gamma }}_n(\varepsilon _n)\Big )=o\left( n^{\frac{3-\tau }{2}(k_{2+}+B(H))+k_1/2}\right) . \end{aligned}$$
(5.14)

Combining this with Theorem 2.1(ii), for fixed \(\varepsilon >0\),

$$\begin{aligned} N(H)&= N(H,M_n^{(\varvec{\alpha })}(\varepsilon ))+N(H,{\bar{M}}_n^{(\varvec{\alpha })}(\varepsilon ))=O(n^{\frac{3-\tau }{2}(k_{2+}+B(H))+k_1/2}) \end{aligned}$$
(5.15)

shows that

$$\begin{aligned} N\Big (H,M_n^{(\varvec{\alpha })}\left( \varepsilon _n\right) \Big )/{N(H)}{\mathop {\longrightarrow }\limits ^{\scriptscriptstyle {{{\mathbb {P}}}}}}1, \end{aligned}$$
(5.16)

as required. This completes the proof of Theorem 2.1(i). \(\square \)

6 Proof of Theorem 2.4

figure a
Fig. 4
figure 4

The subgraph H that is used in Algorithm 1. Algorithm 1 attempts to find a copy of H where the dark vertices are in \(V'\), and the light vertices in \(W'\)

Proof of Theorem 2.4

Algorithm 1 shows the algorithm that distinguishes uniform random graphs from rank-1 inhomogeneous random graphs with connection probabilities (2.11) or (2.10). It first selects only vertices of degrees proportional to \(n^{1/(\tau -1)}\) and \(n^{(\tau -2)/(\tau -1)}\), and then randomly selects such vertices and checks whether they form a copy of induced subgraph H of Fig. 4. We will show that with high probability, Algorithm 1 finds a copy of H when the input graph is generated by a uniform random graph, and that with high probability, Algorithm 1 outputs ‘fail’ when the input graph is a rank-inhomogeneous random graph with connection probability  (2.11) or (2.10).

We first focus on the performance of Algorithm 1 when the input G is a uniform random graph. Algorithm 1 detects copies of subgraph H where the vertices have degrees as illustrated in Fig. 3c: two vertices of degree proportional to \(n^{1/(\tau -1)}\) and four of degree proportional to \(n^{(\tau -2)/(\tau -1)}\). By Theorem 2.1(ii), there are at least \(cn^{(4-\frac{1}{\tau -1})(3-\tau )}\) such induced subgraphs for some c with high probability. Furthermore, denote

$$\begin{aligned} \varvec{\alpha }=[n^{1/(\tau -1)},n^{1/(\tau -1)},n^{(\tau -2)/(\tau -1)},n^{(\tau -2)/(\tau -1)},n^{(\tau -2)/(\tau -1)},n^{(\tau -2)/(\tau -1)}]. \end{aligned}$$
(6.1)

By Assumption 1.1,

$$\begin{aligned} |M^{\scriptscriptstyle {(}n)}(\varvec{\alpha })|=\Theta (n^{4(3-\tau )}\log (n)^{6(\tau -1)}), \end{aligned}$$
(6.2)

so that there are at most \(c_2n^{4(3-\tau )}\log (n)^{6(\tau -1)}\) sets of vertices with degrees in \(M^{\scriptscriptstyle {(}n)}(\varvec{\alpha })\) that form no copy of induced subgraph H for some \(c_2<\infty \). Thus, the probability that a randomly chosen set of vertices with degrees in \(M^{\scriptscriptstyle {(}n)}(\varvec{\alpha })\) forms H is at least

$$\begin{aligned} \frac{cn^{(4-\frac{1}{\tau -1})(3-\tau )}}{c_2n^{4(3-\tau )}\log (n)^{6(\tau -1)}}=\tfrac{c}{c_2}n^{(\tau -3)/(\tau -1)}\log (n)^{6(1-\tau )}. \end{aligned}$$
(6.3)

Algorithm 1 tries at most n such sets of vertices with degrees in \(M^{\scriptscriptstyle {(}n)}(\varvec{\alpha })\), and therefore attempts to find subgraph H in \(\Theta (\min (n,n^{4(3-\tau )}\log (n)^{6(\tau -1)}))=\Theta (f(n))\) attempts where \(f(n)=\min (n,n^{4(3-\tau )}\log (n)^{6(\tau -1)})\). Thus, the probability that the algorithm does not find a copy of induced subgraph H among all attempts is bounded by

$$\begin{aligned} {{\mathbb {P}}}\left( \hbox { Algorithm does not find}\ H\right) \le \left( 1-n^{\frac{\tau -3}{\tau -1}}\right) ^{f(n)}\le e ^{-n^{\gamma }}, \end{aligned}$$
(6.4)

for some \(\gamma >0\), where we have used that \(1-x\le e ^{-x}\)

We now analyze the performance of Algorithm 1 on rank-1 inhomogeneous random graphs with connection probability (2.11). As these have the same degree distribution asymptotically, (6.2) also holds there. Furthermore, the probability that vertices in \(M^{\scriptscriptstyle {(}n)}(\varvec{\alpha })\) together form a copy of H is

$$\begin{aligned} \prod _{{\{i,j\}\in {{{\mathcal {E}}}}_{H}}}p(i,j)\prod _{{\{i,j\}\notin {{{\mathcal {E}}}}_{H}}}(1-p(i,j))\le e ^{-n^{\frac{1}{\tau -1}}n^{\frac{1}{\tau -1}}/\log (n)^2}\le e ^{-n^{\gamma _2}} \end{aligned}$$
(6.5)

for some \(\gamma _2>0\), where we bounded all p(ij) and \(1-p(i,j)\) by 1, except for \(1-p(i,j)\) for the non-edge between the two vertices of degree at least \(n^{\frac{1}{\tau -1}}/\log (n)\) (vertices in the left and right bottom corner of Fig. 4). Thus, there are at most \(\Theta (n^{4(3-\tau )}e ^{-n^{\frac{3-\tau }{\tau -1}}}\log (n)^{6(\tau -1)})\) copies of induced subgraph H on sets of vertices in \(M^{\scriptscriptstyle {(}n)}(\varvec{\alpha })\). Therefore, the probability that a randomly chosen set of vertices with degrees in \(M^{\scriptscriptstyle {(}n)}(\varvec{\alpha })\) forms H is at most

$$\begin{aligned} c_3\frac{n^{4(3-\tau )}e ^{-n^{\frac{3-\tau }{\tau -1}}}\log (n)^{6(\tau -1)}}{n^{4(3-\tau )}\log (n)^{6(\tau -1)}}=c_3e ^{-n^{\frac{3-\tau }{\tau -1}}} \end{aligned}$$
(6.6)

Then, the probability that the algorithm does not find a copy of induced subgraph H among all n attempts is bounded by

$$\begin{aligned} {{\mathbb {P}}}\left( \hbox { Algorithm does not find}\ H\right)&\ge \left( 1-c_3e ^{-n^{\frac{3-\tau }{\tau -1}}}\right) ^{f(n)}\nonumber \\&=1-c_3f(n)e ^{-n^{\frac{3-\tau }{\tau -1}}}+O\left( f(n)^2e ^{-2n^{\frac{3-\tau }{\tau -1}}}\right) . \end{aligned}$$
(6.7)

Thus, with high probability the algorithm outputs ‘fail’ when the input graph is a rank-1 inhomogeneous random graph with connection probability (2.11). A similar calculation shows that the algorithm outputs ‘fail’ with high probability when the input graph G is a rank-1 inhomogeneous random graph with connection probabilities (2.10). \(\square \)