1 Introduction

Given a graph G and a parameter k, the densest k-subgraph problem asks to find a k-vertex subgraph of G of maximum average degree. This is one of the central problems in theoretical computer science. It is NP-hard, and has no polynomial-time approximation scheme (PTAS) under certain complexity theoretic assumptions [13, 23]. On the other hand, the currently best known approximation algorithm achieves an \(O(n^{1/4})\)-approximation [3]. There is a vast literature on this topic, see, for example [4, 6, 29] and their references.

In addition to the algorithmic perspective, another natural direction for the above problem is to understand the maximum density of the small subgraphs of a given graph which one can theoretically guarantee. The precise problem under consideration, proposed by Feige and Wagner [15], is that given a positive integer n and real numbers ds satisfying \(d\ge s\ge 2\), what is the minimum of \(t=t(n,d,s)\) such that every graph G on n vertices with average degree at least d contains a subgraph of average degree at least s on at most t vertices. Note that this question is essentially equivalent to determining the smallest possible average degree of the densest t-vertex subgraph of an n-vertex graph with average degree at least d. The problem also falls squarely within the context of the so called local–global principle, that states that one can obtain global understanding of a structure from having a good understanding of its local properties, or vice versa. This phenomenon has been ubiquitous in many areas of mathematics and beyond, see e.g. [2, 7, 17, 24].

The question of Feige and Wagner, in the case \(s=2\), is equivalent to the famous girth problem [12], that asks for the length of the shortest cycle in a graph on n vertices with average degree d. This problem is extensively studied, and using our notation, it is well known that \(t(n,d,2)=\Theta (\log _{d-1}n)\) (see e.g. [5], page 104, Theorems 1.1 and 1.2). However, it is a major open problem to determine the leading coefficient (see, e.g., [25,26,27]). Much less is known if \(s>2\). A simple probabilistic argument gives the following result.

Proposition 1.1

For every \(s>2\), there is a positive \(c_s\) such that for all \(s\le d\le n-1\), we have \(t(n,d,s)\ge c_s nd^{-\frac{s}{s-2}}\). In other words, for every \(s\le d\le n-1\), there is an n-vertex graph G with average degree at least d in which every subgraph with average degree at least s has at least \(c_snd^{-\frac{s}{s-2}}\) vertices.

Feige and Wagner [15, Conjecture 1.4] proposed the conjecture that this lower bound on t(nds) is optimal, up to polylogarithmic factors in n. In the special case \(s\approx 4\), they also proved certain results in the support of it. First, they showed that if \(\varepsilon >0\) and \(s=4-\varepsilon \), then \(t(n,d,s)=O_{\varepsilon }(nd^{-2})\). Second, they proved that \(t(n,d,4)=O_{\varepsilon }(nd^{-1.8+\varepsilon })\) for every \(\varepsilon >0\). Also, Alon and Hod (personal communication) proved the aforementioned conjecture for certain special values of s and a limited range of d. Here, we completely settle the conjecture of Feige and Wagner with the following theorem (which is even stronger as the error term is logarithmic in d instead of n). Here and below, logarithms are to base two.

Theorem 1.2

For every \(s>2\), there is a constant \(C=C(s)\) such that the following holds for all \(d\ge s\). Let G be an n-vertex graph with average degree at least d, where \(d\le n^{\frac{s-2}{s}}\). Then there is a non-empty set \(R\subset V(G)\) of size at most \(nd^{-\frac{s}{s-2}}(\log d)^C\) such that G[R] has average degree at least s.

Note that while this result requires \(d\le n^{\frac{s-2}{s}}\), we may also apply it for graphs with average degree greater than \(n^{\frac{s-2}{s}}\) by simply taking \(d=n^{\frac{s-2}{s}}\). This way we obtain a subgraph of order at most polylog(n) with average degree at least s.

While our proof of Theorem 1.2 is non-algorithmic, it gives the best theoretical lower bound on the average degree one is guaranteed to find. It would be interesting to decide whether there is a polynomial time algorithm that finds a subgraph that achieves the bound provided by the above theorem.

Theorem 1.2 cannot be used to find a constant sized (independent of n) subgraph with large average degree. In this case, one cannot expect a similar answer as before, as the random deletion method shows the following.

Proposition 1.3

For every \(s>2\) and positive integer t, there exists \(\varepsilon =\varepsilon (s,t)>0\) such that the following holds for every sufficiently large n. There exists a graph G on n vertices with average degree at least \(n^{1-\frac{2}{s}+\varepsilon }\) such that every subgraph of G on at most t vertices has average degree less than s.

A similar argument shows that in case \(d=\Omega \big (n^{\frac{s-2}{s}}\big )\), the logarithmic error term is indeed needed in Theorem 1.2. Motivated by applications from [28] to parity check matrices, Verstraëte (see [22]) conjectured that this lower bound presented in Proposition 1.3 is optimal in a certain sense. More precisely, he proposed the conjecture that for every \(s>2\) and \(\varepsilon >0\) there exists some \(t=t(s,\varepsilon )\) such that if n is sufficiently large, then every graph on n vertices with average degree at least \(n^{1-\frac{2}{s}+\varepsilon }\) must contain a subgraph on at most t vertices with average degree at least s. In the special case s is an integer, this was proved by Jiang and Newman [22]. For even values of s, Janzer [19] strengthened this result, obtaining under the same hypothesis an s-regular subgraph. More precisely he proved that if G is a graph on n vertices with at least \(n^{2-\frac{1}{r}+\frac{1}{k+r-1}+\varepsilon }\) edges for sufficiently large n, then G contains an r-blowup of the cycle \(C_{2k}\) (note that the r-blowup of the cycle \(C_{2k}\) is \(s=2r\)-regular and by taking k large one can make the term \(\frac{1}{k+r-1}\) arbitrarily small). In our next theorem, we prove the conjecture of Verstraëte for all real values of \(s>2\).

Theorem 1.4

For every \(s>2\) and \(\varepsilon >0\), there is a positive integer t such that the following holds for all sufficiently large n. Let G be an n-vertex graph of average degree \(d\ge n^{1-\frac{2}{s}+\varepsilon }\). Then there is a non-empty set \(R\subset V(G)\) of size at most t such that G[R] has average degree at least s.

Our results are closely related to the problem of Erdős, Faudree, Rousseau and Schelp on finding small subgraphs of large minimum degree. In [10], they determined the minimal number of edges in a graph on n vertices which guarantees a proper subgraph (i.e., with \(u<n\) vertices) of minimum degree at least s (see [31] for additional details and recent developments). Erdős, Faudree, Rousseau and Schelp [11] further asked the following general question. Given positive integers n and s, and a positive real number d satisfying \(d\ge s\ge 2\), what is the minimum of \(u=u(n,d,s)\) such that every graph G on n vertices with average degree at least d contains a subgraph of minimum degree at least s on at most u vertices? It is reasonable to suspect that \(u(n,d,s)\approx t(n,d,s)\), that is, that Theorems 1.2 and 1.4 hold with the average degree of G[R] replaced with its minimum degree. In case s is even, the minimum degree version of Theorem 1.4 does hold by the aforementioned results of [19] and [22], the first of which even guarantees a regular subgraph. Moreover, the case \(s=3\) follows from a recent result of Janzer [20], which refutes a conjecture of Erdős and Simonovits [9] (and again provides a regular subgraph). However, the cases when s is odd and greater than 3 remain open. On the other hand, the minimum degree variant of Theorem 1.2 is completely open for every \(s\ge 3\), and our methods do not seem to be adaptable for this problem. At least, by noting that every graph of average degree at least 2s contains a subgraph of minimum degree at least s (see Lemma 2.2), we get the following immediate corollary of Theorem 1.2.

Corollary 1.5

For every integer \(s\ge 2\), there is a constant \(C=C(s)\) such that the following holds for every \(d\ge 2s\). Let G be an n-vertex graph with average degree at least d, where \(d\le n^{\frac{s-1}{s}}\). Then there is a non-empty set \(R\subset V(G)\) of size at most \(nd^{-\frac{s}{s-1}}(\log d)^C\) such that G[R] has minimum degree at least s.

2 Small Subgraphs of Large Average Degree

In this section, we prove Theorems 1.2 and 1.4. Both proofs follow the same argument, however, with a different range of parameters. Let us give a brief outline of this argument.

The key idea is that for every rational number \(\rho >1\) we construct a tree \(T_{\rho }\), which we refer to as a balanced tree, with the following property. Let \(q=q(\rho )\) be the number of leaves of \(T_{\rho }\). If H is a graph that is the union of copies of \(T_{\rho }\) having the same set of leaves, then the average degree of H is at least \(2\rho (1-\frac{q}{|V(H)|})\). These trees were first studied by Bukh and Conlon [8] in their celebrated paper on the Rational Exponents conjecture.

Now suppose we are given a graph G with n vertices and average degree at least d, which does not contain a subgraph of order at most \(t\approx nd^{-\frac{s}{s-2}}\) and average degree at least s. Take \(\rho \) such that \(2\rho \) is slightly larger than s, let \(T_{\rho }\) be the balanced tree with respect to \(\rho \), and let q be the number of leaves of \(T_{\rho }\). By counting the number of subgraphs of G isomorphic to \(T_{\rho }\) and using the pigeonhole principle, we find a large collection \(\mathcal {T}\) of copies of \(T_{\rho }\) in G, all having the same set of leaves. Let H be their union. We show that some subgraph of H will contradict our assumption on G, i.e. it has order at most t and average degree at least s. In order to show this, we consider two cases depending on the number of vertices in H. If \(|V(H)|>t\), we take some sub-collection \(\mathcal {T}'\) of \(\mathcal {T}\) such that \(H'\), which denotes the union of the copies of \(T_{\rho }\) in \(\mathcal {T}'\), has order roughly t. Our choice of parameters will guarantee that \(2\rho (1-\frac{q}{|V(H')|})\ge s\), so \(H'\) suffices by the above mentioned property of \(T_{\rho }\). Otherwise, we argue that unless H has average degree at least s, it cannot contain the described number of copies of \(T_{\rho }\), and we are done again.

2.1 Preliminaries

In this section, we prove the lower bounds (Propositions 1.1 and 1.3) and collect some basic results. First, let us start with the following simple consequence of the multiplicative Chernoff bound.

Lemma 2.1

Let X be the sum of independent indicator (i.e. 0-1 valued) random variables. Then \(\mathbb {P}(X\le \frac{\mathbb {E}(X)}{2})<e^{-\frac{\mathbb {E}(X)}{8}}\).

Next, we present the promised probabilistic lower bound arguments.

Proof of Proposition 1.1

Let \(c_s\) be sufficiently small. If \(d\ge \frac{n-1}{2}\), then \(c_snd^{-\frac{s}{s-2}}<1\), so we can take G to be any n-vertex graph with average degree at least d.

Else, let G be a random graph on n vertices in which each edge is chosen with probability \(p=\frac{2d}{n-1}\), independently of all other edges. Then |E(G)| is the sum of independent indicator random variables and has mean nd, so by Lemma 2.1, the probability that G has fewer than nd/2 edges (i.e., that G has average degree less than d) is at most \(\exp (-\frac{nd}{8})\le e^{-\frac{1}{8}}\le \frac{99}{100}\).

Let R be a subset of V(G) of size \(r\le c_s nd^{-\frac{s}{s-2}}\). By the union bound, the probability that G[R] has average degree at least s (i.e., that G[R] has at least \(\frac{rs}{2}\) edges) is at most \(\left( {\begin{array}{c}\left( {\begin{array}{c}r\\ 2\end{array}}\right) \\ \lceil rs/2\rceil \end{array}}\right) p^{\lceil \frac{rs}{2} \rceil }\le \left( \frac{e\left( {\begin{array}{c}r\\ 2\end{array}}\right) }{\lceil rs/2\rceil } p \right) ^{\lceil \frac{rs}{2}\rceil }\le (erp)^{\frac{rs}{2}}\). Hence, by the union bound, the probability that G has a subgraph of average degree at least s on at most \(c_s n d^{-\frac{s}{s-2}}\) vertices is at most

$$\begin{aligned}{} & {} \sum _{r=1}^{\lfloor c_s n d^{-\frac{s}{s-2}}\rfloor } \left( {\begin{array}{c}n\\ r\end{array}}\right) (erp)^{\frac{rs}{2}} \nonumber \\{} & {} \quad \le \sum _{r=1}^{\lfloor c_s n d^{-\frac{s}{s-2}}\rfloor } \left( \frac{en}{r}\cdot (erp)^{\frac{s}{2}}\right) ^r= \sum _{r=1}^{\lfloor c_s n d^{-\frac{s}{s-2}}\rfloor } (e^{\frac{s}{2}+1}r^{\frac{s-2}{2}}np^{\frac{s}{2}})^r. \end{aligned}$$
(1)

Moreover, for \(r\le \lfloor c_s n d^{-\frac{s}{s-2}}\rfloor \), we have

$$\begin{aligned} r^{\frac{s-2}{2}}np^{\frac{s}{2}}\le c_s^{\frac{s-2}{2}} n^{\frac{s-2}{2}} d^{-\frac{s}{2}} n p^{\frac{s}{2}}\le c_s^{\frac{s-2}{2}} n^{\frac{s-2}{2}} d^{-\frac{s}{2}} n \left( \frac{4d}{n}\right) ^{\frac{s}{2}}=c_s^{\frac{s-2}{2}}4^{\frac{s}{2}}. \end{aligned}$$

Hence, if \(c_s\) is sufficiently small, then by (1), the probability that G has a subgraph of average degree at least s on at most \(c_snd^{-\frac{s}{s-2}}\) vertices is at most \(\frac{1}{1000}\). It follows that with positive probability G has no such subgraph but has average degree at least d, completing the proof. \(\square \)

Proof of Proposition 1.3

We show that \(\varepsilon = \frac{1}{ts}\) suffices. Assume that n is sufficiently large with respect to s and t, and let \(G'\) be the graph on n vertices in which each edge is present independently with probability \(p=n^{-\frac{2}{s}+\frac{2}{ts}}\). Letting X be the number of edges of \(G'\), we have \(\mathbb {E}(X)=\left( {\begin{array}{c}n\\ 2\end{array}}\right) p>\frac{1}{4}n^{2-\frac{2}{s}+\frac{2}{ts}}\).

Let \(\mathcal {R}\) be the family of graphs R with vertex set contained in \(V(G')\) and having \(r\le t\) vertices and exactly \(\lceil \frac{rs}{2}\rceil \) edges. Say that such an R is bad if R is a subgraph of \(G'\). Clearly, \(\mathbb {P}(R \text{ is } \text{ bad})=p^{\lceil \frac{rs}{2}\rceil }\). Let Y be the number of bad elements of \(\mathcal {R}\), then

$$\begin{aligned} \mathbb {E}(Y)=&\sum _{R\in \mathcal {R}}\mathbb {P}(R \text{ is } \text{ bad})\le \sum _{r=1}^{t}\left( {\begin{array}{c}n\\ r\end{array}}\right) \left( {\begin{array}{c}\left( {\begin{array}{c}r\\ 2\end{array}}\right) \\ \lceil rs/2\rceil \end{array}}\right) p^{\lceil \frac{rs}{2}\rceil }\\<&\sum _{r=1}^{t} n^{r}(erp)^{\frac{rs}{2}}<t\big (e^{\frac{s}{2}} t^{\frac{s}{2}} n p^{\frac{s}{2}}\big )^{t}=t^{\frac{ts}{2}+1}e^{\frac{ts}{2}}n. \end{aligned}$$

Hence, we have \(\mathbb {E}(X-Y)>\frac{1}{8}n^{2-\frac{2}{s}+\frac{2}{ts}}>n^{2-\frac{2}{s}+\varepsilon }\). But then there exists a choice for \(G'\) such that \(X-Y>n^{2-\frac{2}{s}+\varepsilon }\). For each bad \(R\in \mathcal {R}\), remove an edge of \(G'\) contained in R, and let the resulting graph be G. Then G contains no element of \(\mathcal {R}\) as a subgraph, so every subgraph of G on at most t vertices has average degree less than s. Furthermore, G has average degree at least \(\frac{2(X-Y)}{n}>n^{1-\frac{2}{s}+\varepsilon }\), finishing the proof. \(\square \)

Finally, before we embark on the proofs of our main theorems, let us state a useful lemma about subgraphs of large minimum degree.

Lemma 2.2

Every graph G of average degree d contains a nonempty subgraph of minimum degree at least \(\frac{d}{2}\).

Proof

Keep removing vertices of degree less than \(\frac{d}{2}\) as long as there is such a vertex. In total, we removed less than \(\frac{|V(G)|d}{2}=|E(G)|\) edges, so the resulting graph is nonempty and has minimum degree at least \(\frac{d}{2}\). \(\square \)

2.2 Balanced Trees

In this section, we define balanced trees, which one can view as the building blocks of our small subgraph of large average degree. Interestingly, but perhaps not unexpectedly, these trees coincide with the balanced trees constructed by Bukh and Conlon [8] in their paper on the Rational Exponents conjecture.

Definition 2.3

Let T be a tree with leaf set L. For any non-empty \(S\subset V(T){\setminus } L\), let

$$\begin{aligned} \rho _T(S):=\frac{e_S}{|S|}, \end{aligned}$$

where \(e_S\) is the number of edges in T incident to at least one vertex from S. Also, set \(\rho _T=\rho _T(V(T)\setminus L)\). (That is, \(\rho _T=\frac{t-1}{a}\), where t is the number of vertices of T and a is the number of non-leaf vertices in T.) We say that T is balanced if \(\rho _T\le \rho _T(S)\) holds for every non-empty \(S\subset V(T)\setminus L\).

The main reason balanced trees are useful for us is given by the following simple lemma which one can prove by induction. Roughly speaking, it states that if T is a balanced tree, then any graph which is formed by taking the union of some copies of T all of which have the same the set of leaves has large average degree. Note that we allow some of the non-leaf vertices to coincide as well: see Fig. 1 for an example where T is the binary tree of height two, and the red and the blue copy share a non-leaf vertex.

Fig. 1
figure 1

The red, blue and green tree all have the same leaf set. (Color figure online)

Lemma 2.4

(Bukh–Conlon [8, Lemma 2.2]) Let T be a balanced tree with q leaves and let H be a graph which is the union of copies of T with the same set of leaves. Then \(e(H)\ge (|V(H)|-q)\rho _T\).

Proof

Let H be the union of k copies of T with the same set L of leaves. We prove the inequality \(e(H)\ge (|V(H)|-q)\rho _T\) by induction on k. If \(k=1\), then \(e(H)=e(T)\ge e_{V(T){\setminus } L}=|V(T){\setminus } L|\rho _T=(|V(H)|-q)\rho _T\), as desired. Assume now that \(k\ge 2\). Let \(T_0\) be one of the k copies of T constituting H and let \(H'\) be the union of the remaining \(k-1\) copies of T. By the induction hypothesis, we have \(e(H')\ge (|V(H')|-q)\rho _T\). Let \(S=V(T_0){\setminus } V(H')\). Note that since all copies of T in H have the same set L of leaves, we have \(S\subset V(T_0)\setminus L\). Now observe that \(e_S\ge |S|\rho _T\) (where \(T_0\) is identified with T and S is viewed as a subset of V(T)). Indeed, the inequality is trivial if \(S=\emptyset \) and else \(e_S=|S|\rho _T(S)\ge |S|\rho _T\) since T is balanced. But then

$$\begin{aligned} e(H)\ge e(H')+e_S\ge (|V(H')|-q)\rho _T+|S|\rho _T=(|V(H)|-q)\rho _T, \end{aligned}$$

completing the induction step. \(\square \)

Next we describe a construction of balanced trees which are caterpillars. A caterpillar is a tree in which the non-leaf vertices form a path.

Definition 2.5

(Bukh–Conlon [8]) Suppose that a and b are positive integers satisfying \(a+1 \le b < 2a+1\) and set \(i = b-a\). We define a tree \(T_{a,b}\) by taking a path with a vertices, which are labelled in order as \(1,2,\dots ,a\), and then adding a leaf to each of the \(i+1\) vertices

$$\begin{aligned} 1, \left\lfloor 1+\frac{a}{i}\right\rfloor ,\left\lfloor 1+2\cdot \frac{a}{i}\right\rfloor ,\dots ,\left\lfloor 1+(i-1)\cdot \frac{a}{i}\right\rfloor ,a. \end{aligned}$$

(Note that if \(b=2a\), then \(\lfloor 1+(i-1)\cdot \frac{a}{i}\rfloor =a\). In this case we attach two leaves in total to vertex a.) For \(b \ge 2a+1\), we define \(T_{a,b}\) recursively to be the tree obtained by attaching a leaf to each non-leaf of \(T_{a,b-a}\).

Remark

In [8], trees \(T_{a,b}\) for \(b\in \{a-1,a\}\) are introduced as well and they are used to define \(T_{a,b}\) for \(b\in \{2a-1,2a\}\), but one can easily see that our definition gives the same graphs.

Bukh and Conlon showed that, indeed, \(T_{a,b}\) is balanced for every \(a<b\). Combined with the simple observation that \(T_{a,b}\) has maximum degree at most \(\lceil b/a\rceil +1\), we obtain the following result.

Lemma 2.6

(Bukh–Conlon [8, Lemma 1.3]) For any positive integers \(a<b\), \(T_{a,b}\) is a balanced caterpillar with a non-leaf vertices, b edges and maximum degree at most \(\lceil b/a\rceil +1\).

2.3 Counting Trees

In this section, we provide lower and upper bounds on the number of copies of a fixed tree in graphs with some prescribed properties. Let us start with the lower bound.

For a graph G and a set S of vertices in G, we write \(\Gamma _G(S)\) for the set of vertices in G which have a neighbour in S. We make use of the following celebrated theorem of Friedman and Pippenger [16] about large bounded degree trees in expanding graphs.

Theorem 2.7

(Friedman–Pippenger [16]) If G is a non-empty graph such that for every \(S\subset V(G)\) with \(|S|\le 2\,m-2\), we have \(|\Gamma _G(S)|\ge (k+1)|S|\), then G contains every tree with at most m vertices and maximum degree at most k.

Say that a graph G is \((\rho ,r)\)-sparse if for every \(R\subset V(G)\) of size at most r, the number of edges in G[R] is at most \(\rho |R|\).

Lemma 2.8

Let G be a \((\rho ,r)\)-sparse graph with average degree at least \(4\rho (k+2)\). Then G contains every tree with at most \(\frac{r}{2(k+2)}\) vertices and maximum degree at most k.

Proof

By Lemma 2.2, G contains a subgraph \(G'\) of minimum degree at least \(2\rho (k+2)\). Note that \(G'\) is also \((\rho ,r)\)-sparse. We show that \(G'\) already contains every tree with at most \(m=\frac{r}{2(k+2)}\) vertices and maximum degree at most k. Otherwise, by Theorem 2.7, there is a set \(S\subset V(G')\) of size at most \(2\,m-2\le \frac{r}{k+2}\) such that \(|\Gamma _{G'}(S)|<(k+1)|S|\). Let \(R=S\cup \Gamma _{G'}(S)\). Then

$$\begin{aligned} |R|\le |S|+|\Gamma _{G'}(S)|< (k+2)|S|\le r. \end{aligned}$$

Furthermore, by the minimum degree condition, the number of edges in \(G'[R]\) is at least

$$\begin{aligned} \frac{1}{2}|S|\cdot 2\rho (k+2)> \rho |R|, \end{aligned}$$

which is a contradiction. \(\square \)

Now we are ready to state our first tree counting lemma.

Lemma 2.9

For any \(\rho >1\) and positive integer k, there exists \(c_0=c_0(\rho ,k)>0\) such that the following holds for every \(n\ge 8\). Let G be an n-vertex \((\rho ,r)\)-sparse graph with average degree at least \(d\ge c_0^{-1}\). Let T be a tree with \(t\le \frac{r}{2(k+2)}\) vertices and maximum degree at most k. Then G contains at least \((c_0 d)^{t-1}\) copies of T.

Proof

We show that \(c_0=\frac{1}{16\rho (k+2)}\) suffices. Let \(p=\frac{8\rho (k+2)}{d}<1\), and sample each edge of G independently with probability p. Let the resulting graph be \(G'\), and let \(X=e(G')\). Then \(\mathbb {E}(X)=pe(G)\ge \frac{pdn}{2}=4\rho (k+2)n\). As X is the sum of independent indicator random variables, we can use Lemma 2.1 to write \(\mathbb {P}\left( X\le \frac{1}{2}\mathbb {E}(X)\right) \le e^{-\frac{\mathbb {E}(X)}{8}}<\frac{1}{2}.\) Hence, with probability at least \(\frac{1}{2}\), \(G'\) has average degree at least \(4\rho (k+2)\). If this happens, we can apply Lemma 2.8 to conclude that \(G'\) contains a copy of T. Thus, the expected number of copies of T in \(G'\) is at least \(\frac{1}{2}\). On the other hand, writing N for the number of copies of T in G, we also have that the expected number of copies of T in \(G'\) is \(p^{t-1}N\). Hence, we get the inequality \(p^{t-1}N\ge \frac{1}{2}\), which implies that G contains at least

$$\begin{aligned} N\ge \frac{1}{2}p^{-(t-1)}=\frac{1}{2}\left( \frac{d}{8\rho (k+2)}\right) ^{t-1}>(c_0d)^{t-1} \end{aligned}$$

copies of T. \(\square \)

Now let us turn to our upper bound on the number of copies of a tree. For simplicity, we only present a counting result in case the tree is a caterpillar. However, it seems likely that a similar result should hold for trees in general as well.

Lemma 2.10

Let G be a graph with n vertices and m edges. Let T be a caterpillar with a non-leaf vertices and maximum degree k. Then G contains at most \(n\cdot (\frac{2m}{a})^{ak}\) copies of T.

Proof

Let \(d_1\ge d_2\ge \dots \ge d_n\) be the degree sequence of G. As T is a caterpillar, its non-leaf vertices form a path on a vertices, so let us first count the number of such paths in G.

Claim

For every vertex \(v\in V(G)\), the number of paths on a vertices in G starting from v is at most \(d_1\dots d_{a-1}\).

Proof

We prove this by induction on a. If \(a=2\), this is trivial, so let us assume that \(a\ge 3\). Let \(G'\) be the subgraph of G we get after removing v, and let \(d_1'\ge \dots \ge d_{n-1}'\) be the degree sequence of \(G'\). There are \(\deg _{G}(v)\) ways to choose the neighbour of v in the path. If this neighbour is \(v'\in V(G')\), we can use our induction hypothesis to conclude that there are at most \(d_1'\dots d_{a-2}'\) paths on \(a-1\) vertices in \(G'\) starting with \(v'\). Hence, the number of paths on a vertices in G starting with v is at most \(d_{G}(v)\cdot (d_1'\dots d'_{a-2})\le d_1\dots d_{a-1}\), finishing the proof. \(\square \)

Hence, the number of ways to embed the non-leaf vertices of T is at most \(n\cdot d_1\dots d_{a-1}\). Suppose that the non-leaf vertices of T are already embedded in G, and their images are \(v_1,\dots ,v_a\). Then the number of ways to choose the leaves of T is at most

$$\begin{aligned} \deg _{G}(v_1)^{k-1}\dots \deg _{G}(v_a)^{k-1}\le (d_1\dots d_{a})^{k-1}. \end{aligned}$$

Therefore, the total number of copies of T in G is at most

$$\begin{aligned} n\cdot (d_1\dots d_{a})^{k}\le n\cdot \left( \frac{d_1+\dots +d_{a}}{a}\right) ^{ak}\le n\cdot \left( \frac{2m}{a}\right) ^{ak}, \end{aligned}$$

where the first inequality is due to the AM-GM inequality. \(\square \)

2.4 Piecing the Trees Together

In this section, we present our main technical lemma, which implies both Theorems 1.2 and 1.4 after substituting the right parameters. Before we state this lemma, we show that if a graph G contains many copies of a balanced caterpillar with the same set of leaves, then G cannot be \((\rho ,r)\)-sparse. Recall that if T is a tree with t vertices and a non-leaf vertices, then \(\rho _{T}=\frac{t-1}{a}\).

Lemma 2.11

Let \(\rho >0\) and let k be a positive integer. Let T be a balanced caterpillar with t vertices, a non-leaf vertices, and maximum degree at most k. Assume that \(t< r\le n\) and \(\rho \le (1-\frac{t}{r-t})\rho _T\). Let G be an n-vertex graph containing at least \(r(\frac{2\rho r}{a})^{ak}\) copies of T with the same set of leaves. Then G is not \((\rho ,r)\)-sparse.

Proof

Assume, for contradiction that G is \((\rho ,r)\)-sparse. Let \(\mathcal {T}\) be a collection of at least \(r(\frac{2\rho r}{a})^{ak}\) copies of T in G with the same set of leaves. Let \(R_0\subset V(G)\) be the set of vertices spanned by the elements of \(\mathcal {T}\). First, observe that we must have \(|R_0| \ge r\). Otherwise, as G is \((\rho ,r)\) sparse, \(G[R_0]\) has at most \(m=\rho |R_0|< \rho r\) edges, so by Lemma 2.10, it contains less than \(r(\frac{2\rho r}{a})^{ak}\) copies of T, a contradiction.

Therefore, we can take a subcollection \(\mathcal {T}'\subset \mathcal {T}\) such that the union of the trees in \(\mathcal {T}'\) spans at least \(r-t\) and at most r vertices. Let R be the set of vertices in G spanned by the union of the trees in \(\mathcal {T}'\). By Lemma 2.4, \(e(G[R])\ge (|R|-q)\rho _T\), where q is the number of leaves in T. Hence,

$$\begin{aligned}{} & {} \frac{e(G[R])}{|R|}\ge \frac{|R|-q}{|R|}\rho _T\ge \frac{r-t-q}{r-t}\rho _T\\{} & {} \ge \frac{r-2t}{r-t}\rho _T=\left( 1-\frac{t}{r-t}\right) \rho _T\ge \rho . \end{aligned}$$

Since \(|R|\le r\), this contradicts the assumption that G is \((\rho ,r)\)-sparse, and the proof is complete. \(\square \)

Now we are ready to state the promised main technical lemma.

Lemma 2.12

Let \(\rho >1\) and let \(c_0=c_0(\rho ,\lceil 2\rho \rceil +1)\) given by Lemma 2.9. Let G be an n-vertex graph with average degree \(d\ge c_0^{-1}\). Assume that there are positive integers r, t and a such that the following inequalities are satisfied.

  1. 1

    \(\rho \le (1-\frac{t}{r-t})\frac{t-1}{a}\),

  2. 2

    \(\frac{t}{a}\le 2\rho \),

  3. 3

    \(2(\lceil 2\rho \rceil +3)t\le r\le n\) and

  4. 4

    \((c_0 d)^{t-1}\ge \left( {\begin{array}{c}n\\ t-a\end{array}}\right) \cdot r\cdot (\frac{2\rho r}{a})^{3t}\).

Then G is not \((\rho ,r)\)-sparse.

Since there are many parameters and conditions to keep track of, it may be helpful to note that condition 2 is just a technicality and the main ones are 1, 3 and 4. One should think of them as \(\rho \lesssim \frac{t}{a}\), \(t\le r\) and \((c_0 d)^{t-1}\gg \left( {\begin{array}{c}n\\ t-a\end{array}}\right) \), respectively. Let us briefly and informally explain where they come from. We will take a balanced tree T with roughly t edges and roughly a non-leaf vertices as in Lemma 2.11. Then we want to find many copies of T with the same leaf set and argue that their union is a subgraph on at most r vertices with large average degree. Clearly, since T has about t vertices, such a union will have at least t vertices, explaining the necessary condition \(t\le r\). In view of Lemma 2.4 and using condition 1, the union of many copies of T has average degree at least about \(2\rho _T\approx \frac{2t}{a}\gtrsim 2\rho \), provided that this union spans much more than t vertices. If the union spans not much more than t vertices, it does not follow from Lemma 2.4 that it has average degree close to \(2\rho \), but we can still use Lemma 2.11 to argue that if many copies of T have the same leaf set, then we get a small subgraph with average degree about \(2\rho \). Hence, it remains to ensure that we indeed have many copies of T with the same leaf set. By Lemma 2.9, the number of copies of T is at least \((c_0d)^{t-1}\). On the other hand, T has about \(t-a\) leaves, so there are roughly \(\left( {\begin{array}{c}n\\ t-a\end{array}}\right) \) ways to choose the leaf set. Therefore, condition 4 is just saying that there are many copies of T with the same leaf set.

Proof of Lemma 2.12

Conditions 1 and 3 imply that \(t-1>a\), so Lemma 2.6 shows that there is a balanced caterpillar T with a non-leaf vertices, \(t-1\) edges and maximum degree at most \(k=\lceil t/a\rceil +1\le \lceil 2\rho \rceil +1\). Note that \(ka<3t\). Assume that G is \((\rho ,r)\)-sparse. Condition 3 implies that \(t\le \frac{r}{2(k+2)}\), so it follows by Lemma 2.9 that G contains at least \((c_0d)^{t-1}\) copies of T. Since T has \(t-a\) leaves and G has \(\left( {\begin{array}{c}n\\ t-a\end{array}}\right) \) subsets of size \(t-a\), it follows from condition 4 and the pigeonhole principle that there is a collection of at least \(r(\frac{2\rho r}{a})^{3t}>r\cdot (\frac{2\rho r}{a})^{ka}\) copies of T in G which share the same set of leaves. Note that T has \(t-1\) edges and a non-leaf vertices, so \(\rho _T=\frac{t-1}{a}\) and condition 1 gives \(\rho \le (1-\frac{t}{r-t})\rho _T\). Hence, we can apply Lemma 2.11 to conclude that G is not \((\rho ,r)\)-sparse, a contradiction. \(\square \)

2.5 Completing the Proofs

In this section, we put everything together to conclude the proofs of our main results. First, we prove Theorem 1.2 in the following equivalent form.

Theorem 2.13

For every \(\rho >1\), there is a constant \(C=C(\rho )\) such that the following holds for every sufficiently large d. Let G be an n-vertex graph with average degree at least d, where \(d\le n^{\frac{\rho -1}{\rho }}\). Then G is not \((\rho ,nd^{-\frac{\rho }{\rho -1}}(\log d)^C)\)-sparse.

Note that this indeed implies Theorem 1.2 with \(s=2\rho \) whenever d is sufficiently large compared to s. To see that Theorem 1.2 holds also when d is bounded by some function of s (but is at least s), note that we may choose \(C=C(s)\) in a way that \(nd^{-\frac{s}{s-2}}(\log d)^C\ge n\), and then we may take \(R=V(G)\).

Before turning to the formal proof of Theorem 2.13, let us give an informal sketch. Setting \(r\approx nd^{-\frac{\rho }{\rho -1}}\), we want to show that G is not \((\rho ,r)\)-sparse. Since we want to apply Lemma 2.12, we need to verify that there exist choices for t and a satisfying the key conditions \(\rho \lesssim \frac{t}{a}\), \(t\le r\) and \((c_0 d)^{t-1}\gg \left( {\begin{array}{c}n\\ t-a\end{array}}\right) \) from that lemma. Let us take \(t\approx r\approx nd^{-\frac{\rho }{\rho -1}}\) and \(a\approx t/\rho \). Then the conditions \(\rho \lesssim \frac{t}{a}\) and \(t\le r\) hold trivially, and

$$\begin{aligned} \left( {\begin{array}{c}n\\ t-a\end{array}}\right) \le \left( \frac{en}{t-a}\right) ^{t-a}\approx \left( d^{\frac{\rho }{\rho -1}}\right) ^{t-a}\approx d^t, \end{aligned}$$

so indeed \((c_0 d)^{t-1}\gtrsim \left( {\begin{array}{c}n\\ t-a\end{array}}\right) \). Hence, Lemma 2.12 implies that G is not \((\rho ,r)\)-sparse.

We now give the formal proof of Theorem 2.13.

Proof of Theorem 2.13

Let C be sufficiently large with respect to \(\rho \), let d be sufficiently large with respect to \(\rho \) and C, let \(f=(\log d)^{C}\) and let \(r=\lfloor nd^{-\frac{\rho }{\rho -1}}f\rfloor \). Then \(r\le n\), and by the conditions of the theorem, \(r\ge \lfloor (\log d)^{C}\rfloor \). Let \(\varepsilon =\frac{1}{\log d}\), \(t=\lceil \frac{r\varepsilon }{8}\rceil \) and let \(a=\lceil \frac{t}{\rho }(1-\varepsilon )\rceil \). Then \(t\ge 20\rho \log d\), assuming that C and d are sufficiently large. It suffices to prove that the four conditions in Lemma 2.12 are satisfied.

Note that

$$\begin{aligned} \frac{a\rho }{t-1}&\le \frac{t}{\rho }\cdot \left( 1-\frac{\varepsilon }{2}\right) \cdot \frac{\rho }{t-1}=\frac{t}{t-1}\left( 1-\frac{\varepsilon }{2}\right) \\&\le \left( 1+\frac{1}{10\log d}\right) \left( 1-\frac{\varepsilon }{2}\right) \le 1-\frac{\varepsilon }{4}\le 1-\frac{t}{r-t}, \end{aligned}$$

where the last inequality follows from \(t=\lceil \frac{r\varepsilon }{8}\rceil \). Hence, we have \(\rho \le (1-\frac{t}{r-t})\frac{t-1}{a}\) and condition 1 is satisfied.

Conditions 2 and 3 are immediate from the definitions of t and a. Hence, it remains to verify condition 4, that is

$$\begin{aligned} \left( c_0d\right) ^{t-1}\ge \left( {\begin{array}{c}n\\ t-a\end{array}}\right) \cdot r\cdot \left( \frac{2\rho r}{a}\right) ^{3t}. \end{aligned}$$
(2)

First, since \(a<\frac{t}{\rho }\), note that

$$\begin{aligned} \left( {\begin{array}{c}n\\ t-a\end{array}}\right) \le \left( \frac{en}{t-a}\right) ^{t-a}\le \left( \frac{en}{t-t/\rho }\right) ^{t-a}\le \left( \frac{e}{1-1/\rho }\right) ^t\cdot \left( \frac{n}{t}\right) ^{t-a}. \end{aligned}$$

Also,

$$\begin{aligned}&\left( \frac{n}{t}\right) ^{t-a} \le \left( \frac{8n}{\varepsilon r}\right) ^{t-a}\le \left( \frac{9d^{\frac{\rho }{\rho -1}}}{\varepsilon f}\right) ^{t-a}\le \left( \frac{9d^{\frac{\rho }{\rho -1}}}{\varepsilon f}\right) ^{t-\frac{t}{\rho }+\frac{t\varepsilon }{\rho }}\\&\le \left( 9 d^{1+\frac{\varepsilon }{(\rho -1)}}\cdot (\varepsilon f)^{-1+\frac{1}{\rho }-\frac{\varepsilon }{\rho }}\right) ^{t}. \end{aligned}$$

Using that \(d^{\varepsilon }=d^{1/\log d}=2\), this implies that

$$\begin{aligned} \left( {\begin{array}{c}n\\ t-a\end{array}}\right) \le \big (c_1d\cdot (\varepsilon f)^{-1+\frac{1-\varepsilon }{\rho }}\big )^{t} \end{aligned}$$

holds for some \(c_1=c_1(\rho )>0\). On the other hand, we have \((\frac{2\rho r}{a})^{3t}\le (\frac{c_2}{\varepsilon ^{3}})^{t}\) for some \(c_2=c_2(\rho )\). Hence, in order to prove (2), it suffices to show that

$$\begin{aligned} (c_0d)^{1-\frac{1}{t}}\ge c_1 c_2\cdot d\cdot r^{\frac{1}{t}}\cdot \varepsilon ^{-4+\frac{1-\varepsilon }{\rho }}\cdot f^{-1+\frac{1-\varepsilon }{\rho }}. \end{aligned}$$
(3)

As \(t>10\log d\) and d is sufficiently large, we have \((c_0d)^{1-\frac{1}{t}}>\frac{c_0d}{2}\). Also, \(t\ge \frac{r}{8\log d}>\log r\), so \(r^{\frac{1}{t}}<3\). Finally, recalling that \(f=(\log d)^{C}\) and \(\varepsilon =\frac{1}{\log d}\), we get that (3) holds whenever C and d are sufficiently large in terms of \(\rho \). This completes the proof of the theorem. \(\square \)

Finally, we prove the following equivalent version of Theorem 1.4.

Theorem 2.14

For every \(\rho >1\) and \(0<\varepsilon <1\), there is a positive integer r such that the following holds for all sufficiently large n. Let G be an n-vertex graph with average degree \(d\ge n^{1-\frac{1}{\rho }+\varepsilon }\). Then G is not \((\rho ,r)\)-sparse.

Proof

Assume that r is sufficiently large in terms of \(\rho \) and \(\varepsilon \). Let \(t=\lceil \frac{4\rho }{\varepsilon } \rceil \) and let \(a=\lfloor \frac{t}{\rho }\rfloor -1\). It suffices to prove that the four conditions in Lemma 2.12 are satisfied.

Note that

$$\begin{aligned} \frac{a\rho }{t-1}\le \frac{(t/\rho -1)\rho }{t-1}=\frac{t-\rho }{t-1}=1-\frac{\rho -1}{t-1}\le 1-\frac{t}{r-t}, \end{aligned}$$

where the last inequality follows from the assumption that r is sufficiently large in terms of \(\rho \) and \(\varepsilon \). Hence, we have \(\rho \le (1-\frac{t}{r-t})\frac{t-1}{a}\) and condition 1 is satisfied.

Condition 3 is immediate from the definition. Also by definitions of a and t, we have \(\frac{t}{\rho } \ge 4\) and hence \(a \ge \frac{t}{\rho }-2 \ge \frac{t}{2\rho }\), verifying condition 2.

Now let us verify condition 4. We have \(d\ge n^{1-\frac{1}{\rho }+\varepsilon }\) and \(\left( {\begin{array}{c}n\\ t-a\end{array}}\right) \le n^{t-a}\). Hence, it is enough to prove that

$$\begin{aligned} (c_0n^{1-\frac{1}{\rho }+\varepsilon })^{t-1}\ge n^{t-a}\cdot r\cdot \left( \frac{2\rho r}{a}\right) ^{3t}. \end{aligned}$$

Note that n is sufficiently large compared to the other parameters, so it suffices to prove the corresponding inequality for the exponents of n on both sides, i.e., that \((1-\frac{1}{\rho }+\varepsilon )(t-1)>t-a\). By the definition of t, \(\varepsilon t \ge 4\rho >4\). Hence

$$\begin{aligned}&\left( 1-\frac{1}{\rho }+\varepsilon \right) (t-1)=(t-1)-\frac{t-1}{\rho }+\varepsilon (t-1)\ge t-\frac{t}{\rho }+\varepsilon t-2\ge t-a\\&+\varepsilon t-4>t-a, \end{aligned}$$

and the proof is complete. \(\square \)

3 Concluding Remarks

3.1 Regular Subgraphs

In this paper, we proved nearly optimal bounds on the order of the smallest subgraph G[R] of average degree at least s in a graph G of given order and average degree. As mentioned in the introduction, in case s is an integer, we believe that the strengthening of our results in which the average degree of G[R] is replaced with its minimum degree should also hold. Moreover, one can further strengthen this by requiring that G[R] contains an s-regular subgraph.

Conjecture 3.1

For every integer \(s\ge 3\), there is a constant \(C=C(s)\) such that the following holds for every sufficiently large n. Let G be an n-vertex graph with average degree at least d, where \(C\log \log n\le d\le n^{\frac{s-2}{s}}\). Then G contains an s-regular subgraph on at most \(nd^{-\frac{s}{s-2}}(\log n)^C\) vertices.

Note that in the special case where \(d\le (\log n)^{C(s-2)/s}\), Conjecture 3.1 asserts the existence of an s-regular subgraph, without a requirement on its order. This is the well studied Erdős–Sauer problem which was resolved very recently in [21]. It was shown there that for large \(C=C(s)\), every n-vertex graph with average degree at least \(C\log \log n\) has an s-regular subgraph. This is tight, as an old construction of Pyber, Rödl and Szemerédi [30] shows that there is some \(c>0\) such that there are n-vertex graphs with average degree at least \(c\log \log n\) and no s-regular subgraph.

On the other hand, when d is very large and we are looking for a subgraph of bounded order, we have the following conjecture generalizing Theorem 1.4. This conjecture quantifies Problem 7.1 from [22].

Conjecture 3.2

For every integer \(s\ge 3\) and \(\varepsilon >0\), there is a positive integer t such that the following holds for all sufficiently large n. Let G be an n-vertex graph of average degree at least \( n^{1-\frac{2}{s}+\varepsilon }\). Then G contains an s-regular subgraph on at most t vertices.

As we remarked earlier, Conjecture 3.2 is known to be true if s is even [19], or \(s=3\) [20].

3.2 Uniform Hypergraphs

Another interesting direction one may explore is the analogous question for uniform hypergraphs, which was also considered by Feige and Wagner [15].

Problem 3.3

Let rn be positive integers, and \(s,d>1\) be real numbers. Determine the asymptotic value of the smallest \(t=t_r(n,d,s)\) such that every r-uniform hypergraph on n vertices of average degree at least d contains a subhypergraph on at most t vertices of average degree at least s.

This problem is closely related to another well known conjecture of Feige [14] about even covers of hypergraphs. An even cover of a hypergraph is a non-empty subhypergraph in which each vertex is contained in an even number of edges. Feige conjectured that every r-uniform hypergraph with n vertices and average degree d contains an even cover on at most \(nd^{-\frac{2}{r-2}}\text{ polylog }(n)\) vertices, which was recently settled by Guruswami, Kothari, and Manohar [18]. Indeed, in order to find a small even cover, one needs to first guarantee a small subhypergraph of average degree at least 2.