1 Introduction

Let \(\mathbb {N}\) denote the set of nonnegative integers and let \(h, k, m \ge 2\) be integers. For an infinite set of positive integers A, let \(R_{A,h}(n)\) and \(R^{*}_{A,h}(n)\) denote the number of solutions of the equations

$$\begin{aligned}&a_{1} + a_{2} + \cdots + a_{h} = n, a_{1} \in A, \ldots , a_{h} \in A, a_{1}< a_{2}< \cdots {} < a_{h}, \\&a_{1} + a_{2} + \cdots + a_{h} = n, a_{1} \in A, \ldots , a_{h} \in A, a_{1} \le a_{2} \le \ldots {} \le a_{h}, \end{aligned}$$

respectively. A set of positive integers A is called a \(B_h[g]\) set if \(R^{*}_{A,h}(n) \le g\) for every positive integer n. A set of positive integers A is said to be a weak \(B_h[g]\) set if \(R_{A,h}(n) \le g\) for every positive integer n. We say a set A of nonnegative integers is a basis of order m if every nonnegative integer can be represented as the sum of m terms from A i.e., \(R_{A,m}(n) > 0\) for every positive integer n. Throughout the paper, we denote the cardinality of a finite set A by |A| and we put

$$\begin{aligned} A(n) = \sum _{\overset{a \in A}{a \le n}}1. \end{aligned}$$

Furthermore, we write \(\mathbb {N}^{k} = \{0^{k}, 1^{k}, 2^{k}, \ldots {}\}\) and \((\mathbb {Z}^{+})^{k} = \{1^{k}, 2^{k}, 3^{k}, \ldots {}\}\). The investigation of the existence of a basis formed by perfect powers is a classical problem in Number Theory. For instance, the Waring problem asserts that \(\mathbb {N}^{k}\) is a basis of order m if m is sufficiently large compared to the power k. A few years ago, the assertion of Waring was sharpened [17] by proving the existence of a sparse basis formed by perfect powers. More precisely,

Theorem 1

(V.H. Vu) For any fixed \(k \ge 2\), there is a constant \(m_{0}(k)\) such that if \(m > m_{0}(k)\), then there exists a basis \(A \subset \mathbb {N}^{k}\) of order m such that \(A(x) \ll x^{1/m}\log ^{1/m}x\).

Obviously, if A is a basis of order m, then \(A(x)^{m} > \left( {\begin{array}{c}A(x)\\ m\end{array}}\right) \ge x + 1\), which yields \(A(x) \gg x^{1/m}\).

It is natural to ask if there exists a \(B_{h}[g]\) set formed by k-th powers such that A(x) is as large as possible. Now, we prove that the best possible exponent is \(\min \left\{ \frac{1}{k},\frac{1}{h}\right\} \). It is clear that \(A(x) \le x^{1/k}\). On the other hand, if A is a \(B_{h}[g]\) set, then

$$\begin{aligned} hgx \ge \sum _{n=1}^{hx}R^{*}_{A,h}(n) \ge \left( {\begin{array}{c}A(x)\\ h\end{array}}\right) \ge \left( \frac{(A(x)-(h-1))^{h}}{h!}\right) \end{aligned}$$

and so \(A(x) \le \root h \of {hgx\cdot h!}+h-1\), which implies that

$$\begin{aligned} A(x)\ll x^{\min \left\{ \frac{1}{k},\frac{1}{h}\right\} }. \end{aligned}$$

Next, we show that in the special case \(k = h = 2\) this can be improved. According to a well-known theorem of Landau [12], the number of positive integers up to a large x that can be written as the sum of two squares is asymptotically \(\frac{cx}{\sqrt{\log x}}\), where c is called Landau–Ramanujan constant. On the other hand, if A is a \(B_{2}[g]\) set formed by squares, then there are at most \(\left( {\begin{array}{c}A(x)\\ 2\end{array}}\right) \) integers below 2x that can be written as the sum of two squares. Then, we have

$$\begin{aligned} \left( {\begin{array}{c}A(x)\\ 2\end{array}}\right) \le \sum _{n=1}^{2x}R^{*}_{A,2}(n) \le (c+o(1))\frac{2x}{\sqrt{\log 2x}}, \end{aligned}$$

which gives

$$\begin{aligned} A(x) \ll \frac{\sqrt{x}}{\root 4 \of {\log x}}. \end{aligned}$$

In view of the above observations, we can formulate the following conjecture.

Conjectre 1

For every \(k \ge 1\), \(h \ge 2\), \(\varepsilon > 0\), there exists a \(B_{h}[g]\) set \(A \subseteq (\mathbb {Z}^{+})^{k}\) such that

$$\begin{aligned} A(x) \gg x^{\min \left\{ \frac{1}{k},\frac{1}{h}\right\} -\varepsilon }. \end{aligned}$$

The above conjecture was proved by Erdős and Rényi [5] when \(k = 1\), \(h = 2\). It was also proved [3, 7] when \(k = 1\), \(h > 2\). It is clear that if Conjecture 1 holds for \(h = k\), then it holds for every \(2 \le h \le k\) as well. Furthermore, it was proved in [2] that for any positive integer g and \(\epsilon > 0\), there exists a \(B_{2}[g]\) set A of squares such that \(A(x) \gg x^{\frac{g}{2g+1}-\epsilon } = x^{\frac{1}{2}-\frac{1}{4g+2}-\epsilon }\) by using the probabilistic method. This implies Conjecture 1 for \(h = k = 2\). Moreover, a conjecture of Lander, Parkin, and Selfridge [13] asserts that if the diophantine equation \(\sum _{i=1}^{n}x_{i}^{k} = \sum _{j=1}^{m}y_j^{k}\), where \(x_{i} \ne y_{j}\) for all \(1 \le i \le n\) and \(1 \le j \le m\) has a nontrivial solution, then \(n + m \ge k\). If \(h < \frac{k}{2}\), this conjecture clearly implies Conjecture 1. It turns out from Theorem 412 in [10] that the number of solutions of \(a^{3} + b^{3} = c^{3} + d^{3}\) can be made arbitrary large, hence the set of cubes is not a \(B_{2}[g]\) set for any g. It is also known [15] that given any real solution of the equation \(a^{4} + b^{4} = c^{4} + d^{4}\), there is a rational solution arbitrary close to it, which implies that the quartics cannot be a \(B_{2}[1]\) set. It may happen that they form a \(B_{2}[2]\) set. As far as we know, it is not known that the equation \(a^{5} + b^{5} = c^{5} + d^{5}\) has any nontrivial solution. It is conjectured that the fifth powers form a \(B_{2}[1]\) set [7,D1]. More generally, Hypothesis K of Hardy and Littlewood [9] asserts that if \(h = k\), then \(R_{(\mathbb {Z}^{+})^{k},k}(n) = O(n^{\varepsilon })\). The conjecture is true for \(k = 2\) [11, Theorem 7.6] and Mahler proved [14] that it is false for \(k = 3\). The conjecture is still open for \(k \ge 4\) [16]. In this paper, we prove that if Hypothesis K holds, then there exists a set A of positive perfect powers as dense as in Conjecture 1 such that \(R_{A,h}(n)\) is bounded.

Theorem 2

Let k be a positive integer. Assume that for some \(2 \le h \le k\) and for every \(\eta > 0\), there exists a positive integer \(n_{0}(\eta )\) such that for every \(n \ge n_{0}(\eta )\), \(R_{(\mathbb {Z}^{+})^{k},h}(n) < n^{\eta }\). Then for every \(\varepsilon > 0\), there exists a set \(A \subseteq (\mathbb {Z}^{+})^{k}\) such that \(R_{A,h}(n)\) is bounded and

$$\begin{aligned} A(x) \gg x^{\frac{1}{k}-\varepsilon } = x^{\min \{\frac{1}{k},\frac{1}{h}\}-\varepsilon }. \end{aligned}$$

If \(k \ge 2\) is even, then it is clear from [11, Theorem 7.6] that \(R^{*}_{(\mathbb {Z}^{+})^{k},2}(n) \le R^{*}_{(\mathbb {Z}^{+})^{2},2}(n) = n^{o(1)}\). If \(k \ge 2\) is odd, then \(a + b\) divides \(a^{k} + b^{k} = n\). Moreover, for every divisor d of n, there is at most one pair (ab), \(1 \le a < b\) such that \(a + b = d\) and \(a^{k} + b^{k} = n\) because the function \(f(x) = x^{k} + (d-x)^{k}\) is continuous and strictly decreasing for every \(0 \le x \le \frac{d}{2}\). It follows that \(R^{*}_{(\mathbb {Z}^{+})^{k},2}(n) \le d(n) = n^{o(1)}\), where d(n) is the number of positive divisors of n. As a corollary, we get that

Corollary 1

For every \(k \ge 2\), \(\varepsilon > 0\), there exists a set \(A \subseteq (\mathbb {Z}^{+})^{k}\) such that \(R_{A,2}(n)\) is bounded and

$$\begin{aligned} A(x) \gg x^{\frac{1}{k}-\varepsilon } = x^{\min \{\frac{1}{k},\frac{1}{2}\}-\varepsilon }. \end{aligned}$$

In contrast, we do not even know whether there exists \(A \subseteq (\mathbb {Z}^{+})^{2}\) such that \(R_{A,3}(n)\) is bounded and \(A(x) \gg x^{\frac{1}{3}-\varepsilon }\).

Problem 1

Is it true that for every \(\varepsilon > 0\) there exists a set \(A \subseteq (\mathbb {Z}^{+})^{2}\) such that \(A(x) \gg x^{\frac{1}{3}-\varepsilon }\) and \(R_{A,3}(n)\) is bounded?

Theorem 3

For every \(k \ge 2\), there exists a positive integer \(h_{0}(k) = O(8^{k}k^{2})\) such that for every \(h \ge h_{0}(k)\) and for every \(\varepsilon > 0\), there exists a set \(A \subseteq (\mathbb {Z}^{+})^{k}\) such that \(R_{A,h}(n)\) is bounded and

$$\begin{aligned} A(x) \gg x^{\frac{1}{h}-\varepsilon } = x^{\min \{\frac{1}{k},\frac{1}{h}\}-\varepsilon }. \end{aligned}$$

If f(x) and g(x) are real-valued functions, then we denote \(f(x) = O(g(x))\) by \(f(x) \ll g(x)\). Before we prove Theorems 2 and 3, we give a short survey of the probabilistic method we will use.

2 Probabilistic and combinatorial tools

The proofs of Theorems 2 and 3 are based on the probabilistic method due to Erdős and Rényi. There is an excellent summary of the probabilistic method in the Halberstam–Roth book [8]. First, we give a survey of the probabilistic tools and notations which we use in the proofs of Theorems 2 and 3. Let \(\Omega \) denote the set of the strictly increasing sequences of positive integers. In this paper, we denote the probability of an event E by \(\mathbb {P}(E)\) and the expectation of a random variable X by \(\mathbb {E}(X)\).

Lemma 1

Let

$$\begin{aligned} \alpha _{1}, \alpha _{2}, \alpha _{3} \ldots {} \end{aligned}$$

be real numbers satisfying

$$\begin{aligned} 0 \le \alpha _{n} \le 1 (n = 1, 2, \ldots {}). \end{aligned}$$

Then there exists a probability space (\(\Omega \), \(\mathcal {S}\), \(\mathbb {P}\)) with the following two properties:

  1. (i)

    For every natural number n, the event \(E^{(n)} = \{A\): \(A \in \Omega \), \(n \in A\}\) is measurable, and \(\mathbb {P}(E^{(n)}) = \alpha _{n}\).

  2. (ii)

    The events \(E^{(1)}\), \(E^{(2)},\ldots \) are independent.

See Theorem 13 in [8, p. 142]. We denote the characteristic function of the event \(E^{(n)}\) by

$$\begin{aligned} \varrho (A, n) = \left\{ \begin{array}{ll} 1 &{} \quad \text {if}\; n \in A \\ 0,&{} \quad \text {if} \;n \notin A. \end{array} \right. \end{aligned}$$

Thus

$$\begin{aligned} A(n) = \sum _{j=1}^{n}\varrho (A, j). \end{aligned}$$
(1)

Furthermore, we denote the number of solutions of \(a_{i_{1}} + a_{i_{2}} + \cdots {} + a_{i_{h}} = n\) by \(R_{A,h}(n)\), where \(a_{i_{1}} \in A\), \(a_{i_{2}} \in A\), ...,\(a_{i_{h}} \in A\), \(1 \le a_{i_{1}}< a_{i_{2}} \ldots {}< a_{i_{h}} < n\). Thus

$$\begin{aligned} R_{A,h}(n) = \sum _{\begin{array}{c} {{a_{1}, a_{2}, \ldots {}, a_{h} \in \mathbb {N}}}\\ {1 \le a_{1}< \cdots {}< a_{h} < n}\\ {a_{1} + a_{2} + \cdots {} + a_{h} = n} \end{array}}\varrho (A, a_{1})\varrho (A, a_{2}) \ldots {} \varrho (A, a_{h}). \end{aligned}$$
(2)

We will use the following special case of Chernoff’s inequality (Corollary 1.9. in [1]).

Lemma 2

If \(t_i\)’s are independent Boolean random variables (i.e., every \(t_{i} \in \{0,1\}\)) and \(X = t_1 + \cdots {} + t_n\), then for any \(\delta > 0\), we have

$$\begin{aligned} \mathbb {P}\big (|X - \mathbb {E}(X)| \ge \delta \mathbb {E}(X)\big ) \le 2e^{-\min (\delta ^{2}/4, \delta /2)\mathbb {E}(X)}. \end{aligned}$$

The following lemma is called Borel–Cantelli lemma.

Lemma 3

Let (\(\mathcal {X}\), \(\mathcal {S}\), \(\mathbb {P}\)) be a probability space and let \(F_{1}\), \(F_{2}\), ... be a sequence of measurable events. If

$$\begin{aligned} \sum _{j=1}^{+\infty }\mathbb {P}(F_{j}) < +\infty , \end{aligned}$$

then with probability 1, at most a finite number of the events \(F_{j}\) can occur.

See [8], p. 135. Finally, we need the following combinatorial result due to Erdős and Rado, see [4]. Let \(r \ge 3\) be a positive integer. A collection of sets \(A_{1}, A_{2}, \ldots {} A_{r}\) forms a \(\Delta \)-system if the sets have pairwise the same intersection (i.e., if \(A_{i} \cap A_{j} = A_{k} \cap A_{l}\) for all \(i\ne j\), \(k\ne l\)).

Lemma 4

If H is a collection of sets of size at most k and \(|H| > (r - 1)^{k}k!\) then H contains r sets forming a \(\Delta \)-system.

3 Proof of Theorem 2

In the first step, we prove that for any random set A, if the expectation of \(R_{A,h}(n)\) is small, then it is almost always bounded.

Lemma 5

Let \(h \ge 2\) and \(\varepsilon > 0\). Consider a random set \(A \subset \mathbb {Z}^{+}\) defined by \(\alpha _{n} = \mathbb {P}(n\in A)\). If \(\mathbb {E}(R_{A,l}(n)) \ll n^{-\varepsilon }\) for every \(2 \le l \le h\), then \(R_{A,h}(n)\) is bounded with probability 1.

Proof

We show similarly as in [6] that with probability 1, \(R_{A,h}(n)\) is bounded by a constant. For each representation \(y_{1} + \cdots {} + y_{h} = n\), \(y_{1}< \cdots {} < y_{h}\), \(y_{1}, \ldots ,y_{h} \in \mathbb {Z}^{+}\), we assign a set \(S = \{y_{1}, \ldots ,y_{h}\}\). We say two representations \(y_{1} + \cdots {} + y_{h} = z_{1} + \cdots {} + z_{h} = n\) are disjoint if the assigned sets \(S_{1} = \{y_{1}, \ldots ,y_{h}\}\) and \(S_{2} = \{z_{1}, \ldots ,z_{h}\}\) are disjoint.

For \(2 \le l \le h\) and a set of positive integers B, let \(f_{B,l}(n) = f_{l}(n)\) denote the maximum number of pairwise disjoint representations of n as the sum of l distinct terms from B. Let

$$\begin{aligned} \mathcal {B} = \{(y_{1}, \ldots {}, y_{l})\in \mathbb {Z}^{+}\times \cdots {} \times \mathbb {Z}^{+}: y_{1} + \cdots {} + y_{l} = n, 1 \le y_{1}< \cdots< y_{l} < n\}, \end{aligned}$$

and let \(H(\mathcal {B}) = \{\mathcal {T} \subset \mathcal {B}\): all the \(S \in \mathcal {T}\) are pairwise disjoint\(\}\). It is clear that the pairwise disjointness of the sets implies the independence of the associated events, i.e., if \(S_1\) and \(S_2\) are pairwise disjoint representations, the events \(S_1 \subset A\), \(S_2 \subset A\) are independent. On the other hand, for a fixed \(2 \le l \le h\), let \(E_{n}\) denote the event

$$\begin{aligned} E_{n} = \{A: A \in \Omega , f_{A,l}(n) > G\} \end{aligned}$$

for some G and write

$$\begin{aligned} \mathcal {F} = \Omega \setminus \bigcap _{i=1}^{+\infty }\Big (\bigcup _{n=i}^{+\infty }E_{n}\Big ). \end{aligned}$$

As a result, we see that \(A \in \mathcal {F}\) if and only if there exists a number \(n_{1} = n_{1}(A)\) such that we have

$$\begin{aligned} f_{A,l}(n) \le G for n \ge n_{1}. \end{aligned}$$

We will prove that \(\mathbb {P}(\mathcal {F}) = 1\) if \(G = \left\lceil \frac{1}{\varepsilon }\right\rceil + 1\). Clearly,

$$\begin{aligned} \mathbb {P}(f_{l}(n) > G)\le & {} \mathbb {P}\Big (\bigcup _{\overset{\mathcal {T} \in H(\mathcal {B})}{|\mathcal {T}| = G+1}}\bigcap _{S \in \mathcal {T}}(S\subset A)\Big ) \le \sum _{\overset{\mathcal {T} \in H(\mathcal {B})}{|\mathcal {T}| = G+1}}\mathbb {P}\Big (\bigcap _{S \in \mathcal {T}}(S\subset A)\Big ) \\&\quad = \sum _{\begin{array}{c} {(S_{1}, \ldots ,S_{G+1})}\\ {Pairwise} \\ {disjoint} \end{array}}\mathbb {P}((S_{1}\subset A) \cap \cdots {} \cap (S_{G+1}\subset A)) \\&\quad = \sum _{\begin{array}{c} {(S_{1}, \ldots {} ,S_{G+1})}\\ {Pairwise} \\ {disjoint} \end{array}}\mathbb {P}((S_{1}\subset A)\cdots {} \mathbb {P}(S_{G+1}\subset A))\\&\quad \le \left( \sum _{S\in \mathcal {B}}\mathbb {P}(S\subset A)\right) ^{G+1} = \left( \mathbb {E}(R_{A,l}(n))\right) ^{G+1} \ll n^{-(G+1)\varepsilon } \ll n^{-1-\varepsilon }. \end{aligned}$$

Using the Borel–Cantelli lemma, it follows that with probability 1, for \(2 \le l \le h\), there exists an \(n_{l}\) such that

$$\begin{aligned} f_{l}(n) \le G for n > n_{l}. \end{aligned}$$

On the other hand, for any \(n \le n_{l}\), there are at most a finite number of representations of n as a sum of l terms. Therefore, almost always for \(2 \le l \le h\), there exists a \(c_{l}\) such that for every n, \(f_{l}(n) < c_{l}\). Then \(c_{\max } = \max _{l}\{c_{l}\}\) exists with probability 1. Now we show similarly as in [6] that almost always there exists a constant \(c = c(A)\) such that for every n, \(R_{A,h}(n) < c\). Suppose that for some positive integer m,

$$\begin{aligned} R_{A,h}(m) > (c_{\max })^{h}h! \end{aligned}$$

with positive probability. Let H be the set of representations of m as the sum of h distinct terms from A. Then \(|H| = R_{A,h}(m) > (c_{\max })^{h}h!\), thus by Lemma 4, H contains a \(\Delta \)-system \(\{S_{1}, \ldots ,S_{c_{\max } +1}\}\). If \(S_{1} \cap \cdots {} \cap S_{c_{\max } +1} = \emptyset \), then \(S_{1}, \ldots {} ,S_{c_{\max } +1}\) form a family of disjoint representations of m as the sum of h terms, which contradicts the definition of \(c_{\max }\). Otherwise let \(S_{1} \cap \cdots {} \cap S_{c_{\max } +1} = \{x_{1}, \ldots ,x_{r}\} = S\), where \(0< r < h - 1\). If \(\sum _{i=1}^{r}x_{i} = t\), then \(S_{1}\setminus S, \ldots {} ,S_{c_{\max } +1}\setminus S\) form a family of disjoint representations of \(m - t\) as the sum of \(h - r\) terms. It follows that \(f_{h-r}(m-t) \ge c_{\max } + 1 > c_{h-r}\), which is impossible because of the definition of \(c_{\max }\). As a result, we see that \(R_{A,h}(m) \le (c_{\max })^{h}h!\), which implies that \(R_{A,h}(n)\) is bounded with probability 1. \(\square \)

Remark 1

It follows from the proof of Lemma 5 that the representation function \(R_{A,h}(n)\) is bounded if and only if \(f_{l}(n)\) is bounded for every \(2 \le l \le h\).

Lemma 6

Consider a random set \(A \subset \mathbb {Z}^{+}\) defined by \(\alpha _{n} = \mathbb {P}(n\in A)\). If

$$\begin{aligned} \lim _{x \rightarrow \infty }\frac{\mathbb {E}(A(x))}{\log x} = +\infty , \end{aligned}$$

then \(A(x) \sim \mathbb {E}(A(x))\), with probability 1.

Proof

It is clear from (1) that A(x) is the sum of independent Boolean random variables. Let

$$\begin{aligned} \delta = \sqrt{\frac{8\log x}{\mathbb {E}(A(x))}}. \end{aligned}$$

Since \(\delta < 2\) (so \(\delta ^{2} < 4\)) if x is large enough, thus it follows from Lemma 2 that

$$\begin{aligned}&\mathbb {P}\left( |A(x) - \mathbb {E}(A(x))| \ge \sqrt{\frac{8\log x}{\mathbb {E}(A(x))}}\cdot \mathbb {E}(A(x))\right) \\&\quad \le 2\exp \left( -\frac{1}{4}\cdot \frac{8\log x}{\mathbb {E}(A(x))}\cdot \mathbb {E}(A(x)) \right) = \frac{2}{x^{2}}. \end{aligned}$$

Since \(\sum _{x=1}^{\infty }\frac{2}{x^{2}}\) converges, by the Borel– Cantelli lemma, we have

$$\begin{aligned} |A(x) - \mathbb {E}(A(x))| < \sqrt{\frac{8\log x}{\mathbb {E}(A(x))}}\cdot \mathbb {E}(A(x)) \end{aligned}$$

with probability 1, for every x large enough. Since

$$\begin{aligned} \sqrt{\frac{8\log x}{\mathbb {E}(A(x))}} = o(1), \end{aligned}$$

as \(x\rightarrow \infty \), the statement of Lemma 6 follows. \(\square \)

Now we are ready to prove Theorem 2. In the first step, we show that for every \(2 \le l \le h\) and \(0< \kappa < \frac{1}{k}\), there exists an \(n_{0}(\kappa ,l)\) such that

$$\begin{aligned} R_{(\mathbb {Z}^{+})^{k},l}(n) < n^{\kappa } \end{aligned}$$
(3)

for every \(n > n_{0}(\kappa ,l)\). We prove it by contradiction. Suppose that there exists a constant \(c > 0\) and a \(0< \kappa < \frac{1}{k}\) such that

$$\begin{aligned} R_{(\mathbb {Z}^{+})^{k},l}(n) > cn^{\kappa } \end{aligned}$$

for infinitely many n. Pick a large n and consider different representations \(n = a^{k}_{1,1} + a^{k}_{1,2} + \cdots {} + a^{k}_{1,l}, n = a^{k}_{2,1} + a^{k}_{2,2} + \cdots {} + a^{k}_{2,l}, \ldots {} ,n = a^{k}_{u,1} + a^{k}_{u,2} + \cdots {} + a^{k}_{u,l}\) where \(a_{i,1}< a_{i,2}< \cdots {} < a_{i,l}\) positive integers for every \(1 \le i \le u\), where \(\lfloor cn^{\kappa } \rfloor + 1 = u\). Then there exist \(1 \le b_{1}< b_{2}< \cdots {} < b_{h-l} \le n^{1/k}\) positive integers such that \(b_{v} \ne a_{i,j}\) for every \(1 \le v \le h-l\) and \(1 \le i \le u\), \(1 \le j \le l\). Then, we have

$$\begin{aligned} R_{(\mathbb {Z}^{+})^{k},h}(n+b^{k}_{1} + \cdots {} + b^{k}_{h-l}) \ge cn^{\kappa } = \frac{c}{h^{\kappa }}(nh)^{\kappa } \ge \frac{c}{h^{\kappa }}(n+b^{k}_{1} + \cdots {} + b^{k}_{h-l})^{\kappa }. \end{aligned}$$

If we denote \(m = n + b^{k}_{1} + \cdots {} + b^{k}_{h-l}\), then \(R_{(\mathbb {Z}^{+})^{k},h}(m) > \frac{c}{h^{\kappa }}m^{\kappa }\) for infinitely many m. It follows that there exists infinitely many m such that

$$\begin{aligned} R_{(\mathbb {Z}^{+})^{k},h}(m) > m^{\kappa /2}. \end{aligned}$$

In view of the hypothesis in Theorem 2, we get a contradiction.

Next, for an \(\varepsilon > 0\), we define the random set A by

$$\begin{aligned} \mathbb {P}(n \in A) = \left\{ \begin{array}{ll} \frac{1}{n^{\varepsilon }}, &{} \quad \text {if}\; n \in (\mathbb {Z}^{+})^{k}\\ 0, &{} \quad \text {if}\; n \notin (\mathbb {Z}^{+})^{k}. \end{array}\right. \end{aligned}$$

Then, in view of (2) and since the events \(i\in A\) and \(j\in A\) are independent for all \(i \ne j\), for every \(2 \le l \le h\), we have

$$\begin{aligned} \mathbb {E}(R_{A,l}(n)) = \sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {}< x_{l}} \\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\mathbb {P}(x_{1}, \ldots {}, x_{l} \in A) = \sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {} < x_{l}} \\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\frac{1}{(x_{1}\ldots {} x_{l})^{\varepsilon }} \end{aligned}$$

Moreover, \(x_{l} \ge \frac{n}{l}\), so

$$\begin{aligned} \mathbb {E}(R_{A,l}(n))\ll \frac{1}{n^{\varepsilon }}\cdot R_{(\mathbb {Z}^{+})^{k},l}(n) \ll \frac{1}{n^{\varepsilon }}\cdot n^{\varepsilon /2} = n^{-\varepsilon /2}, \end{aligned}$$

where the last inequality comes from (3). It follows from Lemma 5 that \(R_{A,h}(n)\) is almost always bounded. In the next step, we prove that A is as dense as desired. Applying the Euler–Maclaurin integral formula,

$$\begin{aligned} \mathbb {E}(A(x)) = \sum _{m \le x^{1/k}}\frac{1}{(m^{k})^{\varepsilon }} = \int _{0}^{x^{1/k}}t^{-k\varepsilon }\mathrm{d}t + O(1) = \frac{1}{1-k\varepsilon }x^{\frac{1}{k}-\varepsilon } + O(1). \end{aligned}$$

By Lemma 6, assuming \(\varepsilon < \frac{1}{k}\) we get that

$$\begin{aligned} A(x) \gg x^{\frac{1}{k}-\varepsilon } \end{aligned}$$

with probability 1. The proof of Theorem 2 is completed.

4 Proof of Theorem 3

As \(h > k\), we define the random set A by

$$\begin{aligned} \mathbb {P}(n \in A) = \left\{ \begin{array}{ll} \frac{1}{n^{\frac{1}{k}-\frac{1}{h}+\varepsilon }}, &{}\quad \text {if}\; n \in (\mathbb {Z}^{+})^{k},\\ 0, &{}\quad \text {if}\;n \notin (\mathbb {Z}^{+})^{k}. \end{array} \right. \end{aligned}$$

First, for every \(2 \le l \le h\), we give an upper estimation to \(\mathbb {E}(R_{A,l}(n))\), where

$$\begin{aligned} \mathbb {E}(R_{A,l}(n)) = \sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {} < x_{l}} \\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\mathbb {P}(x_{1}, \ldots {}, x_{l} \in A). \end{aligned}$$
(4)

We prove that there exists \(h_{1}(k)\) such that for \(h \ge h_{1}(k)\), \(l \le h\), we have

$$\begin{aligned} \sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {} < x_{l}} \\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\mathbb {P}(x_{1}, \ldots {}, x_{l} \in A) \ll n^{-\varepsilon }. \end{aligned}$$

Assume that \(l \le \frac{h}{k}\). Then, we have

$$\begin{aligned} \sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {}< x_{l}} \\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\mathbb {P}(x_{1}, \ldots {}, x_{l} \in A) = \sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {} < x_{l}}\\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\frac{1}{(x_{1} \cdots {} x_{l})^{\frac{1}{k}-\frac{1}{h}+\varepsilon }}. \end{aligned}$$

Since \(x_{l} \ge \frac{n}{l}\), we may therefore calculate

$$\begin{aligned} \sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {} < x_{l}}\\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\frac{1}{(x_{1} \cdots {} x_{l})^{\frac{1}{k}-\frac{1}{h}+\varepsilon }}\ll & {} n^{-\left( \frac{1}{k}-\frac{1}{h}+\varepsilon \right) }\left( \sum _{x=1}^{n^{1/k}} \frac{1}{x^{k(\frac{1}{k}-\frac{1}{h}+\varepsilon )}}\right) ^{l-1}\\&\quad = n^{-\frac{1}{k}+\frac{1}{h}-\varepsilon }\left( \sum _{x=1}^{n^{1/k}}\frac{1}{x^{1-\frac{k}{h}+k\varepsilon }}\right) ^{l-1}. \end{aligned}$$

Then, on applying the assumption \(\frac{l}{h} \le \frac{1}{k}\), we find via Euler–Maclaurin integral formula that

$$\begin{aligned} \sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {} < x_{l}}\\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\frac{1}{(x_{1} \cdots {} x_{l})^{\frac{1}{k}-\frac{1}{h}+\varepsilon }}\ll & {} n^{-\frac{1}{k}+\frac{1}{h}-\varepsilon +(l-1)(\frac{k}{h}-k\varepsilon )\frac{1}{k}} = n^{-\frac{1}{k}+\frac{1}{h}-\varepsilon +(l-1)(\frac{1}{h}-\varepsilon )} \\&\qquad = n^{-\frac{1}{k}+\frac{1}{h}-\varepsilon +\frac{l}{h}-\frac{1}{h}+\varepsilon -l\varepsilon } \ll n^{-l\varepsilon } \ll n^{-\varepsilon }. \end{aligned}$$

Now we assume that \(\frac{h}{k} < l \le h\). Then, we have

$$\begin{aligned}&\sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {}< x_{l}}\\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\mathbb {P}(x_{1}, \ldots {}, x_{l} \in A) = \sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {}< x_{l}}\\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\frac{1}{(x_{1} \cdots {} x_{l})^{\frac{1}{k}-\frac{1}{h}+\varepsilon }} \\&\quad \le \sum _{\begin{array}{c} {x_{1}, \ldots {} ,x_{l}\in (\mathbb {Z}^{+})^{k}}\\ {1 \le x_{1}< \cdots {}< x_{l}}\\ {x_{1} + \cdots {} + x_{l} = n} \end{array}}\frac{1}{(x_{1} \cdots {} x_{l})^{\frac{1}{k}-\frac{1}{l}+\varepsilon }} = \sum _{\begin{array}{c} {y_{1}, \ldots ,y_{l}\in \mathbb {Z}^{+}}\\ {1 \le y_{1}< \cdots {} < y_{l}}\\ {y_{1}^{k} + \cdots {} + y_{l}^{k} = n} \end{array}}\frac{1}{(y_{1} \cdots {} y_{l})^{1-\frac{k}{l}+k\varepsilon }}. \end{aligned}$$

We need the following lemma, which is a weaker version of a lemma of Vu [17, Lemma 2.1].

Lemma 7

For a fixed \(k \ge 2\), there exists a constant \(h_{2}(k) = O(8^{k}k^{2})\) such that for any \(l \ge h_{2}(k)\) and for every \(P_{1}, \ldots , P_{l} \in \mathbb {Z}^{+}\), we have

$$\begin{aligned} |\{(y_{1},\ldots ,y_{l}): y_{i} \in \mathbb {Z}^{+}, y_{i} \le P_{i}, y_{1}^{k} + \cdots {} + y_{l}^{k} = n\}| \ll \frac{1}{n}\prod _{i=1}^{l}P_{i} + \left( \prod _{i=1}^{l}P_{i}\right) ^{1-\frac{k}{l}}. \end{aligned}$$

By Lemma 7, one has

$$\begin{aligned} \sum _{\begin{array}{c} {y_{1}, \ldots {} ,y_{l}\in \mathbb {Z}^{+}}\\ {\frac{P_{i}}{2} < y_{i} \le P_{i}}\\ {y_{1}^{k} + \cdots {} + y_{l}^{k} = n} \end{array}}\frac{1}{(y_{1} \cdots {} y_{l})^{1-\frac{k}{l}+k\varepsilon }}\ll & {} \left( \frac{1}{n}\prod _{i=1}^{l}P_{i} + \left( \prod _{i=1}^{l}P_{i}\right) ^{1-\frac{k}{l}}\right) \left( \prod _{i=1}^{l}P_{i}\right) ^{-1+\frac{k}{l}-k\varepsilon } \\\ll & {} \frac{1}{n}\left( \prod _{i=1}^{l}P_{i}\right) ^{\frac{k}{l}-k\varepsilon } + \left( \prod _{i=1}^{l}P_{i}\right) ^{-k\varepsilon }. \end{aligned}$$

Let \((P_{1}, \ldots ,P_{l}) = (2^{i_{1}}, \ldots ,2^{i_{l}})\), where \(0 \le i_{1} \le i_{2} \le \cdots {} \le i_{l}\). If \(1 \le y_{1}< y_{2}< \cdots {} < y_{l}\), \(\sum _{i=1}^{l}y_{i}^{k} = n\), then obviously

$$\begin{aligned} \left( \frac{n}{l}\right) ^{1/k} \le y_{l} \le n^{1/k}. \end{aligned}$$

Consequently, since \(y_l > 2^{i_l-1}\),

$$\begin{aligned} i_{l} \le \frac{1}{k}\log _{2}n + 1. \end{aligned}$$
(5)

Then, by Lemma 7, we have

$$\begin{aligned}&Q = \sum _{\overset{(i_{1}, \ldots ,i_{l})}{0 \le i_{1} \le \cdots {} \le i_{l}}} \sum _{\begin{array}{c} {(y_{1},\ldots ,y_{l})}\\ {1 \le y_{1}< \cdots {}< y_{l}}\\ {2^{i_{j}-1} < y_{j} \le 2^{i_{j}}}\\ {y_{1}^{k} + \cdots {} + y_{l}^{k} = n} \end{array}}\frac{1}{(y_{1} \cdots {} y_{l})^{1-\frac{k}{l}+k\varepsilon }} \ll \sum _{\overset{(i_{1}, \ldots {} ,i_{l})}{0 \le i_{1} \le \cdots {} \le i_{l}}}\frac{1}{n}\left( \prod _{j=1}^{l}2^{i_{j}}\right) ^{\frac{k}{l}-k\varepsilon }\\&\qquad + \sum _{\overset{(i_{1},\ldots ,i_{l})}{0 \le i_{1} \le \cdots {} \le i_{l}}}\left( \prod _{j=1}^{l}2^{i_{j}}\right) ^{-k\varepsilon } = Q_{1} + Q_{2}. \end{aligned}$$

In the first step, we estimate \(Q_{1}\). By using (5), we have

$$\begin{aligned} Q_{1}= & {} \sum _{\overset{(i_{1},\ldots ,i_{l})}{0 \le i_{1} \le \cdots {} \le i_{l}}}\frac{1}{n}\left( \prod _{j=1}^{l}2^{i_{j}}\right) ^{\frac{k}{l}-k\varepsilon } \le \frac{1}{n}\left( \prod _{j=1}^{l}\left( \sum _{i_{j}=1}^{\lfloor \frac{1}{k}\log _{2}n + 1 \rfloor }2^{i_{j}}\right) \right) ^{\frac{k}{l}-k\varepsilon } \\= & {} \frac{1}{n}\left( \prod _{j=1}^{l}\left( 2^{\lfloor \frac{1}{k}\log _{2}n + 1 \rfloor + 1} - 2\right) \right) ^{\frac{k}{l}-k\varepsilon } = \frac{1}{n}\left( \left( 2^{\lfloor \frac{1}{k}\log _{2}n + 1 \rfloor +1} - 2\right) ^{l}\right) ^{\frac{k}{l}-k\varepsilon } \\\ll & {} \frac{1}{n}((n^{1/k})^{l})^{\frac{k}{l}-k\varepsilon } = n^{-l\varepsilon } \ll n^{-\varepsilon }. \end{aligned}$$

Next, we estimate \(Q_{2}\). Using also (5) we get that

$$\begin{aligned} Q_{2}= & {} \sum _{\overset{(i_{1},\ldots ,i_{l})}{0 \le i_{1} \le \cdots {} \le i_{l}}}\left( \prod _{j=1}^{l}2^{i_{j}}\right) ^{-k\varepsilon } \le \left( \prod _{j=1}^{l}\left( \sum _{i_{j}=1}^{\lfloor \frac{1}{k}\log _{2}n + 1 \rfloor }2^{i_{j}}\right) \right) ^{-k\varepsilon } \\= & {} \left( \prod _{j=1}^{l}\left( 2^{\lfloor \frac{1}{k}\log _{2}n + 1 \rfloor +1} - 2\right) \right) ^{-k\varepsilon } = \left( \left( 2^{\lfloor \frac{1}{k}\log _{2}n + 1 \rfloor +1} - 2\right) ^{l}\right) ^{-k\varepsilon } \\\ll & {} ((n^{l/k})^{l})^{-k\varepsilon } = n^{-l\varepsilon } \ll n^{-\varepsilon }. \end{aligned}$$

Grouping these estimates together,

$$\begin{aligned} Q = Q_{1} + Q_{2} \ll n^{-\varepsilon }. \end{aligned}$$

Returning to (4) we now have the estimation

$$\begin{aligned} \mathbb {E}(R_{A,l}(n)) \ll n^{-\varepsilon }. \end{aligned}$$

It follows from Lemma 6 that, with probability 1, \(R_{A,h}(n)\) is bounded. On the other hand, by using the Euler–Maclaurin formula,

$$\begin{aligned} \mathbb {E}(A(x))= & {} \sum _{m \le x^{1/k}}\frac{1}{(m^{k})^{\frac{1}{k}-\frac{1}{h}+\varepsilon }} = \int _{0}^{x^{1/k}}t^{-1+\frac{k}{h}-k\varepsilon }\mathrm{d}t + O(1) \\= & {} \frac{1}{\frac{k}{h}-k\varepsilon }x^{\frac{1}{h}-\varepsilon } + O(1), \end{aligned}$$

which implies that \(A(x) \gg x^{\frac{1}{h}-\varepsilon }\) with probability 1.

Remark

One might like to generalize Theorems 2 and 3 to \(B_{h}[g]\) sets, i.e., to prove the existence of a set A formed by perfect powers such that \(R^{*}_{A,h}(n) \le g\) for some g and A is as dense as possible. To do this, one needs a generalization of Lemmas 5 and 7 for the number of representations of n as linear forms like \(b_{1}x_{1} + \cdots {} + b_{s}x_{s} = n\). Lemma 5 can be extended to linear forms but the generalization of Lemma 7 seems more complicated.