1 Introduction

Assume we observe a sample of \(n\) i.i.d. random variables \(X_i, i=1,\ldots , n\), with uniform distribution on some subset \(G\) of \({\mathbb {R}}^d, d\ge 2\). We are interested in the problem of estimation of \(G\). In particular, this problem is of interest in detection of abnormal behavior, cf. [1]. In image recovering, when an object is only partially observed, e.g. if only some pixels are available, one would like to recover the object as accurately as possible. When \(G\) is known to be compact and convex, the convex hull of the sample is quite a natural estimator. The properties of this random subset of \({\mathbb {R}}^d\) have been extensively studied since the early 1960’s, from a geometric and probabilistic prospective. The very original question associated to this object was the famous Sylvester four-point problem: what is the probability that one of the four points chosen at random in the plane is inside the triangle formed by the three others? We refer to [2] for a historical survey and extensions of Sylvester problem. Of course, this question is not well posed, since the answer should depend on the probability measure of those four points, and the many answers that were proposed, in the late 18th century, accompanied the birth of a new field: stochastic geometry. Rényi and Sulanke [3, 4] studied some basic properties of the convex hull of \(X_i, i=1,\ldots , n\) when \(G\) is a compact and convex subset of the plane (\(d=2\)). More specifically, if this convex hull is denoted by \(CH(X_1,\ldots ,X_n)\), its number of vertices by \(V_n\) and its missing area \(|G\backslash CH(X_1,\ldots ,X_n)|\) by \(A_n\), they investigated the asymptotic behavior of the expectations \({\mathbb {E}}[V_n]\) and \({\mathbb {E}}[A_n]\). Their results are highly dependent on the structure of the boundary of \(G\). The expected number of vertices is of the order \(n^{1/3}\) when the boundary of \(G\) is smooth enough, and \(r\ln n\) when \(G\) is a convex polygon with \(r\) vertices, \(r\ge 3\). The expected missing area is of the order \(n^{-2/3}\) in the first case and, if \(G\) is a square, it is of the order \((\ln n)/n\). May the square be arbitrarily large or small, only the constants and not the rates are affected, by a scale factor. Rényi and Sulanke [3, 4] provided asymptotic evaluations of these expectations with the explicit constants, up to two or three terms. In 1965, Efron [5] showed a very simple equality which connects the expected value of the number of vertices \(V_{n+1}\) and that of the missing area \(A_n\). Namely, if \(|G|\) stands for the area of \(G\), one has

$$\begin{aligned} {\mathbb {E}}[A_n]=\frac{|G|{\mathbb {E}}[V_{n+1}]}{n+1}, \end{aligned}$$
(1)

independently of the structure of the boundary of \(G\). In particular, (1) allows to extend the results of [3, 4] about the missing area to any convex polygon with \(r\) vertices. If \(G\) is such a polygon, \({\mathbb {E}}[A_n]\) is of the order \(r(\ln n)/n\), up to a factor of the form \(c|G|\), where \(c\) is positive and does not depend on \(r\) or \(G\). More recently, many efforts were made to extend these results to dimensions 3 and more. We refer to [6–8] and the references therein. Notably, Efron’s identity (1) holds in any dimension if \(G\subseteq {\mathbb {R}}^d\) is a compact and convex set and \(|G|\) is its Lebesgue measure.

Bàràny and Larman [9] (see [10] for a review) proposed a generalization of these results with no assumption on the structure of the boundary of \(G\). They considered the \(\varepsilon \)-wet part of \(G\), denoted by \(G(\varepsilon )\) and defined as the union of all the caps of \(G\) of volume \(\varepsilon |G|\), where a cap is the intersection of \(G\) with a half space. Here, \(0\le \varepsilon \le 1\). This notion, together with that of floating body [defined as \(G\backslash G(\varepsilon )\)] had been introduced by Dupin [11] and, later, by Blaschke [12]. In [9], the authors prove that the expected missing volume of the convex hull of independent random points uniformly distributed in a convex body is of the order of the volume of the \(1/n\)-wet part. Then the problem of computing this expected missing volume becomes analytical, and learning about its asymptotics reduces to analyzing the properties of the wet part, which have been studied extensively in convex analysis and geometry. In particular we refer to [13–16] and the references therein. In particular, it was shown that if the boundary of the convex body \(G\) is smooth enough, then the expected missing volume is of the order \(n^{-2/(d+1)}\), and if \(G\) is a polytope, the order is \((\ln n)^{d-1}n^{-1}\).

All these works were developed in a geometric and probabilistic prospective. No efforts were made at this stage to understand whether the convex hull estimator is optimal if seen as an estimator of the set \(G\). Only in the 1990’s, this question was invoked in the statistical literature. The problem of estimation of the support of a density had already been investigated, under more general assumptions (see [1, 17], and the references therein). In particular, mainly consistency of the estimators was shown, and there were very few results about speeds of convergence. Mammen and Tsybakov [18] showed that under some restrictions on the volume and location of \(G\), the convex hull is optimal in a minimax sense (see the next section for details). They did not work directly on this estimator, but instead, they defined a maximum likelihood estimator selected from an \(\epsilon \)-net (see [19]). Korostelev and Tsybakov [20] give a detailed account of the topic of set estimation. See also [21–25], for an overview of recent developments about estimation of the support of a probability measure. A different model was studied in [26], where we considered estimation of the support of the regression function. We built an estimator which achieves a speed of convergence of the order \((\ln n)/n\) when the support is a polytope in \({\mathbb {R}}^d,d\ge 2\). Moreover, we proved that no estimator can achieve a better speed of convergence, so the logarithmic factor cannot be dropped. Although our estimator depends on the knowledge of the number of vertices \(r\) of the true support of the regression function, we proposed an adaptive estimator, with respect to \(r\), which achieves the same speed as for the case of known \(r\).

In this paper, we deal with estimation of the support of a uniform distribution. To our knowledge, there are no results about optimality of the convex hull estimator when that support is a general convex set. In particular, when no assumptions on the volume, the location and the structure of the boundary are made, it is not known if a convex set can be uniformly consistently estimated. In addition, the case of convex polytopes has not been investigated. We focus on this case. Convex polytopes are, in some sense, the most simple convex sets. Intuitively, the convex hull estimator can be improved, and estimation of polytopes should be done with a faster speed of convergence. Indeed, a polytope with a given number of vertices is completely determined by the coordinates of its vertices, and therefore belongs to some parametric family.

The general idea of our work is to reconstruct a convex black image in a white background, from a discrete observation of it. However, our main concern is theoretical, and we do not take into account computational efficiency while defining our estimators.

This paper is organized as follows. In Sect. 2 we give all notation and definitions. In Sect. 3 we propose a new estimator of \(G\) when it is assumed to be a polytope and its number of vertices is known. We show that the risk of this estimator is better than that of the convex hull estimator, and achieves a rate independent of the dimension \(d\). In Sect. 4, we show that in the general case, if no other assumption than compactness, convexity and positive volume is made on \(G\), then the convex hull estimator is optimal in a minimax sense. In Sect. 5 we construct an estimator which is adaptive to the shape of the boundary of \(G\), i.e. which detects, in some sense, whether \(G\) is a polytope or not and, if yes, correctly estimates its number of vertices. Section 6 is devoted to the proofs.

2 Notation and definitions

Let \(d\ge 2\) a positive integer. Denote by \(\rho \) the Euclidean distance in \({\mathbb {R}}^d\) and by \(B_2^d\) the Euclidean unit closed ball in \({\mathbb {R}}^d\).

For brevity, we will a call convex body any compact and convex subset of \({\mathbb {R}}^d\) with positive volume, and we will call a polytope any compact convex polytope in \({\mathbb {R}}^d\) with positive volume. For an integer \(r\ge d+1\), denote by \({\mathcal {P}}_r\) the class of all convex polytopes in \([0,1]^d\) with at most \(r\) vertices. Denote also by \({\mathcal {K}}\) the class of all convex bodies in \({\mathbb {R}}^d\), and by \({\mathcal {K}}_1\) the class of all convex bodies in \([0,1]^d\).

If \(G\) is a closed subset of \({\mathbb {R}}^d\) and \(\epsilon \) is a positive number, we denote by \(G^\epsilon \) the set of all \(x\in {\mathbb {R}}^d\) such that \(\rho (x,G)\le \epsilon \) or, in other terms, \(G^\epsilon =G+\epsilon B_2^d\). If \(G\) is any set, \(I(\cdot \in G)\) stands for the indicator function of \(G\).

The Lebesgue measure on \({\mathbb {R}}^d\) is denoted by \(|\cdot |\) (for brevity, we do not indicate explicitly the dependence on \(d\)). If \(G\) is a measurable subset of \({\mathbb {R}}^d\), we denote respectively by \({\mathbb {P}}_G\) and \({\mathbb {E}}_G\) the probability measure of the uniform distribution on \(G\) and the corresponding expectation operator, and we still use the same notation for the \(n\)-product of this distribution if there is no possible confusion. When necessary, we add the superscript \(^{\otimes n}\) for the \(n\) product. We will use the same notation for the corresponding outer probability and expectation when it is necessary, to avoid measurability issues. The Nikodym pseudo distance between two measurable subsets \(G_1\) and \(G_2\) of \({\mathbb {R}}^d\) is defined as the Lebesgue measure of their symmetric difference, namely \(|G_1\triangle G_2|\).

A subset \(\hat{G}_n\) of \({\mathbb {R}}^d\), whose construction depends on the sample is called a set estimator or, more simply, an estimator.

Given an estimator \(\hat{G}_n\), we measure its accuracy on a given class of sets in a minimax framework. The risk of \(\hat{G}_n\) on a class \({\mathcal {C}}\) of Borel subsets of \({\mathbb {R}}^d\) is defined as

$$\begin{aligned} {\mathcal {R}}_n(\hat{G}_n ; {\mathcal {C}}) = \sup \limits _{G\in {\mathcal {C}}}{\mathbb {E}}_G[|G\triangle \hat{G}_n|]. \qquad \qquad \qquad \qquad \qquad \qquad \qquad (*) \end{aligned}$$

The rate (a sequence depending on \(n\)) of an estimator on a class \({\mathcal {C}}\) is the speed at which its risk converges to zero when the number \(n\) of available observations tends to infinity. For all the estimators defined in the sequel we will be interested in upper bounds on their risk, in order to get information about their rate. For a given class of subsets \({\mathcal {C}}\), the minimax risk on this class when \(n\) observations are available is defined as

$$\begin{aligned} {\mathcal {R}}_n({\mathcal {C}})=\inf _{\hat{G}_n} {\mathcal {R}}_n(\hat{G}_n ; {\mathcal {C}}), \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad (**) \end{aligned}$$

where the infimum is taken over all set estimators depending on \(n\) observations. If \({\mathcal {R}}_n({\mathcal {C}})\) converges to zero, we call the minimax rate of convergence on the class \({\mathcal {C}}\) the speed at which \({\mathcal {R}}_n({\mathcal {C}})\) tends to zero. For a given class \({\mathcal {C}}\) of subsets of \({\mathbb {R}}^d\), it is interesting to provide a lower bound for \({\mathcal {R}}_n({\mathcal {C}})\). By definition, no estimator can achieve a better rate on \({\mathcal {C}}\) than that of the lower bound. This bound gives also information on how close the risk of a given estimator is to the minimax risk. If the rate of the upper bound on the risk of an estimator matches the rate of the lower bound on the minimax risk on the class \({\mathcal {C}}\), then the estimator is said to have the minimax rate of convergence on this class.

For two quantities \(A\) and \(B\), and a parameter \(\vartheta \), which may be multidimensional, we will write \(A\lesssim _ \vartheta B\) (respectively \(A\gtrsim _\vartheta B\)) to say that for some constant positive constant \(c(\vartheta )\) which depends on \(\vartheta \) only, one has \(A\le c(\vartheta )B\) [respectively \(A\ge c(\vartheta )B\)]. If we put no subscript under the signs \(\lesssim \) or \(\gtrsim \), this means that the involved constant is universal, i.e. depends on no parameter.

3 Estimation of polytopes

3.1 Upper bound

Let \(r\ge d+1\) be a known integer. Assume that the underlying set \(G\), denoted by \(P\) in this section, is in \({\mathcal {P}}_r\). The likelihood of the model, seen as a function of the compact set \(G'\subseteq {\mathbb {R}}^d\), is defined as follows, provided that \(G'\) has a positive Lebesgue measure:

$$\begin{aligned} L(X_1,\ldots ,X_n,G')=\prod \limits _{i=1}^n \frac{I(X_i\in G')}{|G'|}. \end{aligned}$$

Therefore, \(\max _{G'\in {\mathcal {C}}}L(X_1,\ldots ,X_n,G')\), i.e., maximization of the likelihood over a given class \({\mathcal {C}}\) of candidates, is achieved when \(G'\) is of minimum Lebesgue measure among all sets of \({\mathcal {C}}\) containing all the sample points. When \({\mathcal {C}}\) is the class of all convex subsets of \({\mathbb {R}}^d\), the maximum likelihood estimator is unique, and it is the convex hull of the sample. As we discussed above, this estimator has been extensively studied. In particular, using Efron’s identity (1), it turns out that its expected number of vertices is of the order of \(r(\ln n)^{d-1}\). However, the unknown polytope \(P\) has no more than \(r\) vertices. Hence, it seems reasonable to restrict the estimator of \(P\) to have much less vertices; the class of all convex subsets of \({\mathbb {R}}^d\) is too large and we propose to maximize the likelihood over the smaller class \({\mathcal {P}}_r\).

Assume that there exists a polytope in \({\mathcal {P}}_r\) with the smallest volume among all polytopes of \({\mathcal {P}}_r\) containing all the sample points. Let \(\hat{P}_n^{(r)}\) be such a polytope, i.e.

$$\begin{aligned} \hat{P}_n^{(r)}\in \underset{P\in {\mathcal {P}}_r, X_i\in P, i=1,\ldots ,n}{\mathrm{argmin }} |P|. \end{aligned}$$
(2)

The existence of such a polytope is ensured by compactness arguments. Note that \(\hat{P}_n^{(r)}\) needs not be unique. The next theorem establishes an exponential deviation inequality for the estimator \(\hat{P}_n^{(r)}\).

Theorem 1

Let \(r\ge d+1\) be an integer, and \(n\ge 2\). Then,

$$\begin{aligned} \sup \limits _{P\in {\mathcal {P}}_r}{\mathbb {P}}_{P}\left[ n\left( |\hat{P}_n^{(r)} \triangle P|-\frac{4dr\ln n}{n}\right) \ge x\right] \lesssim _ d{\mathrm {e}}^{-x/2}, \forall x>0. \end{aligned}$$

From the deviation inequality of Theorem 1 one can easily derive that the risk of the estimator \(\hat{P}_n^{(r)}\) on the class \({\mathcal {P}}_r\) is of the order \(\frac{\ln n}{n}\). Indeed, we have the next corollary.

Corollary 1

Let the assumptions of Theorem 1 be satisfied. Then, for any positive number \(q\),

$$\begin{aligned} \sup \limits _{P\in {\mathcal {P}}_r} {\mathbb {E}}_{P}\left[ |\hat{P}_n^{(r)}\triangle P|^q\right] \lesssim _ {d,q}\left( \frac{r\ln n}{n}\right) ^q. \end{aligned}$$

Corollary 1 shows that the risk \({\mathcal {R}}_n(\hat{P}_n^{(r)} ; {\mathcal {P}}_r)\) of the estimator \(\hat{P}_n^{(r)}\) on the class \({\mathcal {P}}_r\) is bounded from above by \(\frac{r\ln n}{n}\), up to some positive constant which depends on \(d\) only. Therefore we have the following upper bound for the minimax risk on the class \({\mathcal {P}}_r\):

$$\begin{aligned} {\mathcal {R}}_n({\mathcal {P}}_r) \lesssim _ d \frac{r\ln n}{n}. \end{aligned}$$
(3)

It is now natural to ask whether the rate \(\frac{\ln n}{n}\) is minimax, i.e. whether it is possible to find a lower bound for \({\mathcal {R}}_n({\mathcal {P}}_r)\) which converges to zero at the rate \(\frac{\ln n}{n}\), or the logarithmic factor should be dropped. This question is discussed in the next subsection.

3.2 The logarithmic factor

We conjecture that the logarithmic factor can be removed in the upper bound of \({\mathcal {R}}_n({\mathcal {P}}_r), r\ge d+1\). Specifically, for the class of all convex polytopes with at most \(r\) vertices, not necessarily included in the square \([0,1]^d\), which we denote by \({\mathcal {P}}_r^{all}\), we conjecture that, for the normalized version of the risk,

$$\begin{aligned} {\mathcal {Q}}_n({\mathcal {P}}_r^{all})\lesssim _ d \frac{r}{n}. \end{aligned}$$

What motivates our intuition is Efron’s identity (1). Let us recall its proof, which is very easy, and instructive for our purposes. Let the underlying set \(G\) be a convex body in \({\mathbb {R}}^d\), denoted by \(K\), and let \(\hat{K}_n\) be the convex hull of the sample. Almost surely, \(\hat{K}_n\subseteq K\), so

$$\begin{aligned} {\mathbb {E}}_K^{\otimes n}[|\hat{K}_n\triangle K|]&= {\mathbb {E}}_K^{\otimes n}[|K\backslash \hat{K}_n|] \nonumber \\&= {\mathbb {E}}_K^{\otimes n}\left[ \int _K I(x\notin \hat{K}_n)dx\right] \nonumber \\&= |K|{\mathbb {E}}_K^{\otimes n}\left[ \frac{1}{|K|}\int _K I(x\notin \hat{K}_n)dx\right] \nonumber \\&= |K|{\mathbb {E}}_K^{\otimes n}\left[ {\mathbb {P}}_K[X\notin \hat{K}_n|X_1,\ldots ,X_n]\right] \!, \end{aligned}$$
(4)

where \(X\) is a random variable with uniform distribution on \(K\), and is independent of \(X_1,\ldots ,X_n\). We write \({\mathbb {P}}_K[\cdot |X_1,\ldots ,X_n]\) for the conditional distribution given \(X_1,\ldots ,X_n\). Set \(X_{n+1}=X\). For \(i=1,\ldots ,n+1\), we denote by \(\hat{K}_{n+1}^{-i}\) the convex hull of the sample \(X_1,\ldots ,X_{n+1}\) from which the \(i\)th variable \(X_i\) is withdrawn. Then \(\hat{K}_n=\hat{K}_{n+1}^{-(n+1)}\), and by continuing (4), and by using the symmetry of the sample,

$$\begin{aligned} {\mathbb {E}}_K^{\otimes n}[|\hat{K}_n\triangle K|]&= |K|{\mathbb {P}}_K^{\otimes n+1}[X_{n+1}\notin \hat{K}_{n+1}^{-(n+1)}] \nonumber \\&= \frac{|K|}{n+1}\sum \limits _{i=1}^{n+1}{\mathbb {P}}_K^{\otimes n+1}[X_i\notin \hat{K}_{n+1}^{-i}] \nonumber \\&= \frac{|K|}{n+1}\sum \limits _{i=1}^{n+1}{\mathbb {P}}_K^{\otimes n+1}[X_i\in V(\hat{K}_{n+1})], \end{aligned}$$
(5)

where \(V(\hat{K}_{n+1})\) is the set of vertices of \(\hat{K}_{n+1}=CH(X_1,\ldots ,X_{n+1})\). Indeed, with probability one, the point \(X_i\) is not in the convex hull of the \(n\) other points if and only if it is a vertex of the convex hull of the whole sample. By rewriting the probability of an event as the expectation of its indicator function, one gets from (5),

$$\begin{aligned} {\mathbb {E}}_K^{\otimes n}[|\hat{K}_n\triangle K|]&= \frac{|K|}{n+1}\sum \limits _{i=1}^{n+1}{\mathbb {E}}_K^{\otimes n+1}[I(X_i\in V(\hat{K}_{n+1}))] \\&= \frac{|K|}{n+1}{\mathbb {E}}_K^{\otimes n+1}\left[ \sum \limits _{i=1}^{n+1}I(X_i\in V(\hat{K}_{n+1}))\right] \\&= \frac{|K|{\mathbb {E}}_K^{\otimes n+1}[V_{n+1}]}{n+1}, \end{aligned}$$

where \(V_{n+1}\) denotes the cardinality of \(V(\hat{K}_{n+1})\), i.e. the number of vertices of the convex hull \(\hat{K}_{n+1}\). Efron’s equality is then proved.

It turns out that we can follow almost all the proof of this identity when the underlying set \(G\) is a polytope, and when we consider the estimator developed in Sect. 3.1. Let \(r\ge d+1\) be an integer and \(P\in {\mathcal {P}}_r^{all}\). Let \(\hat{P}_n^{(r)}\) be the estimator defined in (2), where \({\mathcal {P}}_r\) is replaced by \({\mathcal {P}}_r^{all}\). In this section, we denote this estimator simply by \(\hat{P}_n\). Note that this estimator does not satisfy the nice property \(\hat{P}_n\subseteq P\), unlike the convex hull. However, by construction, \(|\hat{P}_n|\le |P|\), so \(|P\triangle \hat{P}_n|\le 2|P\backslash \hat{P}_n|\), and we have:

$$\begin{aligned} {\mathbb {E}}_P^{\otimes n}[|\hat{P}_n\triangle P|]&\le 2{\mathbb {E}}_P^{\otimes n}[|P\backslash \hat{P}_n|] \nonumber \\&= 2|P|{\mathbb {E}}_P^{\otimes n}\left[ \frac{1}{|P|}\int _P I(x\notin \hat{P}_n)dx\right] \nonumber \\&= 2|P|{\mathbb {E}}_P^{\otimes n}\left[ {\mathbb {P}}_P[X\notin \hat{P}_n|X_1,\ldots ,X_n]\right] , \end{aligned}$$
(6)

where the random variable \(X\) has the uniform distribution on \(P\) and is independent of the sample \(X_1,\ldots ,X_n\). Again, we write \({\mathbb {P}}_P[\cdot |X_1,\ldots ,X_n]\) for the conditional distribution of \(X\) given \(X_1,\ldots ,X_n\) and we set \(X_{n+1}=X\). For \(i=1,\ldots ,n+1\), we denote by \(\hat{P}_{n+1}^{-i}\) the same estimator as \(\hat{P}_n\), based on the sample \(X_1,\ldots ,X_{n+1}\) from which the \(i\)th variable \(X_i\) is withdrawn. Then, \(\hat{P}_n=\hat{P}_{n+1}^{-(n+1)}\), and by continuing (6),

$$\begin{aligned} {\mathbb {E}}_P^{\otimes n}[|\hat{P}_n\triangle P|]&\le 2|P|{\mathbb {P}}_P^{\otimes n+1}[X_{n+1}\notin \hat{P}_{n+1}^{-(n+1)}] \nonumber \\&= \frac{2|P|}{n+1}\sum \limits _{i=1}^{n+1}{\mathbb {P}}_P^{\otimes n+1}[X_i\notin \hat{P}_{n+1}^{-i}] \nonumber \\&= \frac{2|P|}{n+1}{\mathbb {E}}_P^{\otimes n+1}\left[ \sum \limits _{i=1}^{n+1}I(X_i\notin \hat{P}_{n+1}^{-i})\right] \nonumber \\&= \frac{2|P|{\mathbb {E}}_P^{\otimes n+1}[V'_{n+1}]}{n+1}, \end{aligned}$$
(7)

where \(V'_{n+1}\) stands for the number of points \(X_i\) falling outside of the polytope with at most \(r\) vertices, of minimum volume, containing all the other \(n\) points. It is not clear that if a point \(X_i\) is not in \(\hat{P}^{-i}\), then \(X_i\) lies on the boundary of \(\hat{P}_{n+1}\). However, if this were true, then almost surely \(V'_{n+1}\) would be less or equal to \(d+1\) times the number of facets of \(\hat{P}_{n+1}\). Indeed, any facet of \(\hat{P}_{n+1}\) is supported by an affine hyperplane of \({\mathbb {R}}^d\), which, with probability one, does not contain more than \(d+1\) points of the sample at a time. In addition, the maximal number of facets of a \(d\) dimensional convex polytope with at most \(r\) vertices is bounded by McMullen’s upper bound [27, 28], and the conjecture would be proved. However, it might occur with positive probability that some points \(X_i\) are not in \(\hat{P}^{-i}\), although they do not lay on the boundary of \(\hat{P}_{n+1}\). So it may be of interest to work directly on the random variable \(V'_{n+1}\). This remains an open problem.

3.3 Lower bound for the minimax risk in the case \(d=2\)

In the 2-dimensional case, we provide a lower bound of the order \(1/n\), with a factor that is linear in the number of vertices \(r\). Namely, the following theorem holds.

Theorem 2

Let \(r\ge 5\) be an integer, and \(n\ge r\). Assume \(d=2\). Then,

$$\begin{aligned} {\mathcal {R}}_n({\mathcal {P}}_r)\gtrsim \frac{r}{n}. \end{aligned}$$

Combined with (3), this bound shows that, as a function of \(r\), \({\mathcal {R}}_n({\mathcal {P}}_r)\) behaves linearly in \(r\) in dimension two. In greater dimensions, it is quite easy to show that \({\mathcal {R}}_n({\mathcal {P}}_r)\gtrsim _{d} \frac{1}{n}\), but this lower bound does not show the dependency in \(r\). However, the upper bound (3) shows that \({\mathcal {R}}_n({\mathcal {P}}_r)\) is at most linear in \(r\).

4 Estimation of convex bodies

In this section we no longer assume that the unknown support \(G\) is a polytope, but only that it is a convex body and we write \(G=K\). Denote by \(\hat{K}_n\) the convex hull of the sample. The risk of this estimator cannot be bounded from above uniformly on the class \({\mathcal {K}}\), since by (1) for any given \(n\), \({\mathbb {E}}_K[ |K\triangle \hat{K}_n| ] \rightarrow \infty \) as \(|K|\rightarrow \infty \). Moreover there is no uniformly consistent estimator on the class \({\mathcal {K}}\) of all convex bodies if the risk is defined by (*). The following result holds.

Theorem 3

For all \(n\ge 1\), the minimax risk (**) on the class \({\mathcal {K}}\) is infinite:

$$\begin{aligned} {\mathcal {R}}_n({\mathcal {K}})=+{\infty }. \end{aligned}$$

Therefore, if one considers the unbounded class \({\mathcal {K}}\), it is more appropriate to measure the risk of an estimator \(\tilde{K}_n\) of \(K\), based of a sample of \(n\) observations, using a normalized version of the risk defined in (**):

$$\begin{aligned} {\mathcal {Q}}_n(\tilde{K}_n ; {\mathcal {K}}) = \sup \limits _{K\in {\mathcal {K}}}{\mathbb {E}}_K\left[ \frac{|K\triangle \tilde{K}_n|}{|K|}\right] \!. \end{aligned}$$

Also define the normalized minimax risk on the class \({\mathcal {K}}\):

$$\begin{aligned} {\mathcal {Q}}_n({\mathcal {K}}) = \inf _{\tilde{K}_n}\sup \limits _{K\in {\mathcal {K}}} {\mathbb {E}}_K\left[ \frac{|K\triangle \tilde{K}_n|}{|K|}\right] \!, \end{aligned}$$

where the infimum is taken over all estimators \(\tilde{K}_n\) based on a sample of \(n\) i.i.d. observations. For the estimator \(\hat{K}_n\), we recall Theorem 1 of [29].

Theorem 4

Let \(n\ge 2\) be an integer. There exist two positive constants \(C_1\) and \(C_2\), which depend on \(d\) only, such that:

$$\begin{aligned} \sup \limits _{K\in {\mathcal {K}}}{\mathbb {P}}_K\left[ n\left( \frac{|K\backslash \hat{K}_n|}{|K|}-C_2n^{-2/(d+1)}\right) >x\right] \le C_1e^{-x/d^d},\quad \forall x>0. \end{aligned}$$

From this theorem, combined with the lower bound of [18] (the lower bound of [18] is for the minimax risk, but the proof still holds for the normalized risk), follows the next corollary.

Corollary 2

Let \(n\ge 2\) be an integer. The normalized minimax risk on the class \({\mathcal {K}}\) satisfies

$$\begin{aligned} n^{-\frac{2}{d+1}}\lesssim _ d {\mathcal {Q}}_n({\mathcal {K}}) \lesssim _ d n^{-\frac{2}{d+1}}, \end{aligned}$$

and the convex hull has the minimax rate of convergence on \({\mathcal {K}}\), with respect to the normalized version of the risk.

This result gives an upper bound on \({\mathbb {E}}_K\left[ \frac{|K\triangle \hat{K}_n|}{|K|}\right] \) that is uniform over all convex bodies in \({\mathbb {R}}^d\), with no restriction neither on the volume and location of the set \(K\), unlike in [18], nor on the smoothness and structure of its boundary.

For adaptive estimation, we will only consider subsets of \([0,1]^d\), and the following result, in which only the constants are improved, has a proof which is a simplification of that of Theorem 4 (cf. [29]).

Theorem 5

Let \(n\ge 2\) be an integer. There exists a positive constant \(C_2'\), which depend on \(d\) only, such that:

$$\begin{aligned} \sup \limits _{K\in {\mathcal {K}}_1}{\mathbb {P}}_K\left[ n\left( |K\backslash \hat{K}_n|-C_2'n^{-2/(d+1)}\right) >x\right] \lesssim _ d e^{-x/\beta _d}, \quad \forall x>0. \end{aligned}$$

Therefore, using Theorem 5 and, again, the lower bound of [18], we get the following corollary.

Corollary 3

Let \(n\ge 2\) be an integer. The minimax risk on the class \({\mathcal {K}}_1\) satisfies

$$\begin{aligned} n^{-\frac{2}{d+1}}\lesssim _ d {\mathcal {R}}_n({\mathcal {K}}_1) \lesssim _ d n^{-\frac{2}{d+1}}, \end{aligned}$$

and the convex hull has the minimax rate of convergence on \({\mathcal {K}}_1\), with respect to the risk defined in (**).

5 Adaptative estimation

In Sects. 3 and 4, we proposed estimators which highly depend on the structure of the boundary of the unknown support. In particular, when the support was supposed to be polytopal with at most \(r\) vertices, for some known integer \(r\), our estimator was by construction also a polytope with at most \(r\) vertices. Now we will construct an estimator which does not depend on any other knowledge but the convexity of the unknown support, and the fact that it is located in \([0,1]^d\). This last assumption is made for technical reasons, but it is reasonable. Indeed, the unknown set can be seen as an image that the statistician would like to reconstruct, in some given and known frame. The estimator will achieve the same rate as those of Sect. 3.1 in the polytopal case, that is, \(r\ln n/n\), where \(r\) is the unknown number of vertices of the support, and the same rate, up to a logarithmic factor, as the convex hull which was studied in Sect. 4 in the case where the support is not polytopal, or is polytopal but with too many vertices. Note that if the support is a polytope with \(r\) vertices, where \(r\) is larger than \((\ln n)^{-1}n^\frac{d-1}{d+1}\), then the risk of the convex hull estimator \(\hat{K}_n\) has a smaller rate than that of \(\hat{P}_n^{(r)}\). The idea which we develop here is the same as in [26], Theorem 6. The classes \({\mathcal {P}}_r, r\ge d+1\), are nested, that is, \({\mathcal {P}}_r\subseteq {\mathcal {P}}_{r'}\) as soon as \(r\le r'\). So it is better, in some sense, to overestimate the true number vertices of the unknown polytope \(P\). Intuitively, it makes sense to fit some polytope with more vertices to \(P\), while the opposite may be impossible (e.g. it is possible to fit a quadrilateral on any triangle, but not to fit a triangle on a square). We use this idea in order to select an estimator among the preliminary estimators \(\hat{P}_n^{(r)}, r\ge d+1\), and \(\hat{K}_n\).

Set \(R_n=\lfloor n^{(d-1)/(d+1)}/(\ln n)\rfloor \), where \(\lfloor \cdot \rfloor \) stands for the integer part. For \(r=d+1,\ldots ,R_n-1\), denote by \(\hat{Q}_n^{(r)}=\hat{P}_n^{(r)}\) and define \(\hat{Q}_n^{(R_n)}=\hat{K}_n\). Let \(C=2+\max (4d,C_2')\), and define

$$\begin{aligned} \hat{r}=\min \left\{ r\in \{d+1,\ldots ,R_n\} : |\hat{Q}_n^{(r)}\triangle \hat{Q}_n^{(r')}|\le \frac{2Cr'\ln n}{n}, \forall r'=r,\ldots ,R_n\right\} \!. \end{aligned}$$

The integer \(\hat{r}\) is well defined; indeed, the set in the brackets in the last display is not empty, since the inequality is satisfied for \(r=R_n\).

The adaptive estimator is defined as \(\hat{P}_n^{adapt} = \hat{Q}_n^{(\hat{r})}\). Then, if we denote by \({\mathcal {P}}_{\infty }={\mathcal {K}}_1\), we have the following theorem.

Theorem 6

Let \(n\ge 2\). Let \(\phi _{n,r}=\min \left( \frac{r\ln n}{n},n^{-\frac{2}{d+1}}\right) \), for all integers \(r\ge d+1\) and \(r=\infty \). Then,

$$\begin{aligned} \sup \limits _{d+1\le r\le \infty }\sup \limits _{P\in {\mathcal {P}}_r} {\mathbb {E}}_P\left[ \phi _{n,r}^{-1}|\hat{P}_n^{adapt}\triangle P|\right] \lesssim _ d 1. \end{aligned}$$

Thus, we show that one and the same estimator \(\hat{P}_n^{adapt}\) attains the optimal rate, up to a logarithmic factor, simultaneously on all the classes \({\mathcal {P}}_r, r\ge d+1\), and on the class \({\mathcal {K}}_1\) of all convex bodies in \([0,1]^d\).

The proof of Theorem 6 is similar to that of Theorem 6 of [26].

6 Proofs

6.1 Proof of Theorem 1

Let \(r\ge d+1\) be an integer, and \(n\ge 2\). Let \(P_0\in {\mathcal {P}}_r\) and consider a sample \(X_1,\ldots ,X_n\) of i.i.d. random variables with uniform distribution on \(P_0\). For simplicity’s sake, we will denote \(\hat{P}_n\) instead of \(\hat{P}_n^{(r)}\) in this section.

Let \(\hat{P}_n\) be the estimator defined in Theorem 1. Let us define \({\mathcal {P}}_r^{(n)}\) as the class of all convex polytopes of \({\mathcal {P}}_r\) whose vertices lay on the grid \(\left( \frac{1}{n}{\mathbb {Z}}\right) ^d\), i.e. have as coordinates integer multiples of \(1/n\). We use the following lemma, which corresponds to Lemma 1 1 and its proof in [26].

Lemma 1

Let \(r\ge d+1, n\ge 2\). There exists a positive constant \(K_1\), which depends on \(d\) only, such that for any convex polytope \(P\) in \({\mathcal {P}}_r\) there is a convex polytope \(P^*\in {\mathcal {P}}_{r}^{(n)}\) such that:

$$\begin{aligned} \left\{ \begin{array}{l} |P^*\triangle P|\le \frac{K_1}{n} \\ P^*\subseteq P^{\sqrt{d}/n}, \qquad P\subseteq (P^*)^{\sqrt{d}/n}. \end{array} \right. \end{aligned}$$
(8)

In particular, taking \(P=P_0\) or \(P=\hat{P}_n\) in Lemma 1, we can find two polytopes \(P^*\) and \(\tilde{P}_n\) in \({\mathcal {P}}_{r}^{(n)}\) such that

$$\begin{aligned} \left\{ \begin{array}{l} |P^*\triangle P_0|\le \frac{K_1}{n} \\ P^*\subseteq P_0^{\sqrt{d}/n}, \qquad P_0\subseteq (P^*)^{\sqrt{d}/n} \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} \left\{ \begin{array}{l} |\tilde{P}_n\triangle \hat{P}_n|\le \frac{K_1}{n} \\ \tilde{P}_n\subseteq \hat{P}_n^{\sqrt{d}/n}, \qquad \hat{P}_n\subseteq \tilde{P}_n^{\sqrt{d}/n}. \end{array} \right. \end{aligned}$$

Note that \(\tilde{P}_n\) is random. Let \(\epsilon >0\). By construction, \(|\hat{P}_n|\le |P_0|\), so \(|\hat{P}_n\triangle P_0|\le 2 |P_0\backslash \hat{P}_n|\). Besides, if \(G_1\), \(G_2\) and \(G_3\) are three measurable subsets of \({\mathbb {R}}^d\), the following triangle inequality holds:

$$\begin{aligned} |G_1\backslash G_3|\le |G_1\backslash G_2|+|G_2\backslash G_3|. \end{aligned}$$
(9)

Let us now write the following inclusions between the events.

(10)

where the last union is over the class of all \(P\in {\mathcal {P}}_{r}^{(n)}\) that satisfy the inequality \(|P^*\backslash P|>\epsilon /2 - \frac{2K_1}{n}\). Let \(P\) be such a polytope, then if \(\tilde{P}_n = P\), then necessarily the sample \(\{X_1,\ldots ,X_n\}\) is included in \(P^{\frac{\sqrt{d}}{n}}\), by definition of \(\tilde{P}_n\), and (10) becomes

(11)

where \(C_1={\mathrm {e}}^{4K_1}\). Therefore, using (10) and (11) and denoting by \(\# {\mathcal {P}}_{r}^{(n)}\) the cardinality of the finite class \({\mathcal {P}}_{r}^{(n)}\),

$$\begin{aligned} P_{P_0}\left[ |\hat{P}_n\triangle P_0|>\epsilon \right]&\le \# {\mathcal {P}}_{r}^{(n)}C_1\exp (-n\epsilon /2) \nonumber \\&\le (n+1)^{dr}C_1\exp (-n\epsilon /2) \nonumber \\&\le C_1\exp (-n\epsilon /2+2dr\ln n). \end{aligned}$$
(12)

It turns out that if we take \(\epsilon \) of the form \(\displaystyle {\frac{4dr\ln n}{n}+\frac{x}{n}}\), (12) becomes

$$\begin{aligned} {\mathbb {P}}_{P_0}\left[ n\left( |\hat{P}_n\triangle P_0|-\frac{4dr\ln n}{n}\right) \ge x\right] \le C_1{\mathrm {e}}^{-x/2}, \end{aligned}$$
(13)

which holds for any \(x>0\) and any \(P_0\in {\mathcal {P}}_r\). Theorem 1 is proved.

Corollary 1 comes by applying Fubini’s theorem (see [26] for details).

6.2 Proof of Theorem 2

Let \(r\ge 5\) be an integer, supposed to be even without loss of generality and assume \(n\ge r\). Consider a regular convex polytope \(P^*\) in \([0,1]^2\) with center \(C=(1/2,1/2)\) and with \(r/2\) vertices, denoted by \(A_0, A_2, \ldots , A_{r-2}\), such that for all \(k=0,\ldots ,r/2-1\), the distance between \(A_{2k}\) and the center \(C\) is \(1/2\). Let \(A_1, A_3,\ldots , A_{r-1}\) be \(r/2\) points built as in Fig. 1: for \(k=0,\ldots , r/2-1, A_{2k+1}\) is on the mediator of the segment \([A_{2k},A_{2k+2}]\), outside \(P^*\), at a distance \(\delta =h/2\cos (2\pi /r)\tan (4\pi /r)\) of \(P^*\), with \(h\in (0,1)\) to be chosen. Note that by our construction, \(A_{2k}\) and \(A_{2k+2}\) are vertices of the convex hull of \(A_0,A_2,\ldots ,A_{r-2}\) and \(A_{2k+1}\).

Fig. 1
figure 1

Construction of hypotheses for the lower bound

Let us denote by \(D_k\) the smallest convex cone with apex \(C\), containing the points \(A_{2k}, A_{2k+1}\) and \(A_{2k+2}\), as drawn in Fig. 1. For \(\omega =(\omega _0,\ldots ,\omega _{r/2-1})\in \{0,1\}^{r/2}\), we denote by \(P_\omega \) the convex hull of \(P^*\) and the points \(A_{2k+1}, k=0,\ldots ,r/2-1\) such that \(\omega _k=1\). Then we follow the scheme of the proof of Theorem 5 in [26].

For \(k=0,\ldots ,r/2-1\), and \((\omega _0,\ldots ,\omega _{k-1},\omega _{k+1},\ldots ,\omega _{r/2-1})\in \{0,1\}^{r/2-1}\), we denote by

$$\begin{aligned} \omega ^{(k,0)}&= (\omega _0,\ldots ,\omega _{k-1},0,\omega _{k+1},\ldots ,\omega _{r/2-1}) \quad \text{ and } \text{ by } \\ \omega ^{(k,1)}&= (\omega _0,\ldots ,\omega _{k-1},1,\omega _{k+1},\ldots ,\omega _{r/2-1}). \end{aligned}$$

Note that for \(k=0,\ldots ,r/2-1\), and \((\omega _0,\ldots ,\omega _{k-1},\omega _{k+1},\ldots ,\omega _{r/2-1})\in \) \(\{0,1\}^{r/2-1}\),

$$\begin{aligned} |P_{\omega ^{(k,0)}}\triangle P_{\omega ^{(k,1)}}|=\frac{\delta }{2}\cos (2\pi /r). \end{aligned}$$

Let \(H\) be the Hellinger distance between probability measures. For the definition and some properties, see [30], Section 2.4. We have, by a simple computation,

$$\begin{aligned} 1-\frac{H(P_{\omega ^{(k,0)}},P_{\omega ^{(j,1)}})^2}{2}&= \sqrt{1-\frac{|P_{\omega ^{(k,1)}}\backslash P_{\omega ^{(k,0)}}|}{|P_{\omega ^{(k,1)}}|}} \\&= \sqrt{1-\frac{\delta /2\cos (2\pi /r)}{|P_{\omega ^{(k,1)}}|}}\\&\ge \sqrt{1-\frac{\delta \cos (2\pi /r)}{4}} \end{aligned}$$

since \(|P_{\omega ^{(k,1)}}|\ge |P^*|\ge 1/2\). Now, let \(\hat{P}_n\) be any estimator of \(P^*\), based on a sample of \(n\) i.i.d. random variables. By the same computation as in the proof of Theorem 5 in [26], based on the \(2^{r/2}\) hypotheses that we constructed, we get

$$\begin{aligned} \sup \limits _{P\in {\mathcal {P}}_r}{\mathbb {E}}_P\left[ |P\triangle \hat{P}_n|\right]&\ge \frac{r\delta \cos (2\pi /r)}{8}\left( 1-\frac{\delta \cos (2\pi /r)}{4}\right) ^n \nonumber \\&\ge \frac{rh\cos \left( \frac{2\pi }{r}\right) ^2\tan \left( \frac{4\pi }{r}\right) }{8}\left( 1-\frac{h\cos \left( \frac{2\pi }{r}\right) ^2\tan \left( \frac{4\pi }{r}\right) }{8}\right) ^n.\qquad \end{aligned}$$
(14)

Note that if we denote by \(\displaystyle {x=\frac{2\pi }{r}>0}\) and \(\displaystyle {\phi (x)=\frac{1}{x}\cos (x)^2\tan (2x)}\), then \(\phi (x)\gtrsim 1\) since \(r\) is supposed to be greater or equal to \(5\). Therefore, by the choice \(h=r/n\le 1\) (we assumed that \(n\ge r\)), (14) becomes

$$\begin{aligned} \sup \limits _{P\in {\mathcal {P}}_r}{\mathbb {E}}_P\left[ |P\triangle \hat{P}_n|\right] \gtrsim \frac{r}{n}, \end{aligned}$$

and Theorem 2 is proved.

6.3 Proof of Theorem 3

Let \(t>0\) be fixed. Let \(G_1=t^{1/d}B_2^d\) and \(G_2=(2t)^{1/d}B_2^d\). Let us denote respectively by \({\mathbb {P}}_1\) and \({\mathbb {P}}_2\) the uniform distributions on \(G_1\) and \(G_2\), and by \({\mathbb {E}}_1\) and \({\mathbb {E}}_2\) the corresponding expectations. We denote by \({\mathbb {P}}_1^{\otimes n}\) and \({\mathbb {P}}_2^{\otimes n}\) the \(n\)-product of \({\mathbb {P}}_1\) and \({\mathbb {P}}_2\), respectively, i.e. the probability distribution of a sample of \(n\) i.i.d. random variables of distribution \({\mathbb {P}}_1\) and \({\mathbb {P}}_2\), respectively. The corresponding expectations are still denoted by \({\mathbb {E}}_1\) and \({\mathbb {E}}_2\). Then, for any estimator \(\hat{G}_n\) based on a sample of \(n\) random variables, we bound from bellow the minimax risk by the Bayesian one.

$$\begin{aligned} \sup \limits _{G\in {\mathcal {K}}}{\mathbb {E}}_G[|\hat{G}_n\triangle G|]&\ge \frac{1}{2}\left( {\mathbb {E}}_1[|\hat{G}_n\triangle G_1|]+{\mathbb {E}}_2[|\hat{G}_n\triangle G_2|]\right) \\&\ge \frac{1}{2}\int _{({\mathbb {R}}^d)^n}\left( |\hat{G}_n\triangle G_1|+|\hat{G}_n\triangle G_2|\right) \min (d{\mathbb {P}}_1^{\otimes n},d{\mathbb {P}}_2^{\otimes n}) \\&\ge \frac{1}{2}\int _{({\mathbb {R}}^d)^n}|G_1\triangle G_2|\min (d{\mathbb {P}}_1^{\otimes n},d{\mathbb {P}}_2^{\otimes n}) \\&\ge \frac{t|B_2^d|}{4}\left( 1-\frac{H({\mathbb {P}}_1^{\otimes n},{\mathbb {P}}_2^{\otimes n})^2}{2}\right) ^2, \end{aligned}$$

where \(H\) is the Hellinger distance between probability measures, as in the proof of Theorem 2. Therefore,

$$\begin{aligned} \sup \limits _{G\in {\mathcal {K}}}{\mathbb {E}}_G[|\hat{G}_n\triangle G|] \ge \frac{t|B_2^d|}{4}\left( 1-\frac{H({\mathbb {P}}_1,{\mathbb {P}}_2)^2}{2}\right) ^{2n}, \end{aligned}$$
(15)

and a simple computation shows that

$$\begin{aligned} 1-\frac{H({\mathbb {P}}_1,{\mathbb {P}}_2)^2}{2} = \frac{1}{\sqrt{2}}, \end{aligned}$$

and (15) becomes

$$\begin{aligned} \sup \limits _{G\in {\mathcal {K}}}{\mathbb {E}}_G[|\hat{G}_n\triangle G|] \ge \frac{t|B_2^d|}{2^{n+2}}. \end{aligned}$$

This ends the proof of Theorem 3 by taking \(t\) arbitrarily large.