Skip to main content
Log in

Data depth for the uniform distribution

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

Given a set \(X\) of \(k\) points and a point \(z\) in the \(n\)-dimensional euclidean space, the Tukey depth of \(z\) with respect to \(X\), is defined as \(m/k\), where \(m\) is the minimum integer such that \(z\) is not in the convex hull of some set of \(k-m\) points of \(X\). If \(z\) belongs to the closed region \(B\) delimited by an ellipsoid, define the continuous depth of \(z\) with respect to \(B\) as the quotient \(V(z)/\text{ Vol }(B)\), where \(V(z)\) is the minimum volume of the intersection of \(B\) with the halfspaces defined by any hyperplane passing through \(z\), and \(\text{ Vol }(B)\) is the volume of \(B\). We consider \(z\) a random variable and prove that, if \(z\) is uniformly distributed in \(B\), the continuous depth of \(z\) with respect to \(B\) has expected value \(1/2^{n+1}\). This result implies that if \(z\) and \(X\) are uniformly distributed in \(B\), the expected value of Tukey depth of \(z\) with respect to \(X\) converges to \(1/2^{n+1}\) as the number of points \(k\) goes to infinity. These findings have applications in ecology, namely within the niche theory, where it is useful to explore and characterize the distribution of points inside species niche.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abramowitz A, Stegun M, Irene A (eds) (1972) Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover Publications, New York. ISBN: 978-0-486-61272-0

  • Anthos (2011) Sistema de información de las plantas de España. Real Jardín Botánico, CSIC- Fundación Biodiversidad. Downloaded in November 16, 2011

  • Battista T, Gattone SA (2004) Multivariate bootstrap confidence regions for abundance vector using data depth. Environ Ecol Stat 11:355–365

    Article  Google Scholar 

  • Cerdeira JO, Martins MJ, Silva PC (2012) A combinatorial approach to assess the separability of clusters. J Classif 29:7–22

    Article  Google Scholar 

  • Donoho DL, Gasko M (1992) Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann Stat 20:1803–1827

    Article  Google Scholar 

  • Fukuda K, Rosta V (2005) Data depth and maximum feasible subsystems. In: Avis David, Hertz A, Marcotte Odile (eds) Graph theory and combinatorial optimization. Springer, Berlin

    Google Scholar 

  • Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A, Richardson K (2006) Worldclim. Accessed May, 2012

  • Hijmans RJ, van Etten J (2012) Raster: geographic analysis and modeling with raster data. http://CRAN.R-project.org/package=raster, R package version 1.9-70

  • Hutchinson GE (1957) Concluding remarks. Cold Spring Harb Symp Quant Biol 22: 415–427 (reprinted in 1991: Classics in theoretical biology. Bull Math Biol 53:193–213)

    Google Scholar 

  • Johnson DS, Preparata FP (1978) The densest hemisphere problem. Theor Comput Sci 6:93–107

    Article  Google Scholar 

  • Li J, Ban J, Santiago LS (2011) Nonparametric tests for homogeneity of species assemblages: a data depth approach. Biometrics 67:1481–1488

    Article  PubMed  Google Scholar 

  • Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2012) Cluster: cluster analysis basics and extensions. http://CRAN.R-project.org/package=cluster, R package version 1.14.2

  • Massé JC, Plante JF (2009) Depth: depth functions tools for multivariate analysis. http://CRAN.R-project.org/package=depth, R package version 1.0-1

  • Rousseeuw PJ, Ruts I (1996) Algorithm AS 307: bivariate location depth. Appl Stat (JRSS-C) 45:516–526

    Article  Google Scholar 

  • Rousseeuw PJ, Ruts I (1999) The depth function of a population distribution. Metrika 49:213–244

    Google Scholar 

  • Rousseeuw PJ, Struyf A (1998) Computing location depth and regression depth in higher dimensions. Stat Comput 8:193–203

    Article  Google Scholar 

  • Tukey JW (1975) Mathematics and picturing of data. Proc Int Congr Math Vancouver 23:523–531

    Google Scholar 

  • Zuo R, Serfling Y (2000a) General notions of statistical depth functions. Ann Stat 28:461–482

    Article  Google Scholar 

  • Zuo R, Serfling Y (2000b) Structural properties and convergence results for contours of sample statistical depth functions. Ann Stat 28:483–499

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro C. Silva.

Additional information

Handling Editor: Ashis SenGupta.

The authors acknowledge an anonymous referee for his comments and suggestions.

This work was supported by the Portuguese Foundation for Science and Technology (FCT) through the projects PEst-OE/AGR/UI0239/2011, CEF (Centro de Estudos Florestais) under FEDER/POCI, and PTDC/AAC-AMB/113394/2009.

Appendix

Appendix

1.1 Proof of Theorem 1

Denote by \(V_n\) the volume of the unit closed \(n\)-ball \(B_n\). It is well known that \(V_n=(2\pi /n)\, V_{n-2}\), with \(V_1=2\) and \(V_2=\pi \).

Given \(R\in \mathbb{R }^+\) and a continuous function \(f:[0,R] \rightarrow \mathbb{R }\) set

$$\begin{aligned} I_n[f,R]:=\int \!\!\!\int \limits _{B_n(R)} \!\! \cdots \!\!\! \int f(\Vert x\Vert ) \, dx_1dx_2\cdots dx_n, \end{aligned}$$
(1)

where \(B_n(R)\) denotes the \(n\)-ball centred at the origin of radius \(R\). Using hyperspherical coordinates we obtain by straightforward computations,

$$\begin{aligned} I_n[f,R]= n \,V_n \int \limits _0^R f(\rho ) \,\rho ^{n-1} \,d\rho . \end{aligned}$$
(2)

The volume of the hyperspherical cap orthogonal to \(z=(0,\ldots ,0,z_n)\) is

$$\begin{aligned} V(z)= \int \!\!\!\!\!\!\!\!\!\!\!\int \limits _{B_{n-1} \left( \sqrt{1-z_n^2}\right) }\!\!\!\!\!\!\!\!\!\!\!\cdots \!\!\int \left( \sqrt{1-\Vert y\Vert ^2}-\Vert z\Vert \right) \,dy_1\, dy_2\,\cdots \,dy_{n-1}. \end{aligned}$$

Set \(\Theta _n=\int \!\!\!\int \limits _{B_n(1)}\!\!\cdots \!\!\int V(z)\,dz_1\, dz_2\,\cdots \,dz_n\). If \(z\) is uniformly distributed in \(B_n \equiv B_n(1)\), the expected value of the continuous depth of \(z\) with respect to \(B_n\) is

$$\begin{aligned} \mathrm{E}\!\left[ d_z^c(B_n)\right] =\frac{1}{V_n} \int \!\!\!\!\int \limits _{B_n(1)}\!\!\!\!\cdots \!\!\int \frac{V(z)}{V_n}\,dz_1\, dz_2\,\cdots \,dz_n = \frac{\Theta _n}{V_n^2}, \end{aligned}$$

and we have to show that

$$\begin{aligned} \Theta _n=\frac{V_n^2}{2^{n+1}},\quad n\ge 1. \end{aligned}$$
(3)

For \(n=1, V(z)=1-|z|\) and we obtain

$$\begin{aligned} \Theta _1=2 \int \limits _0^1 (1-z) \,dz=\frac{V_1^2}{2^2}. \end{aligned}$$

For \(n\ge 2\), we have by (2),

$$\begin{aligned} \Theta _n&= I_n\left[ I_{n-1}\left[ \sqrt{1-\rho ^2} - r, \sqrt{1-r^2}\right] ,1\right] \\&= (n-1)n \, V_{n-1} V_n \int \limits _0^1\int \limits _0^{\sqrt{1-r^2}} \left( \sqrt{1-\rho ^2}- r\right) \,\rho ^{n-2} \, d\rho \,\, r^{n-1}\, dr, \end{aligned}$$

where \(\rho =\Vert y\Vert \) and \(r=\Vert z\Vert \).

Given nonnegative integers \(k,\ell \), set \(\Omega _{k,\ell }=\int \limits _0^1 (1- \rho ^2)^{\frac{k}{2}} \,\rho ^\ell \,d\rho \). Changing the order of integration and simplifying, we obtain

$$\begin{aligned} \Theta _n=\frac{n-1}{n+1}\,V_{n-1}\, V_n \, \Omega _{n+1,n-2}. \end{aligned}$$

In particular, we get

$$\begin{aligned} \Theta _2&= \frac{1}{3} V_1 \,V_{2}\, \Omega _{3,0}=\frac{2\pi }{3}\, \int \limits _0^1 (1- \rho ^2)^{3/2}\, d\rho =\frac{\pi ^2}{8}=\frac{V_2^2}{2^3},\\ \Theta _3&= \frac{2}{4}\, V_2 \,V_{3}\, \Omega _{4,1}= \frac{2\pi ^2}{3}\, \int \limits _0^1 (1- \rho ^2)^{2}\,\rho \,d\rho =\frac{\pi ^2}{9}=\frac{V_3^2}{2^4}. \end{aligned}$$

Let \(\Gamma [z]\) denote the usual gamma function. Using relation \(\Gamma [z+1]=z\Gamma [z]\) along with relation

$$\begin{aligned} \Omega _{k,\ell }=\frac{1}{2}\frac{\Gamma \left[ \frac{k+2}{2}\right] \,\Gamma \left[ \frac{\ell +1}{2}\right] }{\Gamma \left[ \frac{k+\ell +3}{2}\right] },\qquad k,\ell \ge 0, \end{aligned}$$

(see, for instance, Abramowitz et al. 1972, Section 6.2) we get for every \(n\ge 4\),

$$\begin{aligned} \Omega _{n+1,n-2}= \Omega _{n-1,n-4}\frac{(n+1)(n-3)}{4\, (n-1)n}. \end{aligned}$$
(4)

By (4) we derive, for \(n\ge 4\),

$$\begin{aligned} \frac{\Theta _n}{\Theta _{n-2}}&= \frac{(n-1)/(n+1)}{(n-3)/(n-1)}\,\, \frac{V_n}{V_{n-2}} \,\, \frac{V_{n-1}}{V_{n-3}}\,\, \frac{\Omega _{n+1,n-2}}{\Omega _{n-1,n-4}}\\&= \frac{(n-1)^2}{(n+1)(n-3)} \,\, \frac{2\pi }{n} \,\, \frac{2\pi }{n-1} \,\, \frac{(n+1)(n-3)}{4\, n(n-1)}\\&= \frac{\pi ^2}{n^2}. \end{aligned}$$

The proof of (3) now follows by induction, reminding that \(V_{n-2}=n V_n/(2\pi )\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silva, P.C., Cerdeira, J.O., Martins, M.J. et al. Data depth for the uniform distribution. Environ Ecol Stat 21, 27–39 (2014). https://doi.org/10.1007/s10651-013-0242-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-013-0242-7

Keywords

Navigation