Abstract
Given a set \(X\) of \(k\) points and a point \(z\) in the \(n\)-dimensional euclidean space, the Tukey depth of \(z\) with respect to \(X\), is defined as \(m/k\), where \(m\) is the minimum integer such that \(z\) is not in the convex hull of some set of \(k-m\) points of \(X\). If \(z\) belongs to the closed region \(B\) delimited by an ellipsoid, define the continuous depth of \(z\) with respect to \(B\) as the quotient \(V(z)/\text{ Vol }(B)\), where \(V(z)\) is the minimum volume of the intersection of \(B\) with the halfspaces defined by any hyperplane passing through \(z\), and \(\text{ Vol }(B)\) is the volume of \(B\). We consider \(z\) a random variable and prove that, if \(z\) is uniformly distributed in \(B\), the continuous depth of \(z\) with respect to \(B\) has expected value \(1/2^{n+1}\). This result implies that if \(z\) and \(X\) are uniformly distributed in \(B\), the expected value of Tukey depth of \(z\) with respect to \(X\) converges to \(1/2^{n+1}\) as the number of points \(k\) goes to infinity. These findings have applications in ecology, namely within the niche theory, where it is useful to explore and characterize the distribution of points inside species niche.
Similar content being viewed by others
References
Abramowitz A, Stegun M, Irene A (eds) (1972) Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover Publications, New York. ISBN: 978-0-486-61272-0
Anthos (2011) Sistema de información de las plantas de España. Real Jardín Botánico, CSIC- Fundación Biodiversidad. Downloaded in November 16, 2011
Battista T, Gattone SA (2004) Multivariate bootstrap confidence regions for abundance vector using data depth. Environ Ecol Stat 11:355–365
Cerdeira JO, Martins MJ, Silva PC (2012) A combinatorial approach to assess the separability of clusters. J Classif 29:7–22
Donoho DL, Gasko M (1992) Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann Stat 20:1803–1827
Fukuda K, Rosta V (2005) Data depth and maximum feasible subsystems. In: Avis David, Hertz A, Marcotte Odile (eds) Graph theory and combinatorial optimization. Springer, Berlin
Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A, Richardson K (2006) Worldclim. Accessed May, 2012
Hijmans RJ, van Etten J (2012) Raster: geographic analysis and modeling with raster data. http://CRAN.R-project.org/package=raster, R package version 1.9-70
Hutchinson GE (1957) Concluding remarks. Cold Spring Harb Symp Quant Biol 22: 415–427 (reprinted in 1991: Classics in theoretical biology. Bull Math Biol 53:193–213)
Johnson DS, Preparata FP (1978) The densest hemisphere problem. Theor Comput Sci 6:93–107
Li J, Ban J, Santiago LS (2011) Nonparametric tests for homogeneity of species assemblages: a data depth approach. Biometrics 67:1481–1488
Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2012) Cluster: cluster analysis basics and extensions. http://CRAN.R-project.org/package=cluster, R package version 1.14.2
Massé JC, Plante JF (2009) Depth: depth functions tools for multivariate analysis. http://CRAN.R-project.org/package=depth, R package version 1.0-1
Rousseeuw PJ, Ruts I (1996) Algorithm AS 307: bivariate location depth. Appl Stat (JRSS-C) 45:516–526
Rousseeuw PJ, Ruts I (1999) The depth function of a population distribution. Metrika 49:213–244
Rousseeuw PJ, Struyf A (1998) Computing location depth and regression depth in higher dimensions. Stat Comput 8:193–203
Tukey JW (1975) Mathematics and picturing of data. Proc Int Congr Math Vancouver 23:523–531
Zuo R, Serfling Y (2000a) General notions of statistical depth functions. Ann Stat 28:461–482
Zuo R, Serfling Y (2000b) Structural properties and convergence results for contours of sample statistical depth functions. Ann Stat 28:483–499
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Ashis SenGupta.
The authors acknowledge an anonymous referee for his comments and suggestions.
This work was supported by the Portuguese Foundation for Science and Technology (FCT) through the projects PEst-OE/AGR/UI0239/2011, CEF (Centro de Estudos Florestais) under FEDER/POCI, and PTDC/AAC-AMB/113394/2009.
Appendix
Appendix
1.1 Proof of Theorem 1
Denote by \(V_n\) the volume of the unit closed \(n\)-ball \(B_n\). It is well known that \(V_n=(2\pi /n)\, V_{n-2}\), with \(V_1=2\) and \(V_2=\pi \).
Given \(R\in \mathbb{R }^+\) and a continuous function \(f:[0,R] \rightarrow \mathbb{R }\) set
where \(B_n(R)\) denotes the \(n\)-ball centred at the origin of radius \(R\). Using hyperspherical coordinates we obtain by straightforward computations,
The volume of the hyperspherical cap orthogonal to \(z=(0,\ldots ,0,z_n)\) is
Set \(\Theta _n=\int \!\!\!\int \limits _{B_n(1)}\!\!\cdots \!\!\int V(z)\,dz_1\, dz_2\,\cdots \,dz_n\). If \(z\) is uniformly distributed in \(B_n \equiv B_n(1)\), the expected value of the continuous depth of \(z\) with respect to \(B_n\) is
and we have to show that
For \(n=1, V(z)=1-|z|\) and we obtain
For \(n\ge 2\), we have by (2),
where \(\rho =\Vert y\Vert \) and \(r=\Vert z\Vert \).
Given nonnegative integers \(k,\ell \), set \(\Omega _{k,\ell }=\int \limits _0^1 (1- \rho ^2)^{\frac{k}{2}} \,\rho ^\ell \,d\rho \). Changing the order of integration and simplifying, we obtain
In particular, we get
Let \(\Gamma [z]\) denote the usual gamma function. Using relation \(\Gamma [z+1]=z\Gamma [z]\) along with relation
(see, for instance, Abramowitz et al. 1972, Section 6.2) we get for every \(n\ge 4\),
By (4) we derive, for \(n\ge 4\),
The proof of (3) now follows by induction, reminding that \(V_{n-2}=n V_n/(2\pi )\).
Rights and permissions
About this article
Cite this article
Silva, P.C., Cerdeira, J.O., Martins, M.J. et al. Data depth for the uniform distribution. Environ Ecol Stat 21, 27–39 (2014). https://doi.org/10.1007/s10651-013-0242-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-013-0242-7