Abstract
Data depth provides a natural means to rank multivariate vectors with respect to an underlying multivariate distribution. Most existing depth functions emphasize a centre-outward ordering of data points, which may not provide a useful geometric representation of certain distributional features, such as multimodality, of concern to some statistical applications. Such inadequacy motivates us to develop a device for ranking data points according to their “representativeness” rather than “centrality” with respect to an underlying distribution of interest. Derived essentially from a choice of goodness-of-fit test statistic, our device calls for a new interpretation of “depth” more akin to the concept of density than location. It copes particularly well with multivariate data exhibiting multimodality. In addition to providing depth values for individual data points, depth functions derived from goodness-of-fit tests also extend naturally to provide depth values for subsets of data points, a concept new to the data-depth literature.
Similar content being viewed by others
References
Agostinelli C, Romanazzi M (2011) Local depth. J Stat Plann Inference 141:817–830
Alba-Fernández V, Jiménez-Gamero MD, Muñoz-García J (2008) A test for the two-sample problem based on empirical characteristic functions. Comput Stat Data Anal 52:3730–3748
Aslan B, Zech G (2005) New test for the multivariate two-sample problem based on the concept of minimum energy. J Stat Comput Simul 75:109–119
Baggerly KA, Scott DW (1999) Comment on “Multivariate analysis by data depth: description statistics, graphics and inference” by R. Y. Liu, J. M. Parelius and K. Singh. Ann Stat 27:843–844
Baringhaus L, Franz C (2004) On a new multivariate two-sample test. J Multivar Anal 88:190–206
Bartoszyński R, Pearl DK, Lawrence J (1997) A multidimensional goodness-of-fit test based on interpoint distances. J Am Stat Assoc 92:577–586
Cabaña A, Cabaña EM (1997) Transformed empirical processes and modified Kolmogorov-Smirnov tests for multivariate distributions. Ann Stat 25:2388–2409
Chen Y, Dang X, Peng H, Bart HLJ (2009) Outlier detection with the kernelized spatial depth function. IEEE Trans Pattern Anal Mach Int 31:288–305
Cressie N, Read TRC (1984) Multinomial goodness-of-fit tests. J R Stat Soc Series B 46:440–464
Cuesta-Albertos JA, Fraiman R, Ransford T (2006) Random projections and goodness-of-fit tests in infinite-dimensional spaces. Bull Braz Math Soc New Series 37:477–501
Cuesta-Albertos J, Nieto-Reyes A (2008) A random functional depth. In: Dabo-Niang S, Ferraty F (eds) Functional and operatorial statistics. Springer, Heidelberg, pp 121–126
Fraiman R, Meloche J (1999) Multivariate L-estimation (with discussion). Test 8:255–317
Friedman J, Rafsky L (1979) Multivariate generalizations of the Wald-Wolfowitz and Sminov two-sample tests. Ann Stat 7:697–717
Ghosh AK, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat 32:327–350
Henze N (1988) A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann Stat 16:772–783
Hlubinka D, Kotík L, Vencálek O (2010) Weighted halfspace depth. Kybernetika 46:125–148
Lange T, Mosler K, Mozharovskyi P (2012) Fast nonparametric classification based on data depth. Stat Pap 1–21. doi:10.1007/s00362-012-0488-4
Li J, Cuesta-Albertos JA, Liu RY (2012) \(DD\)-classifier: nonparametric classification procedure based on \(DD\)-plot. J Am Stat Assoc 107:737–753
Liu RY (1990) On a notion of data depth based on random simplices. Ann Stat 18:405–414
Mahalanobis PC (1936) On the generalized distance in statistics. Proc Natl Acad Sci India 12:49–55
Makhoukhi MB (2008) An approximation for the power function of a non-parametric test of fit. Stat Probab Lett 78:1034–1042
Paindaveine D, van Bever G (2012) From depth to local depth: a focus on centrality. Working Papers ECARES 2012-047, ULB—Universite Libre de Bruxelles
Præstgaard JT (1995) Permutation and bootstrap Kolmogorov-Smirnov tests for the equality of two distributions. Scand J Stat 22:305–322
Rosenbaum PR (2005) An exact distribution-free test comparing two multivariate distributions based on adjacency. J R Stat Soc Series B 67:515–530
Schilling MF (1986) Multivariate two-sample tests based on nearest neighbors. J Am Stat Assoc 81:799–806
Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley, New York
Singh K (1991) A notion of majority depth. Technical Report, Department of Statistics, Rutgers University
Tukey JW (1975) Mathematics and picturing data. Proc Int Congr Math 2:523–531
Wolfowitz J (1954) Generalization of the theorem of Glivenko–Cantelli. Ann Math Stat 25:131–138
Zhu L-X, Fang K-T, Bhatti MI (1997) On estimated projection pursuit-type Cramér-von Mises statistics. J Multivar Anal 63:1–14
Zuo Y, Serfling R (2000) Structural properties and convergence results for contours of sample statistical depth functions. Ann Stat 28:483–499
Acknowledgments
Supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKU 702508P).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of Proposition 1
For any \(x\in \mathbb R \), let \(r(x)\) satisfy \(F(x+r(x))-F(x-r(x))=\gamma \). Differentiation of the latter condition twice with respect to \(x\) gives \(\{f(x-r(x))+f(x+r(x))\}r'(x)=f(x-r(x))-f(x+r(x))\) and \(\{f(x-r(x))+f(x+r(x))\}r''(x)=(1-r'(x))^2f'(x-r(x))-(1+r'(x))^2f'(x+r(x))\). Thus \(r(x)\) has a local minimum or local maximum at \(x=x_0\) satisfying \(r'(x_0)=0\), that is \(f(x_0-r(x_0))=f(x_0+r(x_0))\), according as \(r''(x_0)>0\) or \(<0\) respectively. The proposition then follows by noting that \(r''(x_0)=\{2f(x_0-r(x_0))\}^{-1}\{f'(x_0-r(x_0))-f'(x_0+r(x_0))\}\) and that \(D(F,x)\) is a decreasing function of \(r(x)\).
1.2 Proof of Proposition 2
The unimodality condition ensures that there exists some \(w\in \mathbb R \) and \(R>0\) such that \(F(w+R)-F(w-R)=\gamma \) and \(f(w+R)=f(w-R)=k\), say. Then necessarily \(f\) increases at \(w-R\), decreases at \(w+R\) and \(f(x)>k\) for all \(x\in (w-R,w+R)\). Clearly the depth function (10) is locally maximised at \(w\) by Proposition 1. Fix any \(x_1>x_2>w\) and let \(R_1, R_2>0\) satisfy \(F(x_i+R_i)-F(x_i-R_i)=\gamma \), \(i=1,2\). Note that \(x_2-R_2>w-R\) and \(x_2+R_2>w+R\), or otherwise \([x_2-R_2,x_2+R_2]\) either contains or is contained in \([w-R,w+R]\) strictly, contrary to the definition of \(R_2\). It follows that \(f(x)>f(x_2+R_2)\) for all \(x\in [w-R,x_2+R_2)\). Consider two cases: (i) \(x_1-R_2>x_2+R_2\), (ii) \(x_1-R_2\le x_2+R_2\). Under (i), we have \(f(x_1-R_2)<f(x)\) for all \(x\in [x_2-R_2,x_2+R_2]\), so that \(\int _{x_1-R_2}^{x_1+R_2}f(u)\,du<\int _{x_2-R_2}^{x_2+R_2}f(u)\,du=\gamma \), which implies that \(R_1>R_2\). Under (ii), we have \(f(x_2+R_2)<f(x)\) for all \(x\in [x_2-R_2,x_1-R_2)\), so that
It follows that \(F(x_1+R_2)-F(x_1-R_2)<\gamma \) and hence \(R_1>R_2\). Thus we have, under either (i) or (ii), that the depth function at \(x_1\) is strictly smaller than that at \(x_2\), so that it decreases strictly on \((w,\infty )\). Similar arguments show that it increases strictly on \((-\infty ,w)\).
Rights and permissions
About this article
Cite this article
Dong, Y., Lee, S.M.S. Depth functions as measures of representativeness. Stat Papers 55, 1079–1105 (2014). https://doi.org/10.1007/s00362-013-0555-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-013-0555-5