Depth functions as measures of representativeness

Dong, Ye; Lee, Stephen M. S.

doi:10.1007/s00362-013-0555-5

Depth functions as measures of representativeness

Regular Article
Published: 23 August 2013

Volume 55, pages 1079–1105, (2014)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Ye Dong¹ &
Stephen M. S. Lee¹

274 Accesses
5 Citations
Explore all metrics

Abstract

Data depth provides a natural means to rank multivariate vectors with respect to an underlying multivariate distribution. Most existing depth functions emphasize a centre-outward ordering of data points, which may not provide a useful geometric representation of certain distributional features, such as multimodality, of concern to some statistical applications. Such inadequacy motivates us to develop a device for ranking data points according to their “representativeness” rather than “centrality” with respect to an underlying distribution of interest. Derived essentially from a choice of goodness-of-fit test statistic, our device calls for a new interpretation of “depth” more akin to the concept of density than location. It copes particularly well with multivariate data exhibiting multimodality. In addition to providing depth values for individual data points, depth functions derived from goodness-of-fit tests also extend naturally to provide depth values for subsets of data points, a concept new to the data-depth literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robustness, Dispersion, and Local Functions in Data Depth

Depth Statistics

Central axes and peripheral points in high dimensional directional datasets

Article 20 January 2015

References

Agostinelli C, Romanazzi M (2011) Local depth. J Stat Plann Inference 141:817–830
Article MathSciNet MATH Google Scholar
Alba-Fernández V, Jiménez-Gamero MD, Muñoz-García J (2008) A test for the two-sample problem based on empirical characteristic functions. Comput Stat Data Anal 52:3730–3748
Article Google Scholar
Aslan B, Zech G (2005) New test for the multivariate two-sample problem based on the concept of minimum energy. J Stat Comput Simul 75:109–119
Article MathSciNet MATH Google Scholar
Baggerly KA, Scott DW (1999) Comment on “Multivariate analysis by data depth: description statistics, graphics and inference” by R. Y. Liu, J. M. Parelius and K. Singh. Ann Stat 27:843–844
Google Scholar
Baringhaus L, Franz C (2004) On a new multivariate two-sample test. J Multivar Anal 88:190–206
Article MathSciNet MATH Google Scholar
Bartoszyński R, Pearl DK, Lawrence J (1997) A multidimensional goodness-of-fit test based on interpoint distances. J Am Stat Assoc 92:577–586
MATH Google Scholar
Cabaña A, Cabaña EM (1997) Transformed empirical processes and modified Kolmogorov-Smirnov tests for multivariate distributions. Ann Stat 25:2388–2409
Article MATH Google Scholar
Chen Y, Dang X, Peng H, Bart HLJ (2009) Outlier detection with the kernelized spatial depth function. IEEE Trans Pattern Anal Mach Int 31:288–305
Article Google Scholar
Cressie N, Read TRC (1984) Multinomial goodness-of-fit tests. J R Stat Soc Series B 46:440–464
MathSciNet MATH Google Scholar
Cuesta-Albertos JA, Fraiman R, Ransford T (2006) Random projections and goodness-of-fit tests in infinite-dimensional spaces. Bull Braz Math Soc New Series 37:477–501
Article MathSciNet MATH Google Scholar
Cuesta-Albertos J, Nieto-Reyes A (2008) A random functional depth. In: Dabo-Niang S, Ferraty F (eds) Functional and operatorial statistics. Springer, Heidelberg, pp 121–126
Chapter Google Scholar
Fraiman R, Meloche J (1999) Multivariate L-estimation (with discussion). Test 8:255–317
Article MathSciNet MATH Google Scholar
Friedman J, Rafsky L (1979) Multivariate generalizations of the Wald-Wolfowitz and Sminov two-sample tests. Ann Stat 7:697–717
Article MathSciNet MATH Google Scholar
Ghosh AK, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat 32:327–350
Article MathSciNet MATH Google Scholar
Henze N (1988) A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann Stat 16:772–783
Article MathSciNet MATH Google Scholar
Hlubinka D, Kotík L, Vencálek O (2010) Weighted halfspace depth. Kybernetika 46:125–148
MathSciNet MATH Google Scholar
Lange T, Mosler K, Mozharovskyi P (2012) Fast nonparametric classification based on data depth. Stat Pap 1–21. doi:10.1007/s00362-012-0488-4
Li J, Cuesta-Albertos JA, Liu RY (2012) $DD$-classifier: nonparametric classification procedure based on $DD$-plot. J Am Stat Assoc 107:737–753
Article MathSciNet MATH Google Scholar
Liu RY (1990) On a notion of data depth based on random simplices. Ann Stat 18:405–414
Article MATH Google Scholar
Mahalanobis PC (1936) On the generalized distance in statistics. Proc Natl Acad Sci India 12:49–55
Google Scholar
Makhoukhi MB (2008) An approximation for the power function of a non-parametric test of fit. Stat Probab Lett 78:1034–1042
Article MATH Google Scholar
Paindaveine D, van Bever G (2012) From depth to local depth: a focus on centrality. Working Papers ECARES 2012-047, ULB—Universite Libre de Bruxelles
Præstgaard JT (1995) Permutation and bootstrap Kolmogorov-Smirnov tests for the equality of two distributions. Scand J Stat 22:305–322
MATH Google Scholar
Rosenbaum PR (2005) An exact distribution-free test comparing two multivariate distributions based on adjacency. J R Stat Soc Series B 67:515–530
Article MATH Google Scholar
Schilling MF (1986) Multivariate two-sample tests based on nearest neighbors. J Am Stat Assoc 81:799–806
Article MathSciNet MATH Google Scholar
Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley, New York
Singh K (1991) A notion of majority depth. Technical Report, Department of Statistics, Rutgers University
Tukey JW (1975) Mathematics and picturing data. Proc Int Congr Math 2:523–531
MathSciNet Google Scholar
Wolfowitz J (1954) Generalization of the theorem of Glivenko–Cantelli. Ann Math Stat 25:131–138
Article MathSciNet MATH Google Scholar
Zhu L-X, Fang K-T, Bhatti MI (1997) On estimated projection pursuit-type Cramér-von Mises statistics. J Multivar Anal 63:1–14
Google Scholar
Zuo Y, Serfling R (2000) Structural properties and convergence results for contours of sample statistical depth functions. Ann Stat 28:483–499
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

Supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKU 702508P).

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China
Ye Dong & Stephen M. S. Lee

Authors

Ye Dong
View author publications
You can also search for this author in PubMed Google Scholar
Stephen M. S. Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen M. S. Lee.

Appendix

1.1 Proof of Proposition 1

For any $x\in \mathbb R $, let $r(x)$ satisfy $F(x+r(x))-F(x-r(x))=\gamma $. Differentiation of the latter condition twice with respect to $x$ gives $\{f(x-r(x))+f(x+r(x))\}r'(x)=f(x-r(x))-f(x+r(x))$ and $\{f(x-r(x))+f(x+r(x))\}r''(x)=(1-r'(x))^2f'(x-r(x))-(1+r'(x))^2f'(x+r(x))$. Thus $r(x)$ has a local minimum or local maximum at $x=x_0$ satisfying $r'(x_0)=0$, that is $f(x_0-r(x_0))=f(x_0+r(x_0))$, according as $r''(x_0)>0$ or $<0$ respectively. The proposition then follows by noting that $r''(x_0)=\{2f(x_0-r(x_0))\}^{-1}\{f'(x_0-r(x_0))-f'(x_0+r(x_0))\}$ and that $D(F,x)$ is a decreasing function of $r(x)$.

1.2 Proof of Proposition 2

The unimodality condition ensures that there exists some $w\in \mathbb R $ and $R>0$ such that $F(w+R)-F(w-R)=\gamma $ and $f(w+R)=f(w-R)=k$, say. Then necessarily $f$ increases at $w-R$, decreases at $w+R$ and $f(x)>k$ for all $x\in (w-R,w+R)$. Clearly the depth function (10) is locally maximised at $w$ by Proposition 1. Fix any $x_1>x_2>w$ and let $R_1, R_2>0$ satisfy $F(x_i+R_i)-F(x_i-R_i)=\gamma $, $i=1,2$. Note that $x_2-R_2>w-R$ and $x_2+R_2>w+R$, or otherwise $[x_2-R_2,x_2+R_2]$ either contains or is contained in $[w-R,w+R]$ strictly, contrary to the definition of $R_2$. It follows that $f(x)>f(x_2+R_2)$ for all $x\in [w-R,x_2+R_2)$. Consider two cases: (i) $x_1-R_2>x_2+R_2$, (ii) $x_1-R_2\le x_2+R_2$. Under (i), we have $f(x_1-R_2)<f(x)$ for all $x\in [x_2-R_2,x_2+R_2]$, so that $\int _{x_1-R_2}^{x_1+R_2}f(u)\,du<\int _{x_2-R_2}^{x_2+R_2}f(u)\,du=\gamma $, which implies that $R_1>R_2$. Under (ii), we have $f(x_2+R_2)<f(x)$ for all $x\in [x_2-R_2,x_1-R_2)$, so that

$$\begin{aligned} \int _{x_1-R_2}^{x_1+R_2}f(u)\,du-\int _{x_2-R_2}^{x_2+R_2}f(u)\,du=\int _{x_2+R_2}^{x_1+R_2}f(u)\,du-\int _{x_2-R_2}^{x_1-R_2}f(u)\,du<0. \end{aligned}$$

It follows that $F(x_1+R_2)-F(x_1-R_2)<\gamma $ and hence $R_1>R_2$. Thus we have, under either (i) or (ii), that the depth function at $x_1$ is strictly smaller than that at $x_2$, so that it decreases strictly on $(w,\infty )$. Similar arguments show that it increases strictly on $(-\infty ,w)$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, Y., Lee, S.M.S. Depth functions as measures of representativeness. Stat Papers 55, 1079–1105 (2014). https://doi.org/10.1007/s00362-013-0555-5

Download citation

Received: 21 March 2012
Revised: 20 May 2013
Published: 23 August 2013
Issue Date: November 2014
DOI: https://doi.org/10.1007/s00362-013-0555-5

Keywords

Mathematics Subject Classification

62G99

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Depth functions as measures of representativeness

Abstract

Access this article

Similar content being viewed by others

Robustness, Dispersion, and Local Functions in Data Depth

Depth Statistics

Central axes and peripheral points in high dimensional directional datasets

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof of Proposition 1

1.2 Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Depth functions as measures of representativeness

Abstract

Access this article

Similar content being viewed by others

Robustness, Dispersion, and Local Functions in Data Depth

Depth Statistics

Central axes and peripheral points in high dimensional directional datasets

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of Proposition 1

1.2 Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation