Skip to main content
Log in

Depth functions as measures of representativeness

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Data depth provides a natural means to rank multivariate vectors with respect to an underlying multivariate distribution. Most existing depth functions emphasize a centre-outward ordering of data points, which may not provide a useful geometric representation of certain distributional features, such as multimodality, of concern to some statistical applications. Such inadequacy motivates us to develop a device for ranking data points according to their “representativeness” rather than “centrality” with respect to an underlying distribution of interest. Derived essentially from a choice of goodness-of-fit test statistic, our device calls for a new interpretation of “depth” more akin to the concept of density than location. It copes particularly well with multivariate data exhibiting multimodality. In addition to providing depth values for individual data points, depth functions derived from goodness-of-fit tests also extend naturally to provide depth values for subsets of data points, a concept new to the data-depth literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Agostinelli C, Romanazzi M (2011) Local depth. J Stat Plann Inference 141:817–830

    Article  MathSciNet  MATH  Google Scholar 

  • Alba-Fernández V, Jiménez-Gamero MD, Muñoz-García J (2008) A test for the two-sample problem based on empirical characteristic functions. Comput Stat Data Anal 52:3730–3748

    Article  Google Scholar 

  • Aslan B, Zech G (2005) New test for the multivariate two-sample problem based on the concept of minimum energy. J Stat Comput Simul 75:109–119

    Article  MathSciNet  MATH  Google Scholar 

  • Baggerly KA, Scott DW (1999) Comment on “Multivariate analysis by data depth: description statistics, graphics and inference” by R. Y. Liu, J. M. Parelius and K. Singh. Ann Stat 27:843–844

    Google Scholar 

  • Baringhaus L, Franz C (2004) On a new multivariate two-sample test. J Multivar Anal 88:190–206

    Article  MathSciNet  MATH  Google Scholar 

  • Bartoszyński R, Pearl DK, Lawrence J (1997) A multidimensional goodness-of-fit test based on interpoint distances. J Am Stat Assoc 92:577–586

    MATH  Google Scholar 

  • Cabaña A, Cabaña EM (1997) Transformed empirical processes and modified Kolmogorov-Smirnov tests for multivariate distributions. Ann Stat 25:2388–2409

    Article  MATH  Google Scholar 

  • Chen Y, Dang X, Peng H, Bart HLJ (2009) Outlier detection with the kernelized spatial depth function. IEEE Trans Pattern Anal Mach Int 31:288–305

    Article  Google Scholar 

  • Cressie N, Read TRC (1984) Multinomial goodness-of-fit tests. J R Stat Soc Series B 46:440–464

    MathSciNet  MATH  Google Scholar 

  • Cuesta-Albertos JA, Fraiman R, Ransford T (2006) Random projections and goodness-of-fit tests in infinite-dimensional spaces. Bull Braz Math Soc New Series 37:477–501

    Article  MathSciNet  MATH  Google Scholar 

  • Cuesta-Albertos J, Nieto-Reyes A (2008) A random functional depth. In: Dabo-Niang S, Ferraty F (eds) Functional and operatorial statistics. Springer, Heidelberg, pp 121–126

    Chapter  Google Scholar 

  • Fraiman R, Meloche J (1999) Multivariate L-estimation (with discussion). Test 8:255–317

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman J, Rafsky L (1979) Multivariate generalizations of the Wald-Wolfowitz and Sminov two-sample tests. Ann Stat 7:697–717

    Article  MathSciNet  MATH  Google Scholar 

  • Ghosh AK, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat 32:327–350

    Article  MathSciNet  MATH  Google Scholar 

  • Henze N (1988) A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann Stat 16:772–783

    Article  MathSciNet  MATH  Google Scholar 

  • Hlubinka D, Kotík L, Vencálek O (2010) Weighted halfspace depth. Kybernetika 46:125–148

    MathSciNet  MATH  Google Scholar 

  • Lange T, Mosler K, Mozharovskyi P (2012) Fast nonparametric classification based on data depth. Stat Pap 1–21. doi:10.1007/s00362-012-0488-4

  • Li J, Cuesta-Albertos JA, Liu RY (2012) \(DD\)-classifier: nonparametric classification procedure based on \(DD\)-plot. J Am Stat Assoc 107:737–753

    Article  MathSciNet  MATH  Google Scholar 

  • Liu RY (1990) On a notion of data depth based on random simplices. Ann Stat 18:405–414

    Article  MATH  Google Scholar 

  • Mahalanobis PC (1936) On the generalized distance in statistics. Proc Natl Acad Sci India 12:49–55

    Google Scholar 

  • Makhoukhi MB (2008) An approximation for the power function of a non-parametric test of fit. Stat Probab Lett 78:1034–1042

    Article  MATH  Google Scholar 

  • Paindaveine D, van Bever G (2012) From depth to local depth: a focus on centrality. Working Papers ECARES 2012-047, ULB—Universite Libre de Bruxelles

  • Præstgaard JT (1995) Permutation and bootstrap Kolmogorov-Smirnov tests for the equality of two distributions. Scand J Stat 22:305–322

    MATH  Google Scholar 

  • Rosenbaum PR (2005) An exact distribution-free test comparing two multivariate distributions based on adjacency. J R Stat Soc Series B 67:515–530

    Article  MATH  Google Scholar 

  • Schilling MF (1986) Multivariate two-sample tests based on nearest neighbors. J Am Stat Assoc 81:799–806

    Article  MathSciNet  MATH  Google Scholar 

  • Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley, New York

  • Singh K (1991) A notion of majority depth. Technical Report, Department of Statistics, Rutgers University

  • Tukey JW (1975) Mathematics and picturing data. Proc Int Congr Math 2:523–531

    MathSciNet  Google Scholar 

  • Wolfowitz J (1954) Generalization of the theorem of Glivenko–Cantelli. Ann Math Stat 25:131–138

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu L-X, Fang K-T, Bhatti MI (1997) On estimated projection pursuit-type Cramér-von Mises statistics. J Multivar Anal 63:1–14

    Google Scholar 

  • Zuo Y, Serfling R (2000) Structural properties and convergence results for contours of sample statistical depth functions. Ann Stat 28:483–499

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKU 702508P).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen M. S. Lee.

Appendix

Appendix

1.1 Proof of Proposition 1

For any \(x\in \mathbb R \), let \(r(x)\) satisfy \(F(x+r(x))-F(x-r(x))=\gamma \). Differentiation of the latter condition twice with respect to \(x\) gives \(\{f(x-r(x))+f(x+r(x))\}r'(x)=f(x-r(x))-f(x+r(x))\) and \(\{f(x-r(x))+f(x+r(x))\}r''(x)=(1-r'(x))^2f'(x-r(x))-(1+r'(x))^2f'(x+r(x))\). Thus \(r(x)\) has a local minimum or local maximum at \(x=x_0\) satisfying \(r'(x_0)=0\), that is \(f(x_0-r(x_0))=f(x_0+r(x_0))\), according as \(r''(x_0)>0\) or \(<0\) respectively. The proposition then follows by noting that \(r''(x_0)=\{2f(x_0-r(x_0))\}^{-1}\{f'(x_0-r(x_0))-f'(x_0+r(x_0))\}\) and that \(D(F,x)\) is a decreasing function of \(r(x)\).

1.2 Proof of Proposition 2

The unimodality condition ensures that there exists some \(w\in \mathbb R \) and \(R>0\) such that \(F(w+R)-F(w-R)=\gamma \) and \(f(w+R)=f(w-R)=k\), say. Then necessarily \(f\) increases at \(w-R\), decreases at \(w+R\) and \(f(x)>k\) for all \(x\in (w-R,w+R)\). Clearly the depth function (10) is locally maximised at \(w\) by Proposition 1. Fix any \(x_1>x_2>w\) and let \(R_1, R_2>0\) satisfy \(F(x_i+R_i)-F(x_i-R_i)=\gamma \), \(i=1,2\). Note that \(x_2-R_2>w-R\) and \(x_2+R_2>w+R\), or otherwise \([x_2-R_2,x_2+R_2]\) either contains or is contained in \([w-R,w+R]\) strictly, contrary to the definition of \(R_2\). It follows that \(f(x)>f(x_2+R_2)\) for all \(x\in [w-R,x_2+R_2)\). Consider two cases: (i) \(x_1-R_2>x_2+R_2\), (ii) \(x_1-R_2\le x_2+R_2\). Under (i), we have \(f(x_1-R_2)<f(x)\) for all \(x\in [x_2-R_2,x_2+R_2]\), so that \(\int _{x_1-R_2}^{x_1+R_2}f(u)\,du<\int _{x_2-R_2}^{x_2+R_2}f(u)\,du=\gamma \), which implies that \(R_1>R_2\). Under (ii), we have \(f(x_2+R_2)<f(x)\) for all \(x\in [x_2-R_2,x_1-R_2)\), so that

$$\begin{aligned} \int _{x_1-R_2}^{x_1+R_2}f(u)\,du-\int _{x_2-R_2}^{x_2+R_2}f(u)\,du=\int _{x_2+R_2}^{x_1+R_2}f(u)\,du-\int _{x_2-R_2}^{x_1-R_2}f(u)\,du<0. \end{aligned}$$

It follows that \(F(x_1+R_2)-F(x_1-R_2)<\gamma \) and hence \(R_1>R_2\). Thus we have, under either (i) or (ii), that the depth function at \(x_1\) is strictly smaller than that at \(x_2\), so that it decreases strictly on \((w,\infty )\). Similar arguments show that it increases strictly on \((-\infty ,w)\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, Y., Lee, S.M.S. Depth functions as measures of representativeness. Stat Papers 55, 1079–1105 (2014). https://doi.org/10.1007/s00362-013-0555-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-013-0555-5

Keywords

Mathematics Subject Classification

Navigation