Fast nonparametric classification based on data depth

Abstract

A new procedure, called D D α-procedure, is developed to solve the problem of classifying d-dimensional objects into q ≥ 2 classes. The procedure is nonparametric; it uses q-dimensional depth plots and a very efficient algorithm for discrimination analysis in the depth space [0,1]q. Specifically, the depth is the zonoid depth, and the algorithm is the α-procedure. In case of more than two classes several binary classifications are performed and a majority rule is applied. Special treatments are discussed for ‘outsiders’, that is, data having zero depth vector. The D Dα-classifier is applied to simulated as well as real data, and the results are compared with those of similar procedures that have been recently proposed. In most cases the new procedure has comparable error rates, but is much faster than other classification approaches, including the support vector machine.

This is a preview of subscription content, log in to check access.

References

  1. Asuncion A, Newman D (2007) UCI machine learning repository. http://archive.ics.uci.edu/ml/

  2. Cascos I (2009) Data depth: multivariate statistics and geometry. In: Kendall W, Molchanov I (eds) New perspectives in stochastic geometry. Oxford University Press, Oxford

  3. Christmann A, Rousseeuw PJ (2001) Measuring overlap in binary regression. Comput Stat Data Anal 37: 65–75

    Article  MATH  MathSciNet  Google Scholar 

  4. Christmann A, Fischer P, Joachims T (2002) Comparison between various regression depth methods and the support vector machine to approximate the minimum number of misclassifications. Comput Stat 17: 273–287

    Article  MATH  MathSciNet  Google Scholar 

  5. Cuesta-Albertos JA, Nieto-Reyes A (2008) The random Tukey depth. Comput Stat Data Anal 52: 4979–4988

    Article  MATH  MathSciNet  Google Scholar 

  6. Dutta S, Ghosh AK (2011) On classification based on L p depth with an adaptive choice of p (Preprint 2011)

  7. Dutta S, Ghosh AK (2012) On robust classification using projection depth. Ann Inst Stat Math 64: 657–676

    Article  MATH  MathSciNet  Google Scholar 

  8. Dyckerhoff R (2004) Data depths satisfying the projection property. AStA 88: 163–190

    Article  MATH  MathSciNet  Google Scholar 

  9. Dyckerhoff R, Koshevoy G, Mosler K (1996) Zonoid data depth: theory and computation. In: Prat A (ed) COMPSTAT 1996 Proceedings in computational statistics. Physica-Verlag, Heidelberg, pp 235–240

  10. Ghosh AK, Chaudhuri P (2005) On data depth and distribution free discriminant analysis using separating surfaces. Bernoulli 11: 1–27

    Article  MATH  MathSciNet  Google Scholar 

  11. Ghosh AK, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat 32: 327–350

    Article  MATH  MathSciNet  Google Scholar 

  12. Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

  13. Hubert M, van Driessen K (2004) Fast and robust discriminant analysis. Comput Stat Data Anal 45:301–320

    Google Scholar 

  14. Jornsten R (2004) Clustering and classification based on the L1 data depth. J Multivar Anal 90: 67–89

    Article  MathSciNet  Google Scholar 

  15. Koshevoy G, Mosler K (1997) Zonoid trimming for multivariate distributions. Ann Stat 25: 1998–2017

    Article  MATH  MathSciNet  Google Scholar 

  16. Lange T, Mozharovskyi P, Barath G (2011) Two approaches for solving tasks of pattern recognition and reconstruction of functional dependencies. XIV International conference on applied stochastic models and data analysis, Rome

  17. Li J, Cuesta-Albertos JA, Liu RY (2012) DD-classifier: nonparametric classification procedure based on DD-plot. J Am Stat Assoc 107: 737–753

    Article  MATH  MathSciNet  Google Scholar 

  18. Liu RY (1990) On a notion of data depth based on random simplices. Ann Stat 18: 405–414

    Article  MATH  Google Scholar 

  19. Liu RY, Parelius J, Singh K (1999) Multivariate analysis of the data-depth: descriptive statistics and inference. Ann Stat 27: 783–858

    MATH  MathSciNet  Google Scholar 

  20. Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Acad India 12: 49–55

    Google Scholar 

  21. Mosler K (2002) Multivariate dispersion, central regions and depth: the lift zonoid approach. Springer, New York

  22. Mosler K, Hoberg R (2006) Data analysis and classification with the zonoid depth. In: Liu R, Serfling R, Souvaine D (eds) Data depth: robust multivariate analysis, computational geometry and applications, pp 49–59

  23. Rousseeuw PJ, Hubert M (1999) Regression depth. J Am Stat Assoc 94: 388–433

    Article  MATH  MathSciNet  Google Scholar 

  24. Serfling R (2006) Depth functions in nonparametric multivariate inference. In: Liu R, Serfling R, Souvaine D (eds) Data depth: robust multivariate analysis, computational geometry and applications, pp 1–16

  25. Tukey JW (1974) Mathematics and the picturing of data. In: Proceeding of the international congress of mathematicians, Vancouver, pp 523–531

  26. Vapnik VN (1998) Statistical learning theory. Wiley, New York

  27. Vasil’ev VI (1991) The reduction principle in pattern recognition learning (PRL) problem. Pattern Recogn Image Anal 1:1

    Google Scholar 

  28. Vasil’ev VI (2003) The reduction principle in problems of revealing regularities I. Cybern Syst Anal 39: 686–694

    Article  MATH  Google Scholar 

  29. Vasil’ev VI, Lange T (1998) The duality principle in learning for pattern recognition (in Russian). Kibernetika i Vytschislit’elnaya Technika 121: 7–16

    Google Scholar 

  30. Zuo YJ, Serfling R (2000) General notions of statistical depth function. Ann Stat 28: 461–482

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Karl Mosler.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lange, T., Mosler, K. & Mozharovskyi, P. Fast nonparametric classification based on data depth. Stat Papers 55, 49–69 (2014). https://doi.org/10.1007/s00362-012-0488-4

Download citation

Keywords

  • Alpha-procedure
  • Zonoid depth
  • DD-plot
  • Pattern recognition
  • Supervised learning
  • Misclassification rate
  • Support vector machine