Fast nonparametric classification based on data depth Regular Article

First Online: 10 November 2012 Received: 13 April 2012 Revised: 06 September 2012 DOI :
10.1007/s00362-012-0488-4

Cite this article as: Lange, T., Mosler, K. & Mozharovskyi, P. Stat Papers (2014) 55: 49. doi:10.1007/s00362-012-0488-4
Abstract A new procedure, called D D α -procedure, is developed to solve the problem of classifying d -dimensional objects into q ≥ 2 classes. The procedure is nonparametric; it uses q -dimensional depth plots and a very efficient algorithm for discrimination analysis in the depth space [0,1]^{q} . Specifically, the depth is the zonoid depth, and the algorithm is the α -procedure. In case of more than two classes several binary classifications are performed and a majority rule is applied. Special treatments are discussed for ‘outsiders’, that is, data having zero depth vector. The D Dα -classifier is applied to simulated as well as real data, and the results are compared with those of similar procedures that have been recently proposed. In most cases the new procedure has comparable error rates, but is much faster than other classification approaches, including the support vector machine.

Keywords Alpha-procedure Zonoid depth DD-plot Pattern recognition Supervised learning Misclassification rate Support vector machine

References Asuncion A, Newman D (2007) UCI machine learning repository.

http://archive.ics.uci.edu/ml/
Cascos I (2009) Data depth: multivariate statistics and geometry. In: Kendall W, Molchanov I (eds) New perspectives in stochastic geometry. Oxford University Press, Oxford

Christmann A, Rousseeuw PJ (2001) Measuring overlap in binary regression. Comput Stat Data Anal 37: 65–75

CrossRef MATH MathSciNet Christmann A, Fischer P, Joachims T (2002) Comparison between various regression depth methods and the support vector machine to approximate the minimum number of misclassifications. Comput Stat 17: 273–287

CrossRef MATH MathSciNet Cuesta-Albertos JA, Nieto-Reyes A (2008) The random Tukey depth. Comput Stat Data Anal 52: 4979–4988

CrossRef MATH MathSciNet Dutta S, Ghosh AK (2011) On classification based on L
_{p} depth with an adaptive choice of p (Preprint 2011)

Dutta S, Ghosh AK (2012) On robust classification using projection depth. Ann Inst Stat Math 64: 657–676

CrossRef MATH MathSciNet Dyckerhoff R (2004) Data depths satisfying the projection property. AStA 88: 163–190

CrossRef MATH MathSciNet Dyckerhoff R, Koshevoy G, Mosler K (1996) Zonoid data depth: theory and computation. In: Prat A (ed) COMPSTAT 1996 Proceedings in computational statistics. Physica-Verlag, Heidelberg, pp 235–240

Ghosh AK, Chaudhuri P (2005) On data depth and distribution free discriminant analysis using separating surfaces. Bernoulli 11: 1–27

CrossRef MATH MathSciNet Ghosh AK, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat 32: 327–350

CrossRef MATH MathSciNet Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

Hubert M, van Driessen K (2004) Fast and robust discriminant analysis. Comput Stat Data Anal 45:301–320

Jornsten R (2004) Clustering and classification based on the L1 data depth. J Multivar Anal 90: 67–89

CrossRef MathSciNet Koshevoy G, Mosler K (1997) Zonoid trimming for multivariate distributions. Ann Stat 25: 1998–2017

CrossRef MATH MathSciNet Lange T, Mozharovskyi P, Barath G (2011) Two approaches for solving tasks of pattern recognition and reconstruction of functional dependencies. XIV International conference on applied stochastic models and data analysis, Rome

Li J, Cuesta-Albertos JA, Liu RY (2012)

DD -classifier: nonparametric classification procedure based on

DD -plot. J Am Stat Assoc 107: 737–753

CrossRef MATH MathSciNet Liu RY (1990) On a notion of data depth based on random simplices. Ann Stat 18: 405–414

CrossRef MATH Liu RY, Parelius J, Singh K (1999) Multivariate analysis of the data-depth: descriptive statistics and inference. Ann Stat 27: 783–858

MATH MathSciNet Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Acad India 12: 49–55

Mosler K (2002) Multivariate dispersion, central regions and depth: the lift zonoid approach. Springer, New York

Mosler K, Hoberg R (2006) Data analysis and classification with the zonoid depth. In: Liu R, Serfling R, Souvaine D (eds) Data depth: robust multivariate analysis, computational geometry and applications, pp 49–59

Rousseeuw PJ, Hubert M (1999) Regression depth. J Am Stat Assoc 94: 388–433

CrossRef MATH MathSciNet Serfling R (2006) Depth functions in nonparametric multivariate inference. In: Liu R, Serfling R, Souvaine D (eds) Data depth: robust multivariate analysis, computational geometry and applications, pp 1–16

Tukey JW (1974) Mathematics and the picturing of data. In: Proceeding of the international congress of mathematicians, Vancouver, pp 523–531

Vapnik VN (1998) Statistical learning theory. Wiley, New York

Vasil’ev VI (1991) The reduction principle in pattern recognition learning (PRL) problem. Pattern Recogn Image Anal 1:1

Vasil’ev VI (2003) The reduction principle in problems of revealing regularities I. Cybern Syst Anal 39: 686–694

CrossRef MATH Vasil’ev VI, Lange T (1998) The duality principle in learning for pattern recognition (in Russian). Kibernetika i Vytschislit’elnaya Technika 121: 7–16

Zuo YJ, Serfling R (2000) General notions of statistical depth function. Ann Stat 28: 461–482

CrossRef MATH MathSciNet © Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations 1. Hochschule Merseburg, Geusaer Straße Merseburg Germany 2. Universität zu Köln, Albertus-Magnus-Platz Köln Germany