Optimal Exponential Bounds on the Accuracy of Classification

Kerkyacharian, G.; Tsybakov, A. B.; Temlyakov, V.; Picard, D.; Koltchinskii, V.

doi:10.1007/s00365-014-9229-3

Optimal Exponential Bounds on the Accuracy of Classification

Published: 26 April 2014

Volume 39, pages 421–444, (2014)
Cite this article

Constructive Approximation Aims and scope

G. Kerkyacharian¹,
A. B. Tsybakov²,
V. Temlyakov³,
D. Picard¹ &
…
V. Koltchinskii⁴

313 Accesses
3 Citations
Explore all metrics

Abstract

Consider a standard binary classification problem, in which $(X,Y)$ is a random couple in $\mathcal{X}\times \{0,1\}$, and the training data consist of $n$ i.i.d. copies of $(X,Y).$ Given a binary classifier $f:\mathcal{X}\mapsto \{0,1\},$ the generalization error of $f$ is defined by $R(f)={\mathbb P}\{Y\ne f(X)\}$. Its minimum $R^*$ over all binary classifiers $f$ is called the Bayes risk and is attained at a Bayes classifier. The performance of any binary classifier $\hat{f}_n$ based on the training data is characterized by the excess risk $R(\hat{f}_n)-R^*$. We study Bahadur’s type exponential bounds on the following minimax accuracy confidence function based on the excess risk:

where the supremum is taken over all distributions $P$ of $(X,Y)$ from a given class of distributions $\mathcal{M}$ and the infimum is over all binary classifiers $\hat{f}_n$ based on the training data. We study how this quantity depends on the complexity of the class of distributions $\mathcal{M}$ characterized by exponents of entropies of the class of regression functions or of the class of Bayes classifiers corresponding to the distributions from $\mathcal{M}.$ We also study its dependence on margin parameters of the classification problem. In particular, we show that, in the case when $\mathcal{X}=[0,1]^d$ and $\mathcal{M}$ is the class of all distributions satisfying the margin condition with exponent $\alpha >0$ and such that the regression function $\eta $ belongs to a given Hölder class of smoothness $\beta >0,$

for some constants $D,\lambda _0>0$.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exact Rate of Convergence of Kernel-Based Classification Rule

PAC-Bayes Bounds for Supervised Classification

Finite Sample Rates for Logistic Regression with Small Noise or Few Samples

Article Open access 21 May 2024

References

Audibert, J.-Y., Tsybakov, A.B.: Fast learning rates for plug-in classifiers. Ann. Stat. 35, 608–633 (2007)
Article MATH MathSciNet Google Scholar
Blanchard, G., Lugosi, G., Vayatis, N.: On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. 4, 861–894 (2003)
MathSciNet Google Scholar
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification and risk bounds. J. Am. Stat. Assoc. 101, 138–156 (2006)
Article MATH MathSciNet Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Applications of Mathematics (New York), vol. 31. Springer-Verlag, New York (1996)
MATH Google Scholar
DeVore, R., Kerkyacharian, G., Picard, D., Temlyakov, V.: Approximation methods for supervised learning. Found. Comput. Math. 6, 3–58 (2006)
Article MATH MathSciNet Google Scholar
Dudley, R.: Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999)
Book MATH Google Scholar
Ibragimov, I.A., Hasminskii, R.Z.: Statistical Estimation: Asymptotic Theory. Springer, New York (1981)
Book MATH Google Scholar
Kolmogorov, A.N., Tikhomorov, V.M.: $\epsilon $-entropy and $\epsilon $-capacity of sets in function spaces. Trans. Am. Math. Soc. 17, 277–364 (1961)
Google Scholar
Koltchinskii, V.: Local Rademacher complexities and Oracle inequalities in risk minimization. Ann. Stat. 34(6), 2593–2656 (2006)
Article MATH MathSciNet Google Scholar
Koltchinskii, V.: Oracle inequalities in empirical risk minimization and sparse recovery problems. Ecole d’été de Probabilités de Saint-Flour 2008. Lecture Notes in Mathematics. Springer, New York (2011)
Massart, P.: Concentration inequalities and model selection. Ecole d’été de Probabilités de Saint-Flour. Lecture Notes in Mathematics. Springer, New York (2007)
Massart, P., Nédélec, É.: Risk bounds for statistical learning. Ann. Stat. 34(5), 2326–2366 (2006)
Article MATH Google Scholar
Pentacaput, N.I.: Optimal exponential bounds on the accuracy of classification. arXiv:1111.6160 (2011)
Steinwart, I., Scovel, J.C.: Fast rates for support vector machines using Gaussian kernels. Ann. Stat. 35, 575–607 (2007)
Article MATH MathSciNet Google Scholar
Temlyakov, V.N.: Approximation in learning theory. Constr. Approx. 27(1), 33–74 (2008)
Article MATH MathSciNet Google Scholar
Tsybakov, A.B.: Optimal aggregation of classifiers in statistical learning. Ann. Stat. 32(1), 135–166 (2004)
Article MATH MathSciNet Google Scholar
Tsybakov, A.B.: Introduction to Nonparametric Estimation. Springer, New York (2009)
Book MATH Google Scholar
Tsybakov, A.B., van de Geer, S.: Square root penalty: adaptation to the margin in classification and in edge estimation. Ann. Stat. 33(3), 1203–1224 (2005)
Article MATH Google Scholar
van der Vaart, A., Wellner, J.: Weak Convergence and Empirical Processes, With Applications to Statistics. Springer-Verlag, New York (1996)
Book MATH Google Scholar
Yang, Y.: Minimax nonparametric classification—part I: rates of convergence. IEEE Trans. Inform. Theory 45, 2271–2284 (1999)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Université Paris-Diderot, CNRS-LPMA, Bâtiment Sophie Germain, 5 rue Thomas Mann, 75205 , Paris CEDEX 13, France
G. Kerkyacharian & D. Picard
CREST-ENSAE, 3 av. Pierre Larousse, 92240 , Malakoff, France
A. B. Tsybakov
Department of Mathematics, University of South Carolina, Columbia, SC , 29208, USA
V. Temlyakov
School of Mathematics, Georgia Institute of Technology, Atlanta, GA , 30332, USA
V. Koltchinskii

Authors

G. Kerkyacharian
View author publications
You can also search for this author in PubMed Google Scholar
A. B. Tsybakov
View author publications
You can also search for this author in PubMed Google Scholar
V. Temlyakov
View author publications
You can also search for this author in PubMed Google Scholar
D. Picard
View author publications
You can also search for this author in PubMed Google Scholar
V. Koltchinskii
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. B. Tsybakov.

Additional information

Communicated by Albert Cohen.

Appendix

Proof of Theorems 2 and 3. We deduce Theorems 2 and 3 from the following fact that we state here as a proposition.

Proposition 4

Let either $0<\alpha <\infty $ and $\varkappa =\frac{1+\alpha }{\alpha }$ or $\alpha =\infty $ and $\varkappa =1$. Then, there exists a constant $C_*>0$ such that, for all $t>0,$

$$\begin{aligned} \sup _{P\in {\mathcal {M}}^*(\rho , \alpha )}{\mathbb P}\biggl \{ R(\hat{f}_{n,2})-R(f^*_P)\ge C_* \biggl [n^{-\frac{\varkappa }{2\varkappa -1+\rho }}\vee \biggl (\frac{t}{n}\biggr )^{\frac{\varkappa }{2\varkappa -1}} \biggr ] \biggr \}\le e^{1-t}. \end{aligned}$$

It is easy to see that Theorem 2 follows from this proposition by taking $t=c n \lambda ^{\frac{2+\alpha }{1+\alpha }}$ with $\lambda \ge c' n^{-\frac{1+\alpha }{2+\alpha (1+\rho )}}$ for some constants $c,c'>0,$ and using that $\varkappa =\frac{1+\alpha }{\alpha }$. To obtain Theorem 3, we take $t=c n \lambda $ with $\lambda \ge c' n^{-\frac{1}{1+\rho }}$.

Proposition 4 will be derived from a general excess risk bound in abstract empirical risk minimization ([10], Theorem 4.3). We will state this result here for completeness. To this end, we need to introduce some notation. Let $\mathcal{G}$ be a class of measurable functions from a probability space $(S,\mathcal{A}_S, P)$ into $[0,1]$, and let $Z_1,\dots , Z_n$ be i.i.d. copies of an observation $Z$ sampled from $P.$ For any probability measure $P$ and any $g\in \mathcal{G}$, introduce the following notation for the expectation:

$$\begin{aligned} Pg=\int \limits _S gdP. \end{aligned}$$

Denote by $P_n$ the empirical measure based on $(Z_1,\dots , Z_n)$, and consider the minimizer of the empirical risk

$$\begin{aligned} \hat{g}_n \triangleq \mathrm{argmin}_{g\in \mathcal{G}}P_n g. \end{aligned}$$

For a function $g\in \mathcal{G},$ define the excess risk

$$\begin{aligned} \mathcal{E}_P(g)\triangleq Pg-\inf \limits _{g'\in \mathcal{G}}Pg'. \end{aligned}$$

The set

$$\begin{aligned} \mathcal{F}_P(\delta )\triangleq \{g\in \mathcal{G}: \, \mathcal{E}_P(g)\le \delta \} \end{aligned}$$

is called the $\delta $-minimal set. The size of such a set will be controlled in terms of its $L_2(P)$-diameter

$$\begin{aligned} D(\delta )\triangleq \sup _{g,g'\in \mathcal{F}_P(\delta )}\Vert g-g'\Vert _{L_2(P)} \end{aligned}$$

and also in terms of the following “localized empirical complexity”:

$$\begin{aligned} \phi _n(\delta )\triangleq {\mathbb E}\sup _{g,g'\in \mathcal{F}_P(\delta )}|(P_n-P)(g-g')|. \end{aligned}$$

We will use these complexity measures to construct an upper confidence bound on the excess risk $\mathcal{E}_P(\hat{f}_{n,2}).$ For a function $\psi :{\mathbb R}_+\mapsto {\mathbb R}_+,$ define

$$\begin{aligned} \psi ^{\flat }(\delta )\triangleq \sup _{\sigma \ge \delta }\frac{\psi (\sigma )}{\sigma }. \end{aligned}$$

Let

$$\begin{aligned} V_n^t(\delta )\triangleq 4\biggl [\phi _n^{\flat }(\delta )+\sqrt{(D^2)^{\flat }(\delta )\frac{t}{n\delta }}+ \frac{t}{n\delta }\biggr ],\ \delta >0, t>0, \end{aligned}$$

and define

$$\begin{aligned} \sigma _n^t\triangleq \inf \{\sigma : V_n^t(\sigma )\le 1\}. \end{aligned}$$

The following result is the first bound of Theorem 4.3 in [10].

Proposition 5

For all $t>0,$

$$\begin{aligned} {\mathbb P}\Big \{\mathcal{E}_P(\hat{f}_{n,2})>\sigma _n^t\Big \}\le e^{1-t}. \end{aligned}$$

In addition to this, we will use the well-known inequality for the expected sup-norm of the empirical process in terms of bracketing entropy, see Theorem 2.14.2 in [19]. More precisely, we will need the following simplified version of that result.

Lemma 2

Let $\mathcal{T}$ be a class of functions from $S$ into $[0,1]$ such that $\Vert g\Vert _{L_2(P)}\le a$ for all $g\in \mathcal{T}.$ Assume that $H_{[\ ]}(a, \mathcal{T},\Vert \cdot \Vert _{L_2(P)})+1\le a^2 n$. Then,

$$\begin{aligned} {\mathbb E}\sup _{g\in \mathcal{T}}|P_n g -P g|\le \frac{\bar{C}}{\sqrt{n}} \int \limits _0^{a} \left( H_{[\ ]}({\varepsilon }, \mathcal{T},\Vert \cdot \Vert _{L_2(P)})+1\right) ^{1/2} d{\varepsilon }, \end{aligned}$$

where $\bar{C}>0$ is a universal constant.

Proof of Proposition 4

Note that if $t>n,$ then $(\frac{t}{n})^{\varkappa /(2\varkappa -1)}> 1,$ and the result holds trivially with $C_*=1$ since $R(\hat{f}_{n,2})-R(f^*_P)\le 1.$ Thus, it is enough to consider the case $t\le n.$

Let $S=\mathcal{X}\times \{0,1\}$ and $P$ be the distribution of $Z=(X,Y)$. We will apply Proposition 5 to the class $\mathcal{G}\triangleq \{g_f: \, g_f(x,y)=I_{\{y \ne f(x)\}}, \ f\in \mathcal{F}\}$. Then, clearly, $Pg_f=R(f)$ and $\mathcal{E}_P (g_f)= R(f)-R(f^*_P)$ for $g_f(x,y)=I_{\{y \ne f(x)\}},$ which implies that

$$\begin{aligned} \mathcal{F}_P(\delta )=\{g_f: f\in \mathcal{F},\ R(f)-R(f^*_P)\le \delta \}. \end{aligned}$$

We also have $\Vert g_{f_1}-g_{f_2}\Vert _{L_2(P)}^2=\Vert f_1-f_2\Vert _{L_1(\mu _X)}.$ Thus, it follows from Lemma 1 that, for all $g_f\in \mathcal{G}$,

$$\begin{aligned} \mathcal{E}_P (g_f)\ge c_M \Vert g_f-g_{f_{P}^{*}}\Vert _{L_2(P)}^{2\varkappa }, \end{aligned}$$

and we get a bound on the $L_2(P)$-diameter of the $\delta $-minimal set $\mathcal{F}_P(\delta ):$ with some constant ${\bar{c}}_1>0$,

$$\begin{aligned}&D(\delta )\le {\bar{c}}_1 \delta ^{1/(2\varkappa )}. \end{aligned}$$

(22)

To bound the function $\phi _n(\delta ),$ we will apply Lemma 2 to the class $\mathcal{T}=\mathcal{F}_P(\delta )$ with $a=1$. Note that

$$\begin{aligned} H_{[\ ]}({\varepsilon }, \mathcal{F}_P(\delta ),\Vert \cdot \Vert _{L_2(P)})&\le 2H_{[\ ]}({\varepsilon }/2,\mathcal{G},\Vert \cdot \Vert _{L_2(P)})\\&\le 2 H_{[\ ]}({\varepsilon }^2/4, \mathcal{F},\Vert \cdot \Vert _{L_1(\mu _X)})\\&\le 2 H_{[\ ]}({\varepsilon }^2/(4c_\mu ), \mathcal{F},\Vert \cdot \Vert _{L_1(\mu )}) . \end{aligned}$$

Using (17), we easily get from Lemma 2 that, with some constants ${\bar{c}}_2, {\bar{c}}_3>0$,

$$\begin{aligned} \phi _n(\delta )\le {\bar{c}}_2 \delta ^{\frac{1-\rho }{2\varkappa }} n^{-1/2},\ \ \delta \ge {\bar{c}}_3 n^{-\frac{\varkappa }{1+\rho }}, \end{aligned}$$

which implies that, with some constant ${\bar{c}}_4>0$,

$$\begin{aligned} \phi _n(\delta )\le {\bar{c}}_4 \max \Big (\delta ^{\frac{1-\rho }{2\varkappa }} n^{-1/2}, n^{-\frac{1}{1+\rho }}\Big ),\delta >0. \end{aligned}$$

This and (22) lead to the following bound on the function $V_n^t(\delta )$:

$$\begin{aligned} V_n^t(\delta )\le {\bar{c}}_5 \biggl [\delta ^{\frac{1-\rho }{2\varkappa }-1}n^{-1/2}\vee \delta ^{-1}n^{-\frac{1}{1+\rho }}+\delta ^{\frac{1}{2\varkappa }-1} \sqrt{\frac{t}{n}}+ \delta ^{-1}\frac{t}{n}\biggr ] \end{aligned}$$

that holds with some constant ${\bar{c}}_5.$ Thus, we end up with a bound on $\sigma _n^t:$

$$\begin{aligned} \sigma _n^{t}\le {\bar{c}}_6 \biggl [n^{-\frac{\varkappa }{2\varkappa -1+\rho }}\vee n^{-\frac{1}{1+\rho }}\vee \biggl (\frac{t}{n}\biggr )^{\varkappa /(2\varkappa -1)} \vee \frac{t}{n} \biggr ]. \end{aligned}$$

(23)

Note that, for $\varkappa \ge 1,\;\rho < 1$, and $t\le n,$ we have

$$\begin{aligned} n^{-\varkappa /(2\varkappa -1+\rho )}\ge n^{-1/(1+\rho )}\ \ \mathrm{and}\ \ \left( \frac{t}{n}\right) ^{\varkappa /(2\varkappa -1)} \ge \frac{t}{n}. \end{aligned}$$

Therefore, (23) can be simplified as follows:

$$\begin{aligned} \sigma _n^{t}\le {\bar{c}}_7 \biggl [n^{-\frac{\varkappa }{2\varkappa -1+\rho }}+ \biggl (\frac{t}{n}\biggr )^{\varkappa /(2\varkappa -1)} \biggr ], \end{aligned}$$

and the result immediately follows from Proposition 5.$\square $

1.1 Tools for the Minimax Lower Bounds

For two probability measures $\mu $ and $\nu $ on a measurable space $({\mathcal {X}}, {\mathcal A})$, we define the Kullback-Leibler divergence and the $\chi ^2$-divergence as follows:

$$\begin{aligned} \mathcal {K}(\mu ,\nu ) \triangleq \int \limits _{{\mathcal {X}}} g\ln g d\nu , \quad \chi ^2(\mu ,\nu ) \triangleq \int \limits _{{\mathcal {X}}} (g-1)^2 d\nu , \end{aligned}$$

if $\mu $ is absolutely continuous with respect to $\nu $ with Radon-Nikodym derivative $g=\frac{d\mu }{d\nu },$ and we set $\mathcal {K}(\mu ,\nu )\triangleq +\infty $, $\chi ^2(\mu ,\nu )\triangleq +\infty $ otherwise.

We will use the following auxiliary result.

Lemma 3

Let $({\mathcal {X}}, {\mathcal A})$ be a measurable space, and let $A_i \in {\mathcal A},\; i\in \{ 0,1,\dots ,M\}, M\ge 2$, be such that $\forall i\ne j,\; A_i\cap A_j =\emptyset .$ Assume that $Q_i$, $i\in \{0,1\dots ,M\}$, are probability measures on $({\mathcal {X}}, {\mathcal A})$ such that

$$\begin{aligned} \frac{1}{M}\sum _{j=1}^M \mathcal {K}(Q_j,Q_0) \le \chi <\infty . \end{aligned}$$

Then,

$$\begin{aligned} p_*\triangleq \max _{0\le i \le M} Q_{i}({\mathcal {X}}{\setminus } A_i) \ge \frac{1}{12}\min \{1, \, M e^{-3\chi }\}. \end{aligned}$$

Proof

Proposition 2.3 in [17] yields

$$\begin{aligned} p_*\ge \sup _{0<\tau <1}\frac{\tau M}{\tau M +1} \left( 1+\frac{\chi + \sqrt{\chi /2}}{\log \tau }\right) . \end{aligned}$$

In particular, taking $\tau ^*=\min (M^{-1}, e^{-3\chi })$ and using that $\sqrt{6\log M} \ge 2$ for $M\ge 2$, we obtain

$$\begin{aligned} p_*\ge \frac{\tau ^* M}{\tau ^* M +1}\left( 1+\frac{\chi + \sqrt{\chi /2}}{\log \tau ^*}\right) \ge \frac{1}{12}\min \{1, \, M e^{-3\chi }\}. \end{aligned}$$

$\square $

We now prove a classification setting analog of the lower bound obtained by DeVore et al. [5] in the regression problem.

Theorem 5

Assume that a class $\Theta $ of probability distributions $P$ with the corresponding regression functions $\eta _P$ and Bayes rules $f^*_{P}$ (as defined above), contains a set $\{{P_i}\}_{i=1}^N \subset \Theta ,\; N\ge 3$, with the following properties: the marginal distribution of $X$ is $\mu _X$ for all $P_i$, independently of $i$, where $\mu _X$ is an arbitrary probability measure, $1/4\le \eta _{P_i}\le 3/4,\; i=1,\dots ,N$, and for any $i\ne j$,

$$\begin{aligned} \Vert \eta _{P_i}-\eta _{P_j}\Vert _{L_2(\mu _X)}\le \gamma , \end{aligned}$$

(24)

$$\begin{aligned} \Vert f^*_{P_i}-f^*_{P_j}\Vert _{L_1(\mu _X)}\ge s \end{aligned}$$

(25)

with some $\gamma >0,\; s>0$. Then, for any classifier $\hat{f}_n$, we have

$$\begin{aligned} \max _{1\le k \le N}\mathbb {P}_k\{\Vert \hat{f}_n-f^*_{P_k}\Vert _{L_1(\mu _X)}\ge s/{2}\} \ge \frac{1}{12}\min \big (1, \, (N-1) \exp \{-24 n\gamma ^2\}\big ), \end{aligned}$$

where $\mathbb {P}_k$ denotes the product probability measure associated to the i.i.d. $n$-sample from $P_k$.

Proof

We apply Lemma 3, where we set $Q_i=\mathbb {P}_i$, $M=N-1$, and define the random events $A_i$ as follows:

$$\begin{aligned} A_i\triangleq \{{\mathcal {D}}_n:\Vert \hat{f}_n-f^*_{P_i}\Vert _{L_1(\mu _X)}<s/2\},\quad i=1,\dots , N. \end{aligned}$$

The events $A_i$ are disjoint because of (25). Thus, the theorem follows from Lemma 3 if we prove that $ \mathcal {K}(\mathbb {P}_i,\mathbb {P}_j)\le 8n\gamma ^2$ for all $i,j$.

Let us evaluate $ \mathcal {K}(\mathbb {P}_i,\mathbb {P}_j)$. For each $\eta _{P_i}$, the corresponding measure $P_i$ is determined as follows:

$$\begin{aligned} dP_i(x,y)\triangleq (\eta _{P_i}(x)d\delta _{1}(y)+ (1-\eta _{P_i}(x))d\delta _{0}(y))d\mu _X(x), \end{aligned}$$

where $d\delta _\xi $ denotes the Dirac measure with unit mass at $\xi $. Set for brevity $\eta _i\triangleq \eta _{P_i}$. Fix $i$ and $j$. We have $dP_i(x,y)= g(x,y)dP_j(x,y)$, where

$$\begin{aligned} g(x,1)= \frac{\eta _i(x)}{\eta _j(x)},\quad g(x,0)=\frac{1-\eta _i(x)}{1-\eta _j(x)}. \end{aligned}$$

Therefore, using the inequalities $1/4\le \eta _{i}, \eta _j\le 3/4$ and (24), we find

$$\begin{aligned} \chi ^2({P}_i, {P}_j)&= \int \left\{ \frac{(\eta _i(x)-\eta _j(x))^2}{\eta _j(x)}+ \frac{(\eta _i(x)-\eta _j(x))^2}{1-\eta _j(x)}\right\} d\mu _X(x)\nonumber \\&\le 8\Vert \eta _i-\eta _j\Vert _{L_2(\mu _X)}^2 \le 8\gamma ^2. \end{aligned}$$

(26)

Together with inequality between the Kullback and $\chi ^2$-divergences, cf. [17], p. 90, this yields

$$\begin{aligned} \mathcal {K}(\mathbb {P}_i,\mathbb {P}_j) = n\mathcal {K}({P}_i, {P}_j) \le n\chi ^2({P}_i,{P}_j) \le 8n\gamma ^2. \end{aligned}$$

$\square $

Comment. The preprint version of this paper was posted on the Arxiv under the pseudonym N.I. Pentacaput [13]. Then the paper was submitted to “Constructive Approximation” and was accepted for publication under this pseudonym. However, it turns out that because of the Publisher rules no paper can be published under a pseudonym. As a result, we publish it under our real names that we have chosen to arrange in a random order.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kerkyacharian, G., Tsybakov, A.B., Temlyakov, V. et al. Optimal Exponential Bounds on the Accuracy of Classification. Constr Approx 39, 421–444 (2014). https://doi.org/10.1007/s00365-014-9229-3

Download citation

Received: 25 November 2011
Revised: 03 December 2012
Accepted: 25 March 2013
Published: 26 April 2014
Issue Date: June 2014
DOI: https://doi.org/10.1007/s00365-014-9229-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal Exponential Bounds on the Accuracy of Classification

Abstract

Access this article

Similar content being viewed by others

Exact Rate of Convergence of Kernel-Based Classification Rule

PAC-Bayes Bounds for Supervised Classification

Finite Sample Rates for Logistic Regression with Small Noise or Few Samples

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Proposition 4

Proposition 5

Lemma 2

Proof of Proposition 4

1.1 Tools for the Minimax Lower Bounds

Lemma 3

Proof

Theorem 5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Optimal Exponential Bounds on the Accuracy of Classification

Abstract

Access this article

Similar content being viewed by others

Exact Rate of Convergence of Kernel-Based Classification Rule

PAC-Bayes Bounds for Supervised Classification

Finite Sample Rates for Logistic Regression with Small Noise or Few Samples

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Proposition 4

Proposition 5

Lemma 2

Proof of Proposition 4

1.1 Tools for the Minimax Lower Bounds

Lemma 3

Proof

Theorem 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation