Learning from Non-iid Data: Fast Rates for the One-vs-All Multiclass Plug-in Classifiers

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9076)


We prove new fast learning rates for the one-vs-all multiclass plug-in classifiers trained either from exponentially strongly mixing data or from data generated by a converging drifting distribution. These are two typical scenarios where training data are not iid. The learning rates are obtained under a multiclass version of Tsybakov’s margin assumption, a type of low-noise assumption, and do not depend on the number of classes. Our results are general and include a previous result for binary-class plug-in classifiers with iid data as a special case. In contrast to previous works for least squares SVMs under the binary-class setting, our results retain the optimal learning rate in the iid case.


Faster Learning Rate Margin Assumption Binary-class Setting Multiclass Version Optimal Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Audibert, J.Y., Tsybakov, A.B.: Fast learning rates for plug-in classifiers. Ann. Stat. 35(2), 608–633 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Kohler, M., Krzyzak, A.: On the rate of convergence of local averaging plug-in classification rules under a margin condition. IEEE Trans. Inf. Theory 53(5), 1735–1742 (2007)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Monnier, J.B.: Classification via local multi-resolution projections. Electron. J. Stat. 6, 382–420 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Minsker, S.: Plug-in approach to active learning. J. Mach. Learn. Res. 13, 67–90 (2012)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Tsybakov, A.B.: Optimal aggregation of classifiers in statistical learning. Ann. Stat. 32, 135–166 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Zhang, T.: Statistical analysis of some multi-category large margin classification methods. J. Mach. Learn. Res. 5, 1225–1251 (2004)zbMATHGoogle Scholar
  7. 7.
    Agarwal, A.: Selective sampling algorithms for cost-sensitive multiclass prediction. In: Proceedings of the International Conference on Machine Learning (2013)Google Scholar
  8. 8.
    Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)zbMATHMathSciNetGoogle Scholar
  9. 9.
    Steinwart, I., Christmann, A.: Fast learning from non-iid observations. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, pp. 1768–1776. MIT Press, Cambridge (2009)Google Scholar
  10. 10.
    Hang, H., Steinwart, I.: Fast learning from alpha-mixing observations. J. Multivar. Anal. 127, 184–199 (2014)CrossRefzbMATHMathSciNetGoogle Scholar
  11. 11.
    Bartlett, P.L.: Learning with a slowly changing distribution. In: COLT 1992Google Scholar
  12. 12.
    Long, P.M.: The complexity of learning according to two models of a drifting environment. Mach. Learn. 37(3), 337–354 (1999)CrossRefzbMATHGoogle Scholar
  13. 13.
    Barve, R.D., Long, P.M.: On the complexity of learning from drifting distributions. In: COLT 1996Google Scholar
  14. 14.
    Mohri, M., Muñoz Medina, A.: New analysis and algorithm for learning with drifting distributions. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS (LNAI), vol. 7568, pp. 124–138. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  15. 15.
    Steinwart, I., Scovel, C.: Fast rates for support vector machines using gaussian kernels. Ann. Stat. 35, 575–607 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Shen, X., Wang, L.: Generalization error for multi-class margin classification. Electron. J. Stat. 1, 307–330 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Pierre, A., Xiaoyin, L., Olivier, W.: Prediction of time series by statistical learning: general losses and fast rates. Depend. Model. 1, 65–93 (2014)Google Scholar
  18. 18.
    Modha, D.S., Masry, E.: Minimum complexity regression estimation with weakly dependent observations. IEEE Trans. Inf. Theory 42(6), 2133–2145 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Cuong, N.V., Ho, L.S.T., Dinh, V.: Generalization and robustness of batched weighted average algorithm with v-geometrically ergodic markov data. In: Jain, S., Munos, R., Stephan, F., Zeugmann, T. (eds.) ALT 2013. LNCS (LNAI), vol. 8139, pp. 264–278. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  20. 20.
    Ané, C.: Analysis of comparative data with hierarchical autocorrelation. Ann. Appl. Stat. 2(3), 1078–1102 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  21. 21.
    Yurinskiĭ, V.: Exponential inequalities for sums of random vectors. J. Multivar. Anal. 6(4), 473–499 (1976)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of MathematicsPurdue UniversityWest LafayetteUSA
  2. 2.Department of BiostatisticsUniversity of CaliforniaLos AngelesUSA
  3. 3.Department of Computer ScienceNational University of SingaporeSingaporeSingapore
  4. 4.Department of StatisticsUniversity of Wisconsin-MadisonMadisonUSA
  5. 5.Department of Computer ScienceUniversity of ScienceHo Chi Minh CityVietnam

Personalised recommendations