Advertisement

Foundations of Computational Mathematics

, Volume 18, Issue 4, pp 971–1013 | Cite as

Optimal Rates for Regularization of Statistical Inverse Learning Problems

  • Gilles Blanchard
  • Nicole MückeEmail author
Article
  • 377 Downloads

Abstract

We consider a statistical inverse learning (also called inverse regression) problem, where we observe the image of a function f through a linear operator A at i.i.d. random design points \(X_i\), superposed with an additive noise. The distribution of the design points is unknown and can be very general. We analyze simultaneously the direct (estimation of Af) and the inverse (estimation of f) learning problems. In this general framework, we obtain strong and weak minimax optimal rates of convergence (as the number of observations n grows large) for a large class of spectral regularization methods over regularity classes defined through appropriate source conditions. This improves on or completes previous results obtained in related settings. The optimality of the obtained rates is shown not only in the exponent in n but also in the explicit dependency of the constant factor in the variance of the noise and the radius of the source condition set.

Keywords

Reproducing kernel Hilbert space Spectral regularization Inverse problem Statistical learning Minimax convergence rates 

Mathematics Subject Classification

62G08 62G20 65J22 68Q32 

References

  1. 1.
    F. Bauer, S. Pereverzev, and L. Rosasco. On regularization algorithms in learning theory. J. Complexity, 23(1):52–72, 2007.MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    R. Bhatia. Matrix Analysis. Springer, 1997.Google Scholar
  3. 3.
    R. Bhatia and J. Holbrook. Fréchet derivatives of the power function. Indiana University Mathematics Journal, 49 (3):1155–1173, 2000.MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    N. H. Bingham, C. M. Goldie, and J. L. Teugels. Regular Variation, volume 27 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1987.Google Scholar
  5. 5.
    N. Bissantz, T. Hohage, A. Munk, and F. Ruymgaart. Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J. Numer. Analysis, 45(6):2610–2636, 2007.MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    G. Blanchard and N. Krämer. Convergence rates of kernel conjugate gradient for random design regression. Analysis and Applications, 2016.Google Scholar
  7. 7.
    G. Blanchard and P. Massart. Discussion of “2004 IMS medallion lecture: Local Rademacher complexities and oracle inequalities in risk minimization”, by V. Koltchinskii. Annals of Statistics, 34(6):2664–2671, 2006.Google Scholar
  8. 8.
    P. Bühlmann and B. Yu. Boosting with the \(l_2\)-loss: Regression and classification. Journal of American Statistical Association, 98(462):324–339, 2003.MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    A. Caponnetto. Optimal rates for regularization operators in learning theory. Technical report, MIT, 2006.Google Scholar
  10. 10.
    A. Caponnetto and Y. Yao. Cross-validion based adaptation for regularization operators in learning theory. Analysis and Applications, 8(2):161–183, 2010.MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    F. Cucker and S. Smale. Best choices for regularization parameters in learning theory: on the bias-variance problem. Foundations of Computational Mathematics, 2(4):413–428, 2002.MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    E. De Vito and A. Caponnetto. Optimal rates for regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2006.MathSciNetzbMATHGoogle Scholar
  13. 13.
    E. De Vito, L. Rosasco, and A. Caponnetto. Discretization error analysis for Tikhonov regularization. Analysis and Applications, 4(1):81–99, 2006.MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    E. De Vito, L. Rosasco, A. Caponnetto, and U. De Giovannini. Learning from examples as an inverse problem. J. of Machine Learning Research, 6:883–904, 2005.MathSciNetzbMATHGoogle Scholar
  15. 15.
    R. DeVore, G. Kerkyacharian, D. Picard, and V.Temlyakov. Mathematical methods for supervised learning. Foundations of Computational Mathematics, 6(1):3–58, 2006.MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    L. Dicker, D. Foster, and D. Hsu. Kernel methods and regularization techniques for nonparametric regression: Minimax optimality and adaptation. Technical report, Rutgers University, 2015.Google Scholar
  17. 17.
    H. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems. Kluwer Academic Publishers, 2000.Google Scholar
  18. 18.
    K. Fukumizu, F. R. Bach, and A. Gretton. Statistical consistency of kernel canonical correlation analysis. Journal of Machine Learning Research, 8:361–383, 2007.MathSciNetzbMATHGoogle Scholar
  19. 19.
    L. L. Gerfo, L. Rosasco, F. Odone, E. De Vito, and A. Verri. Spectral algorithms for supervised learning. Neural Computation, 20(7):1873–1897, 2008.MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural network architectures. Neural Computation, 7(2):219–269, 1993.CrossRefGoogle Scholar
  21. 21.
    L. Györfi, M. Kohler, A. Krzyzak, and H. Walk. A Distribution-free Theory of Nonparametric Regression. Springer, 2002.Google Scholar
  22. 22.
    P. Halmos and V. Sunder. Bounded Integral Operators on \(L^2\) -Spaces. Springer, 1978.Google Scholar
  23. 23.
    L. Hörmander. The analysis of linear partial differential operators I. Springer, 1983.Google Scholar
  24. 24.
    S. Loustau. Inverse statistical learning. Electron. J. Statist., 7:2065–2097, 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    S. Loustau and C. Marteau. Minimax fast rates for discriminant analysis with errors in variables. Bernoulli, 21(1):176–208, 2015.MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    P. Mathé and S. Pereverzev. Geometry of linear ill-posed problems in variable Hilbert scales. Inverse Problems, 19(3):789, 2003.MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    S. Mendelson and J. Neeman. Regularization in kernel learning. The Annals of Statistics, 38(1):526–565, 2010.MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    F. O’Sullivan. Convergence characteristics of methods of regularization estimators for nonlinear operator equations. SIAM J. Numer. Anal., 27(6):1635–1649, 1990.MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    I. F. Pinelis and A. I. Sakhanenko. Remarks on inequalities for probabilities of large deviations. Theory Probab. Appl., 30(1):143–148, 1985.MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    S. Smale and D. Zhou. Shannon sampling II: Connections to learning theory. Appl. Comput. Harmon. Analysis, 19(3):285–302, 2005.MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    S. Smale and D. Zhou. Learning theory estimates via integral operators and their approximation. Constructive Approximation, 26(2):153–172, 2007.MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    I. Steinwart and A. Christman. Support Vector Machines. Springer, 2008.Google Scholar
  33. 33.
    I. Steinwart, D. Hush, and C. Scovel. Optimal rates for regularized least squares regression. Proceedings of the 22nd Annual Conference on Learning Theory, pages 79–93, 2009.Google Scholar
  34. 34.
    V. Temlyakov. Approximation in learning theory. Constructive Approximation, 27(1):33–74, 2008.MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    A. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2008.Google Scholar
  36. 36.
    G. Wahba. Spline Models for Observational Data, volume 59. SIAM CBMS-NSF Series in Applied Mathematics, 1990.Google Scholar
  37. 37.
    C. Wang and D.-X. Zhou. Optimal learning rates for least squares regularized regression with unbounded sampling. Journal of Complexity, 27(1):55–67, 2011.MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Y. Yao, L. Rosasco, and A. Caponnetto. On early stopping in gradient descent learning. Constructive Approximation, 26(2):289–315, 2007.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© SFoCM 2017

Authors and Affiliations

  1. 1.Institute of MathematicsUniversity of PotsdamPotsdamGermany

Personalised recommendations