Machine Learning

, Volume 106, Issue 2, pp 171–195 | Cite as

Fast rates by transferring from auxiliary hypotheses

Article

Abstract

In this work we consider the learning setting where, in addition to the training set, the learner receives a collection of auxiliary hypotheses originating from other tasks. We focus on a broad class of ERM-based linear algorithms that can be instantiated with any non-negative smooth loss function and any strongly convex regularizer. We establish generalization and excess risk bounds, showing that, if the algorithm is fed with a good combination of source hypotheses, generalization happens at the fast rate \(\mathcal {O}(1/m)\) instead of the usual \(\mathcal {O}(1/\sqrt{m})\). On the other hand, if the source hypotheses combination is a misfit for the target task, we recover the usual learning rate. As a byproduct of our study, we also prove a new bound on the Rademacher complexity of the smooth loss class under weaker assumptions compared to previous works.

Keywords

Fast-rate generalization bounds Transfer learning Domain adaptation Rademacher complexity Smooth loss functions Strongly-convex regularizers 

References

  1. Aytar, Y., & Zisserman, A. (2011). Tabula rasa: Model transfer for object category detection. In IEEE International Conference on Computer Vision (ICCV), (pp. 2252–2259). IEEE.Google Scholar
  2. Bartlett, P. L., & Mendelson, S. (2003). Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.MathSciNetMATHGoogle Scholar
  3. Bartlett, P. L., Bousquet, O., & Mendelson, S. (2005). Local Rademacher complexities. Annals of Statistics, 33(4), 1497–1537.MathSciNetCrossRefMATHGoogle Scholar
  4. Ben-David, S., & Urner, R. (2012). On the hardness of domain adaptation and the utility of unlabeled target samples. In Algorithmic learning theory, lecture notes in computer science (Vol. 7568, pp. 139–153). Springer.Google Scholar
  5. Ben-David, S., & Urner, R. (2013). Domain adaptation as learning with auxiliary information. In New Directions in Transfer and Multi-Task - Workshop @ Advances in Neural Information Processing Systems.Google Scholar
  6. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine learning, 79(1–2), 151–175.MathSciNetCrossRefGoogle Scholar
  7. Bergamo, A., & Torresani, L. (2014). Classemes and other classifier-based features for efficient object categorization. In IEEE Transactions on Pattern Analysis and Machine Intelligence, (pp. 99).Google Scholar
  8. Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Conference on Computational learning theory, ACM, (pp. 92–100).Google Scholar
  9. Bousquet, O. (2002). Concentration inequalities and empirical processes theory applied to the analysis of learning algorithms. PhD thesis, Ecole Polytechnique.Google Scholar
  10. Bousquet, O., & Elisseeff, A. (2002). Stability and Generalization. Journal of Machine Learning Research, 2, 499–526.MathSciNetMATHGoogle Scholar
  11. Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75.MathSciNetCrossRefGoogle Scholar
  12. Chapelle, O., Schölkopf, B., Zien, A., et al. (2006). Semi-supervised learning (Vol. 2). Cambridge: MIT Press.CrossRefGoogle Scholar
  13. Cortes, C., & Mohri, M. (2014). Domain adaptation and sample bias correction theory and algorithm for regression. Theoretical Computer Science, 519, 103–126.MathSciNetCrossRefMATHGoogle Scholar
  14. Daumé III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics.Google Scholar
  15. Daumé III, H., Kumar, A., & Saha, A. (2010). Frustratingly easy semi-supervised domain adaptation. In Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, Association for Computational Linguistics, (pp. 53–59).Google Scholar
  16. Duan, L., Tsang, I. W., Xu, D., & Chua, T. (2009). Domain adaptation from multiple sources via auxiliary classifiers. In International Conference on Machine Learning, (pp. 289–296).Google Scholar
  17. Hoorfar, A., & Hassani, M. (2008). Inequalities on the lambert w function and hyperpower function. Journal of Inequalities in Pure and Applied Mathematics, 9(2), 5–9.MathSciNetMATHGoogle Scholar
  18. Kakade, S. M., Sridharan, K., & Tewari, A. (2008). On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In Advances in Neural Information Processing Systems, 21, (pp. 793–800).Google Scholar
  19. Kakade, S. M., Shalev-Shwartz, S., & Tewari, A. (2012). Regularization techniques for learning with matrices. Journal of Machine Learning Research, 13, 1865–1890.MathSciNetMATHGoogle Scholar
  20. Kienzle, W., & Chellapilla, K. (2006). Personalized handwriting recognition via biased regularization. In International Conference on Machine Learning, (pp. 457–464).Google Scholar
  21. Kuzborskij, I., & Orabona, F. (2013). Stability and Hypothesis Transfer Learning. In International Conference on Machine Learning, (pp. 942–950).Google Scholar
  22. Kuzborskij, I., Orabona, F., & Caputo, B. (2013). From N to N+1: Multiclass transfer incremental learning. In Conference on Computer Vision and Pattern Recognition, (pp. 3358–3365).Google Scholar
  23. Kuzborskij, I., Orabona, F., & Caputo, B. (2015). Transfer Learning through Greedy Subset Selection. In International Conference on Image Analysis and Processing.Google Scholar
  24. Li, L., Su, H., Xing, E. P., & Fei-Fei, L. (2010). Object bank: A high-level image representation for scene classification & semantic feature sparsification. In Advances in Neural Information Processing Systems, 23, (pp. 1378–1386).Google Scholar
  25. Li, X., & Bilmes, J. (2007). A bayesian divergence prior for classiffier adaptation. In International Conference on Artificial Intelligence and Statistics, (pp. 275–282).Google Scholar
  26. Luo, J., Tommasi, T., & Caputo B. (2011). Multiclass transfer learning from unconstrained priors. In International Conference on Computer Vision, (pp. 1863–1870).Google Scholar
  27. Mansour, Y., Mohri, M., & Rostamizadeh, A. (2008). Domain adaptation with multiple sources. In Advances in Neural Information Processing Systems, 21, (pp. 1041–1048).Google Scholar
  28. Mansour Y., Mohri M., & Rostamizadeh, A. (2009). Domain adaptation: Learning bounds and algorithms. In The Conference on Learning Theory.Google Scholar
  29. Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning. Cambridge: The MIT Press.MATHGoogle Scholar
  30. Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Conference on Computer Vision and Pattern Recognition, (pp. 1717–1724).Google Scholar
  31. Orabona, F., Castellini, C., Caputo, B., Fiorilla, A., & Sandini, G. (2009). Model Adaptation with Least-Squares SVM for Adaptive Hand Prosthetics. In IEEE International Conference on Robotics and Automation, (pp. 2897–2903). IEEE.Google Scholar
  32. Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRefGoogle Scholar
  33. Schölkopf, B., Herbrich, R., & Smola, A. J. (2001). A generalized representer theorem. In Conference on Computational learning theory, (pp. 416–426). Springer.Google Scholar
  34. Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge: Cambridge University Press.CrossRefMATHGoogle Scholar
  35. Sharmanska, V., Quadrianto, N., & Lampert, C. H. (2013). Learning to rank using privileged information. In IEEE International Conference on Computer Vision (ICCV), (pp. 825–832). IEEE.Google Scholar
  36. Srebro, N., Sridharan, K., & Tewari, A. (2010). Smoothness, low noise and fast rates. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in neural information processing systems, 23 (pp. 2199–2207). Red Hook: Curran Associates, Inc.Google Scholar
  37. Taylor, M. E., & Stone, P. (2009). Transfer leraning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10, 1633–1685.MATHGoogle Scholar
  38. Thrun, S., & Pratt, L. (1998). Learning to learn. New York: Springer.CrossRefMATHGoogle Scholar
  39. Tommasi, T., Orabona, F., & Caputo, B. (2010). Safety in numbers: Learning categories from few examples with multi model knowledge transfer. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR, San Francisco, CA, USA, 13–18 June 2010, (pp. 3081–3088).Google Scholar
  40. Tommasi, T., Orabona, F., Castellini, C., & Caputo, B. (2013). Improving control of dexterous hand prostheses using adaptive learning. IEEE Transactions on Robotics, 29(1), 207–219.CrossRefGoogle Scholar
  41. Tommasi, T., Orabona, F., & Caputo, B. (2014). Learning categories from few examples with multi model knowledge transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 928–941.CrossRefGoogle Scholar
  42. Vapnik, V., & Vashist, A. (2009). A new learning paradigm: Learning using privileged information. Neural Networks, 22(5), 544–557.CrossRefMATHGoogle Scholar
  43. Vito, E. D., Caponnetto, A., & Rosasco, L. (2005). Model selection for regularized least-squares algorithm in learning theory. Foundations of Computational Mathematics, 5(1), 59–85.MathSciNetCrossRefMATHGoogle Scholar
  44. Yang, J., Yan, R., & Hauptmann, A. (2007). Cross-Domain Video Concept Detection Using Adaptive SVMs. In Proceedings of the 15th international conference on Multimedia, ACM, (pp. 188–197).Google Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  1. 1.Idiap Research InstituteMartignySwitzerland
  2. 2.Department of Computer ScienceStony Brook UniversityStony BrookUSA

Personalised recommendations