Machine Learning

, Volume 108, Issue 8–9, pp 1635–1652 | Cite as

On the analysis of adaptability in multi-source domain adaptation

  • Ievgen RedkoEmail author
  • Amaury Habrard
  • Marc Sebban
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2019 Journal Track


In many real-world applications, it may be desirable to benefit from a classifier trained on a given source task from some largely annotated dataset in order to address a different but related target task for which only weakly labeled data are available. Domain adaptation (DA) is the framework which aims at leveraging the statistical similarities between the source and target distributions to learn well. Current theoretical results show that the efficiency of DA algorithms depends on (i) their capacity of minimizing the divergence between the source and target domains and (ii) the existence of a good hypothesis that commits few errors in both domains. While most of the work in DA has focused on new divergence measures, the second aspect, often modeled as the capability term, remains surprisingly under-investigated. In this paper, we show that the problem of the best joint hypothesis estimation can be reformulated using a Wasserstein distance-based error function in the context of multi-source DA. Based on this idea, we provide a theoretical analysis of the capability term and derive inequalities allowing us to estimate it from finite samples. We empirically illustrate the proposed idea on different data sets.


Transfer learning Domain adaptation Learning theory 



Funding was provided by Agence nationale de la recherche (Grant No. ANR-15-CE23-0026).


  1. Agueh, M., & Carlier, G. (2011). Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43(2), 904–924.MathSciNetCrossRefzbMATHGoogle Scholar
  2. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. ICML, 70, 214–223.Google Scholar
  3. Álvarez Esteban, P., del Barrio, E., Cuesta-Albertos, J. A., & Matrán, C. (2011). Uniqueness and approximate computation of optimal incomplete transportation plans. Annales de l’Institut Henri Poincaré B: Probability and Statistics, 47(2), 358–375.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2007). Analysis of representations for domain adaptation. In: NIPS (pp. 137–144).Google Scholar
  5. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. (2010a). A theory of learning from different domains. Machine Learning, 79, 151–175.MathSciNetCrossRefGoogle Scholar
  6. Ben-David, S., Lu, T., Luu, T., & Pàl, D. (2010b). Impossibility theorems for domain adaptation. AISTATS, 9, 129–136.Google Scholar
  7. Bigot, J., Gouet, R., Klein, T., & Lopez, A. (2018). Upper and lower risk bounds for estimating the Wasserstein barycenter of random measures on the real line. Electronic Journal of Statistics, 12(02), 2253–2289. Google Scholar
  8. Bigot, J., Cazelles, E., & Papadakis, N. (2018a). Data-driven regularization of Wasserstein barycenters with an application to multivariate density registration. ArXiv e-prints arXiv:1804.08962v2.
  9. Bigot, J., Cazelles, E., & Papadakis, N. (2018b). Penalization of barycenters in the Wasserstein space. ArXiv e-prints.Google Scholar
  10. Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Wortman, J. (2008). Learning bounds for domain adaptation. In: NIPS.Google Scholar
  11. Boissard, E., Le Gouic, T., & Loubes, J. M. (2015). Distribution’s template estimate with Wasserstein metrics. Bernoulli, 21(2), 740–759.MathSciNetCrossRefzbMATHGoogle Scholar
  12. Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2015). Unbalanced optimal transport: Geometry and Kantorovich formulation. ArXiv e-prints 1508.05216v2.
  13. Cortes, C., & Mohri, M. (2014). Domain adaptation and sample bias correction theory and algorithm for regression. Theoretical Computer Science, 519, 103–126.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Crammer, K., Kearns, M., & Wortman, J. (2008). Learning from multiple sources. Journal of Machine Learning Research, 9, 1757–1774.MathSciNetzbMATHGoogle Scholar
  15. Cuturi , M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In: NIPS (pp. 2292–2300).Google Scholar
  16. Cuturi, M., & Doucet, A. (2014). Fast computation of Wasserstein barycenters. In: ICML (pp. 685–693).Google Scholar
  17. Fournier, N., & Guillin, A. (2015). On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162(3–4), 707.MathSciNetCrossRefzbMATHGoogle Scholar
  18. Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., & Poggio, T. A. (2015). Learning with a Wasserstein loss. In: NIPS (pp. 2053–2061).Google Scholar
  19. Genevay, A., Peyré, G., & Cuturi, M. (2018). Learning generative models with Sinkhorn divergences. In: AISTATS (pp. 1608–1617).Google Scholar
  20. Germain, P., Habrard, A., Laviolette, F., & Morvant, E. (2013). A Pac-Bayesian approach for domain adaptation with specialization to linear classifiers. In: ICML (pp. 738–746).Google Scholar
  21. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In: NIPS (pp. 2672–2680).Google Scholar
  22. Le Gouic, T., & Loubes, J. M. (2017). Existence and consistency of Wasserstein barycenters. Probability Theory and Related Fields, 168, 901–917.MathSciNetCrossRefzbMATHGoogle Scholar
  23. Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009a). Domain adaptation: Learning bounds and algorithms. In: COLT.Google Scholar
  24. Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009b). Multiple source adaptation and the Rényi divergence. In: UAI (pp. 367–374).Google Scholar
  25. Margolis, A. (2011). A literature review on domain adaptation with unlabeled data. Technical report, University of Washington. Google Scholar
  26. Monge, G. (1781). Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences (pp. 666–704).Google Scholar
  27. Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRefGoogle Scholar
  28. Redko, I., Habrard, A., & Sebban, M. (2017). Theoretical analysis of domain adaptation with optimal transport. In: ECML/PKDD (pp. 737–753).Google Scholar
  29. Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.CrossRefzbMATHGoogle Scholar
  30. Sinkhorn, R., & Knopp, P. (1967). Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21, 343–348.MathSciNetCrossRefzbMATHGoogle Scholar
  31. Smola, A., Gretton, A., Song, L., & Schölkopf, B. (2007). A hilbert space embedding for distributions. In: ALT (pp. 13–31).Google Scholar
  32. Zhang, C., Zhang, L., & Ye, J. (2012). Generalization bounds for domain adaptation. In: NIPS (pp. 3320–3328).Google Scholar
  33. Zhang, Z., & Müller, H. G. (2011). Functional density synchronization. Computational Statistics and Data Analysis, 55(7), 2234–2249.MathSciNetCrossRefzbMATHGoogle Scholar
  34. Zolotarev, V. M. (1984). Probability metrics. Theory of Probability and Its Applications, 28(2), 278–302.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.CNRS, Institut d Optique Graduate School Laboratoire Hubert Curien UMR 5516Univ Lyon, UJM-Saint-EtienneSaint-ÉtienneFrance

Personalised recommendations