Machine Learning

, Volume 79, Issue 1, pp 151–175

A theory of learning from different domains


  • Shai Ben-David
    • David R. Cheriton School of Computer ScienceUniversity of Waterloo
    • Department of Computer ScienceUC Berkeley
  • Koby Crammer
    • Department of Electrical EngineeringThe Technion
  • Alex Kulesza
    • Department of Computer and Information ScienceUniversity of Pennsylvania
  • Fernando Pereira
    • Google Research
  • Jennifer Wortman Vaughan
    • School of Engineering and Applied SciencesHarvard University

DOI: 10.1007/s10994-009-5152-4

Cite this article as:
Ben-David, S., Blitzer, J., Crammer, K. et al. Mach Learn (2010) 79: 151. doi:10.1007/s10994-009-5152-4


Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time?

We address the first question by bounding a classifier’s target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier.

We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors.


Domain adaptation Transfer learning Learning theory Sample-selection bias

Copyright information

© The Author(s) 2009