Domain adaptation–can quantity compensate for quality?
The Domain Adaptation problem in machine learning occurs when the distribution generating the test data differs from the one that generates the training data. A common approach to this issue is to train a standard learner for the learning task with the available training sample (generated by a distribution that is different from the test distribution). One can view such learning as learning from a not-perfectly-representative training sample. The question we focus on is under which circumstances large sizes of such training samples can guarantee that the learned classifier preforms just as well as one learned from target generated samples. In other words, are there circumstances in which quantity can compensate for quality (of the training data)? We give a positive answer, showing that this is possible when using a Nearest Neighbor algorithm. We show this under some assumptions about the relationship between the training and the target data distributions (the assumptions of covariate shift as well as a bound on the ratio of certain probability weights between the source (training) and target (test) distribution). We further show that in a slightly different learning model, when one imposes restrictions on the nature of the learned classifier, these assumptions are not always sufficient to allow such a replacement of the training sample: For proper learning, where the output classifier has to come from a predefined class, we prove that any learner needs access to data generated from the target distribution.
KeywordsMachine learning Domain adaptation Sample complexity
Mathematics Subject Classification (2010)68Q32
Unable to display preview. Download preview PDF.
- 1.Ben-David, S., and Urner, R.: On the hardness of domain adaptation and the utility of unlabeled target samples. In: ALT, pp. 139–153 (2012)Google Scholar
- 2.Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: NIPS, pp. 137–144 (2006)Google Scholar
- 3.Cortes, C., Mansour, Y., Mohri, M.: Learning bounds for importance weighting. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 442–450 (2010)Google Scholar
- 4.Daumé III, H., Jagarlamudi, J.: Domain adaptation for machine translation by mining unseen words. In: Association for Computational Linguistics (2011)Google Scholar
- 5.Gong, B., Shi, Y., Sha, F., Grauman, K: Geodesic flow kernel for unsupervised domain adaptation. In: CVPR, pp. 2066–2073 (2012)Google Scholar
- 6.Haussler, D., Welzl, E.: Epsilon-nets and simplex range queries. In: Proceedings of the Second Annual Symposium on Computational Geometry, SCG ’86, pp. 61–71. New York, NY, USA, ACM (1986)Google Scholar
- 7.Huang, J., Gretton, A., Schölkopf, B., Smola, A.J., Borgwardt, K.M.: Correcting sample selection bias by unlabeled data. In: NIPS. MIT Press, Cambridge (2007)Google Scholar
- 8.Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB, pp. 180–191 (2004)Google Scholar
- 9.Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: Learning bounds and algorithms. In: COLT (2009)Google Scholar
- 10.Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning. Cambridge University Press (2014, in press)Google Scholar
- 12.Sugiyama, M., Mueller, K.: Generalization error estimation under covariate shift. In: Workshop on Information-Based Induction Sciences (2005)Google Scholar
- 13.Urner, R., Ben-David, S., Shalev-Shwartz, S.: Supplementay material to: Unlabeled data can speed-up prediction time. http://www.cs.uwaterloo.ca/~rurner/SSLSupplementICML2011.pdf (2011)
- 14.Urner, R., Ben-David, S., Shalev-Shwartz, S.: Unlabeled data can speed up prediction time. In: ICML (2011)Google Scholar