Domain Adaptation in Regression

  • Corinna Cortes
  • Mehryar Mohri
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6925)


This paper presents a series of new results for domain adaptation in the regression setting. We prove that the discrepancy is a distance for the squared loss when the hypothesis set is the reproducing kernel Hilbert space induced by a universal kernel such as the Gaussian kernel. We give new pointwise loss guarantees based on the discrepancy of the empirical source and target distributions for the general class of kernel-based regularization algorithms. These bounds have a simpler form than previous results and hold for a broader class of convex loss functions not necessarily differentiable, including L q losses and the hinge loss. We extend the discrepancy minimization adaptation algorithm to the more significant case where kernels are used and show that the problem can be cast as an SDP similar to the one in the feature space. We also show that techniques from smooth optimization can be used to derive an efficient algorithm for solving such SDPs even for very high-dimensional feature spaces. We have implemented this algorithm and report the results of experiments demonstrating its benefits for adaptation and show that, unlike previous algorithms, it can scale to large data sets of tens of thousands or more points.


Support Vector Regression Target Domain Domain Adaptation Unlabeled Data Reproduce Kernel Hilbert Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: NIPS 2006 (2007)Google Scholar
  2. 2.
    Ben-David, S., Lu, T., Luu, T., Pál, D.: Impossibility theorems for domain adaptation. Journal of Machine Learning Research - Proceedings Track 9, 129–136 (2010)Google Scholar
  3. 3.
    Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In: ACL 2007 (2007)Google Scholar
  4. 4.
    Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Wortman, J.: Learning bounds for domain adaptation. In: NIPS 2007 (2008)Google Scholar
  5. 5.
    Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. The MIT Press, Cambridge (1992)zbMATHGoogle Scholar
  6. 6.
    Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3) (1995)Google Scholar
  7. 7.
    Dredze, M., Blitzer, J., Talukdar, P.P., Ganchev, K., Graca, J., Pereira, F.: Frustratingly Hard Domain Adaptation for Parsing. In: CoNLL 2007 (2007)Google Scholar
  8. 8.
    Dudley, R.M.: Real Analysis and Probability. Wadsworth, Belmont (1989)zbMATHGoogle Scholar
  9. 9.
    Jiang, J., Zhai, C.: Instance Weighting for Domain Adaptation in NLP. In: Proceedings of ACL 2007, pp. 264–271 (2007)Google Scholar
  10. 10.
    Legetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comp. Speech and Lang. (1995)Google Scholar
  11. 11.
    Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: Learning bounds and algorithms. In: Proceedings of COLT 2009. Omnipress, Montréal, Canada (2009)Google Scholar
  12. 12.
    Martínez, A.M.: Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans. Pattern Anal. 24(6) (2002)Google Scholar
  13. 13.
    Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k 2). Soviet Mathematics Doklady 27(2), 372–376 (1983)zbMATHGoogle Scholar
  14. 14.
    Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Nesterov, Y.: Smoothing technique and its applications in semidefinite optimization. Math. Program. 110, 245–259 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Nesterov, Y., Nemirovsky, A.: Interior Point Polynomial Methods in Convex Programming: Theory and Appl. SIAM, Philadelphia (1994)CrossRefGoogle Scholar
  17. 17.
    Pietra, S.D., Pietra, V.D., Mercer, R.L., Roukos, S.: Adaptive language modeling using minimum discriminant estimation. In: HLT 1991: Workshop on Speech and Nat. Lang. (1992)Google Scholar
  18. 18.
    Rosenfeld, R.: A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer Speech and Language 10, 187–228 (1996)CrossRefGoogle Scholar
  19. 19.
    Saunders, C., Gammerman, A., Vovk, V.: Ridge Regression Learning Algorithm in Dual Variables. In: ICML (1998)Google Scholar
  20. 20.
    Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. JMLR 2, 67–93 (2002)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Vapnik, V.N.: Statistical Learning Theory. J. Wiley & Sons, Chichester (1998)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Corinna Cortes
    • 1
  • Mehryar Mohri
    • 1
    • 2
  1. 1.Google ResearchNew York
  2. 2.Courant Institute of Mathematical SciencesNew York

Personalised recommendations