Skip to main content
Log in

On generalization in moment-based domain adaptation

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

Domain adaptation algorithms are designed to minimize the misclassification risk of a discriminative model for a target domain with little training data by adapting a model from a source domain with a large amount of training data. Standard approaches measure the adaptation discrepancy based on distance measures between the empirical probability distributions in the source and target domain. In this setting, we address the problem of deriving generalization bounds under practice-oriented general conditions on the underlying probability distributions. As a result, we obtain generalization bounds for domain adaptation based on finitely many moments and smoothness conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Wortman, J.: Learning bounds for domain adaptation. In: Advances in Neural Information Processing Systems, pp. 129–136 (2008)

  2. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1-2), 151–175 (2010)

    MathSciNet  Google Scholar 

  3. Pan, S J, Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Google Scholar 

  4. Ben-David, S., Urner, R.: Domain adaptation–can quantity compensate for quality?. Ann. Math. Artif. Intell. 70(3), 185–202 (2014)

    MathSciNet  MATH  Google Scholar 

  5. Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440–447 (2007)

  6. Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 264–271 (2007)

  7. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(Jan), 1–35 (2016)

    MathSciNet  MATH  Google Scholar 

  8. Zellinger, W., Moser, B A, Grubinger, T., Lughofer, E., Natschläger, T, Saminger-Platz, S.: Robust unsupervised domain adaptation for neural networks via moment alignment. Inform. Sci. 483, 174–191 (2019)

    MathSciNet  MATH  Google Scholar 

  9. Zellinger, W., Moser, B., Chouikhi, A., Seitner, F., Nezveda, M., Gelautz, M.: Linear optimization approach for depth range adaption of stereoscopic videos. Stereoscopic Displays and Applications XXVII, IS&T Electronic Imaging (2016)

  10. Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 465–479 (2012)

    Google Scholar 

  11. Baktashmotlagh, M., Harandi, M.T., Lovell, B.C., Salzmann, M.: Unsupervised domain adaptation by domain invariant projection. In: IEEE International Conference on Computer Vision, pp. 769–776 (2013)

  12. Sun, B., Saenko, K.: Deep coral: Correlation alignment for deep domain adaptation. In: Workshop of the European Conference on Machine Learning, pp. 443–450. Springer (2016)

  13. Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., Saminger-Platz, S.: Central moment discrepancy (cmd) for domain-invariant representation learning. International Conference on Learning Representations (2017)

  14. Koniusz, P., Tas, Y., Porikli, F.: Domain adaptation by mixture of alignments of second-or higher-order scatter tensors. In: Conference on Computer Vision and Pattern Recognition, vol. 2 (2017)

  15. Li, Y., Wang, N., Shi, J., Hou, X., Liu, J.: Adaptive batch normalization for practical domain adaptation. Pattern Recogn 80(2018), 109–117 (2018)

    Google Scholar 

  16. Zhao, M., Bian, G., Wang, P.: Joint weakly parameter-shared and higher order statistical criteria for domain adaptation. In: 2017 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), pp. 274–279. IEEE (2017)

  17. Nikzad-Langerodi, R., Zellinger, W., Lughofer, E., Saminger-Platz, S.: Domain-invariant partial-least-squares regression. Anal. Chem. 90(11), 6693–6701 (2018)

    Google Scholar 

  18. Peng, M., Zhang, Q., Jiang, Y-, Huang, X.: Cross-domain sentiment classification with target domain specific information. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2505–2513 (2018)

  19. Ke, Q., Bennamoun, M., Rahmani, H., An, S., Sohel, F., Boussaid, F.: Identity adaptation for person re-identification. IEEE Access 6, 48147–48155 (2018)

    Google Scholar 

  20. Wei, K.-Y., Hsu, C.-T.: Generative adversarial guided learning for domain adaptation. In: British Machine Vision Conference (2018)

  21. Xing, J., Zhu, K., Zhang, S.: Adaptive multi-task transfer learning for chinese word segmentation in medical text. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3619–3630 (2018)

  22. Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: IEEE International Conference on Computer Vision (ICCV), pp. 1406–1415 (2019)

  23. Peng, M., Zhang, Q., Huang, X.: Weighed domain-invariant representation learning for cross-domain sentiment analysis. arXiv preprint arXiv:1909.08167 (2019)

  24. Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: Advances in Neural Information Processing Systems, pp. 137–144 (2007)

  25. Courty, N., Flamary, R., Tuia, D., Rakotomamonjy, A.: Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1853–1865 (2017)

    Google Scholar 

  26. Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: Proceedings of the International Conference on Machine Learning, pp. 97–105 (2015)

  27. Long, M., Zhu, H., Wang, J., Jordan, M.I.: Unsupervised domain adaptation with residual transfer networks. In: Advances in Neural Information Processing Systems, pp. 136–144 (2016)

  28. Zhuang, F., Cheng, X., Luo, P., Pan, S.J., He, Q.: Supervised representation learning: Transfer learning with deep autoencoders. In: International Joint Conference on Artificial Intelligence (2015)

  29. Lindsay, B.G., Basak, P.: Moments determine the tail of a distribution (but not much else). Am. Stat. 54(4), 248–251 (2000)

    MathSciNet  Google Scholar 

  30. Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley, New York (2012)

    MATH  Google Scholar 

  31. Milev, M., Inverardi, P.N., Tagliani, A.: Moment information and entropy evaluation for probability densities. Appl. Math. Comput. 218(9), 5782–5795 (2012)

    MathSciNet  MATH  Google Scholar 

  32. Csiszar, I.: i-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3(1), 146–158 (1975). http://www.jstor.org/stable/2959270

    MathSciNet  MATH  Google Scholar 

  33. Koopman, B.O.: On distributions admitting a sufficient statistic. Trans. Amer. Math. Soc. 39(3), 399–409 (1936)

    MathSciNet  MATH  Google Scholar 

  34. Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int Stat Rev 70(3), 419–435 (2002)

    MATH  Google Scholar 

  35. Barron, A.R., Sheu, C.-H.: Approximation of density functions by sequences of exponential families. Ann. Stat. 19(3), 1347–1369 (1991)

    MathSciNet  MATH  Google Scholar 

  36. Cox, D.D.: Approximation of least squares regression on nested subspaces. Ann. Stat. 16(2), 713–732 (1988)

    MathSciNet  MATH  Google Scholar 

  37. Sugiyama, M., Kawanabe, M.: Machine learning in non-stationary environments: Introduction to covariate shift adaptation. MIT Press, Cambridge (2012)

    Google Scholar 

  38. Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media (2013)

  39. Ben-David, S., Lu, T., Luu, T., Pál, D: Impossibility theorems for domain adaptation. In: International Conference on Artificial Intelligence and Statistics, pp. 129–136 (2010)

  40. Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: Learning bounds and algorithms. Annual Conference on Computational Learning Theory (2009)

  41. Mansour, Y., Mohri, M., Rostamizadeh, A.: Multiple source adaptation and the Rényi divergence. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 367–374. AUAI Press (2009)

  42. Mansour, Y., Schain, M.: Robust domain adaptation. Ann. Math. Artif. Intell. 71(4), 365–380 (2014)

    MathSciNet  MATH  Google Scholar 

  43. Vural, E.: Generalization bounds for domain adaptation via domain transformations. In: 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2018)

  44. Sugiyama, M., Müller, K.-R.: Generalization error estimation under covariate shift. In: Workshop on Information-Based Induction Sciences. IBIS (2005)

  45. Akhiezer, N.I.: The Classical Moment Problem and some Related Questions in Analysis. Oliver & Boyd, London (1965)

    MATH  Google Scholar 

  46. Tardella, L.: A note on estimating the diameter of a truncated moment class. Stat. Probab. Lett. 54(2), 115–124 (2001)

    MathSciNet  MATH  Google Scholar 

  47. Kleiber, C., Stoyanov, J.: Multivariate distributions and the moment problem. J. Multivar. Anal. 113, 7–18 (2013)

    MathSciNet  MATH  Google Scholar 

  48. Schmüdgen, K.: The Moment Problem. Springer, Berlin (2017)

    MATH  Google Scholar 

  49. Laurent, M.: Sums of squares, moment matrices and optimization over polynomials. In: Emerging Applications of Algebraic Geometry, pp. 157–270. Springer (2009)

  50. di Dio, P.J., Schmüdgen, K.: The multidimensional truncated moment problem: Atoms, determinacy, and core variety. J. Funct. Anal. 274(11), 3124–3148 (2018)

    MathSciNet  MATH  Google Scholar 

  51. Tagliani, A.: A note on proximity of distributions in terms of coinciding moments. Appl. Math. Comput. 145(2-3), 195–203 (2003)

    MathSciNet  MATH  Google Scholar 

  52. Tagliani, A.: Entropy estimate of probability densities having assigned moments: Hausdorff case. Appl. Math. Lett. 15(3), 309–314 (2002)

    MathSciNet  MATH  Google Scholar 

  53. Tagliani, A.: Numerical aspects of finite hausdorff moment problem by maximum entropy approach. Appl. Math. Comput. 118(2-3), 133–149 (2001)

    MathSciNet  MATH  Google Scholar 

  54. Rachev, S.T., Klebanov, L., Stoyanov, S.V., Fabozzi, F.: The Methods of Distances in the Theory of Probability and Statistics. Springer Science & Business Media, Berlin (2013)

    MATH  Google Scholar 

  55. Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Measures of Complexity, pp. 11–30. Springer (2015)

  56. Daume, H. III, Marcu, D.: Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 26, 101–126 (2006)

    MathSciNet  MATH  Google Scholar 

  57. Frontini, M., Tagliani, A.: Hausdorff moment problem and maximum entropy: On the existence conditions. Appl. Math. Comput. 218(2), 430–433 (2011)

    MathSciNet  MATH  Google Scholar 

  58. Wainwright, M.J., Jordan, M.I., et al.: Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 1(1–2), 1–305 (2008)

    MATH  Google Scholar 

  59. Borwein, J.M., Lewis, A.S.: Convergence of best entropy estimates. SIAM J. Optim. 1(2), 191–205 (1991)

    MathSciNet  MATH  Google Scholar 

  60. Tagliani, A.: Hausdorff moment problem and maximum entropy: a unified approach. Appl. Math. Comput. 105(2-3), 291–305 (1999)

    MathSciNet  MATH  Google Scholar 

  61. Comon, P.: Independent component analysis, a new concept?. Signal Process. 36(3), 287–314 (1994)

    MATH  Google Scholar 

  62. Hyvärinen, A., Hoyer, P.O., Inki, M.: Topographic independent component analysis. Neural Comput. 13(7), 1527–1558 (2001)

    MATH  Google Scholar 

  63. Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Mach. Learn. Res. 3(Jul), 1–48 (2002)

    MathSciNet  MATH  Google Scholar 

  64. Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)

    Google Scholar 

  65. Lévy, P.: Calcul des probabilités. Gauthier-Villars (1925)

  66. Ponomarev, S.P.: Submersions and preimages of sets of measure zero. Sib. Math. J. 28(1), 153–163 (1987)

    MATH  Google Scholar 

  67. Dudley, R.M.: Real Analysis and Probability, vol. 74. Cambridge University Press, Cambridge (2002)

    Google Scholar 

  68. Agmon, N., Alhassid, Y., Levine, R.D.: An algorithm for finding the distribution of maximal entropy. J. Comput. Phys. 30(2), 250–258 (1979)

    MATH  Google Scholar 

  69. Batou, A., Soize, C.: Calculation of Lagrange multipliers in the construction of maximum entropy distributions in high stochastic dimension. SIAM/ASA J. Uncertain. Quantif. 1(1), 431–451 (2013)

    MathSciNet  MATH  Google Scholar 

  70. Zolotarev, V.M., Senatov, V.V.: Two-sided estimates of Levy’s metric. Teoriya Veroyatnostei i ee Primeneniya 20(2), 239–250 (1975)

    MATH  Google Scholar 

  71. Zolotarev, V.M.: Metric distances in spaces of random variables and their distributions. Sbornik: Mathematics 30(3), 373–401 (1976)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We thank Sepp Hochreiter, Helmut Gfrerer, Thomas Natschläger and Hamid Eghbal-Zadeh for helpful discussions. The research reported in this paper has been funded by the Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK), the Federal Ministry for Digital and Economic Affairs (BMDW), and the Province of Upper Austria in the frame of the COMET–Competence Centers for Excellent Technologies Programme and the COMET Module S3AI managed by the Austrian Research Promotion Agency FFG. The first and second author further acknowledge the support of the FFG in the project AutoQual-I.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Werner Zellinger.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zellinger, W., Moser, B.A. & Saminger-Platz, S. On generalization in moment-based domain adaptation. Ann Math Artif Intell 89, 333–369 (2021). https://doi.org/10.1007/s10472-020-09719-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-020-09719-x

Keywords

Mathematics Subject Classification (2010)

Navigation