Abstract
Domain adaptation algorithms are designed to minimize the misclassification risk of a discriminative model for a target domain with little training data by adapting a model from a source domain with a large amount of training data. Standard approaches measure the adaptation discrepancy based on distance measures between the empirical probability distributions in the source and target domain. In this setting, we address the problem of deriving generalization bounds under practice-oriented general conditions on the underlying probability distributions. As a result, we obtain generalization bounds for domain adaptation based on finitely many moments and smoothness conditions.
Similar content being viewed by others
References
Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Wortman, J.: Learning bounds for domain adaptation. In: Advances in Neural Information Processing Systems, pp. 129–136 (2008)
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1-2), 151–175 (2010)
Pan, S J, Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Ben-David, S., Urner, R.: Domain adaptation–can quantity compensate for quality?. Ann. Math. Artif. Intell. 70(3), 185–202 (2014)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440–447 (2007)
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 264–271 (2007)
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(Jan), 1–35 (2016)
Zellinger, W., Moser, B A, Grubinger, T., Lughofer, E., Natschläger, T, Saminger-Platz, S.: Robust unsupervised domain adaptation for neural networks via moment alignment. Inform. Sci. 483, 174–191 (2019)
Zellinger, W., Moser, B., Chouikhi, A., Seitner, F., Nezveda, M., Gelautz, M.: Linear optimization approach for depth range adaption of stereoscopic videos. Stereoscopic Displays and Applications XXVII, IS&T Electronic Imaging (2016)
Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 465–479 (2012)
Baktashmotlagh, M., Harandi, M.T., Lovell, B.C., Salzmann, M.: Unsupervised domain adaptation by domain invariant projection. In: IEEE International Conference on Computer Vision, pp. 769–776 (2013)
Sun, B., Saenko, K.: Deep coral: Correlation alignment for deep domain adaptation. In: Workshop of the European Conference on Machine Learning, pp. 443–450. Springer (2016)
Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., Saminger-Platz, S.: Central moment discrepancy (cmd) for domain-invariant representation learning. International Conference on Learning Representations (2017)
Koniusz, P., Tas, Y., Porikli, F.: Domain adaptation by mixture of alignments of second-or higher-order scatter tensors. In: Conference on Computer Vision and Pattern Recognition, vol. 2 (2017)
Li, Y., Wang, N., Shi, J., Hou, X., Liu, J.: Adaptive batch normalization for practical domain adaptation. Pattern Recogn 80(2018), 109–117 (2018)
Zhao, M., Bian, G., Wang, P.: Joint weakly parameter-shared and higher order statistical criteria for domain adaptation. In: 2017 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), pp. 274–279. IEEE (2017)
Nikzad-Langerodi, R., Zellinger, W., Lughofer, E., Saminger-Platz, S.: Domain-invariant partial-least-squares regression. Anal. Chem. 90(11), 6693–6701 (2018)
Peng, M., Zhang, Q., Jiang, Y-, Huang, X.: Cross-domain sentiment classification with target domain specific information. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2505–2513 (2018)
Ke, Q., Bennamoun, M., Rahmani, H., An, S., Sohel, F., Boussaid, F.: Identity adaptation for person re-identification. IEEE Access 6, 48147–48155 (2018)
Wei, K.-Y., Hsu, C.-T.: Generative adversarial guided learning for domain adaptation. In: British Machine Vision Conference (2018)
Xing, J., Zhu, K., Zhang, S.: Adaptive multi-task transfer learning for chinese word segmentation in medical text. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3619–3630 (2018)
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: IEEE International Conference on Computer Vision (ICCV), pp. 1406–1415 (2019)
Peng, M., Zhang, Q., Huang, X.: Weighed domain-invariant representation learning for cross-domain sentiment analysis. arXiv preprint arXiv:1909.08167 (2019)
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: Advances in Neural Information Processing Systems, pp. 137–144 (2007)
Courty, N., Flamary, R., Tuia, D., Rakotomamonjy, A.: Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1853–1865 (2017)
Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: Proceedings of the International Conference on Machine Learning, pp. 97–105 (2015)
Long, M., Zhu, H., Wang, J., Jordan, M.I.: Unsupervised domain adaptation with residual transfer networks. In: Advances in Neural Information Processing Systems, pp. 136–144 (2016)
Zhuang, F., Cheng, X., Luo, P., Pan, S.J., He, Q.: Supervised representation learning: Transfer learning with deep autoencoders. In: International Joint Conference on Artificial Intelligence (2015)
Lindsay, B.G., Basak, P.: Moments determine the tail of a distribution (but not much else). Am. Stat. 54(4), 248–251 (2000)
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley, New York (2012)
Milev, M., Inverardi, P.N., Tagliani, A.: Moment information and entropy evaluation for probability densities. Appl. Math. Comput. 218(9), 5782–5795 (2012)
Csiszar, I.: i-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3(1), 146–158 (1975). http://www.jstor.org/stable/2959270
Koopman, B.O.: On distributions admitting a sufficient statistic. Trans. Amer. Math. Soc. 39(3), 399–409 (1936)
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int Stat Rev 70(3), 419–435 (2002)
Barron, A.R., Sheu, C.-H.: Approximation of density functions by sequences of exponential families. Ann. Stat. 19(3), 1347–1369 (1991)
Cox, D.D.: Approximation of least squares regression on nested subspaces. Ann. Stat. 16(2), 713–732 (1988)
Sugiyama, M., Kawanabe, M.: Machine learning in non-stationary environments: Introduction to covariate shift adaptation. MIT Press, Cambridge (2012)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media (2013)
Ben-David, S., Lu, T., Luu, T., Pál, D: Impossibility theorems for domain adaptation. In: International Conference on Artificial Intelligence and Statistics, pp. 129–136 (2010)
Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: Learning bounds and algorithms. Annual Conference on Computational Learning Theory (2009)
Mansour, Y., Mohri, M., Rostamizadeh, A.: Multiple source adaptation and the Rényi divergence. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 367–374. AUAI Press (2009)
Mansour, Y., Schain, M.: Robust domain adaptation. Ann. Math. Artif. Intell. 71(4), 365–380 (2014)
Vural, E.: Generalization bounds for domain adaptation via domain transformations. In: 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2018)
Sugiyama, M., Müller, K.-R.: Generalization error estimation under covariate shift. In: Workshop on Information-Based Induction Sciences. IBIS (2005)
Akhiezer, N.I.: The Classical Moment Problem and some Related Questions in Analysis. Oliver & Boyd, London (1965)
Tardella, L.: A note on estimating the diameter of a truncated moment class. Stat. Probab. Lett. 54(2), 115–124 (2001)
Kleiber, C., Stoyanov, J.: Multivariate distributions and the moment problem. J. Multivar. Anal. 113, 7–18 (2013)
Schmüdgen, K.: The Moment Problem. Springer, Berlin (2017)
Laurent, M.: Sums of squares, moment matrices and optimization over polynomials. In: Emerging Applications of Algebraic Geometry, pp. 157–270. Springer (2009)
di Dio, P.J., Schmüdgen, K.: The multidimensional truncated moment problem: Atoms, determinacy, and core variety. J. Funct. Anal. 274(11), 3124–3148 (2018)
Tagliani, A.: A note on proximity of distributions in terms of coinciding moments. Appl. Math. Comput. 145(2-3), 195–203 (2003)
Tagliani, A.: Entropy estimate of probability densities having assigned moments: Hausdorff case. Appl. Math. Lett. 15(3), 309–314 (2002)
Tagliani, A.: Numerical aspects of finite hausdorff moment problem by maximum entropy approach. Appl. Math. Comput. 118(2-3), 133–149 (2001)
Rachev, S.T., Klebanov, L., Stoyanov, S.V., Fabozzi, F.: The Methods of Distances in the Theory of Probability and Statistics. Springer Science & Business Media, Berlin (2013)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Measures of Complexity, pp. 11–30. Springer (2015)
Daume, H. III, Marcu, D.: Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 26, 101–126 (2006)
Frontini, M., Tagliani, A.: Hausdorff moment problem and maximum entropy: On the existence conditions. Appl. Math. Comput. 218(2), 430–433 (2011)
Wainwright, M.J., Jordan, M.I., et al.: Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 1(1–2), 1–305 (2008)
Borwein, J.M., Lewis, A.S.: Convergence of best entropy estimates. SIAM J. Optim. 1(2), 191–205 (1991)
Tagliani, A.: Hausdorff moment problem and maximum entropy: a unified approach. Appl. Math. Comput. 105(2-3), 291–305 (1999)
Comon, P.: Independent component analysis, a new concept?. Signal Process. 36(3), 287–314 (1994)
Hyvärinen, A., Hoyer, P.O., Inki, M.: Topographic independent component analysis. Neural Comput. 13(7), 1527–1558 (2001)
Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Mach. Learn. Res. 3(Jul), 1–48 (2002)
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Lévy, P.: Calcul des probabilités. Gauthier-Villars (1925)
Ponomarev, S.P.: Submersions and preimages of sets of measure zero. Sib. Math. J. 28(1), 153–163 (1987)
Dudley, R.M.: Real Analysis and Probability, vol. 74. Cambridge University Press, Cambridge (2002)
Agmon, N., Alhassid, Y., Levine, R.D.: An algorithm for finding the distribution of maximal entropy. J. Comput. Phys. 30(2), 250–258 (1979)
Batou, A., Soize, C.: Calculation of Lagrange multipliers in the construction of maximum entropy distributions in high stochastic dimension. SIAM/ASA J. Uncertain. Quantif. 1(1), 431–451 (2013)
Zolotarev, V.M., Senatov, V.V.: Two-sided estimates of Levy’s metric. Teoriya Veroyatnostei i ee Primeneniya 20(2), 239–250 (1975)
Zolotarev, V.M.: Metric distances in spaces of random variables and their distributions. Sbornik: Mathematics 30(3), 373–401 (1976)
Acknowledgments
We thank Sepp Hochreiter, Helmut Gfrerer, Thomas Natschläger and Hamid Eghbal-Zadeh for helpful discussions. The research reported in this paper has been funded by the Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK), the Federal Ministry for Digital and Economic Affairs (BMDW), and the Province of Upper Austria in the frame of the COMET–Competence Centers for Excellent Technologies Programme and the COMET Module S3AI managed by the Austrian Research Promotion Agency FFG. The first and second author further acknowledge the support of the FFG in the project AutoQual-I.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zellinger, W., Moser, B.A. & Saminger-Platz, S. On generalization in moment-based domain adaptation. Ann Math Artif Intell 89, 333–369 (2021). https://doi.org/10.1007/s10472-020-09719-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-020-09719-x
Keywords
- Transfer learning
- Domain adaptation
- Moment distance
- Learning theory
- Classification
- Total variation distance
- Probability metric