Abstract
We present a general theoretical and algorithmic analysis of the problem of multiple-source adaptation, a key learning problem in applications. We derive new normalized solutions with strong theoretical guarantees for the cross-entropy loss and other similar losses. We also provide new guarantees that hold in the case where the conditional probabilities for the source domains are distinct. We further present a novel analysis of the convergence properties of density estimation used in distribution-weighted combinations, and study their effects on the learning guarantees. Moreover, we give new algorithms for determining the distribution-weighted combination solution for the cross-entropy loss and other losses. We report the results of a series of experiments with real-world datasets. We find that our algorithm outperforms competing approaches by producing a single robust predictor that performs well on any target mixture distribution. Altogether, our theory, algorithms, and empirical results provide a full solution for the multiple-source adaptation problem with very practical benefits.
Similar content being viewed by others
References
Arndt, C: Information Measures: Information and its Description in Science and Engineering. Signals and Communication Technology. Springer, New York (2004)
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: NIPS, pp 137–144 (2006)
Blanchard, G., Lee, G., Scott, C.: Generalizing from several related classification tasks to a new unlabeled sample. In: NIPS, pp 2178–2186 (2011)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL, pp 440–447 (2007)
Brouwer, L. E. J.: Über eineindeutige, stetige Transformationen von Flächen in sich. Math. Ann. 69(2), 176–180 (1910). Springer
Chen, X., Deng, X.: Matching algorithmic bounds for finding a brouwer fixed point. J. ACM 55(3), 13:1–13:26 (2008)
Cortes, C., Mohri, M.: Domain adaptation and sample bias correction theory and algorithm for regression. Theor. Comput. Sci. 519, 103–126 (2014)
Cortes, C., Mohri, M., Muñoz Medina, A.: Adaptation algorithm and theory based on generalized discrepancy. In: KDD, pp 169–178 (2015)
Cortes, C., Greenberg, S., Mohri, M.: Relative deviation learning bounds and generalization with unbounded loss functions. Ann. Math. Artif. Intell. 85 (1), 45–70 (2019)
Cover, T. M., Thomas, J. M.: Elements of Information Theory. Wiley-Interscience, New York (2006)
Crammer, K., Kearns, M. J., Wortman, J.: Learning from multiple sources. J. Mach. Learn. Res. 9, 1757–1774 (2008)
Daumé, H III.: Frustratingly easy domain adaptation. In: Annual Meeting of the Association for Computational Linguistics (2007)
Deng, J., Zhang, Z., Eyben, F., Schuller, B.: Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Process. Lett. 21(9), 1068–1072 (2014)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML, vol. 32, pp 647–655 (2014)
Dredze, M., Crammer, K., Pereira, F.: Confidence-weighted linear classification. In: ICML, vol. 307, pp 264–271 (2008)
Duan, L., Tsang, I. W., Xu, D., Chua, T.: Domain adaptation from multiple sources via auxiliary classifiers. In: ICML, vol. 382, pp 289–296 (2009)
Duan, L., Xu, D., Tsang, I. W.: Domain adaptation from multiple sources: a domain-dependent regularization approach. IEEE Trans. Neural Netw. Learn. Syst. 23(3), 504–518 (2012)
Eaves, B. C.: Homotopies for computation of fixed points. Math. Program. 3(1), 1–22 (1972)
Ganin, Y., Lempitsky, V. S.: Unsupervised domain adaptation by backpropagation. In: ICML, vol. 37, pp 1180–1189 (2015)
Gibbs, A. L., Su, F. E.: On choosing and bounding probability metrics. Int. Stat. Rev./Rev. Int. Stat. 70(3), 419–435 (2002)
Girshick, R. B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587 (2014)
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: ICML, pp 513–520 (2011)
Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: CVPR, pp 2066–2073 (2012)
Gong, B., Grauman, K., Sha, F.: Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation. In: ICML, vol. 8, pp 222–230 (2013a)
Gong, B., Grauman, K., Sha, F.: Reshaping visual datasets for domain adaptation. In: NIPS, pp 1286–1294 (2013b)
Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: an unsupervised approach. In: ICCV, pp 999–1006. IEEE (2011)
Hirsch, M. D., Papadimitriou, C. H., Vavasis, S. A.: Exponential lower bounds for finding brouwer fix points. J. Complex. 5(4), 379–416 (1989)
Hoffman, J., Kulis, B., Darrell, T., Saenko, K.: Discovering latent domains for multisource domain adaptation. In: ECCV, vol. 7573, pp 702–715 (2012)
Hoffman, J., Rodner, E., Donahue, J., Saenko, K., Darrell, T.: Efficient learning of domain-invariant image representations. In: ICLR (2013)
Hoffman, J., Mohri, M., Zhang, N.: Algorithms and theory for multiple-source adaptation. In: Advances in Neural Information Processing Systems, pp 8246–8256 (2018)
Horst, R., Thoai, N. V.: DC programming: overview. J. Optim. Theory Appl. 103(1), 1–43 (1999)
Huang, J., Smola, A. J., Gretton, A., Borgwardt, K. M., Schölkopf, B.: Correcting sample selection bias by unlabeled data. In: NIPS, pp 601–608 (2006)
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in nlp. In: Annual Meeting of the Association of Computational Linguistics, pp 264–271 (2007)
Khosla, A., Zhou, T., Malisiewicz, T., Efros, A. A., Torralba, A.: Undoing the damage of dataset bias. In: ECCV, vol. 7572, pp 158–171 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G. E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114 (2012)
Kuhn, H.: Simplicial approximations of fixed points. Proc. Natl Acad. Sci. 61(4), 1238–1242 (1968)
Liao, H.: Speaker adaptation of context dependent deep neural networks. In: ICASSP, pp 7947–7951 (2013)
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)
Long, M., Cao, Y., Wang, J., Jordan, M. I.: Learning transferable features with deep adaptation networks. In: ICML, vol. 37, pp 97–105 (2015)
Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation with multiple sources. In: NIPS, pp 1041–1048 (2008)
Mansour, Y., Mohri, M., Rostamizadeh, A.: Multiple source adaptation and the Rényi divergence. In: UAI, pp 367–374 (2009a)
Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: learning bounds and algorithms. In: COLT (2009b)
Martínez, A. M.: Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans. Pattern Anal. Mach. Intell. 24(6), 748–763 (2002)
Merrill, O. H.: Applications and Extensions of an Algorithm That Computes Fixed Points of Certain Upper Semi-continuous Point to Set Mappings. PhD thesis, Dept. of Industrial Engineering. University of Michigan (1972)
Muandet, K., Balduzzi, D., Schölkopf, B.: Domain generalization via invariant feature representation. In: ICML, vol. 28, pp 10–18 (2013)
Pan, S. J., Ni, X., Sun, J. -T., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th International conference on World Wide Web, pp 751–760 (2010)
Pei, Z., Cao, Z., Long, M., Wang, J.: Multi-adversarial domain adaptation. In: AAAI, pp 3934–3941 (2018)
Rényi, A.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, pp 547–561 (1961)
Roark, B., Sproat, R., Allauzen, C., Riley, M., Sorensen, J., Tai, T.: The opengrm open-source finite-state grammar software libraries. In: ACL (System Demonstrations), pp 61–66 (2012)
Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: ECCV, vol. 6314, pp 213–226 (2010)
Scarf, H.: The approximation of fixed points of a continuous mapping. SIAM J. Appl. Math. 15(5), 1328–1343 (1967)
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp 24–29. IEEE (2011)
Sriperumbudur, B. K., Lanckriet, G. R. G.: A proof of convergence of the concave-convex procedure using Zangwill’s theory. Neural Comput. 24(6), 1391–1407 (2012)
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)
Taigman, Y., Polyak, A., Wolf, L.: Unsupervised cross-domain image generation. In: ICLR (2017)
Tao, P. D., An, L. T. H.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
Tao, P. D., An, L. T. H.: A DC optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
Torralba, A., Efros, A. A.: Unbiased look at dataset bias. In: CVPR, pp 1521–1528 (2011)
Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: ICCV, pp 4068–4076 (2015)
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Conference on Computer Vision and Pattern Recognition, pp 7167–7176 (2017)
Valiant, L. G.: A theory of the learnable. In: Annual ACM Symposium on Theory of Computing, pp 436–445 (1984)
Van Erven, T., Harremos, P.: Rényi divergence and kullback-leibler divergence. IEEE Trans. Inf. Theory 60(7), 3797–3820 (2014)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
von Neumann, J.: Zur theorie der gesellschaftsspiele. Math. Ann. 100(1), 295–320 (1928)
Xu, Z., Li, W., Niu, L., Xu, D.: Exploiting low-rank structure from latent domains for domain generalization. In: ECCV, vol. 8691, pp 628–643 (2014)
Yang, J., Yan, R., Hauptmann, A. G.: Cross-domain video concept detection using adaptive svms. In: ACM Multimedia, pp 188–197 (2007)
Yuille, A. L., Rangarajan, A.: The concave-convex procedure. Neural Comput. 15(4), 915–936 (2003)
Zhang, K., Gong, M., Schölkopf, B.: Multi-source domain adaptation: a causal view. In: AAAI, pp 3150–3157 (2015)
Acknowledgements
This work was partly funded by NSF CCF-1535987, IIS-1618662, and a Google Research Award.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhang, N., Mohri, M. & Hoffman, J. Multiple-source adaptation theory and algorithms. Ann Math Artif Intell 89, 237–270 (2021). https://doi.org/10.1007/s10472-020-09716-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-020-09716-0