Advertisement

Layer-wise domain correction for unsupervised domain adaptation

  • Shuang Li
  • Shi-ji Song
  • Cheng Wu
Article

Abstract

Deep neural networks have been successfully applied to numerous machine learning tasks because of their impressive feature abstraction capabilities. However, conventional deep networks assume that the training and test data are sampled from the same distribution, and this assumption is often violated in real-world scenarios. To address the domain shift or data bias problems, we introduce layer-wise domain correction (LDC), a new unsupervised domain adaptation algorithm which adapts an existing deep network through additive correction layers spaced throughout the network. Through the additive layers, the representations of source and target domains can be perfectly aligned. The corrections that are trained via maximum mean discrepancy, adapt to the target domain while increasing the representational capacity of the network. LDC requires no target labels, achieves state-of-the-art performance across several adaptation benchmarks, and requires significantly less training time than existing adaptation methods.

Keywords

Unsupervised domain adaptation Maximum mean discrepancy Residual network Deep learning 

CLC number

TP183 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ajakan H, Germain P, Larochelle H, et al., 2014. Domainadversarial neural networks. https://arxiv.org/abs/1412.4446Google Scholar
  2. Ben-David S, Blitzer J, Crammer K, et al., 2010. A theory of learning from different domains. Mach Learn, 79(1-2):151–175. https://doi.org/10.1007/s10994-009-5152-4MathSciNetCrossRefGoogle Scholar
  3. Blitzer J, McDonald R, Pereira F, 2006. Domain adaptation with structural correspondence learning. Proc Conf on Empirical Methods in Natural Language Processing, p.120–128. https://doi.org/10.3115/1610075.1610094Google Scholar
  4. Borgwardt KM, Gretton A, Rasch MJ, et al., 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 22(14):e49–e57. https://doi.org/10.1093/bioinformatics/btl242CrossRefGoogle Scholar
  5. Chen MM, Weinberger KQ, Blitzer JC, 2011. Co-training for domain adaptation. Advances in Neural Information Processing Systems, p.2456–2464.Google Scholar
  6. Chen MM, Xu ZX, Weinberger K, et al., 2012. Marginalized denoising autoencoders for domain adaptation. https://arxiv.org/abs/1206.4683Google Scholar
  7. Donahue J, Jia YQ, Vinyals O, et al., 2014. Decaf: a deep convolutional activation feature for generic visual recognition. Proc 31st Int Conf on Machine Learning, p.647–655.Google Scholar
  8. Duan LX, Tsang IW, Xu D, et al., 2009. Domain transfer SVM for video concept detection. IEEE Conf on Computer Vision and Pattern Recognition, p.1375–1381. https://doi.org/10.1109/CVPR.2009.5206747Google Scholar
  9. Duan LX, Tsang IW, Xu D, 2012. Domain transfer multiple kernel learning. IEEE Trans Patt Anal Mach Intell, 34(3):465–479. https://doi.org/10.1109/TPAMI.2011.114CrossRefGoogle Scholar
  10. Ganin Y, Lempitsky V, 2015. Unsupervised domain adaptation by backpropagation. Proc 32nd Int Conf on Machine Learning, p.1180–1189.Google Scholar
  11. Gardner JR, Upchurch P, Kusner MJ, et al., 2015. Deep manifold traversal: changing labels with convolutional features. https://arxiv.org/abs/1511.06421Google Scholar
  12. Gehring J, Auli M, Grangier D, et al., 2017. Convolutional sequence to sequence learning. https://arxiv.org/abs/1705.03122Google Scholar
  13. Glorot X, Bordes A, Bengio Y, 2011. Domain adaptation for large-scale sentiment classification: a deep learning approach. Proc 28th Int Conf on Machine Learning, p.513–520.Google Scholar
  14. Gong BQ, Shi Y, Sha F, et al., 2012. Geodesic flow kernel for unsupervised domain adaptation. IEEE Conf on Computer Vision and Pattern Recognition, p.2066–2073. https://doi.org/10.1109/CVPR.2012.6247911Google Scholar
  15. Gong BQ, Grauman K, Sha F, 2013. Connecting the dots with landmarks: discriminatively learning domaininvariant features for unsupervised domain adaptation. Proc 30th Int Conf on Machine Learning, p.222–230.Google Scholar
  16. Gretton A, Borgwardt KM, Rasch MJ, et al., 2012. A kernel two-sample test. J Mach Learn Res, 13(1):723–773.MathSciNetzbMATHGoogle Scholar
  17. He KM, Zhang XY, Ren SQ, et al., 2015. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. IEEE Int Conf on Computer Vision, p.1026–1034. https://doi.org/10.1109/ICCV.2015.123Google Scholar
  18. He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. IEEE Conf on Computer Vision and Pattern Recognition, p.770–778. https://doi.org/10.1109/CVPR.2016.90Google Scholar
  19. Hoffman J, Tzeng E, Park T, et al., 2017. CyCADA: cycleconsistent adversarial domain adaptation. https://arxiv.org/abs/1711.03213Google Scholar
  20. Ioffe S, Szegedy C, 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. Proc 32nd Int Conf on Machine Learning, p.448–456.Google Scholar
  21. Kingma DP, Ba J, 2014. Adam: a method for stochastic optimization. https://arxiv.org/abs/1412.6980Google Scholar
  22. Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84–90. https://doi.org/10.1145/3065386CrossRefGoogle Scholar
  23. LeCun Y, Bottou L, Bengio Y, et al., 1998. Gradient-based learning applied to document recognition. Proc IEEE, 86(11):2278–2324. https://doi.org/10.1109/5.726791CrossRefGoogle Scholar
  24. Li YJ, Swersky K, Zemel R, 2015. Generative moment matching networks. Proc 32nd Int Conf on Machine Learning, p.1718–1727.Google Scholar
  25. Long MS, Wang JM, Ding GG, et al., 2013. Transfer feature learning with joint distribution adaptation. Proc IEEE Int Conf on Computer Vision, p.2200–2207. https://doi.org/10.1109/ICCV.2013.274Google Scholar
  26. Long MS, Wang JM, Ding GG, et al., 2014. Transfer joint matching for unsupervised domain adaptation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.1410–1417. https://doi.org/10.1109/CVPR.2014.183Google Scholar
  27. Long MS, Cao Y, Wang JM, et al., 2015. Learning transferable features with deep adaptation networks. Proc 32nd Int Conf on Machine Learning, p.97–105.Google Scholar
  28. Long MS, Wang JM, Cao Y, et al., 2016a. Deep learning of transferable representation for scalable domain adaptation. IEEE Trans Knowl Data Eng, 28(8):2027–2040. https://doi.org/10.1109/TKDE.2016.2554549CrossRefGoogle Scholar
  29. Long MS, Zhu H, Wang JM, et al., 2016b. Unsupervised domain adaptation with residual transfer networks. Advances in Neural Information Processing Systems, p.136–144.Google Scholar
  30. Mikolov T, Sutskever I, Chen K, et al., 2013. Distributed representations of words and phrases and their compositionality. Proc 26th Int Conf on Neural Information Processing Systems, p.3111–3119.Google Scholar
  31. Netzer Y, Wang T, Coates A, et al., 2011. Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, p.1–9.Google Scholar
  32. Oquab M, Bottou L, Laptev I, et al., 2014. Learning and transferring mid-level image representations using convolutional neural networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.1717–1724. https://doi.org/10.1109/CVPR.2014.222Google Scholar
  33. Pan SJL, Yang Q, 2010. A survey on transfer learning. IEEE Trans Knowl Data Eng, 22(10):1345–1359. https://doi.org/10.1109/TKDE.2009.191CrossRefGoogle Scholar
  34. Pan SJL, Tsang IW, Kwok JT, et al., 2011. Domain adaptation via transfer component analysis. IEEE Trans Neur Netw, 22(2):199–210. https://doi.org/10.1109/TNN.2010.2091281CrossRefGoogle Scholar
  35. Russakovsky O, Deng J, Su H, et al., 2015. ImageNet large scale visual recognition challenge. Int J Comput Vis, 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-yMathSciNetCrossRefGoogle Scholar
  36. Saenko K, Kulis B, Fritz M, et al., 2010. Adapting visual category models to new domains. LNCS, 6314:213–226. https://doi.org/10.1007/978-3-642-15561-1_16Google Scholar
  37. Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556Google Scholar
  38. Srivastava N, Hinton G, Krizhevsky A, et al., 2014. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res, 15(1):1929–1958.MathSciNetzbMATHGoogle Scholar
  39. Sutskever I, Martens J, Dahl G, et al., 2013. On the importance of initialization and momentum in deep learning. Proc 30th Int Conf on Machine Learning, p.1139–1147.Google Scholar
  40. Sutskever I, Vinyals O, Le Q, 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, p.3104–3112.Google Scholar
  41. Tzeng E, Hoffman J, Zhang N, et al., 2014. Deep domain confusion: maximizing for domain invariance. https://arxiv.org/abs/1412.3474Google Scholar
  42. van der Maaten L, Hinton G, 2008. Visualizing data using t-SNE. J Mach Learn Res, 9(11):2579–2605.zbMATHGoogle Scholar
  43. Yosinski J, Clune J, Bengio Y, et al., 2014. How transferable are features in deep neural networks? Proc 27th Int Conf on Neural Information Processing Systems, p.3320–3328.Google Scholar

Copyright information

© Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Automation DepartmentTsinghua UniversityBeijingChina

Personalised recommendations