Domain Attention Model for Domain Generalization in Object Detection

  • Weixiong He
  • Huicheng Zheng
  • Jianhuang Lai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11259)


Domain generalization methods in object detection aim to learn a domain-invariant detector for different domains. However, it is difficult to obtain a domain-invariant detector when there is large discrepancy between different domains. Based on the idea of biasing the allocation of available processing resources towards the most informative components of an input, attention models have shown promising performance on different tasks. In this paper, we provide a framework for addressing the issue of visual domain generalization with domain attention. Specifically, we build a domain attention block utilizing the source domain discrepancy to learn different weights for different source domains on the input features, so that the input features similar to the source domains will be enhanced and the features different from all the source domains will be suppressed. Thus we can obtain a domain-general representation effective for localization and classification in the proposed model. In order to demonstrate the merits of the proposed approach, we put forward a HD-16 dataset for object detection in different scenes. Extensive experiments on HD-16 dataset verify the effectiveness of the proposed approach.


Domain generalization Object detection Attention model 



This work was supported by National Natural Science Foundation of China (U1611461), Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase, No. U1501501), and Science and Technology Program of Guangzhou (No. 201803030029).


  1. 1.
    Blanchard, G., Lee, G., Scott, C.: Generalizing from several related classification tasks to a new unlabeled sample. In: NIPS, pp. 2178–2186 (2011)Google Scholar
  2. 2.
    Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: NIPS, pp. 838–846 (2016)Google Scholar
  3. 3.
    Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L.: Domain adaptive faster R-CNN for object detection in the wild. arXiv preprint arXiv:1803.03243 (2018)
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893. IEEE (2005)Google Scholar
  5. 5.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  6. 6.
    Ghifary, M., Balduzzi, D., Kleijn, W.B., Zhang, M.: Scatter component analysis: a unified framework for domain adaptation and domain generalization. T-PAMI 39(7), 1414–1430 (2017)CrossRefGoogle Scholar
  7. 7.
    Ghifary, M., Bastiaan Kleijn, W., Zhang, M., Balduzzi, D.: Domain generalization for object recognition with multi-task autoencoders. In: ICCV, pp. 2551–2559 (2015)Google Scholar
  8. 8.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)Google Scholar
  9. 9.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: a K-means clustering algorithm. J. Royal Stat. Soc. 28(1), 100–108 (1979)zbMATHGoogle Scholar
  10. 10.
    Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  11. 11.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507 (2017)
  12. 12.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. T-PAMI 20(11), 1254–1259 (1998)CrossRefGoogle Scholar
  13. 13.
    Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 158–171. Springer, Heidelberg (2012). Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  15. 15.
    Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125. IEEE (2017)Google Scholar
  16. 16.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: CVPR, pp. 2980–2988. IEEE (2017)Google Scholar
  17. 17.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  18. 18.
    Long, M., Cao, Y., Wang, J., Jordan, M.I.: Learning transferable features with deep adaptation networks. arXiv preprint arXiv:1502.02791 (2015)
  19. 19.
    Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: NIPS, pp. 2204–2212 (2014)Google Scholar
  20. 20.
    Motiian, S., Piccirilli, M., Adjeroh, D.A., Doretto, G.: Unified deep supervised domain adaptation and generalization. In: ICCV, pp. 5716–5726 (2017)Google Scholar
  21. 21.
    Muandet, K., Balduzzi, D., Schölkopf, B.: Domain generalization via invariant feature representation. In: ICML, pp. 10–18 (2013)Google Scholar
  22. 22.
    Niu, L., Li, W., Xu, D., Cai, J.: An exemplar-based multi-view domain generalization framework for visual recognition. T-PAMI 29(2), 259–272 (2016)MathSciNetGoogle Scholar
  23. 23.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788. IEEE (2016)Google Scholar
  24. 24.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. T-PAMI 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  25. 25.
    Shen, Z., Liu, Z., Li, J., Jiang, Y.G., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: CVPR, pp. 1919–1927. IEEE (2017)Google Scholar
  26. 26.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  27. 27.
    Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: ICCV, pp. 4068–4076. IEEE (2015)Google Scholar
  28. 28.
    Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014)
  29. 29.
    Wang, F., et al.: Residual attention network for image classification. arXiv:1704.06904 (2017)
  30. 30.
    Xu, Z., Li, W., Niu, L., Xu, D.: Exploiting low-rank structure from latent domains for domain generalization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 628–643. Springer, Cham (2014). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Weixiong He
    • 1
    • 2
    • 3
  • Huicheng Zheng
    • 1
    • 2
    • 3
  • Jianhuang Lai
    • 1
    • 2
    • 3
  1. 1.School of Data and Computer ScienceSun Yat-sen UniversityGuangzhouChina
  2. 2.Key Laboratory of Machine Intelligence and Advanced ComputingMinistry of EducationGuangzhouChina
  3. 3.Guangdong Key Laboratory of Information Security TechnologyGuangzhouChina

Personalised recommendations