Deep Reinforced Attention Learning for Quality-Aware Visual Recognition

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12361)


In this paper, we build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks and disclose the effectiveness of attention modules more straightforwardly to fully exploit their potential. Given an existing neural network equipped with arbitrary attention modules, we introduce a meta critic network to evaluate the quality of attention maps in the main network. Due to the discreteness of our designed reward, the proposed learning method is arranged in a reinforcement learning setting, where the attention actors and recurrent critics are alternately optimized to provide instant critique and revision for the temporary attention representation, hence coined as Deep REinforced Attention Learning (DREAL). It could be applied universally to network architectures with different types of attention modules and promotes their expressive ability by maximizing the relative gain of the final recognition performance arising from each individual attention module, as demonstrated by extensive experiments on both category and instance recognition benchmarks.


Convolutional Neural Networks Attention modules Reinforcement learning Visual recognition 

Supplementary material

504471_1_En_29_MOESM1_ESM.pdf (1.3 mb)
Supplementary material 1 (pdf 1317 KB)


  1. 1.
    Ammar, H.B., Eaton, E., Ruvolo, P., Taylor, M.: Online multi-task learning for policy gradient methods. In: ICML (2014)Google Scholar
  2. 2.
    Cao, Q., Lin, L., Shi, Y., Liang, X., Li, G.: Attention-aware face hallucination via deep reinforcement learning. In: CVPR (2017)Google Scholar
  3. 3.
    Chen, B., Deng, W., Hu, J.: Mixed high-order attention network for person re-identification. In: ICCV (2019)Google Scholar
  4. 4.
    Chen, T., et al.: ABD-Net: attentive but diverse person re-identification. In: ICCV (2019)Google Scholar
  5. 5.
    Chen, T., Wang, Z., Li, G., Lin, L.: Recurrent attentional reinforcement learning for multi-label image recognition. In: AAAI (2018)Google Scholar
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  7. 7.
    Dong, W., Zhang, Z., Tan, T.: Attention-aware sampling via deep reinforcement learning for action recognition. In: AAAI (2019)Google Scholar
  8. 8.
    Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: ICML (2016)Google Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  10. 10.
    Hu, J., Shen, L., Albanie, S., Sun, G., Vedaldi, A.: Gather-excite: exploiting feature context in convolutional neural networks. In: NeurIPS (2018)Google Scholar
  11. 11.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)Google Scholar
  12. 12.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
  13. 13.
    Lan, X., Wang, H., Gong, S., Zhu, X.: Deep reinforcement learning attention selection for person re-identification. arXiv e-prints arXiv:1707.02785, July 2017
  14. 14.
    Lan, X., Wang, H., Gong, S., Zhu, X.: Deep reinforcement learning attention selection for person re-identification. In: BMVC (2017)Google Scholar
  15. 15.
    Langford, J., Zhang, T.: The epoch-greedy algorithm for multi-armed bandits with side information. In: NIPS (2008)Google Scholar
  16. 16.
    Lee, H., Kim, H.E., Nam, H.: SRM: a style-based recalibration module for convolutional neural networks. In: ICCV (2019)Google Scholar
  17. 17.
    Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: CVPR (2018)Google Scholar
  18. 18.
    Linsley, D., Shiebler, D., Eberhardt, S., Serre, T.: Learning what and where to attend with humans in the loop. In: ICLR (2019)Google Scholar
  19. 19.
    Littman, M.L.: Reinforcement learning improves behaviour from evaluative feedback. Nature 521, 445–451 (2015) CrossRefGoogle Scholar
  20. 20.
    Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. In: NIPS (2014)Google Scholar
  21. 21.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)CrossRefGoogle Scholar
  22. 22.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)Google Scholar
  23. 23.
    Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: BAM: bottleneck attention module. In: BMVC (2018)Google Scholar
  24. 24.
    Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)Google Scholar
  25. 25.
    Rao, Y., Lu, J., Zhou, J.: Attention-aware deep reinforcement learning for video face recognition. In: ICCV (2017)Google Scholar
  26. 26.
    Reddi, S.J., Kale, S., Kumar, S.: On the convergence of Adam and beyond. In: ICLR (2018)Google Scholar
  27. 27.
    Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 17–35. Springer, Cham (2016). Scholar
  28. 28.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)Google Scholar
  29. 29.
    Si, J., et al.: Dual attention matching network for context-aware feature sequence based person re-identification. In: CVPR (2018)Google Scholar
  30. 30.
    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: ICML (2014)Google Scholar
  31. 31.
    Song, C., Huang, Y., Ouyang, W., Wang, L.: Mask-guided contrastive attention model for person re-identification. In: CVPR (2018)Google Scholar
  32. 32.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  33. 33.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)Google Scholar
  34. 34.
    Tay, C.P., Roy, S., Yap, K.H.: AANet: attribute attention network for person re-identifications. In: CVPR (2019)Google Scholar
  35. 35.
    Wang, C., Zhang, Q., Huang, C., Liu, W., Wang, X.: Mancs: a multi-task attentional network with curriculum sampling for person re-identification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 384–400. Springer, Cham (2018). Scholar
  36. 36.
    Wang, F., et al.: Residual attention network for image classification. In: CVPR (2017)Google Scholar
  37. 37.
    Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: CVPR (2020)Google Scholar
  38. 38.
    Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)Google Scholar
  39. 39.
    Wang, Y., Xie, L., Qiao, S., Zhang, Y., Zhang, W., Yuille, A.L.: Multi-scale spatially-asymmetric recalibration for image classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 523–539. Springer, Cham (2018). Scholar
  40. 40.
    Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). Scholar
  41. 41.
    Xia, B.N., Gong, Y., Zhang, Y., Poellabauer, C.: Second-order non-local attention networks for person re-identification. In: ICCV (2019)Google Scholar
  42. 42.
    Xu, J., Zhao, R., Zhu, F., Wang, H., Ouyang, W.: Attention-aware compositional network for person re-identification. In: CVPR (2018)Google Scholar
  43. 43.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV (2015)Google Scholar
  44. 44.
    Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: CVPR (2017)Google Scholar
  45. 45.
    Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: AAAI (2020)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The Hong Kong University of Science and TechnologyKowloonHong Kong

Personalised recommendations