RCAA: Relational Context-Aware Agents for Person Search

  • Xiaojun Chang
  • Po-Yao Huang
  • Yi-Dong ShenEmail author
  • Xiaodan Liang
  • Yi Yang
  • Alexander G. Hauptmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)


We aim to search for a target person from a gallery of whole scene images for which the annotations of pedestrian bounding boxes are unavailable. Previous approaches to this problem have relied on a pedestrian proposal net, which may generate redundant proposals and increase the computational burden. In this paper, we address this problem by training relational context-aware agents which learn the actions to localize the target person from the gallery of whole scene images. We incorporate the relational spatial and temporal contexts into the framework. Specifically, we propose to use the target person as the query in the query-dependent relational network. The agent determines the best action to take at each time step by simultaneously considering the local visual information, the relational and temporal contexts, together with the target person. To validate the performance of our approach, we conduct extensive experiments on the large-scale Person Search benchmark dataset and achieve significant improvements over the compared approaches. It is also worth noting that the proposed model even performs better than traditional methods with perfect pedestrian detectors.


Person search Relational network 



This work was supported in part by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DOI/IBC) contract number D17PC00340, in part by China National 973 program 2014CB340301, and in part by the Data to Decisions CRC (D2D CRC) and the Cooperative Research Centres Programme. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation/herein. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government.


  1. 1.
    Cheng, D., Chang, X., Liu, L., Hauptmann, A.G., Gong, Y., Zheng, N.: Discriminative dictionary learning with ranking metric embedded for person re-identification. In: IJCAI (2017)Google Scholar
  2. 2.
    Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: CVPR (2016)Google Scholar
  3. 3.
    Zheng, W., Gong, S., Xiang, T.: Person re-identification by probabilistic relative distance comparison. In: CVPR (2011)Google Scholar
  4. 4.
    Zheng, L., et al.: MARS: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 868–884. Springer, Cham (2016). Scholar
  5. 5.
    Zheng, L., Zhang, H., Sun, S., Chandraker, M., Tian, Q.: Person re-identification in the wild. CoRR abs/1604.02531 (2016)Google Scholar
  6. 6.
    Yu, S., Yang, Y., Hauptmann, A.G.: Harry potter’s marauder’s map: localizing and tracking multiple persons-of-interest by nonnegative discretization. In: CVPR (2013)Google Scholar
  7. 7.
    Li, W., Wang, X.: Locally aligned feature transforms across views. In: CVPR (2013)Google Scholar
  8. 8.
    Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: deep filter pairing neural network for person re-identification. In: CVPR (2014)Google Scholar
  9. 9.
    Zheng, W., Gong, S., Xiang, T.: Associating groups of people. In: BMVC (2009)Google Scholar
  10. 10.
    Xu, Y., Ma, B., Huang, R., Lin, L.: Person search in a scene by jointly modeling people commonness and person uniqueness. In: MM. ACM (2014)Google Scholar
  11. 11.
    Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: End-to-end deep learning for person search. CoRR abs/1604.01850 (2016)Google Scholar
  12. 12.
    Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: CVPR (2017)Google Scholar
  13. 13.
    Santoro, A., et al.: A simple neural network module for relational reasoning. CoRR abs/1706.01427 (2017)Google Scholar
  14. 14.
    Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: ICCV (2015)Google Scholar
  15. 15.
    Wang, X., Doretto, G., Sebastian, T., Rittscher, J., Tu, P.H.: Shape and appearance context modeling. In: ICCV (2007)Google Scholar
  16. 16.
    Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: CVPR (2013)Google Scholar
  17. 17.
    Khamis, S., Kuo, C.-H., Singh, V.K., Shet, V.D., Davis, L.S.: Joint learning for attribute-consistent person re-identification. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 134–146. Springer, Cham (2015). Scholar
  18. 18.
    Liao, S., Li, S.Z.: Efficient PSD constrained asymmetric metric learning for person re-identification. In: ICCV (2015)Google Scholar
  19. 19.
    Pedagadi, S., Orwell, J., Velastin, S.A., Boghossian, B.A.: Local fisher discriminant analysis for pedestrian re-identification. In: CVPR (2013)Google Scholar
  20. 20.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  21. 21.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  22. 22.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  23. 23.
    Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: ICPR (2014)Google Scholar
  24. 24.
    Ahmed, E., Jones, M.J., Marks, T.K.: An improved deep learning architecture for person re-identification. In: CVPR (2015)Google Scholar
  25. 25.
    Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance comparison for person re-identification. Pattern Recogn. 48(10), 2993–3003 (2015)CrossRefGoogle Scholar
  26. 26.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  27. 27.
    Dollár, P., Appel, R., Belongie, S.J., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)CrossRefGoogle Scholar
  28. 28.
    Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for pedestrian detection. In: CVPR (2015)Google Scholar
  29. 29.
    Cai, Z., Saberian, M.J., Vasconcelos, N.: Learning complexity-aware cascades for deep pedestrian detection. In: ICCV (2015)Google Scholar
  30. 30.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: CVPR (2015)Google Scholar
  31. 31.
    Ouyang, W., Wang, X.: A discriminative deep model for pedestrian detection with occlusion handling. In: CVPRGoogle Scholar
  32. 32.
    Luo, P., Tian, Y., Wang, X., Tang, X.: Switchable deep network for pedestrian detection. In: CVPR (2014)Google Scholar
  33. 33.
    Sutton, R.S.: Introduction to Reinforcement Learning, vol. 135Google Scholar
  34. 34.
    Mnih, V., et al.: Playing atari with deep reinforcement learning. CoRR (2013)Google Scholar
  35. 35.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRefGoogle Scholar
  36. 36.
    Jayaraman, D., Grauman, K.: Look-Ahead before you leap: end-to-end active recognition by forecasting the effect of motion. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 489–505. Springer, Cham (2016). Scholar
  37. 37.
    Yun, S., Choi, J., Yoo, Y., Yun, K., Choi, J.Y.: Action-decision networks for visual tracking with deep reinforcement learning. In: ICCV (2017)Google Scholar
  38. 38.
    van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI (2016)Google Scholar
  39. 39.
    Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)Google Scholar
  40. 40.
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)zbMATHGoogle Scholar
  41. 41.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  42. 42.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRefGoogle Scholar
  43. 43.
    Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., Yan, S.: Tree-structured reinforcement learning for sequential object localization. In: NIPS (2016)Google Scholar
  44. 44.
    Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), Bled, Slovenia, 27–30 June 1999, pp. 278–287 (1999)Google Scholar
  45. 45.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML (2016)Google Scholar
  46. 46.
    Geng, M., Wang, Y., Xiang, T., Tian, Y.: Deep transfer learning for person re-identification. CoRR abs/1611.05244 (2016)Google Scholar
  47. 47.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  48. 48.
    Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: ICCV (2015)Google Scholar
  49. 49.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  50. 50.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV (2015)Google Scholar
  51. 51.
    Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: CVPR (2015)Google Scholar
  52. 52.
    Köstinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: CVPR (2012)Google Scholar
  53. 53.
    Wang, Z., Li, H., Ouyang, W., Wang, X.: Learning deep representations for scene labeling with semantic context guided supervision. CoRR abs/1706.02493 (2017)Google Scholar
  54. 54.
    Potter, M.A., De Jong, K.A.: A cooperative coevolutionary approach to function optimization. In: Davidor, Y., Schwefel, H.-P., Männer, R. (eds.) PPSN 1994. LNCS, vol. 866, pp. 249–257. Springer, Heidelberg (1994). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xiaojun Chang
    • 1
  • Po-Yao Huang
    • 1
  • Yi-Dong Shen
    • 2
    Email author
  • Xiaodan Liang
    • 1
  • Yi Yang
    • 3
  • Alexander G. Hauptmann
    • 1
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA
  2. 2.Institute of SoftwareChinese Academy of SciencesBeijingChina
  3. 3.Centre for Artificial IntelligenceUniversity of Technology SydneyUltimoAustralia

Personalised recommendations