Advertisement

Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline)

  • Yifan Sun
  • Liang Zheng
  • Yi Yang
  • Qi Tian
  • Shengjin WangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11208)

Abstract

Employing part-level features offers fine-grained information for pedestrian image description. A prerequisite of part discovery is that each part should be well located. Instead of using external resources like pose estimator, we consider content consistency within each part for precise part location. Specifically, we target at learning discriminative part-informed features for person retrieval and make two contributions. (i) A network named Part-based Convolutional Baseline (PCB). Given an image input, it outputs a convolutional descriptor consisting of several part-level features. With a uniform partition strategy, PCB achieves competitive results with the state-of-the-art methods, proving itself as a strong convolutional baseline for person retrieval. (ii) A refined part pooling (RPP) method. Uniform partition inevitably incurs outliers in each part, which are in fact more similar to other parts. RPP re-assigns these outliers to the parts they are closest to, resulting in refined parts with enhanced within-part consistency. Experiment confirms that RPP allows PCB to gain another round of performance boost. For instance, on the Market-1501 dataset, we achieve (77.4+4.2)% mAP and (92.3+1.5)% rank-1 accuracy, surpassing the state of the art by a large margin. Code is available at: https://github.com/syfafterzy/PCB_RPP

Keywords

Person retrieval Part-level feature Part refinement 

References

  1. 1.
    Barbosa, I.B., Cristani, M., Caputo, B., Rognhaugen, A., Theoharis, T.: Looking beyond appearances: Synthetic training data for deep cnns in re-identification. arXiv preprint arXiv:1701.03153 (2017)
  2. 2.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)Google Scholar
  3. 3.
    Chen, Y., Zhu, X., Gong, S.: Person re-identification by deep learning multi-scale representations. In: International Conference on Computer Vision, Workshop on Cross-Domain Human Identification (CHI) (2017)Google Scholar
  4. 4.
    Cheng, D.S., Cristani, M., Stoppa, M., Bazzani, L., Murino, V.: Custom pictorial structures for re-identification. In: BMVC (2011)Google Scholar
  5. 5.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS (2016)Google Scholar
  6. 6.
    Das, A., Chakraborty, A., Roy-Chowdhury, A.K.: Consistent re-identification in a camera network. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 330–345. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_22CrossRefGoogle Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  8. 8.
    Diba, A., Pazandeh, A.M., Pirsiavash, H., Gool, L.V.: Deepcamp: deep convolutional action & attribute mid-level patterns. In: CVPR (2016)Google Scholar
  9. 9.
    Engel, C., Baumgartner, P., Holzmann, M., Nutzel, J.F.: Person re-identification by support vector ranking. In: BMVC (2010)Google Scholar
  10. 10.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  11. 11.
    Geng, M., Wang, Y., Xiang, T., Tian, Y.: Deep transfer learning for person re-identification. arXiv preprint arXiv:1611.05244 (2016)
  12. 12.
    Gheissari, N., Sebastian, T.B., Hartley, R.: Person reidentification using spatiotemporal appearance. In: CVPR (2006)Google Scholar
  13. 13.
    Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 262–275. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88682-2_21CrossRefGoogle Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  15. 15.
    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv: 1703.07737 (2017)
  16. 16.
    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_3CrossRefGoogle Scholar
  17. 17.
    Jose, C., Fleuret, F.: Scalable metric learning via weighted approximate rank component analysis. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 875–890. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_53CrossRefGoogle Scholar
  18. 18.
    Karanam, S., Gou, M., Wu, Z., Rates-Borras, A., Camps, O., Radke, R.J.: A comprehensive evaluation and benchmark for person re-identification: features, metrics, and datasets. arXiv preprint arXiv: 1605.09653 (2016)
  19. 19.
    Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: deep filter pairing neural network for person re-identification. In: CVPR (2014)Google Scholar
  20. 20.
    Li, W., Zhu, X., Gong, S.: Person re-identification by deep joint learning of multi-loss classification. In: IJCAI (2017)Google Scholar
  21. 21.
    Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. arXiv preprint arXiv:1802.08122 (2018)
  22. 22.
    Li, Y., Liu, L., Shen, C., van den Hengel, A.: Mining mid-level visual patterns with deep CNN activations. Int. J. Comput. Vision (2017)Google Scholar
  23. 23.
    Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: CVPR (2015)Google Scholar
  24. 24.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  25. 25.
    Liu, X., et al.: Hydraplus-net: attentive deep features for pedestrian analysis. In: ICCV (2017)Google Scholar
  26. 26.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  27. 27.
    M., J.O., Tuytelaars, T.: Modeling visual compatibility through hierarchical mid-level elements. In: ECCV (2016)Google Scholar
  28. 28.
    Ma, A.J., Yuen, P.C., Li, J.: Domain transfer support vector ranking for person re-identification without target camera label information. In: ICCV (2013)Google Scholar
  29. 29.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  30. 30.
    Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 17–35. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48881-3_2CrossRefGoogle Scholar
  31. 31.
    Su, C., Li, J., Zhang, S., Xing, J., Gao, W., Tian, Q.: Pose-driven deep convolutional model for person re-identification. In: ICCV (2017)Google Scholar
  32. 32.
    Sun, Y., Zheng, L., Deng, W., Wang, S.: SVDNet for pedestrian retrieval. In: ICCV (2017)Google Scholar
  33. 33.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI (2017)Google Scholar
  34. 34.
    Ustinova, E., Ganin, Y., Lempitsky, V.: Multiregion bilinear convolutional neural networks for person re-identification. arXiv preprint arXiv: 1512.05300 (2015)
  35. 35.
    Wei, L., Zhang, S., Yao, H., Gao, W., Tian, Q.: GLAD: Global-local-alignment descriptor for pedestrian retrieval. ACM Multimed. (2017)Google Scholar
  36. 36.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  37. 37.
    Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR (2016)Google Scholar
  38. 38.
    Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: ICML (2015)Google Scholar
  39. 39.
    Yao, H., Zhang, S., Zhang, Y., Li, J., Tian, Q.: Deep representation learning with part loss for person re-identification. arXiv preprint arXiv:1707.00798 (2017)
  40. 40.
    Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. arXiv preprint arXiv: 1705.00384 (2017)
  41. 41.
    Zhao, L., Li, X., Wang, J., Zhuang, Y.: Deeply-learned part-aligned representations for person re-identification. In: ICCV (2017)Google Scholar
  42. 42.
    Zheng, L., Huang, Y., Lu, H., Yang, Y.: Pose invariant embedding for deep person re-identification. arXiv preprint arXiv:1701.07732 (2017)
  43. 43.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV (2015)Google Scholar
  44. 44.
    Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: past, present and future. arXiv preprint arXiv:1610.02984 (2016)
  45. 45.
    Zheng, W., Gong, S., Xiang, T.: Reidentification by relative distance comparison. TPAMI (2013)Google Scholar
  46. 46.
    Zheng, Z., Zheng, L., Yang, Y.: Pedestrian alignment network for large-scale person re-identification. arXiv preprint arXiv: 1707.00408 (2017)
  47. 47.
    Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: ICCV (2017)Google Scholar
  48. 48.
    Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: CVPR (2017)Google Scholar
  49. 49.
    Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. arXiv preprint arXiv: 1708.04896 (2017)
  50. 50.
    Zhong, Z., Zheng, L., Zheng, Z., Li, S., Yang, Y.: Camera style adaptation for person re-identification. arXiv preprint arXiv:1711.10295 (2017)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yifan Sun
    • 1
  • Liang Zheng
    • 2
  • Yi Yang
    • 3
  • Qi Tian
    • 4
  • Shengjin Wang
    • 1
    Email author
  1. 1.Department of Electronic EngineeringTsinghua UniversityBeijingChina
  2. 2.Research School of Computer ScienceAustralian National UniversityCanberraAustralia
  3. 3.Centre for Artificial IntelligenceUniversity of Technology SydneyUltimoAustralia
  4. 4.Huawei Noah’s Ark LabUniversity of Texas at San AntonioSan AntonioUSA

Personalised recommendations