Deep Learning in Person Re-identification for Cyber-Physical Surveillance Systems

  • Lin WuEmail author
  • Brian C. Lovell
  • Yang Wang
Part of the Advanced Sciences and Technologies for Security Applications book series (ASTSA)


The Cyber-physical Systems (CPS) are a combination of integrated physical processes, networking and computation to be minored and controlled y embedded subsystems via networked systems with feedback loops to change their behaviour when needed. Whilst the increased use of CPS brings more threats to the public, and thus security problems in this area have become a global issue to make it necessary to develop new approaches for securing CPS. The CPS utilise three-level architecture based on the respective functions of each layer: the perception layer, the transmission layer, and the application layer. Security in specific, CPS applications is currently the most important security objective of CPS because it offers the importance of CPS in its improving functionality

This chapter focuses on the application aspect which is more related to people’s daily lives, and will present a real-time system including distributed multi-camera system that integrates computing and communicating capabilities with monitoring on people in the physical world, namely person re-identification in the cyber-physical surveillance systems. The increasing sophistication and diversity of threats to public security have been causing a critical demand for the development and deployment of reliable, secure, and time-efficient visual intelligent surveillance systems in smart cities. For example, visual surveillance for indoor environments, like metro stations, plays an important role both in the assurance of safety conditions for the public and in the management of the transport network. Recent progress in computer vision techniques and related visual analytics offers new prospects for an intelligent surveillance system. A major recent development is the massive success resulting from using deep learning techniques to enable a significant boosting to visual analysis performance and initiate new research directions to understand visual content. For example, convolutional neural networks have demonstrated superiority on modelling high-level visual concepts. It is expected that the development of deep learning and its related visual analytic methodologies would further influence the field of intelligent surveillance systems. In view of the high demand for a prevalent surveillance system by the metropolis communities, this chapter will introduce recent research based on deep neural networks and pipelines to the practitioners and human investigators undertaking forensic and security analysis of large volumes of open-world CCTV video data sourced from a large distributed multi-camera network covering complex urban environments with transport links. This chapter will address the challenges of using deep learning and related techniques to understand and promote the use of ubiquitous intelligent surveillance systems.


Cyber-physical system with security Multi-camera networking Person re-identification in cyber-physical security 


  1. 1.
    Ahmed E, Jones M, Marks TK (2015) An improved deep learning architecture for person re-identification. In: CVPRGoogle Scholar
  2. 2.
    Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: Computer Vision and Pattern RecognitionGoogle Scholar
  3. 3.
    Ashibani Y, Mahmoud QH (2017) Cyber physical systems security: analysis, challenges and solutions. Comput Secur 68:81–97CrossRefGoogle Scholar
  4. 4.
    Bak S, Carr P (2017) One-shot metric learning for person re-identification. In: CVPRGoogle Scholar
  5. 5.
    Bazzani L, Cristani M, Perina A, Murino V (2012) Multiple-shot person re-identification by chromatic and epitomic analyses. Pattern Recogn 33(7):898–903CrossRefGoogle Scholar
  6. 6.
    Bhabad MA, Scholar P (2015) Internet of things: architecture, security issues and countermeasure. Int J Comput Appl 125(4):1–4Google Scholar
  7. 7.
    Chen D, Yuan Z, Chen B, Zhang N (2016) Similarity learning with spatial constraints for person re-identification. In: CVPR, pp 1268–1277Google Scholar
  8. 8.
    Chen SZ, Guo CC, Lai JH (2016) Deep ranking for re-identification via joint representation learning. IEEE Trans Image Process 25(5):2353–2367MathSciNetCrossRefGoogle Scholar
  9. 9.
    Cheng D, Gong Y, Zhou S, Wang J, Zhang N (2016) Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: CVPR, pp 1335–1344Google Scholar
  10. 10.
    Cho YJ, Yoon KJ (2016) Improving person re-identification via pose-aware multi-shot matching. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  11. 11.
    Chung D, Tahboub K, Delp EJ (2017) A two stream siamese convolutional neural network for person re-identification. In: International Conference on Computer VisionGoogle Scholar
  12. 12.
    Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: ICMLGoogle Scholar
  13. 13.
    Fan H, Zheng L, Yan C, Yang Y (2018) Unsupervised person re-identification: clustering and fine-tuning. ACM Trans Multimed Comput Commun Appl 14(4):Article 83:1–18CrossRefGoogle Scholar
  14. 14.
    Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: CVPRGoogle Scholar
  15. 15.
    Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: CVPRGoogle Scholar
  16. 16.
    Girdhar R, Ramanan D, Gupta A, Sivic J, Russell B (2017) Actionvlad: learning spatio-temporal aggregation for action classification. In: Computer Vision and Pattern RecognitionGoogle Scholar
  17. 17.
    Gong S, Christani M, Loy CC, Hospedales TM (2014) Person re-identification. Springer, LondonCrossRefGoogle Scholar
  18. 18.
    Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: NIPSGoogle Scholar
  19. 19.
    Gray D, Brennan S, Tao H (2007) Evaluating appearance models for recognition, reacquisition, and tracking. In: Proceedings of International Workshop on Performance Evaluation for Tracking and SurveillanceGoogle Scholar
  20. 20.
    Gray D, Tao H (2008) Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: ECCVGoogle Scholar
  21. 21.
    Guillaumin M, Verbeek J, Schmid C (2009) Is that you? Metric learning approaches for face identification. In: ICCVGoogle Scholar
  22. 22.
    Joachims T, Finley T, Yu CNJ (2009) Cutting-plance training of structural SVMS. J Mach Learn Res 77:27–59CrossRefGoogle Scholar
  23. 23.
    Kedem D, Tyree S, Sha F, Lanckriet GR, Weinberger KQ (2012) Non-linear metric learning. In: NIPSGoogle Scholar
  24. 24.
    Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: ICLRGoogle Scholar
  25. 25.
    Klaser A, Marszaek M, Shmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision ConferenceGoogle Scholar
  26. 26.
    Kostinger M, Hirzer M, Wohlhart P, Roth PM, Bischof H (2012) Large scale metric learning from equivalence constraints. In: CVPRGoogle Scholar
  27. 27.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing SystemsGoogle Scholar
  28. 28.
    Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: CVPRGoogle Scholar
  29. 29.
    Law MT, Thome N, Cord M (2013) Quadruplet-wise image similarity learning. In: ICCVGoogle Scholar
  30. 30.
    Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: CVPRGoogle Scholar
  31. 31.
    Li W, Wang X (2013) Locally alligned feature transforms across views. In: CVPRGoogle Scholar
  32. 32.
    Li W, Zhao R, Tang X, Wang X (2014) Deepreid: Deep filter pairing neural network for person re-identification. In: CVPRGoogle Scholar
  33. 33.
    Li W, Zhu X, Gong S (2017) Person re-identification by deep joint learning of multi-loss classification. In: IJCAIGoogle Scholar
  34. 34.
    Li Z, Chang S, Liang F, Huang TS, Cao L, Smith J (2013) Learning locally-adaptive decision functions for person verification. In: CVPRGoogle Scholar
  35. 35.
    Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: CVPR, pp 2197–2206Google Scholar
  36. 36.
    Liao S, Li SZ (2015) Efficient psd constrained asymmetric metric learning for person re-identification. In: ICCVGoogle Scholar
  37. 37.
    Lisanti G, Masi I, Del Bimbo A, Bagdanov, AD (2015) Person re-identification by iterative re-weighted sparse ranking. IEEE Trans Pattern Anal Mach Intell 37(8):1629–1642CrossRefGoogle Scholar
  38. 38.
    McFee B, Lanckriet GRG (2010) Metric learning to rank. In: ICMLGoogle Scholar
  39. 39.
    McLaughlin N, del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: CVPRGoogle Scholar
  40. 40.
    Mignon A, Jurie F (2012) PCCA: a new approach for distance learning from sparse pairwise constraints. In: CVPR, pp 2666–2672Google Scholar
  41. 41.
    Ouyang W, Wang X (2013) Joint deep learning for pedestrian detection. In: ICCVGoogle Scholar
  42. 42.
    Paisitkriangkrai S, Shen C, van den Hengel A (2015) Learning to rank in person re-identification with metric ensembles. In: CVPRGoogle Scholar
  43. 43.
    Pedagadi S, Orwell J, Velastin S, Boghossian B (2013) Local fisher discriminant analysis for pedestrian re-identification. In: CVPRGoogle Scholar
  44. 44.
    Peng P, Xiang T, Wang Y, Pontil M, Gong S, Huang T, Tian Y (2016) Unsupervised cross-dataset transfer learning for person re-identification. In: CVPRGoogle Scholar
  45. 45.
    Prosser B, Zheng WS, Gong S, Xiang T, Mary Q (2010) Person re-identification by support vector ranking. In: BMVCGoogle Scholar
  46. 46.
    Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434Google Scholar
  47. 47.
    Schwartz W, Davis L (2009) Learning discriminative appearance-based models using partial least squares. In: Proceedings of SIBGRAPIGoogle Scholar
  48. 48.
    Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: ACM MultimediaGoogle Scholar
  49. 49.
    Shi H, Yang Y, Zhu X, Liao S, Lei Z, Zheng W, Li SZ (2016) Embedding deep metric for person re-identification: a study against large variations. In: ECCV, pp 732–748Google Scholar
  50. 50.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLRGoogle Scholar
  51. 51.
    Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: CVPRGoogle Scholar
  52. 52.
    Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: CVPRGoogle Scholar
  53. 53.
    Sun Y, Zheng L, Deng W, Wang S (2017) SVDnet for pedestrian retrieval. In: ICCVGoogle Scholar
  54. 54.
    Tsochantaridis I, Hofman T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: ICMLGoogle Scholar
  55. 55.
    Varior RR, Haloi M, Wang G (2016) Gated siamese convolutional neural network architecture for human re-identification. In: ECCV, pp 791–808Google Scholar
  56. 56.
    Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: ECCV, pp 135–153Google Scholar
  57. 57.
    Wang F, Zuo W, Lin L, Zhang D, Zhang L (2016) Joint learning of single-image and cross-image representations for person re-identification. In: CVPR, pp 1288–1296Google Scholar
  58. 58.
    Wang H, Gong S, Xiang T (2014) Unsupervised learning of generative topic saliency for person re-identification. In: BMVCGoogle Scholar
  59. 59.
    Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool LV (2016) Temporal segment networks: towards good practices for deep action recognition. In: ECCVGoogle Scholar
  60. 60.
    Wang N, Yeung D (2013) Learning a deep compact image representation for visual tracking. In: NIPSGoogle Scholar
  61. 61.
    Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: ECCVGoogle Scholar
  62. 62.
    Wang X, Doretto G, Sebastian T, Rittscher J, Tu P (2007) Shape and appearance context modeling. In: ICCVGoogle Scholar
  63. 63.
    Weinberger K, Blitzer J, Saul L (2006) Distance metric learning for large margin nearest neighbor classification. In: NIPSGoogle Scholar
  64. 64.
    Wilson D, Martinez T (2003) The general inefficiency of batch training for gradient decent learning. Neural Netw 16(10):1429–1451CrossRefGoogle Scholar
  65. 65.
    Wu L, Shen C, van den Hengel A (2016) Deep recurrent convolutional networks for video-based person re-identification: an end-to-end approach. In: arXiv: 1606.01609Google Scholar
  66. 66.
    Wu L, Shen C, van den Hengel A (2016) Personnet: Person re-identification with deep convolutional neural networks. In: CoRR abs/1601.07255Google Scholar
  67. 67.
    Wu L, Shen C, van den Hengel A (2017) Deep linear discriminant analysis on fisher networks: a hybrid architecture for person re-identification. Pattern Recogn 65:238–250CrossRefGoogle Scholar
  68. 68.
    Wu L, Wang Y, Gao J, Li X (2018) Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recogn 73:275–288CrossRefGoogle Scholar
  69. 69.
    Wu L, Wang Y, Gao J, Li X (2018) What-and-where to look: deep siamese attention networks for video-based person re-identification. IEEE Trans Multimedia. CrossRefGoogle Scholar
  70. 70.
    Wu L, Wang Y, Ge Z, Hu Q, Li X (2018) Structured deep hashing with convolutional neural networks for fast person re-identification. Comput Vis Image Underst 167:63–73CrossRefGoogle Scholar
  71. 71.
    Wu L, Wang Y, Li X, Gao J (2018) Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Trans Cybern 99:1–12Google Scholar
  72. 72.
    Wu L, Wang Y, Li X, Gao J (2018) What-and-where to match: deep spatially multiplicative integration networks for person re-identification. Pattern Recogn 76:727–738CrossRefGoogle Scholar
  73. 73.
    Wu L, Wang Y, Shao L (2019) Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans Image Process 28(4):1602–1612MathSciNetCrossRefGoogle Scholar
  74. 74.
    Wu Y, Mukunoki M, Funatomi T, Minoh M, Lao S (2011) Optimizing mean reciprocal rank for person re-identification. In: Advanced Video and Signal-Based SurveillanceGoogle Scholar
  75. 75.
    Xiao T, Li H, Ouyang W (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR, pp 1249–1258Google Scholar
  76. 76.
    Xiong F, Gou M, Camps O, Sznaier M (2014) Person re-identification using kernel-based metric learning methods. In: ECCVGoogle Scholar
  77. 77.
    Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: ICCVGoogle Scholar
  78. 78.
    Yan Y, Ni B, Song Z, Ma C, Yan Y, Yang X (2016) Person re-identification via recurrent feature aggregation. In: ECCVGoogle Scholar
  79. 79.
    Yu HX, Wu A, Zheng WS (2017) Cross-view asymmetric metric learning for unsupervised person re-identification. In: ICCVGoogle Scholar
  80. 80.
    Zhang C, Wu L, Wang Y (2018) Crossing generative adversarial networks for cross-view person re-identification. In: arXiv:1801.01760Google Scholar
  81. 81.
    Zhang L, Xiang T, Gong S (2016) Learning a discriminative null space for person re-identification. In: CVPRGoogle Scholar
  82. 82.
    Zhang R, Lin L, Zuo W, Zhang L (2015) Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Trans Image Process 24(12):4766–4779MathSciNetCrossRefGoogle Scholar
  83. 83.
    Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: CVPRGoogle Scholar
  84. 84.
    Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X (2017) Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: CVPRGoogle Scholar
  85. 85.
    Zhao L, Li X, Zhuang Y, Wang J (2017) Deeply-learned part-aligned representations for person re-identification. In: ICCVGoogle Scholar
  86. 86.
    Zhao R, Ouyang W, Wang X (2013) Person re-identification by salience matching. In: ICCVGoogle Scholar
  87. 87.
    Zhao R, Ouyang W, Wang X (2013) Unsupervised salience learning for person re-identification. In: CVPRGoogle Scholar
  88. 88.
    Zhao R, Ouyang W, Wang X (2014) Learning mid-level filters for person re-identification. In: CVPRGoogle Scholar
  89. 89.
    Zheng L, Huang Y, Lu H, Yang Y (2017) Pose invariant embedding for deep person re-identification. arXiv:1701.07732Google Scholar
  90. 90.
    Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: ICCVGoogle Scholar
  91. 91.
    Zheng WS, Gong S, Xiang T (2011) Person re-identification by probabilistic relative distance comparison. In: CVPRGoogle Scholar
  92. 92.
    Zheng WS, Gong S, Xiang T (2016) Towards open-world person re-identification by one-shot group-based verification. TPAMI 38(3), 591–606CrossRefGoogle Scholar
  93. 93.
    Zheng X, Ouyang W, Wang X (2013) Multi-stage contextual deep learning for pedestrian detection. In: ICCVGoogle Scholar
  94. 94.
    Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: CVPRGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.The University of QueenslandBrisbaneAustralia
  2. 2.Hefei University of TechnologyHefeiChina

Personalised recommendations