Appearance-based person reidentification in camera networks: problem overview and current approaches

Original Research

Abstract

Recent advances in visual tracking methods allow following a given object or individual in presence of significant clutter or partial occlusions in a single or a set of overlapping camera views. The question of when person detections in different views or at different time instants can be linked to the same individual is of fundamental importance to the video analysis in large-scale network of cameras. This is the person reidentification problem. The paper focuses on algorithms that use the overall appearance of an individual as opposed to passive biometrics such as face and gait. Methods that effectively address the challenges associated with changes in illumination, pose, and clothing appearance variation are discussed. More specifically, the development of a set of models that capture the overall appearance of an individual and can effectively be used for information retrieval are reviewed. Some of them provide a holistic description of a person, and some others require an intermediate step where specific body parts need to be identified. Some are designed to extract appearance features over time, and some others can operate reliably also on single images. The paper discusses algorithms for speeding up the computation of signatures. In particular it describes very fast procedures for computing co-occurrence matrices by leveraging a generalization of the integral representation of images. The algorithms are deployed and tested in a camera network comprising of three cameras with non-overlapping field of views, where a multi-camera multi-target tracker links the tracks in different cameras by reidentifying the same people appearing in different views.

Keywords

Re-identification Surveillance Tracking Appearance matching Integral image Co-occurrence Integral representation 

References

  1. Amit Y, Kong A (1996) Graphical templates for model registration. IEEE Trans Pattern Anal Mach Intell 18(3):225–236CrossRefGoogle Scholar
  2. Bak S, Corvee E, BrTmond F, Thonnat M (2010a) Person re-identification using spatial covariance regions of human body parts. In: Proceedings of IEEE international conference on video and signal based surveillanceGoogle Scholar
  3. Bak S, Corvee E, BrTmond F, Thonnat T (2010b) Person re-identification using haar-based and dcd-based signature. In: Proceedings of the workshop on activity monitoring by multi-camera surveillance systemsGoogle Scholar
  4. Bäuml M, Bernardin K, Fischer M, Ekenel HK (2010) Multi-pose face recognition for person retrieval in camera networks. In: Proceedings of IEEE international conference on video and signal based surveillanceGoogle Scholar
  5. Bay H, Ess A, Tuytelaars T, Van Goo L (2008) Surf: Speeded up robust features. Comput Vis Image Underst 110(3):346–359Google Scholar
  6. Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24:509–522CrossRefGoogle Scholar
  7. Bird N, Masoud O, Papanikolopoulos N, Isaacs A (2005) Detection of loitering individuals in public transportation areas. IEEE Trans Intell Transport Syst 6(2):167–177CrossRefGoogle Scholar
  8. Bissacco A, Soatto S (2009) Hybrid dynamical models of human motion for the recognition of human gaits. Int J Comput Vis 85(1):101–114CrossRefGoogle Scholar
  9. Blackman S, Popoli R (1999) Design and analysis of modern tracking systems. Artech House Publishers, NorwoodGoogle Scholar
  10. Bookstein FL (1986) Size and shape spaces for landmark data in two dimensions. Stat Sci 1(2):181–242MATHCrossRefGoogle Scholar
  11. Cai Y, Huang K, Tan T (2008) Human appearance matching across multiple non-overlapping cameras. In: Proceedings of the international conference on pattern recognitionGoogle Scholar
  12. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698CrossRefGoogle Scholar
  13. Cox IJ, Hingorani SL (1994) An efficient implementation and evaluation of reid’s multiple hypothesis tracking algorithm for visual tracking. In: Proceedings of the international conference on pattern recognitionGoogle Scholar
  14. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 1. pp 886–893Google Scholar
  15. Damen D, Hogg D (2007) Associating people dropping off and picking up objects. In: Proceedings of the British machine vision conferenceGoogle Scholar
  16. Doretto G, Soatto S (2006) Dynamic shape and appearance models. IEEE Trans Pattern Anal Mach Intell 28(12):2006–2019CrossRefGoogle Scholar
  17. Doretto G, Wang X (2007) Integral computations: a framework to compute fast region based features. Tech. Rep. 2007GRC593, GE Global Research. Visualization and Computer Vision Laboratory, NiskayunaGoogle Scholar
  18. Doretto G, Yao Y (2010) Region moments: fast invariant descriptors for detecting small image structures. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognitionGoogle Scholar
  19. Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognitionGoogle Scholar
  20. Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 524–531Google Scholar
  21. Felzenszwalb PF (2005) Representation and detection of deformable shapes. IEEE Trans Pattern Anal Mach Intell 27(2):208–220CrossRefGoogle Scholar
  22. Felzenszwalb PF, Huttenlocher D (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181CrossRefGoogle Scholar
  23. Forssen PE (2007) Maximally stable colour regions for recognition and matching. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognitionGoogle Scholar
  24. Funt BV, Finlayson GD (1995) Color constant color indexing. IEEE Trans Pattern Anal Mach Intell 17:522–529CrossRefGoogle Scholar
  25. Gandhi T, Trivedi MM (2007) Person tracking and reidentification: introducing panoramic appearance map (PAM) for feature representation. Mach Vis Appl 18(3–4):207–220MATHCrossRefGoogle Scholar
  26. Geusebroek J, Boomgaard R, Smeulders AWM, Geerts H (2001) Color invariance. IEEE Trans Pattern Anal Mach Intell 23:1338–1350CrossRefGoogle Scholar
  27. Gheissari N, Sebastian TB, Tu PH, Rittscher J, Hartley R (2006) Person reidentification using spatiotemporal appearance. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 1528–1535Google Scholar
  28. Gray D, Tao H (2008) Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Proceedings of the European conference on computer vision, pp 262–275Google Scholar
  29. Guo Y, Hsu S, Shan Y, Sawhney H, Kumar R (2005) Vehicle fingerprinting for reacquisition & tracking in videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 761–768Google Scholar
  30. Hamdoun O, Moutarde F, Stanciulescu B, Steux B (2008) Person reidentification in multi-camera system by signature based on interest point descriptors collected on short video sequences. In: Proceedings of the ACM/IEEE international conference distributed smart camerasGoogle Scholar
  31. Hu L, Wang Y, Jiang S, Huang Q, Gao W (2008) Human reappearance detection based on on-line learning. In: Proceedings of the international conference on pattern recognitionGoogle Scholar
  32. Huang J, Kumar SR, Mitra M, Zhu WJ, Zabih R (1997) Image indexing using color correlograms. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, San Juan, pp 762–768Google Scholar
  33. Isard M, MacCormick J (2001) BraMBLe: aBayesian multiple-blob tracker. In: Proceedings of IEEE international conference on computer vision, pp 34–41Google Scholar
  34. Jaffré G, Joly P (2004) Costume: a new feature for automatic video content indexing. In: Proceedings of RIAO, pp 314–325Google Scholar
  35. Javed O, Rasheed Z, Shafique K, Shah M (2003) Tracking across multiple cameras with disjoint views. In: Proceedings of IEEE international conference on computer vision, pp 952–957Google Scholar
  36. Javed O, Shafique K, Shah M (2005) Appearance modeling for tracking in multiple non-overlapping cameras. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 26–33Google Scholar
  37. Javed O, Shafique K, Rasheed Z, Shah M (2007) Modeling inter-camera space-time and appearance relationships for tracking accross non-overlapping views. Comput Vis Image Underst 109:146–162CrossRefGoogle Scholar
  38. Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: Proceedings of IEEE international conference on computer visionGoogle Scholar
  39. Ke Y, Sukthankar R, Hebert M (2005) Efficient visual event detection using volumetric features. In: Proceedings of IEEE international conference on computer vision, vol 1, pp 166–173Google Scholar
  40. Khan SM, Shah M (2006) A multiview approach to tracking people in crowded scenes using a planar homography constraint. In: Proceedings of the European conference on computer vision, pp 133–146Google Scholar
  41. Krahnstoever N, Tu P, Sebastian T, Perera A, Collins R (2006) Multi-view detection and tracking of travelers and luggage in mass transit environments. In: Proceeding of IEEE international workshop on performance evaluation of tracking and surveillanceGoogle Scholar
  42. Kumar S, Hebert M (2006) Discriminative random fields. Int J Comput Vis 68:179–201CrossRefGoogle Scholar
  43. Lazebnik S, Schmid C, Ponce J (2003) Affine-invariant local descriptors and neighborhood statistics for texture recognition. In: Proceedings of IEEE international conference on computer vision, pp 649–655Google Scholar
  44. Lin Z, Davis LS (2008) Learning pairwise dissimilarity profiles for appearance recognition in visual surveillance. In: International symposium on visual computing, pp 23–34Google Scholar
  45. Lo Presti L, Sclaroff S, La Cascia M (2009) Object matching in distributed video surveillance systems by lda-based appearance descriptors. In: Proceedings of the international conference on image analysis and processingGoogle Scholar
  46. Lowe D (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60:91–110CrossRefGoogle Scholar
  47. Ma X, Grimson WEL (2005) Edge-based rich representation for vehicle classification. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 1185–1192Google Scholar
  48. Ma Y, Soatto S, Kosecká J, Sastry SS (2004) An invitation to 3D vision: from images to geometric models. Springer, New York, Inc.Google Scholar
  49. Madden C, Cheng E, Piccardi M (2007) Tracking people across disjoint camera views by an illumination-tolerant appearance representation. Mach Vis Appl 18(3):233–247MATHCrossRefGoogle Scholar
  50. Makris D, Ellis TJ, Black JK (2004) Bridging the gaps between cameras. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 205–210Google Scholar
  51. Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27:1615–1630CrossRefGoogle Scholar
  52. Mikolajczyk K, Schmid C, Zisserman A (2004) Human detection based on a probabilistic assembly of robust part detectors. In: Proceedings of the European conference on computer vision, pp 69–82Google Scholar
  53. Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Van Gool L (2005) A comparison of affine region detectors. Int J Comput Vis 65(1–2):43–72CrossRefGoogle Scholar
  54. Moon H, Phillips PJ (2001) Computational and performance aspects of PCA-based face-recognition algorithms. Perception 30(3):3003–3321CrossRefGoogle Scholar
  55. Mori G, Malik J (2006) Recovering 3d human body configurations using shape contexts. IEEE Trans Pattern Anal Mach Intell 28(7):1052–1062CrossRefGoogle Scholar
  56. Moscheni F, Bhattacharjee S, Kunt M (1998) Spatiotemporal segmentation based on region merging. IEEE Trans Pattern Anal Mach Intell 20(9):897–915CrossRefGoogle Scholar
  57. Nakajima C, Pontil M, Heisele B, Poggio T (2003) Full-body person recognition system. Pattern Recognit 36(9):1997–2006MATHCrossRefGoogle Scholar
  58. Oliveira de Oliveira I, de Souza Pio JL (2009) People reidentification in a camera network. In: Proceeding of the IEEE international conference on dependable, autonomic and secure computingGoogle Scholar
  59. Ozcanli OC, Tamrakar A, Kimia BB, Mundy JL (2006) Augmenting shape with appearance in vehicle category recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, New York, NY, USA, vol 1, pp 935–942Google Scholar
  60. Park A U Jain, Kitahara I, Kogure K, Hagita N (2006) ViSE: visual search engine using multiple networked cameras. In: Proceedings of the international conference on pattern recognition, pp 1204–1207Google Scholar
  61. Patras L, Hendriks EA, Lagendijk RL (2001) Video segmentation by MAP labeling of watershed segments. IEEE Trans Pattern Anal Mach Intell 23(3):326–332CrossRefGoogle Scholar
  62. Pham TV, Worring M, Smeulders AWM (2007) A multi-camera visual surveillance system for tracking of recurrences of people. In: Proceedings of the ACM/IEEE international conference distributed smart camerasGoogle Scholar
  63. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognitionGoogle Scholar
  64. Phillips P, Flynn P, Scruggs T, Bowyer K, Chang J, Hoffman K, Marques J, Min J, Worek W (2005) Overview of the face recognition grand challenge. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 947–954Google Scholar
  65. Porikli F (2003) Inter-camera color calibration by correlation model function. In: Proceedings of IEEE international conference on image processing, vol 2, pp 133–136Google Scholar
  66. Porikli F (2005) Integral histogram: a fast way to extract histograms in Cartesian spaces. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 829–836Google Scholar
  67. Prosser B, Gong S, Xiang T (2008) Multi-camera matching using bi-directional cumulative brightness transfer functions. In: Proceedings of the British machine vision conferenceGoogle Scholar
  68. Rahimi A, Dunagan B, Darrel T (2004) Simultaneous calibration and tracking with a network of non-overlapping sensors. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognitionGoogle Scholar
  69. Rasmussen C, Hager G (1998) Joint probabilistic techniques for tracking multi-part objects. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 16–21Google Scholar
  70. Savarese S, Winn J, Criminisi A (2006) Discriminative object class models of appearance and shape by correlatons. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 2033–2040Google Scholar
  71. Schiele B, Crowley JL (2000) Recognition without correspondence using multidimensional receptive field histograms. Int J Comput Vis 36(1):31–50CrossRefGoogle Scholar
  72. Schwartz WR, Davis LS (2009) Learning discriminative appearance-based models using partial least squares. In: Brazilian symposium on computer graphics and image processingGoogle Scholar
  73. Seigneur JM, Solis D, Shevlin F (2004) Ambient intelligence through image retrieval. In: International conference on image and video retrieval. Springer, Berlin, pp 526–534Google Scholar
  74. Senior A, Hsu MA R Land Mottaleb, Jain AK (2002) Face detection in color images. IEEE transactions on pattern analysis and machine intelligence 24(5):696–706Google Scholar
  75. Shotton J, Winn J, Rother C, Criminisi A (2006) TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Proceedings of the European conference on computer vision, pp 1–15Google Scholar
  76. Song Y, Goncalves L, Perona P (2003) Unsupervised learning of human motion. IEEE Trans Pattern Anal Mach Intell 25(7):814–827CrossRefGoogle Scholar
  77. Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32CrossRefGoogle Scholar
  78. Teixeira LF, Corte-Real L (2009) Video object matching across multiple independent views using local descriptors and adaptive learning. Pattern Recognit Lett 30(2):157–167CrossRefGoogle Scholar
  79. Truong Cong DN, Achard C, Khoudour L, Douadi L (2009) Video sequences association for people re-identification across multiple non-overlapping cameras. In: Proceedings of the international conference on image analysis and processingGoogle Scholar
  80. Truong Cong DN, Khoudour L, Achard C, Meurie C, Lezoray O (2010) People re-identification by spectral classification of silhouettes. Signal Process 90(8):2362–2374MATHCrossRefGoogle Scholar
  81. Tu PH, Doretto G, Krahnstoever NO, Perera AAG, Wheeler FW, Liu X, Rittscher J, Sebastian TB, Yu T, Harding KG (2007) An intelligent video framework for homeland protection. In: Carapezza EM (ed) Proceedings of SPIE defence and security symposium—unattended ground, sea, and air sensor technologies and applications IX, Orlando, vol 6562Google Scholar
  82. Tuzel O, Porikli F, Meer P (2006) Region covariance: a fast descriptor for detection and classification. In: Proceedings of the European conference on computer vision, pp 589–600Google Scholar
  83. Varma M, Zisserman A (2005) A statistical approach to texture classification from single images. Int J Comput Vis 62:61–81Google Scholar
  84. Vedaldi A, Soatto S (2006) Local features, all grown up. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 1753–1760Google Scholar
  85. Vincent L, Soille P (1991) Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans Pattern Anal Mach Intell 13(6):583–598CrossRefGoogle Scholar
  86. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57:137–154CrossRefGoogle Scholar
  87. Wang L, Tan T, Ning H, Hu W (2003) Silhouette analysis-based gait recognition for human identification. IEEE Trans Pattern Anal Mach Intell 25(12):1505–1518CrossRefGoogle Scholar
  88. Wang X, Doretto G, Sebastian TB, Rittscher J, Tu PH (2007) Shape and appearance context modeling. In: Proceedings of IEEE international conference on computer vision, pp 1–8Google Scholar
  89. Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: Proceedings of IEEE international conference on computer vision, vol 2, pp 1800–1807Google Scholar
  90. Wolf L, Bileschi S (2006) A critical view of context. Int J Comput Vis 69(2):251–261CrossRefGoogle Scholar
  91. Wu H, Liu X, Doretto G (2008) Face alignment using boosted ranking models. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1–8Google Scholar
  92. Zhang J, Collins R, Liu Y (2003) Representation and matching of articulated shapes. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp II:342–349Google Scholar
  93. Zhao Q, Tao H (2005) Object tracking using color correlogram. In: IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, pp 263–270Google Scholar
  94. Zitnick CL, Jojic N, Kang SB (2005) Consistent segmentation for optical flow estimation. In: Proceedings of IEEE international conference on computer vision, pp 1308–1315Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.West Virginia UniversityMorgantownUSA
  2. 2.GE Global ResearchNiskayunaUSA

Personalised recommendations