Keywords

1 Introduction

In recent years, video surveillance has grown more and more. This resulted in an increase of cameras installed in different places (private or public), making their exploitation and monitoring very difficult for human being. That is why much research has been done to create intelligent vision systems that can help the human being, in interpreting scenes and reacting with alarms in case of any anomaly. Currently there are several types of video surveillance systems (access control in sensitive locations, people recognition, control of traffic congestion, ...etc.).

In this paper we are interested in the problem of person re-identification in a camera network. Re-identification in computer vision systems aims to follow a person, associate an identifier to him, and store it in a database. If the person leaves the scene then reappears in the field of view of any camera, it will be assigned the same identifier. In a crowded and uncontrolled environment observed by cameras from unknown distances, person re-identification relying upon conventional biometrics, such as face recognition, is neither feasible nor reliable, due to insufficiently constrained conditions and insufficient image details for extracting robust biometrics [17]. Instead, visual features based on the appearance of people, determined by their clothing and objects carried or associated with them, can be exploited more reliably for re-identification.

The remainder of this paper is organized as follows: Sect. 2 presents some related works from the literature. Section 3 descibes in details each block of the proposed system. The experimental results and their discussion are presented in Sect. 4. Finally, some conclusions are drawn in Sect. 5.

2 Related Works

In the literature, the approaches of re-identification can be grouped in several classes, according to several criteria [12]:

  1. 1.

    The number of images per person: This class comprises two families. The first family is the family of mono-sample methods, where the signature of a person is extracted from a single image as in [1, 3, 6, 15, 16, 24] . The second family is the family of multi-sample methods, where multiple images are used to calculate the signature of a person as in [4, 5, 7, 11, 14, 19, 21].

  2. 2.

    The type of representation: The first family in this class is the family of global approaches, where the whole information in the image is exploited for calculating the person’s signature, as in [1, 2, 13]. The second family is that of local approaches, which represent the image by several feature vectors, each vector describes a region or a locally detected point, such as in [5, 8, 9].

  3. 3.

    The existence of a set of images mapped a priori: This class includes supervised approaches like in [2, 3, 14] and unsupervised approaches as in [10, 23].

A very nice survey of people re-identification approaches is presented in [22]. They are therein grouped as a multidimensional taxonomy according to camera setting, sample set cardinality, signature, adoption of a body model, machine learning techniques and application scenario.

3 Description of the Proposed System

In this section, we describe the different stages of the proposed system, for person re-identification in non-overlapping camera network. These stages are: person detection, their localization and verification, their tracking and their re-identification. The overall flowchart of the proposed system is shown in Fig. 1.

Fig. 1.
figure 1

The flowchart of the proposed system

3.1 Person Detection

This initial stage is accomplished by combining the Mixture of Gaussians (MoG) method [20] and the difference method. The MoG is one of the most used and successful methods in surveillance systems, because it is adaptive, and can handle multimodal backgrounds.

In the difference method, we first take the difference between two successive images in grayscale \( I_{g(t)} \) and \( I_{g(t-1)}\), as in  Eq. (1), and then we compare the resulting difference image \( I_{diff} \) to a threshold to detect pixels in movement.

$$\begin{aligned} I_{diff} = I_{g(t)} - I_{g(t-1)} \end{aligned}$$
(1)

The hybrid of detections resulting from the MoG and difference methods is performed using the logical OR operation.

After detecting moving objects we fill the holes [18]. The holes of a binary image correspond to the set of its regional minima, which are not connected to the image border.

3.2 Person Localization

The localization of the detected person is done using the labeling technique. This technique consists in separating the areas in the mask obtained from the detection step. We associate with each area an integer value (label) by using an 8-connected neighborhood, then we calculate some proprieties for each area, e.g. x and y coordinates, height, width and sum of foreground pixels.

3.3 Verification

To eliminate false detections, we propose a verification phase. To be validated each detected person has to verify the following three conditions:

  • The ratio of width to height: this ratio has to lie between min and max thresholds.

  • The surface of the rectangle containing the person (surface = height \(\times \) width) has to lie between min and max thresholds. This is to eliminate very small and very big objects due to false detection.

  • The ratio of the sum of foreground pixels to the surface also has to be limited.

3.4 Person Tracking

The person tracking process is done by template matching using the Sum of Squared Differences Algorithm (SSD). In digital image processing, the SSD is a measure of the similarity between image blocks. It is calculated by taking the square of the difference between each pixel in the original block X (a portion from the current frame) and the corresponding pixel in the Y block being used for comparison (Model from previous detection).

These differences are summed to create a simple metric of block similarity as in  Eq. (2), zero means that the two blocks are identical. We sweep all the positions in the frame, then the block with the smallest metric is the tracked block.

The SSD value for two blocks X and Y calculated by:

$$\begin{aligned} SSD = \sum _{i=1}^{M}\sum _{j=1}^{N} (X(i,j) - Y(i,j))^2 \end{aligned}$$
(2)

For a given Y model, the most similar block X is the one that minimizes the SSD.

Fig. 2.
figure 2

Detailed flowchart of re-identification and online construction of database

3.5 Re-identification and Association

Following the stages of detection, localization, verification, and tracking, we have the stage of re-identification and online construction of database DB containing the history of each person that appeared in the view field of the cameras. Figure 2 presents a detailed flowchart of this stage.

This stage deals with the moving objects obtained from the detection and tracking stages, which are called ‘found person’.

First, we calculate the intersection between the found persons resulting from the detection and tracking, the intersection \( (A\cap B) \) of two rectangles A and B is the rectangle that contains all elements of A that also belong to B.

Then we test if the found persons resulted from detection only, tracking only or from both. If found person comes from intersection or tracking only, we update the database with the identifier of tracked person.

On the other hand, if that found person comes from detection only, then we calculate its histogram. An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image. It plots the number of pixels for each tonal value. The histogram of the found person is compared to the histograms of identified persons stored in the database. If there is a match, then we update the database by associating that person with the matched identifier, otherwise, we consider this person as a new one and assign to it a new identifier that is added to the database.

4 Experimental Results

In this section, we present the material and the database used, the experimental results, and their discussion.

4.1 System development environment

The material that was used for the development of our application is:

  1. 1.

    A laptop with:

    • Processor: Intel core i7 4702MQ CPU @ 2.20 GHz 2.20 GHz.

    • RAM memory: 8.00 Go.

    • Operating system: Windows 8.1, 64-bits

    • Hard Drive: 1 TB.

  2. 2.

    Digital video recorder DVR.

  3. 3.

    Camera with characteristics:

    • 1/3 Sony HR CCD

    • 420 TV lines

    • 0.2 Lux

    • Adjustable Focal between (3 mm and 8 mm).

To test our system we build our own database, composed of sequences of images recorded on the third floor of the Department of Electronics at USTO university. Three cameras, set to a height of (2.30 m) and with an angle of (−30), were used to take these images. Each sequence contains from one to three people who walk in the fields of view of the three cameras. The cameras were placed as shown in the layout presented in Fig. 3.

Fig. 3.
figure 3

Layout of the cameras

Fig. 4.
figure 4

Fields of view of the 3 cameras: (a) Camera 1, (b) Camera 2, (c) Camera 3

To fulfil the condition of a non-overlapping camera network, the database was realized so that a person lies in the field of view of only one camera, at a given instant. Figure 4 shows the fields of view of the three cameras.

4.2 Experimental Results, and Discussion

In this section, we will present and discuss the results of each step of the proposed system. The Mixture of Gaussian gives us raw results of detection from each camera, after having defined suitable settings according to some criteria, like: indoor or outdoor environment, people movement speed and lighting changes. Figure 5(b) presents an example of these results. To improve these raw results, we combine them with the results of the difference method (Fig. 5(c)), which allows for the detection of the edges of moving objects, then we proceed to a holes filling of the resulting image to obtain better results as illustrated in Fig. 5(d).

Fig. 5.
figure 5

Example of person detection. (a) Original image, (b) Results of detection by MoG, (c) Results of detection by difference, (d) Results of the holes filling of the OR between b and c

In Fig. 6, we present the localization and verification results. After the localization by the labeling technique, we apply the verification procedure to each person. In Fig. 6(a), only the persons that verify the validation conditions are kept (the person in green rectangle), the others in red are ignored. Figure 6(b) presents the detection results.

The tracking step is run in parallel with the detection step and it is realized by the SSD. To accelerate its execution we decided to apply it only to a limited region of interest, instead of searching in the whole frame. This region is determined by the coordinates of the model to track. The obtained results of tracking are satisfactory. Figure 7 shows the tracking results, Fig. 7(a) is a detection in frame 159 and Fig. 7(b) is its tracking in frame 222.

Fig. 6.
figure 6

Example of localization and verification, (a) Localization and verification results on the original image, (b) Results of detection

Fig. 7.
figure 7

Tracking results, (a) Detection in frame 159, (b) The tracking of detection of frame 159 in frame 222

Fig. 8.
figure 8

The multiplication of detected person with its mask, (a) Detected person, (b) Mask of detected person and (c) Results of multiplication

Fig. 9.
figure 9

Different histograms of the silhouette, (a), (b), (c) and (d) are the histograms of Red, Green, Blue channels and grayscales respectively (Color figure online)

Fig. 10.
figure 10

A sample of the constructed database

The re-identification stage is realized using two techniques, the intersection of detection and tracking for the temporal association, and the histogram for comparison. In Fig. 8, we present the multiplication of the detected person with its mask to extract the silhouette only. Then, we calculate histograms of Red, Green, Blue channels and grayscales as shown in Fig. 9. We use the histogram of the silhouette to avoid the effect of the background. These different histograms are used for comparison with the models stored in the database. If there is a match, we associate the matched identifier to the actual person, otherwise, we consider that the actual person is new and add it to the database with a new identifier.

A sample of the constructed database is presented in Fig. 10. This database contains the history of every person that enters the field of view of the cameras.

5 Conclusion

In this paper, we presented the conception and implementation of a system for person re-identification in a camera network, based on the appearance. This system aims to build an online database that contains the history of every person captured by the cameras.

This system is able to assign an identifier to each detected person, that it keeps everywhere in the fields of view of the cameras and even if he or she disappears and then appears again.

Our system implements an improved detection technique that combines the Mixture of Gaussians method and the difference method. The SSD algorithm with an acceleration strategy is used for the tracking step, whereas the re-identification stage is realized using two techniques: the intersection for temporal association and the histogram comparison.

The global system was tested on a real data set collected by three cameras. The experimental results show that our approach leads to very satisfactory results with an opportunity for improvement in the re-identification stage, by using a local histograms instead of using the global one. Also as a future work, we plan to evaluate our method quantitatively and compare it with other methods.