# A method for counting people attending large public events

- First Online:

DOI: 10.1007/s11042-013-1628-0

- Cite this article as:
- Kopaczewski, K., Szczodrak, M., Czyzewski, A. et al. Multimed Tools Appl (2015) 74: 4289. doi:10.1007/s11042-013-1628-0

- 2.2k Downloads

## Abstract

The algorithm for people counting in crowded scenes, based on the idea of virtual gate which uses optical flow method is presented. The concept and practical application of the developed algorithm under real conditions is depicted. The aim of the work is to estimate the number of people passing through entrances of a large sport hall. The most challenging problem was the unpredicted behavior of people while entering the building. The examined flow of people fluctuated between individual persons and dense crowd. A series of experiments during sport and entertainment events was made. The results of the experiments show a high efficiency of the elaborated algorithm.

### Keywords

Crowd People counting Crowd behavior## 1 Introduction

Mass crowd gatherings such as sport games or concerts can be a source of various risks for individuals, particularly evoked by excessive number of people in a specific place. Exceeding the value regarded as the safe limit may cause that in some emergency situations people would suffer injuries or death [12]. The organizers should know the number of people that are gathered in the building or in the enclosed outdoor space. Similarly to many objects of this type, the building considered in this work is not equipped with the people counting systems such as mechanical gates. Besides, people often feel concerned crossing such an installation, as to what would happen in case of necessity of rapid leaving the building. Moreover, the behavior of the crowd while entering an object through wide doors makes it impossible to use other optical or mechanical means such as radiation beam systems. The infrared barriers are ineffective because they often count a group of people being close each to other as a single person [11]. A problem is also that such solutions usually are not able to recognize the pedestrian movement direction.

Aside of video processing algorithms commonly used for surveillance of public places, the audio analysis provides an important supplement [8]. Nevertheless, when a large group of people appears, it gets very hard to determine their number by the majority of image processing methods which commonly use object detection and tracking. Background extraction-based approaches, such as the ones developed at the Multimedia Systems Department of Gdansk University of Technology [5] cannot separate objects properly when people walk at very small distances or when their hands are connected. Other methods use multiple cameras to deal with the segmentation problem [10, 14, 15], or apply models of human figures obtained during observing the foreground of an image [7, 13]. Moreover, installing numerous cameras would be impractical in the considered building.

Other solutions which utilize camera image together with laser beams for tracking feet, require deploying sensors on the ground [4].

The commercial systems diverge in the technologies applied and the actual target to solve. A laser counter offered in the market today and visible light systems are very common. These kinds of products often achieve best performance in some specific conditions, only. Meanwhile, the situations found while gathering the experimental data proven to be difficult to interpret algorithmically.

The methods of counting people in crowded scenes can be found in literature.

Albiol et al. [1] describe a technique based on the analysis of the derivate image constructed from a time sampled section of original image. Another method [2] uses statistical analysis of object corners detected while people move. Both latter methods were investigated in some specific conditions of underground train doors. Bozzoli et al. [3] propose an approach based on the sparse optical flow method and pedestrians contours obtaining by edge extraction from the image. However, similarity of such contours in not guaranteed because people may have various hair or clothes colors.

The proposed algorithm for counting people in the crowd differs from described approaches. It uses dense optical flow for motion analysis employing the so called virtual gate. The algorithm is dealing with complex situations which occur while people entering large sport halls. Moreover the algorithm is designed to work in a system with centralized architecture, where the video signals gathered from multiple cameras are being processed by an efficient computer cluster. The aim of the system is to show the estimated number of people while they are incoming to the hall through several gates. The KASKADA supercomputing platform is the algorithm working environment [9].

## 2 Virtual gate algorithm

The Virtual Gate algorithm is based on the modified Optical Flow method. The method developed for counting people does not involve classifying modules because the aim of the algorithm is to detect size and direction of motion of objects in video sequences having dimensions similar to the size of an average human body. Moreover, the Virtual Gate is used in places where human motion is expected, especially at entrances, passes, etc. Two parts of algorithm can be distinguished: the main module which performs image processing and the calibration module.

The detailed structure of the Virtual Gate is depicted in Fig. 1b. It is composed of a set of rectangle regions (*R*_{i}) situated next to each other and overlapping. Rectangles have identical shape and their size is corresponding to the size of an average human body contour, with respect to its height and width (for a particular camera view). Rectangles are spaced evenly along *x* axis and they are 80 % overlapped.

*R*

_{i}using the Dense Optical Flow method [6]. Choice of Dense Optical Flow was made in order to obtain motion description per each pixel. The set of vectors representing the direction and the velocity of the motion detected is obtained as the result of the operation above. Displacement vectors can be expressed by the planar vector field:

,**i****j**unit vectors of

*x*and*d*axes**e**_{ρ},**e**_{φ}unit vectors related to polar coordinates (the distance from the axis of symmetry, the angle measured counterclockwise from the positive

*x*-axis).

Two directions of people motion through the virtual gate are considered, namely forward and backward (“in” and “out”, +*d* and –*d*, as in Fig. 1c). Moreover, a small divergence of the direction (±*α*) is allowed, because usually people do not maintain bearing while walking. The tolerance should not be too large, because of the need to discard those walking along the gate.

*R*

_{i}) is presented in Fig. 2. For each input video frame, vectors representing motion speed and direction are calculated. Components of Eq. (1) can be written in a form of cylindrical (in this two dimensional case - polar) coordinates:

*φ*

_{0}denote the angle corresponding to the direction pointed by

*d*. New functions

*V*

_{x}

^{I},

*V*

_{d}

^{I},

*V*

_{x}

^{O}and

*V*

_{d}

^{O}are calculated as given in Eq. (3) in order to obtain desired direction vectors (I-means “in”, O-“out”):

*L*

^{I}) or “out” (

*L*

^{O}), enclosed in each region

*R*

_{i}is obtained according to the following Eq. (4):

In the next step, *L*^{I} and *L*^{O} are compared to a threshold value *T*_{S} which is proportional to the average area of human silhouette at a given camera view, the value being obtained experimentally. If *L*^{(•)} is greater than the threshold value *T*_{S}, the “in” or “out” people counter is increased and rectangle *R*_{i} enters an inactive (“hold”) state for the period of *C* frames. The inactive state means that calculations are not performed for region *R*_{i}. This operation is done in order to avoid counting errors and to allow leaving the area of *R*_{i} by the moving object while not being counted more than once. The period of the “hold” state is obtained experimentally during the calibration.

*T*) in order to improve the effectiveness of the algorithm. Searching is based on the bisection method which approaches the optimal result in sequential iterations. The input data for the calibration process are:

video sequence presenting passage of individuals and groups of people,

the Virtual Gate geometry and parameters (i.e. dimensions of regions

*R*_{i}, number and distance between regions),number of people passed through the gate in selected time moments.

*N*number of selected time moments

*p*_{n}=*p*_{0},*p*_{1}, …,*p*_{N}real number of people passed through the gate during the period between the instants

*n*-1 and*n*(the Ground Truth)*w*_{n}=*w*_{0},*w*_{1}, …,*w*_{N}weight coefficient.

*y*_{n}=*y*_{0},*y*_{1}, …,*y*_{N}the number of moving objects counted during the period between the instants

*n*-1 and*n*by the Virtual Gate algorithm*S*_{n}=*S*_{0},*S*_{1}, …,*S*_{N}counting error in selected time moment

*S*total counting error.

*w*

_{n}is decreased and in case of individuals, it is increased. In each step, partial errors (Eq. (5)) and the total error (Eq. (6)) are minimized and then the counting threshold is increased or decreased respectively according to Eqs. (7) and (8). The calibration process stops when

*T*

_{corr}≤ 1.

*T*counting threshold (value within range 0, 1…100)

*T*_{corr}counting threshold correction.

## 3 Experimental results

The experiments were made in the sports and entertainment hall which maximum capacity is 15.000 of people (“Ergo Arena” located in Gdansk). The image acquisition hardware was deployed at the main entrance which consists of 6 symmetrical doors. The cameras were set to observe only 3 doors because the others were not being used during the events, frequently. The cameras were installed at the height of 6.5 m above the floor, whereas the width of the door is 2.9 m. Image acquisition speed was 30 frames per second, resolution 640 × 360 points.

Total number of 11 recordings have been gathered during sport and entertainment events. Each represents a real situation of people and crowd entering the sports and entertainment hall, whereas the length of each is about 1.5 h. The total length of the experimental material is about 16.5 h. The number of people in each test recording was counted manually and treated as Ground Truth reference value, which was later compared to the Virtual Gate algorithm output.

The people counting system based on Virtual Gate algorithm can count in real time people passing through numerous entrances. The image gathered from multiple camera is being analyzed simultaneously on a supercomputer with the support of the KASKADA platform. Such a centralized architecture provides simplicity of changing the counting system scale.

### 3.1 Real situations during people counting

The algorithm was examined in real conditions, while crowds have been entering the sport and entertainment hall. Practically some situations posed considerable difficulties for people counting algorithms. Below, we present some examples of people behavior and encountered problems.

### 3.2 People counting results

Counting threshold (*T*) parameter of Virtual Gate for test recordings

Recording | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|---|

| 65 | 65 | 68 | 62 | 68 | 65 | 70 | 68 | 68 | 68 | 65 |

Recordings 6 and 7 represent cases of the worst achieved counting error values (overestimation and underestimation of number of people, respectively). The error value shown in Fig. 11 is rising rapidly between 33rd and 35th minute of recording. In this period a very dense crowd has pushed towards the hall and groups of people moved chaotically in the area of virtual gate. Moreover, the position of the ticket checker was not constant. In case of chart depicted in Fig. 12, the error is increasing significantly between 35th and 50th minute of the experiment. The error is mainly caused by stopping people in the counting area and by unpredicted movements (i.e. moving back and partially re-entering the counting area).

*A*) is calculated according to the following equation:

*V*number of people obtained by Virtual Gate

*N*reference (true) number of people

*i*denotes recording number.

Accuracy of Virtual Gate algorithm

Recording | Count (Virtual Gate) | Count (Ground Truth) | Accuracy of V.G. algorithm ( |
---|---|---|---|

1 | 644 | 642 | 99.69 |

2 | 995 | 1017 | 97.84 |

3 | 440 | 442 | 99.55 |

4 | 555 | 573 | 96.86 |

5 | 739 | 771 | 95.85 |

6 | 720 | 673 | 93.47 |

7 | 817 | 860 | 94.74 |

8 | 324 | 336 | 96.43 |

9 | 402 | 420 | 95.71 |

10 | 420 | 433 | 97.00 |

11 | 445 | 478 | 93.10 |

Total | 6497 | 6649 | 97.71 |

## 4 Conclusions

The concept and practical application results of the algorithm for people counting in a crowd passing through the gates of large sport object were presented in this paper. We described the approach to the analysis of motion data obtained with the dense optical flow estimated with the devised Virtual Gate algorithm. The algorithm efficiency was examined using some real videos recorded at a large sport and entertainment hall (Ergo-Arena in Gdansk). The counting precision achieved of 97.6 % on an average at total 6,649 persons, is satisfactory considering the real people behavior which was far from the organized movement. A further work will focus on extending the system to operate at other entrances to the hall of the Ergo-Arena object being used less frequently. Moreover, changes of organization of the process of crowd entering the building would be necessary in order to improve the algorithm accuracy. The future work may also focus on a practical application of the system to other large public objects.

## Acknowledgements

Research funded within the project No. POIG.02.03.03-00-008/08, entitled “MAYDAY EURO 2012—the supercomputer platform of context-depended analysis of multimedia data streams for identifying specified objects or safety threads”. The project is subsidized by the European regional development fund and by the Polish State budget.

## Copyright information

**Open Access** This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.