The concept of entropy is based on the generic observation that there are many more ways for a system of microscopic particles to be disordered than to have a certain specific order. While manifesting a specific macroscopic state, it is more probable for such system, or statistical ensemble to assume a level of disorder. If a certain order is observed, given that it would have been unlikely that such order is obtained at random, it can be concluded with certain confidence that there were other forces at play which enforced such arrangement.
In this section, one assumes that the crowd is a homogeneous system or otherwise the concepts are considered for homogeneous groups within the crowd. Detection of homogeneous groups is achieved through the use of Collective merging [37]. These meso-scale groups are tracked in consecutive frames. This is further discussed in Sect. 4.
In classical statistical mechanics theory, entropy, S, is the measure of mechanical disorder for a system of microscopic particles. It is defined in the following way:
$$\begin{aligned} S=-K\sum _{i}{p_i\ln {p_i}} \end{aligned}$$
(1)
where, for a system with a discrete set of microstates, \(p_i\) is the probability of occurrence for microstate i and K is the Boltzmann constant. Similarly, entropy (mostly denoted by H) is adopted as a measure of uncertainty in information theory:
$$\begin{aligned} H=-\sum _{i}p_i\log _b{p_i}. \end{aligned}$$
(2)
For both the above-mentioned theories, entropy leads to understanding of the overall macroscopic state of a system of microscopic particles, by calculating the statistical realisation of their microscopic states. The initial definition of entropy in classical statistical mechanics, \(S=k_B\ln {W}\), connects entropy directly to the number of microstates, W, which corresponds to the macroscopic state of the system.
Considering the states of matter which include solid, liquid and gas, entropy for these states can be understood intuitively. In a solid, molecules oscillate around a fixed point, while entropy remains relatively low. In a liquid system, molecules move relatively freely while keeping certain distances from one another. In such case, entropy is usually higher in value than that of a solid system. Finally in a gaseous system, the constituent molecules can freely move anywhere, which leads to the highest values of entropy. In other words, higher manifested values in entropy are observed when the uncertainty on the position of the constituent molecules of matter increases.
One of the challenges in evaluating the value of entropy is that for each crowd example only a limited subset of all possible microstates are observed. Therefore, it is not possible to count the number of microstates or directly calculate their probabilities. For this, an extra step is devised to infer a model for all possible microstates using the set of observed microstates.
Calculation of entropy using a microstate model
We define the entropy of a crowd as the joint entropy of \(N_p\) individuals who are scattered in \(N_l\) locations with a probability mass function \(f_{Y_i}\) on a discrete random variable, \(Y_i\), defined at each spatial bin, \(l_i\).
The joint entropy of two ensembles X and Y is [16]
$$\begin{aligned} H(X,Y)=\sum _{xy\in {\mathcal {A}}_X{\mathcal {A}}_Y}{P(x,y)\log {\frac{1}{P(x,y)}}} \end{aligned}$$
(3)
where both X and Y are triples. X is a triple \((x,{\mathcal {A}}_X,{\mathcal {P}}_X)\) where x is the value of a random variable, which takes on one of a set of possible values, \({\mathcal {A}}_X = \{a_1,a_2,\ldots ,a_I\}\), having probabilities \({\mathcal {P}}_X = \{p_1,p_2,\ldots ,p_I\}\). Similarly, Y is a triple \((y,{\mathcal {A}}_Y,{\mathcal {P}}_Y)\).
Thereby, the entropy of a crowd can be described as
$$\begin{aligned}&H(X_1,\dots ,X_{N_p})\nonumber \\&\quad =-\sum _{x_1\in {\mathcal {L}}_X}{\ldots \sum _{x_{N_p}\in {\mathcal {L}}_X}{P(x_1,\dots ,x_{N_p})\log {P(x_1,\dots ,x_{N_p})}}}\nonumber \\ \end{aligned}$$
(4)
where \(X_k\) is a triple \((x_k, {\mathcal {L}}_X, {\mathcal {P}}_{X_k})\). \(x_k\) takes on one of a set of possible values, \({\mathcal {L}}_X=\{l_1,l_2,\dots ,l_{N_l}\}\), having probabilities \({\mathcal {P}}_{X_k}=\{p_{k,1},p_{k,2},\dots ,p_{k,N_l}\}\), with \(P(x_k=l_i)=p_{k,i}\). Two approaches are considered here to evaluate \(H(X_1,\dots ,X_{N_p})\).
Approach 1: Complete enumeration
First, the complete enumeration of all possible microstates is considered using the ones which have been observed to calculate \(f_{Y_i}\)s. The joint probabilities, \(P(x_1,\dots ,x_{N_p})\), in Eq. (4) are the other unknowns. While these probabilities can be calculated using the probability mass functions \(f_{Y_i}\), assumptions regarding the dependency of the individuals need to be made. The computation cost is of the order \(O(N_{l}^{N_p})\). This is the permutation of \(N_p\) individuals scattered in \(N_l\) locations. For each of these permutations, the joint probability \(p(X_1, \ldots ,X_{N_p})\) needs to be calculated.
The validity of this approach may be contested since the probability mass functions \(f_{Y_i}\) are calculated using a limited sample set of observed microstates and it is prone to over-fitting the model to the observed sample set. Thus, relaxing some of the conditions in this model may be favourable for a better coverage of the space of all possible microstates.
Approach 2: Preserving the density pattern
One of the assumptions in the above approach concerns the dependence between the positions of the individuals. In the example below, it will be shown that although there is reason to believe that these positions are dependent, sufficient information is not available to understand their dependencies in an unbiased manner.
In support of the dependency argument, consider that people tend to keep certain distances from each other, the so-called personal space. Also depending on the relationships between the individuals, they may tend to group together or avoid others. From a different point of view, consider that a certain macrostate has been observed in a crowd: a number of clusters of people are observed in different locations. There may be different causes for this effect. Hypothesis A: some physical locations are more desirable than others, and people cluster in them for that reason. Hypothesis B: there is some social relationship between members of the crowd, and they cluster together due to that relationship. In Hypothesis B, the act of clustering is important, while the cluster positions are random. Furthermore, Hypothesis C can be added to accommodate the combination of the other two hypotheses. However, sufficient information is not given in favour of either hypothesis A, B or C in the above example.
Therefore, we propose that when analysing crowd formation through a few correlated frames a simpler model which exhibits similar outcomes is adopted. We hypothesize that a pattern is formed in the crowd if each individual is bounded by the same pattern. In this model, apart from the locations of people, which are considered to be independent, the individuals are considered to be identical. As a result of this approach, the calculation of entropy simplifies.
Let \(n_{i,j}\) be the number of times that individual j has been observed in bin \(l_i\) in \(N_f\) frames (\(N_f\) is the number of frames in a chosen time window). The probability of selecting this bin, \(l_i\), by individual j is
$$\begin{aligned} P(x_j=l_i)=\frac{n_{i,j}}{N_f}. \end{aligned}$$
(5)
Given that the location of individuals is considered independent and no distinction applies between individuals, the probability of any individual selecting bin \(l_i\) is the same as any other individual. Thus, the probability of selecting bin \(l_i\), \(P(x=l_i)\), is estimated in the following way:
$$\begin{aligned} P(x=l_i)= & {} \frac{\sum _{k=1}^{N_p}{P(x_k=l_i)}}{N_p}\nonumber \\= & {} \frac{\sum _{k=1}^{N_p}{\frac{n_{i,k}}{N_f}}}{N_p}= \frac{\sum _{k=1}^{N_p}{n_{i,k}}}{N_f N_p}= \frac{n_i}{N_f N_p} \end{aligned}$$
(6)
where \(n_i\) is the sum of all density counts at bin \(l_i\) in \(N_f\) frames. Since the locations of individuals are independent, the joint entropy of the crowd, \(H(X_1,\dots ,X_{N_p})\), simplifies as
$$\begin{aligned} H(X_1,\dots ,X_{N_p})=\sum _{k=1}^{N_p}{H(X_k)}. \end{aligned}$$
(7)
Also, note that the locations of all the individuals are based on the same location probabilities, \(P(x=l_i)\). Thus,
$$\begin{aligned}&H(X_1)=H(X_2)=\dots =H(X_{N_p}), \end{aligned}$$
(8)
$$\begin{aligned}&H(X_1,\dots ,X_{N_p})=N_p H(X) \end{aligned}$$
(9)
where X is a triple \((x, {\mathcal {L}}_X,{\mathcal {P}}_X)\), the outcome x is the value of a random variable which takes on one of a set of possible values, \({\mathcal {L}}_X=\{l_1,l_2,\dots ,l_{N_l})\}\), having probabilities \({\mathcal {P}}_X=\{p_1,p_2,\dots ,p_{N_l}\}\), with \(P(x=l_i)=p_i\) as was defined in Eq. (6). The crowd entropy in Eq. (8) has a time complexity of \(O(N_l)\). (This is the time required to calculate \(p_i\)s using a constant time window size.) In other words, the entropy can be computed in linear time.
Pre-processing
Three pre-processing stages should be considered before crowd entropy is calculated:
Real-world locations; The locations of individuals in an image have been subjected to projective transform. The severity of the distortion caused by this transform is relative to the angle between camera’s image plane and the scene’s ground plane. Ideally, this angle would be zero. This is specifically when the camera is placed overhead and looking down at the crowd. The location of individuals becomes increasingly skewed as this angle increases. Figure 4 shows three examples where the disruptive effects of projective transform are increasing from left to right.
Given the head locations of the individuals, the real-world positions can be retrieved using the camera calibration matrix and assuming an average height for the entire crowd. This is done through head-height plane homography transform [24]. However, the problem of head detection has proven difficult in the context of crowds. An alternative method using image features is discussed in Sect. 3.5. [19] also noted in their survey paper that side views “are least preferable for particle-based frameworks”. However, a soft calibration can be considered in the case of features as was also demonstrated by Zhou et al. [37].
Internal position density map; In order to calculate entropy, the internal position of each individual within the crowd, \(x_i\), is required. If the crowd is stationary, then the observed position, \(x_o\), is equal to the internal position \((x_i=x_o \iff v_f=0)\). However, if the crowd is moving with a flow velocity, \(v_f\), the change in the internal position in a time step dt can be calculated as
$$\begin{aligned} dx_i=dx_o-v_fdt. \end{aligned}$$
(10)
Internal position density map; Once the internal positions of individuals are known, an internal density map can be created. Note that the width of the density map bins, \(w_{bin}\), is a significant parameter in the calculation of entropy. In this, a too large a bin will mask the very information that entropy is aiming to extract; with a too large a spatial bin, a gas and a solid may appear similar the way they uniformly occupy the space, while a too small a bin will be prone to noise. This is illustrated in Fig. 5. This figure shows two entropy levels with Fig. 5a low entropy and Fig. 5b high entropy. A time window of two consecutive frames is also depicted with the blue circles representing the position of particles at time \(t_0\) and green circles for positions at time \(t_1\). The spatial gridding was done using two bin sizes: large bins and small bins. Please note that each spatial bin only counts the number of particles which lands on that bin, while the location of the particle within the bin is inconsequential. Conceptually, entropy for Fig. 5a when observed by the large bin is zero, since there is no difference between the two observed microstates and the particles appear stationary. The oscillations are better observed with the small bin where two of the particles are observed in new bins in \(t_1\). In Fig. 5b, depicting a large entropy, the large bin only observes three out of 16 particles to have moved between \(t_0\) and \(t_1\), while this number is 15 out of 16 for the small bin. As a larger time window is considered, it is expected that the particles of example (b) would populate the available space, while the particles of example (a) are expected to oscillate around the original location. This effect cannot be observed by the large bin since it even observes the example (a) with a two-frame time window as uniformly populating the available space.
Normalisation of Entropy
Non-normalised entropy can only be used to compare crowds which are composed of the same number of individuals and have the same spatial extent. Since these conditions are rarely met, the normalisation of entropy becomes a necessary step to achieve.
Specific entropy; Specific entropy is the entropy per unit of mass. Assuming each individual has a unit of mass, the specific entropy, \(H_k\), will be the entropy of one individual in this crowd:
$$\begin{aligned} H_k=H(X) \end{aligned}$$
(11)
where X is a triple \((x, {\mathcal {L}}_X,{\mathcal {P}}_X)\), as in Eq. (8).
Specific entropy per unit of area; Entropy is maximized if \({\mathcal {P}}_X\) is uniform [16]: \(H(X)\le \log |{\mathcal {L}}_X|\) with equality achieved \(iff \quad \forall i\in \{1,\dots ,N_l\}, p_i=\frac{1}{|{\mathcal {L}}_X|} =\frac{1}{N_l}\).
It can be seen that the maximum value of entropy increases with the increase in the number of spatial bins, \(N_l\). To account for this, we borrow a concept called redundancy from information theory. Redundancy is a measure of the amount of wasted space when coding and transmitting data. The redundancy of X, R(X), on alphabet \({\mathcal {A}}_X\) measures the fractional difference between H(X) and its maximum possible value:
$$\begin{aligned} R(x)=1-\frac{H(X)}{\log |{\mathcal {A}}_X|}. \end{aligned}$$
(12)
Complementary to the concept of redundancy is efficiency, where redundancy and efficiency of a code add up to one. In this case, our notion of normalised specific entropy, \(h_k\), is analogous to efficiency:
$$\begin{aligned} h_k=\frac{H_k}{\log N_l}. \end{aligned}$$
(13)
Minimum entropy; The minimum value for entropy is theoretically equal to zero. This is when only one microstate is possible for the system, and therefore the probability of that microstate to occur is one. We do not differentiate between individuals, and the probability of their presence at each location is calculated from the density map of the entire crowd. Thus, except if the entire crowd is concentrated at one spatial bin (which does not sound like a proper behaviour for a crowd if the bin size is set correctly), the minimum value of zero is not obtainable. Instead, the obtainable minimum value of entropy is dependent on the initial density map, which in turn depends on the number of individuals, their sparseness and the bin sizes. It is desirable to assign a small entropy to a crowd that holds its structure, no matter how dense or sparse that structure may be. In this, the focus should be on the deviation of the crowd from its original arrangement. The minimum entropy is assumed to be that of the initial state (with window size zero). This normalises for density and sparsity of the crowd. A crowd for which the members hold their initial positions and just oscillate within the bounds of their respective positions the entropy is considered to be minimum within that time window. The entropy of this crowd is mapped onto zero entropy. In other words, only if the same structure is repeatedly replicated the entropy is considered to be zero. In practice, as the time windows get larger, uncertainty and noise build up and generally entropy grows with the increase in the size of the time window. Therefore, in real examples zero entropies do not occur. Similarly, the uniform coverage of the spatial bins will not be achieved in real examples and so is the entropy value of one. A word of caution: it is possible that in the initial state the particles are nearly uniformly distributed. In such cases, the difference between the minimum and maximum entropy is very small. This is generally a cue to incorrect bin size. An example of this was seen with the large bin in Fig. 5. The minimum entropy is thereby defined as
$$\begin{aligned} h_{min}=-\sum _{i=1}^{N_l}{p_{0_i} \log p_{0_i}} \end{aligned}$$
(14)
where \(p_{0_i}\) is the probability of location \(l_i\) being occupied in the initial frame. Thereby, the normalised, scaled, specific entropy, \(\hbar _k\), is defined as
$$\begin{aligned} \hbar _k=\frac{H_k-h_{min}}{\log N_l - h_{min}}. \end{aligned}$$
(15)
The normalised, scaled specific entropy, \(\hbar _k\), will be referred to as entropy hereafter.
Experimental results
Three crowd examples have been used in order to demonstrate the proposed method for conceptualising crowd as a statistical mechanical system. Experiment A (exp A) shows a crowd of people going down a staircase. The motion of the crowd in this example is unidirectional. Figure 6a shows one frame example of this crowd. It depicts an indoor scene with artificial lighting, while the crowd is viewed from an oblique frontal view. Figure 6b shows the second crowd example (exp B). This focuses on people on an escalator which is located on the left-hand side of the same video footage. Here, the pedestrians are mostly standing still while the escalator carries them upwards. Finally, Fig. 6c shows a larger crowd in an open indoor space (shopping mall) with pedestrians moving in various directions (exp C). Both exp A and B reside in a crowd footage from the data-driven crowd analysis data set [23]. This video is captured at a resolution of \(640 \times 360\) pixels and comprises 1155 frames at 25 frames per second (fps). Exp C uses a video footage from the Collective Motion Database [35] and has a resolution of \(1000 \times 670\) pixels with 600 frames captured at 25 fps.
It is expected that: (i) the crowd in exp B, Fig. 6b, has the smallest entropy; (ii) the crowd in exp A, Fig. 6a, has a larger entropy than the crowd in exp B but still smaller than that of the crowd in exp C, Fig. 6c. The largest entropy is envisaged for the crowd in exp C.
In these experiments, the respective figures show three calibration planes. In this, the orange plane is the reference plane which is manually drawn. The blue and yellow planes are the ground-level and head-level planes, respectively. These are projected back to the image plane after calibration. The red circles show the position of the individuals’ heads on the head-level plane. Entropy was initially calculated using manually labelled heads. These were projected into the ground plane [24]. For this, a pre-processing step with a head detection algorithm was assumed to be present. Experiments were carried out for varying time window sizes \((w_{tw})\) and spatial bin widths \((w_{bin})\). The results confirmed the hypothesis with
$$\begin{aligned} \hbar _k(X_{exp_C})>\hbar _k(X_{exp_A})>\hbar _k(X_{exp_B}). \end{aligned}$$
(16)
Figure 7 shows the results, where a time window size of 3 s is used. It can be seen that the order of entropy values is as expected and the separation within the error bars between the various experiment crowds is mostly achieved. This figure also demonstrates the effects of spatial bin size, where bins in the range of [0.01 m, 0.6 m] are investigated. It can be seen that the smallest bin sizes do not offer a good separation between the crowds. The same also goes for the larger bin sizes. The best separation is achieved for bin sizes within the range [0.04 m, 0.2 m]. As the bins get larger, the entropy becomes unstable for the escalator case. This can be attributed to the small volume of the escalator crowd as well as that too large bin sizes are not sensitive enough to the differences in individuals’ motions. It is also observed that larger time windows offer better separation. However, it must be noted that due to observing a non-stationary crowd with a stationary camera, it is possible that the crowd or the section of the crowd which is being analysed would move beyond the camera field of view. The results for exp B, when analysed with a 5 s time window, may be less reliable for this reason.
However, as it transpired, obtaining a good tracking of heads with a generic algorithm for different crowd examples was an elusive task. Thus, a series of image features that are detected readily and tracked easily were considered as the initial step. The immediate concern would be that the features are not necessarily from the head area; they can be from different parts of the body. However, if the crowd is dense enough most features will be from the head region. But since we deal with crowds that are not sufficiently dense, the mapping of the features onto the ground plane is problematic. We have experimented with masking the none head plane regions to eliminate those features which are defiantly not on the head plane and assume that the rest of the features are on the head plane. However, this is a very naive assumption and introduces large errors in the position of features. Depending on the specifics of the example, these errors may be more disruptive than the distortion caused by the projective transform. This issue will be discussed further in the next section.
Entropy via image features
Corner feature detection using a method which was introduced by [26] is used for feature detection in images. These features are specifically designed to be suitable for tracking. If a background image is available, the detected features are compared against the features which are detected on the background and the background features are removed from the list of detected features. As mentioned before, a mask for the head plane is used to eliminate all the features which cannot be on the head plane. The remaining features are assumed to be on the head plane and are mapped onto the ground plane. Entropy is calculated as before.
A visual and intuitive description of how the algorithm works is shown in Figs. 8 and 9. In Eq. 6, \(n_i\) was defined as the sum of all density counts at bin \(l_i\) in \(N_f\) frames. It can also be seen from this equation that the \(p_i\)s which determine the value of entropy are linearly dependent on these \(n_i\)s. An image showing all these \(n_i\)s where the image intensity at location \(l_i\) is dependent on the value of \(n_i\) is referred here to as \(n_i\)-map. Note that the locations on the \(n_i\)-maps are the internal positions of the features which are projected into the ground plane. Figure 8 shows the \(n_i\)-map for exp A (stairs) over a 2s time window. Since the locations with \(n_i=0\) do not affect the value of entropy, condensed versions of the \(n_i\)-maps for all the three experiments are also shown in Fig. 9. We shall call these, condensed \(n_i\)-maps, profiles.
Figure 9 shows the profiles for exps A&B&C in the order of increasing entropy from left to right. This effect (increasing entropy) can be seen visually. In these, the probability of feature occurrence is linearly dependent on the value of the pixels. In Fig. 9a, most of the pixels are very low-valued (red in colour). Thus, they have low probability of feature occurrence. However, note that all the points in the profile are nonzero. In contrast, there are also some isolated high-valued pixels. (These can be viewed as peaks of probability function.) In fact, they offer a sound hypothesis for features’ respective locations. The background pixels have higher values in Fig. 9b (yellow in colour), meaning that the probability of feature occurrence is more evenly distributed over spatial bins. However, there are still many high-valued points (peaks) where the probability of occurrence is higher. In Fig. 9c, the background is yet higher in value (green in colour), while the peaks are less prominent. In this example, the probability of feature occurrence is more evenly distributed and thus high values of entropies are expected.
Table 2 shows the normalised respective entropies which are calculated for these examples. The normalisation values \(h_{\min }\) and \(h_{\max }\) affect the result significantly. Also, it can be seen that the values for normalised entropies are very high. This is due to the small size of the bins being used. In Fig. 10, these results reside in the upper left corner of the graph. Small bin sizes are depicted for more intuitive visualisation.
Table 2 Normalising entropy values for the crowd examples
Figure 10 shows the detected entropy of the three examples using image features. The level of separation between the entropies is understandably lower. This is due to the noise which is introduced by replacing head detection with feature detection and the added distortion which corresponds to assuming feature points are on the head plane. The mean value separation still holds for all the bin sizes. It was initially noted that the distortion introduced by an approximate ground plane projection might be more disruptive than that which has been originally introduced by the projective transform. Therefore, the results for entropy via image features using image coordinates were also provided.Footnote 2 These are shown in Fig. 11. It can be seen that the results are improved and the separation is mostly achieved for the three experiments. The effect of using larger time windows is demonstrated in Fig. 12. As was described before, when larger windows are considered, more variability is observed together with the natural build-up of noise. Therefore, the value of entropy increases. However, it is worth mentioning that in the case of the experiments shown in Fig. 12, where no ground plane mapping is used, the results remain consistent. In those, the mean value separation is obtained between the entropy values of the experiments at various time window sizes.
As an example of execution time, the video containing both the escalator and stairs experiments is processed at a mean rate of 10 fps with the spatial bin size of 16 pixels. This is using an Intel Core i7-2600 CPU at 3.40GHz. One notes that the execution speed decreases with the increasing number of crowd clusters in the frame. (Crowd clusters are discussed in Sect. 4.3.) On the other hand, the speed increases as a result of using larger spatial bins.
Entropy versus collectiveness
Collectiveness is a measure of collective motion that is introduced by Zhou et al. [37]. They define it as follows: “Collectiveness describes the degree of individuals acting as a union in collective motions”. Collectiveness seeks collective manifolds wherein consistent motion is observed in neighbourhoods, while global consistency among non-neighbours is obtained through intermediate individuals in neighbourhoods on the manifold. Collectiveness assigns values in the range [0, 1] to a given crowd. It requires setting a parameter, K, which defines the range of neighbourhoods in the given experimented crowd.
Collectiveness bears similarities with entropy. In order to be able to compare collectiveness with entropy directly, the notion of structure is introduced. As noted, entropy is basically a measure of disorder, while structure can be described as a measure of order. For a normalised entropy ranging within an interval [0, 1], structure and entropy are complementary and add up to unity: \(s_k=1-\hbar _k\), where \(s_k\) is the normalised structure. Figure 13 shows a comparison between collectiveness and structure (via entropy using image features with no ground plane projection). It can be seen that collectiveness also achieves separation between these examples. Although entropy finds a larger distinction between exp A (Stairs) and exp C (Hall), collectiveness finds exp B (Escalator) and exp A (Stairs) more distinct. This is an early sign that depending on the sample which is to be analysed one or the other method may be more effective. The most important factors which may contribute here are: (i) the density and behaviour of crowd; (ii) camera view angle and spatial resolution; and (iii) structure of the environment. It should be mentioned here that both collectiveness and entropy values depend on the respective adopted parameters of these methods. These include K for collectiveness and spatial bin sizes (\(w_{bin}\)) for entropy (temporal window, \(w_{tw}\), is not that significant) . Here, a mid-range k (\(k=20\)) is used to produce the collectiveness results and \(w_{bin}\) is subsequently chosen to produce similar values for the structure in the escalator example and then used to evaluate the other two examples.
Figure 14 shows an example where collectiveness fails to produce stable and reliable results. It is worth noting that collectiveness is essentially a different concept from that of entropy. Collectiveness is best for analysing crowds with discernible motions in the form of flows and limited oscillatory motions. Figure 14 depicts an example of a stadium, wherein the initial state of the crowd is calm with sparse incoherent motions. However, an event which may occur on the pitch may trigger increased level of excitement of the crowd in the stadium arena.Footnote 3 Figure 14c, d shows the values of collectiveness and entropy in the crowd for illustration. Here, the dotted red line indicates the time of the event, while the volatility of the crowd increases before the event in anticipation. In this circumstance, collectiveness does not seem to provide intuitive results. The initial state of crowd has small amounts of motions, meaning that any small group with more significant motion can override the value for the collectiveness. Further, in the absence of such groups collectiveness becomes unstable as it tries to connect incoherent sparse motions within the crowd. In contrast, entropy clearly captures the increased volatility and the change in the state of the crowd.