Video trajectory analysis using unsupervised clustering and multi-criteria ranking

Surveillance camera usage has increased significantly for visual surveillance. Manual analysis of large video data recorded by cameras may not be feasible on a larger scale. In various applications, deep learning-guided supervised systems are used to track and identify unusual patterns. However, such systems depend on learning which may not be possible. Unsupervised methods relay on suitable features and demand cluster analysis by experts. In this paper, we propose an unsupervised trajectory clustering method referred to as t-Cluster. Our proposed method prepares indexes of object trajectories by fusing high-level interpretable features such as origin, destination, path, and deviation. Next, the clusters are fused using multi-criteria decision making and trajectories are ranked accordingly. The method is able to place abnormal patterns on the top of the list. We have evaluated our algorithm and compared it against competent baseline trajectory clustering methods applied to videos taken from publicly available benchmark datasets. We have obtained higher clustering accuracies on public datasets with significantly lesser computation overhead.


Introduction and related works
Object motion pattern identification and trajectory analysis are two important steps in various computer vision applications (Ahmed et al. 2018b). Trajectory analysis is used in many video analysis tasks such as video summarization (Dogra et al. 2016;Ajmal et al. 2017), event detection (Reddy and Veena 2018), and visual surveillance (Vishwakarma and Huang et al. 2018). Analysis of large volume trajectory can be effective in traffic analysis (Santhosh et al. 2018) and crowd monitoring (Bera et al. 2016). The primary application of such analysis is abnormality detection (Roshtkhari and Levine 2013;Mabrouk and Zagrouba 2018). However, unsupervised clustering of trajectories is a difficult task. Clustering using simple features extracted from object trajectories, e.g. object location (x i , y i , t i ), produces poor results (Xu et al. 2015). They cannot be used for complex and long-term analysis. High-level features like source, destination, path, and activity can be used to represent moving objects. These high-level features can help to find patterns and group them together. The objective can be to classify the trajectories into frequent patterns; abnormal patterns belong to infrequent movements or outliers.
We note three different approaches of trajectory analysis. The first one is supervised learning approaches. It uses a set of known or unknown patterns to train neural networks. The second one is semi-supervised methods that use minimal labelled data for learning. The third one is unsupervised methods that primarily depend on feature selection, clustering, and analysis of clusters. Next, we will discuss these methods and how our method bridges the gap. Supervised trajectory analysis Various form of supervised methods utilizes different trajectory features for learning. Hidden Markov model (HMM)-based learning method (Kwon et al. 2017) is used to extract semantic region for trajectory analysis. Artificial neural network-based method such as convolutional neural network (CNN) (Mehrasa et al. 2018) is used to analyse player trajectory and team activity and also in detecting, tracking, and traffic behaviour analysis (Ren et al. 2018). Recently, Zhao et al. (2018) used trajectory convolution for human action classification and a variation of CNN model for trajectory-based video action recognition (Dai and Srivastava 2019). Many surveillance applications such as pedestrian trajectory and crowd interaction analysis (Xu et al. 2018b) also utilize the power of supervised machine intelligence. Recurrent neural network (RNN) (Ma et al. 2018) is also used in abnormal trajectory detection and sequence learning. Xu et al. (2018a) proposed to use a dual mode (static and dynamic) for supervised traffic analysis. All these methods demand a manual annotated training data and majority of the algorithms are scene specific and do not support transferable learning mechanism.
Semi-supervised trajectory analysis Semi-supervised methods overcome some of the problems of supervised learning such as a demand of large volume training samples. In this area, a trajectory histogram-based semi-supervised method (Chen et al. 2017) is proposed for dangerous event detection. The method uses minimal training samples only for the dangerous events. Maximum likelihood-based method (Chakraborty et al. 2018) is also used to detect freeway traffic. Topic models are popular in many semi-supervised tasks and also used in trajectory analysis (Wang et al. 2019) to explore human activity analysis. Modelling approach (Feizi 2019) is also proposed for abnormal behaviour detection. Graph-based structural learning by combining structure representation (Michelioudakis et al. 2019) is also proposed for trajectory learning and composite event detection. Although these methods utilized minimal training samples still all the benefits of unsupervised learning are not achieved.
Unsupervised trajectory analysis Unsupervised methods are free from large volume training samples and usually not design for specific applications. Majority of these methods depends on feature selection, distance measurement policies, and clustering algorithms. Incremental trajectory clustering based on Dirichlet process mixture model (DPMM) (Hu et al. 2013) and dense point-based trajectory clustering framework (Ochs et al. 2014) is used to represent long-term videos. Lin et al. (2016b) have proposed droplet-based features to find the abnormalities. Clustering trajectories using low-level information such as position often produces poor results (Xu et al. 2015). To overcome this problem, stateof-the-art mean shift algorithm (Comaniciu and Meer 2002) and shrinkage-based frameworks (Xu et al. 2015) for unsupervised trajectory clustering have already been proposed. Xu et al. (2015) have proposed adaptive multi-kernel-based shrinkage (AMKS), and Wang and Carreira-Perpinán (2010) have proposed manifold blurring mean shift (MBMS) algorithms as improvements. However, the majority of these existing techniques rely on a single feature of the trajectory. Fuzzy theory and multiple independent features-based method (Anjum and Cavallaro 2008) is applied to identify distinct patterns. Ahmed et al. (2018a) present a fuzzy aggregation scheme for abnormality detection. Recently, Saini et al. (2019) proposed a graph-based trajectory classification method that can be used in traffic analysis. Particle swarm-based trajectory clustering (Izakian et al. 2016) is applied on a synthetic dataset. In Feng et al. (2017) and Xu et al. (2017), authors have used deep appearance and motion features together to detect abnormality. Choong et al. (2016) have proposed a similarity function to achieve clustering of spatio-temporal data. Short duration trajectory (Lin et al. 2016a;Sharma and Guha 2016) extracted from feature tracker also used for clustering and understanding crowd behaviour. Density-based approach is popular among unsupervised algorithms and also applied in vessel trajectory analysis . Zhao et al. (2019) used an unsupervised decision module to identify traffic abnormality. Unsupervised trajectory modelling using location, velocity, and time appearance (Campo et al. 2018) is used to cluster trajectories. Das and Mishra (2018) proposed a mean shiftbased method for crowd trajectory analysis and abnormality detection. Recent approaches such as adversarial framework (Spampinato et al. 2020) are also used in abnormal event detection. Yue et al. (2019) utilized deep trajectory representation and proposed a deep trajectory clustering (DETECT) for behaviour analysis. Neural network-based trajectory analysis for traffic analysis proposed in Bandaragoda et al. (2019). Reviews on trajectory analysis (Ahmed et al. 2018b) and clustering  show the methods and applications of trajectory-based analysis in detail.

Challenges and gaps bridged by our work
Majority of trajectory analysis methods used for abnormality and event detection. The main challenges of such supervised and semisupervised trajectory analysis are (i) demand of manual annotated trajectory dataset and in most of the case the dataset is scene specific, and (ii) a concrete definition of normal and abnormal patterns for detection and classification. It is noted that the concept of "normality" and "abnormality" is not fixed always. For example, a high-speed car is abnormal where the speed is restricted by upper limit, whereas a low moving car is treated as abnormal when a highway lane is restricted by minimum speed. In unsupervised methods, selection of features played most vital role. State-of-the-art low-level features such as speed, velocity, and movement patterns are well explored. The challenges of unsupervised methods are solved here using a suitable selection of highlevel features such as origin, destination, and path deviation of trajectory and choice of suitable clustering framework for extracting logical meaning of movement clusters. Finally, multi-criteria decision making is used to rank the patterns to identify abnormalities. The motivation for such work is manifold. The primary application is to identify unusual patterns by analysing unsupervised clusters and rank them accordingly. At many public places like subway stations, railway junctions, highway junctions, or airports, the method can be used for detecting unusual movement patterns from a large volume of camera footage. The method also significantly reduces the volume of data by representing trajectories using high-level interpretable features such as path and deviation which makes the method suitable for large volume trajectory analysis.
Rest of the paper is organized as follows. Section 2 explains the proposed framework. Experimental results are provided in Sect. 3. Finally, Sect. 4 concludes the paper.

Proposed framework
In this section, we present the approach of unsupervised pattern searching and ranking to understand abnormal movements. Let a spatio-temporal scene of finite duration be represented using a set of trajectories τ = {T 1 , T 2 , . . . , T n }. Each trajectory can be represented by a pair of points, namely entry and exit points (Ahmed et al. 2018a). We perform a partitional trajectory clustering and the ranking scheme that combines multiple features to generate crisp partitions for indexing. Independent features are aggregated to obtain a higher degree of descriptiveness of the trajectories as opposed to using a single feature. Each feature produces an abnormality score of a moving object. Object trajectories are then represented by this abnormality score. Initially, the trajectories are extracted using multi-object tracker. Then, entry/exit regions are identified from the cumulative patterns. Next, entry-to-exit paths are extracted; hence, any deviations from the path (PD) can be easily obtained. In the next stage, each trajectory is assigned independent abnormality scores based on these high-level features and the scores are aggregated using multi-criteria decision making (MCDM) and the score is used to rank each object. We have experimented with two state-of-the-art MCDM methods, namely entropy-based simple additive weighting (SAW) (Abdullah and Adawiyah 2014) and weighted Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) (Hwang et al. 1993). Figure 1 depicts the different modules of the proposed approach. Next, we will discuss each module in detail.

Unsupervised trajectory clustering
We introduce an unsupervised trajectory clustering method that is referred to as t-Cluster. The method takes a set of trajectories extracted using MOT and returns three sets of clusters. A trajectory belongs to exactly three clusters taken from the different sets. The method is presented in Algorithm 1.

Algorithm 1 t-Cluster
Require: 1: A set of trajectories τ = {T 1 , T 2 , . . . , T n } with n samples, , and δ is the minimum number of points to make a cluster Ensure: where λ is the number of points, v represents the volume of the hyperspace, denotes Gamma function.

Trajectory representation, fusion and ranking
In visual surveillance context, abnormality can be defined in various ways. For example, infrequent motion patterns or abnormality in average velocity or time spent can be very important to understand abnormal situations. A method to calculate abnormality scores by fusing individual featurebased scores is presented here. All movement patterns (clusters) that are based on entry (C entr y ), exit (C exit ), and entry-to-exit (C path ) are extracted at the beginning using t-Cluster. Assume the patterns based on any one of the above mentioned criteria is represented as given in (1), where frequency of each pattern is given in (2) ( A pattern of movement can be considered as discrete random variable. Hence, the weight of a pattern ω( p) is defined by the probability density function given in (3) such that the condition given in (4) is satisfied.
According to our assumption, lower frequency represents higher abnormality. The abnormality score (σ ) of a pattern is scaled between 0 and 1. σ is defined by (5), where higher score represents more abnormality.
A trajectory may belong to three different patterns, namely entry based, exit based, and entry-to-exit path based. α, β, and γ represent abnormality scores based on the above three criteria. A DTW barycentre averaging (DBA) (Petitjean et al. 2014) method is used to estimate the average path between entry-to-exit. The dynamic time wrapping (DTW) algorithm is heuristic in nature and used to calculate a global average of various time series applications. DTW distance between two trajectory t a and t b is calculated recursively between t a (1 . . . i) and t b (1 . . . j) such that the Euclidean distance (ED) is aligned and mapped as: The method iteratively refines the initial average sequence in order to minimize the squared distance (DTW). The path is defined by κ = D B A(τ ), where τ = τ 1 , τ 2 , . . . , τ m is the set of all trajectories that belong to similar path. For example, Fig. 2 depicts the construction of average path using DBA.
Deviation-based abnormality score of a trajectory is then defined by the maximum deviation of the trajectory from the average path. The Hausdorff distance between the path and the trajectory is estimated using (7) The displacement from the average path is a high-level feature, and it represents the trajectories. Higher the value of the deviation, more the abnormality. Path deviation (PD) of a trajectory is formally defined in (8) PD can be useful to measure spatial displacement as well as temporal displacement. It is observed that normal moving targets have lower PD compared to abnormal cases such as moving in the wrong direction, moving with high speed, loitering, unusual stop, and moving slowly. For example, Fig. 3 shows deviation of a normal target, Some abnormal cases like fast moving target (Fig. 4), loitering (Fig. 5), side walker (Fig. 6), and moving in opposite direction (Fig. 7) are shown. It is noted that the deviation of an abnormal target may have a higher deviation score.
Next, path normality score (γ ) and path deviation abnormality score (ζ ) are calculated in a similar fashion as done in case of α and β. Higher the value of γ or ζ , more unusual the trajectory. Finally, a set of local ranks (L κ ) based on α, β, γ , and ζ are estimated. The highest abnormality scores represent the most unusual patterns of a given scene.

Fusion techniques
Multi-modal feature fusion has gained considerable attention of the researchers for data-mining related tasks. Fusion can be done using lower-level features that are often referred to as early fusion. On the other hand, decision-level or late-fusion and a hybrid approach can also be taken. In visual surveillance, fusion at low level is common. However, decision-level fusion is still evolving. Here, we present two state-of-the-art fusion methodologies, namely entropy-based SAW (Abdullah and Adawiyah 2014) and TOPSIS (Hwang et al. 1993) have been used to fuse multiple criteria and produce a meaningful abnormality score. Entropy has been used to estimate the weight of each criterion. Shannon entropy (Shannon 2001) is refereed as an important measure of disorder or uncertainty. Using entropy, we can measure the uncertainty present in the information. Probabilistic distribution of the patterns can represent uncertainty. In case of multi-criteria decision making (MCDM), entropy decides the weight of the criterion. Higher entropy represents higher diversity in the information. Our assumption is that most uncertain event has a higher abnormality score for randomly moving objects. The entropy of the random variable X is  (9), where x i = {x 1 , x 2 , . . . , x N } is the set observations of X and p(x i ) is the probability of taking the value x i .

Simple Additive Weighting (SAW) fusion
Hence, entropy of different criteria is calculated using (10).
The weight of a pattern or trajectory follows the pattern (χ ) that is defined in (11), where ω is the abnormality score of the pattern and ψ is the weight of the criterion.
Simple additive weighting (SAW) is a widely known method of MCDM. In this work, it has been used to fuse the trajectory and parameter weights into a single parameter. represents the combined weight of trajectory, and it is calculated using (12).
Finally, the trajectories are ranked according to their aggregated abnormality scores.

TOPSIS-guided fusion
The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) method is mainly used to get an optimal solution, which is farthest from the negative ideal solution and closest to the ideal solution. Similar to SAW method, TOPSIS is also used to aggregate individual abnormality scores (ω). Positive ideal score and negative ideal score of a set of N trajectories can be defined using (13) and (14).
Aggregated abnormality scores using TOPSIS (ϕ) can then be estimated using (15). Our assumption is that all criteria are evenly probable.  A global ranking of the trajectories based on α, β, γ , and ζ can be obtained using fusion. We denote G κ to be the global rank of a trajectory obtained using SAW( ) or TOPSIS (ϕ), where highest rank (e.g. rank 1) represents most abnormal/unusual pattern according to our assumption. Global ranks help to resolve ambiguity in local ranking when they more than one trajectories share same local rank.

Experimental results
In this section, we present the results and comparisons with state-of-the-art methods using publicly available datasets.

Dataset
We have used four public datasets, namely MIT trajectory dataset (MIT) (Wang et al. 2011), QMUL junction dataset (QMUL) (Long et al. 2016), and crowd dataset (UCF) (Ali and Shah 2007). Details of about these datasets are summarized in Table 1. These datasets have been selected because they provide challenging as well as the simplistic scenario. Some of the videos in these datasets contain a large number of moving objects and the presence of occlusions.

Unsupervised trajectories clustering
In this section, we present the results of the trajectory clustering (t-Custer). We also present comparative results using other distance measures and clustering techniques. We have compared the proposed method (only using C path ) and popularly known state-of-the-art path clustering techniques such as mean shift, MBMS (Wang and Carreira-Perpinán 2010), AKMS (Xu et al. 2015), Fast AKMS (Xu et al. 2015), DETECT Yue et al. (2019), and deep representation-based feature (Bandaragoda et al. 2019). Results reveal the superiority of our proposed method over the existing techniques.
We have also experimented with distance-based trajectory clustering methods aided by Euclidean distance and dynamic time warping (DTW) combined with Kmeans to extract patterns of movements form cumulative trajectories. Table 2 summarizes the accuracy of path-based pattern clustering using various methods. We have calculated the adjusted rand index (ARI) to measure clustering similarity. The experiment is carried out by taking random 80% trajectories in each run and repeated 20 times. The mean ARI and the distribution of accuracy are shown in Fig. 8 (QMUL), Fig. 9 (MIT), and Fig. 10 (UCF) dataset. The results support the superiority of our method in terms of higher ARI and lower deviation of accuracy in randomly selected trajectories. Table 2 summarizes the average ARI of the baselines and proposed method. Figures 11, 12, and 13 present source, destination, and path-based cluster analysis obtain using t-Cluster when applied on QMUL dataset. The box plot in source and destination is obtained by considering the distance of each trajectory from the cluster centre, and in path-based clustering, the Hausdorff distance of each trajectory from the path is considered. The outlier points of the boxes are more likely to be unusual.
Our proposed method is closely related with shrinkagebased methods such as mean shift, AMKS, MBMS. The quantitative results of shrinking are presented in Fig. 14. It is observed that the proposed clustering method produces much distinguishable cluster compared to others.
The computational overhead of the proposed method is also lower as compared to other methods since our method does not compare pairwise points. Table 3 shows the average execution time of various clustering methods (20 runs) in Intel core i7, 3.6 GHz processor with 16 GB of RAM.

Ranking of trajectories
Entry/exit and entry-to-exit paths are good features to summarize a scene. However, clustering using these features may result in loss of information, e.g. speed of the objects, path deviation. To overcome such problems, path deviation has been included and an aggregate of all features using MCDM has been performed. In this section, we present patterns (i.e. clusters) obtained using various features. Table 4 summarizes the results. It has been found that inclusion of path deviation actually splits the clusters and produces larger movement patterns. Figure 15 presents examples of some trajectories and corresponding ranks using different parameters.

Comparison between SAW and TOPSIS
Spearman's rank correlation coefficient has been used to measure the relationship between two ranking mechanisms.  It is calculated using (16), where d i is the difference between two ranks and n represents the number of observations. We also perform the nonparametric Wilcoxon signed rank test to determine the p value for four different datasets using SAW and TOPSIS which are listed in Table 5. For each of these datasets, we set the null hypothesis (H 0 ) since there is no significant difference between the ranking mechanism Bold values indicate the best performance   SAW and TOPSIS. However, for all these datasets, H 0 is set to 0 at 5% level of significance. This reveals that there is not much evidence to reject H 0 . Therefore, we conclude that the notable difference does not exist between SAW and TOPSIS ranking mechanisms at 5% significance level.

Conclusion
In this paper, we present an unsupervised method for unsupervised trajectory clustering and indexing. Our technique is based on a clustering and ranking method using entry/exit regions and entry-to-exit paths. Trajectory abnormality scores obtained with respect to entry/exit patterns, entry-to-exit paths, path deviation, and a local ranks (L κ ) are generated for each moving object. MCDM fusion has been applied to aggregate individual abnormality scores and a global abnormality score is obtained. In the next step, a global rank (G κ ) is assigned to each object. All moving objects are then represented using L κ and G κ pair, where lower value represents higher abnormality. The proposed algorithm can be thought as a generalized framework for unsupervised trajectory clustering and ranking and the method can be applied for intelligent browsing of large volume of surveillance videos.