Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Synonyms

Multi-dimensional time series similarity; Mining spatio-temporal datasets

Definition

The trajectory of a moving object is typically modeled as a sequence of consecutive locations in a multi‐dimensional (generally two or three dimensional) Euclidean space. Such data types arise in many applications where the location of a given object is measured repeatedly over time. Typical trajectory data are obtained during a tracking procedure with the aid of various sensors. Here also lies the main obstacle of such data; they may contain a significant amount of outliers or in other words incorrect data measurements (unlike for example, stock data which contain no errors whatsoever). An example of two trajectories is shown in Fig. 1.

Figure 1
figure 1

Examples of 2D trajectories

Many data mining tasks, such as clustering and classification, necessitate a distance function that is used to estimate the similarity or dis‐similarity between any two objects in the database. Furthermore, in order to provide efficient solutions to many data mining tasks, a method that can retrieve quickly the data objects that are more similar to a given query object (or a set of objects) is required. Therefore, to perform data mining on trajectories of moving objects, the following problem must be addressed: given a database \( \mathcal{D} \) of trajectories and a query \( \mathcal{Q} \) (not already in the database), the system has to find the trajectory \( \mathcal{T} \) that is closest to \( \mathcal{Q} \). In order to solve this problem, two important sub-problems must be addressed: i) define a realistic and appropriate distance function and ii) design an indexing scheme for answering the nearest neighbor query efficiently.

Historical Background

Trajectories are modeled as multi‐dimensional time series. Most of the related work on time-series data analysis has concentrated on the use of some metric \( L_p \) norm. The \( L_p \) norm distance between two n-dimensional vectors \( \bar{x} \) and \( \bar{y} \) is defined as \( L_p( \bar{x}, \bar{y}) = (\sum_{i=1}^{n} (|x_i - y_i|)^p)^{ 1/p} \). For \( p=2 \) it is the well known Euclidean distance and for \( p=1 \) the Manhattan distance. The advantage of this simple metric is that it allows efficient indexing with a dimensionality reduction technique [1,7,10]. On the other hand, the model cannot deal well with outliers and is very sensitive to small distortions in the time axis [15]. There are a number of interesting extensions to the above model to support various transformations such as scaling [14], shifting [8], normalization [8] and moving average [14].

Other techniques to define time series similarity are based on extracting certain features (landmarks [12] or signatures [6]) from each time-series and then using these features to define the similarity. Another approach is to represent a time series using the direction of the sequence at regular time intervals [13].

Although the vast majority of database/data mining research on time series data mining has focused on Euclidean distance, virtually all real world systems that use time series matching as a subroutine, utilize a similarity measure which allows warping. In retrospect, this is not very surprising, since most real world processes, particularly biological processes, can evolve at varying rates. For example, in bioinformatics, it is well understood that functionally related genes will express themselves in similar ways, but possibly at different rates. Therefore, the Dynamic Time Warping (DTW) distance has been used for many datasets of this type. The method to compute DTW between two sequences is based on Dynamic Programming [2] and is more expensive than computing \( L_p \) norms. Approaches to mitigate the large computational cost of the DTW have appeared in [9,16] where lower bounding functions are used in order to speed up the execution of DTW. Furthermore, an approach to combine the benefits of warping distances and \( L_p \) norms has been proposed in [4].

The flexibility provided by DTW is very important, however its efficiency deteriorates for noisy data, since by matching all the points, it also matches the outliers distorting the true distance between the sequences. An alternative approach is the use of Longest Common Subsequence (\( LCSS \)), which is a variation of the edit distance [11]. The basic idea is to match two sequences by allowing them to stretch, without rearranging the order of the elements but allowing some elements to be unmatched. Using the LCSS of two sequences, one can define the distance using the length of this subsequence [3,5].

Scientific Fundamentals

First, some definitions are provided and then the similarity functions based on the appropriate models are presented. It is assumed that objects are points that move on the (x,y)-plane and time is discrete.

Let A and B be two trajectories of moving objects with size n and m respectively, where \( A = ((a_{x,1}, a_{y,1}), \dots,\break (a_{x,n}, a_{y,n})) \) and \( B = ((b_{x,1}, b_{y,1}), \dots, (b_{x, m}, b_{y,m})) \). For a trajectory A, let Head(A) be the sequence \( Head(A) = ((a_{x,1}, a_{y,1}), \dots, (a_{x,n-1}, a_{y,n-1})) \).

Given an integer δ and a real number \( 0 < \epsilon < 1 \), the \( LCSS_{\delta, \epsilon}(A, B) \) is defined as follows:

$$ LCSS_{\delta, \epsilon}(A, B) = \left\{ \begin{array}{l} $0, \quad {\text{if}} \ A \ {\text{or}}\ B \ {\text{is}}\ {\text{empty}}$ \\[3mm] %% 1 + LCSS_{\delta, \epsilon}( Head(A), Head(B)), \quad\\ {\text{if}}\quad |a_{x,n} - b_{x,m}| < \epsilon \quad{\text{and}}\quad\\ |a_{y,n} - b_{y,m}| < \epsilon \quad{\text{and}}\quad |n - m| \mathchar"313C \delta \\[3mm] %% \max (LCSS_{\delta, \epsilon}( Head(A), B), \; LCSS_{\delta, \epsilon}\\ ( A, Head(B))),\ {\text{otherwise}}\:. \end{array}\right. $$

The constant δ controls how far in time can go in order to match a given point from one trajectory to a point in another trajectory. The constant ε is the matching threshold (see Fig. 2).

Figure 2
figure 2

The notion of the LCSS matching within a region of δ & ε for a trajectory. The points of the 2 trajectories within the gray region can be matched by the LCSS function

The first similarity function is based on the LCSS and the idea is to allow time stretching. Then, objects that are close in space at different time instants can be matched if the time instants are also not very far.

Therefore, the similarity function S1 between two trajectories A and B, given δ and ε, is defined as follows:

$$ S1(\delta, \epsilon, A, B) = {{LCSS_{\delta, \epsilon}(A, B)} \over { \min(n, m)}}\:. $$

The division by the length of the sequence in S1 serves the purpose of comparing the LCSS value between sequences of different lengths.

The S1 function is used to define another similarity measure that is more suitable for trajectories. Consider the set of all translations. A translation simply shifts a trajectory in space by a different constant in each dimension. Let \( \cal{F} \) be the family of translations. Then a function \( f_{c,d} \) belongs to \( \cal{F} \) if \( f_{c,d}(A) = ((a_{x,1}+c, a_{y,1}+d), \dots, (a_{x,n}+c, a_{y,n}+d)) \). Using this family of translation, the following distance function is defined.

Given δ, ε and the family \( \cal{F} \) of translations, the similarity function S2 between two trajectories A and B, is defined as follows:

$$ S2(\delta, \epsilon, A, B) = \max_{f_{c,d} \in \cal{F}} S1(\delta, \epsilon,A,f_{c,d}(B))\:. $$

The similarity functions S1 and S2 range from 0 to 1. Therefore, the distance function between two trajectories can be estimated as follows:

Given δ, ε and two trajectories A and B, then:

$$ \begin{aligned} &D1(\delta, \epsilon, A, B) = 1 - S1(\delta, \epsilon, A, B) \quad \textrm{and}\quad\\ %% &D2(\delta, \epsilon, A, B) = 1 - S2(\delta, \epsilon, A, B)\:. \end{aligned} $$

Note that D1 and D2 are symmetric. \( LCSS_{\delta, \epsilon}(A,B) \) is equal to \( LCSS_{\delta, \epsilon}(B,A) \) and the transformation that is used in D2 is a translation which preserves the symmetric property.

By allowing translations, similarities between movements that are parallel in space can be detected. In addition, the LCSS model allows stretching and displacement in time, so it can detect similarities in movements that happen with different speeds, or at different times.

Given the definitions above, efficient methods to compute the distance functions are presented next.

Computing the Similarity Function S1

To compute the similarity functions S1, S2 an LCSS computation is needed. The LCSS can be computed by a dynamic programming algorithm in \( O(n^2) \) time. However, if matchings are allowed only when the difference in the indices is at most δ, a faster algorithm is possible. The following result has been shown in [2,5]: Given two trajectories A and B, with \( |A| = n \) and \( |B|=m \), the \( LCSS_{\delta, \epsilon}(A,B) \) can be found in \( O(\delta (n+m)) \) time.

If δ is small, the dynamic programming algorithm is very efficient. However, for some applications, δ may need to be large. For that case, the above computation can be improved using random sampling.

By taking a sufficiently small amount of random samples from the original data, it can be shown that with high probability the random sample preserves the properties (shape, structure, average value, etc) of the original population. The random sampling method will give an approximate result but with a probabilistic guarantee on the error. In particular, it can be shown that, given two trajectories A and B with length n, two constants δ and ε, and a random sample of A, \( |RA|=s \), an approximation of the \( LCSS(\delta, \epsilon,A,B) \) can be computed such that the approximation error is less than β with probability at least \( 1-\rho \), in \( O(ns) \) time, where \( s=f(\rho, \beta) \). To give a practical perspective of the random sampling approach, to be within 0.1 of the true similarity of two trajectories A and B, with a probability of 90% and the similarity between them is around 0.8, the A should be sampled at 250 locations. Notice that this number is independent of the length of both A and B. To be able to capture accurately the similarity between less similar trajectories (e. g. with 0.4 similarity) then more sample points must be used (e. g. 500 points).

Computing the Similarity Function S2

Consider now the more complex similarity function S2. Here, given two sequences A,B, and constants \( \delta, \epsilon \), the translation \( f_{c,d} \) that maximizes the length of the longest common subsequence of \( A, f_{c,d}(B) \) (\( LCSS_{\delta, \epsilon}(A, f_{c,d}(B) \)) over all possible translations must be found.

Let the length of trajectories A and B be n and m respectively. Let also assume that the translation \( f_{c_1,d_1} \) is the translation that, when applied to B, gives a longest common subsequence: \( LCSS_{\delta, \epsilon}(A,f_{c_1,d_1}(B)) = a \), and it is also the translation that maximizes the length of the longest common subsequence: \( LCSS_{\delta, \epsilon}(A,f_{c_1,d_1}(B)) = \max_{c,d \in {\cal R}} LCSS_{\delta, \epsilon}(A,f_{c,d}(B)) \).

The key observation is that, although there is an infinite number of translations that can be applied on B, each translation \( f_{c,d} \) results in a longest common subsequence between A and \( f_{c,d}(B) \), and there is a finite set of possible longest common subsequences. Therefore, it is possible to enumerate the set of translations, such that this set provably includes a translation that maximizes the length of the longest common subsequence of A and \( f_{c,d}(B) \). Based on this idea, it has been shown in [15] that: Given two trajectories A and B, with \( |A| = n \) and \( |B|=m \), the \( S2(\delta, \epsilon,A, B) \) can be computed in \( O((n+m)^3 \delta^3) \) time.

Furthermore, a more efficient algorithm has been proposed that achieves a running time of \( O((m+n) \delta^3 / \beta^2) \), given a constant \( 0 < \beta < 1 \). However, this algorithm is approximate and the approximation \( AS2_{\delta, \beta}(A,B) \) is related with the actual distance with the formula: \( S2(\delta, \epsilon,A,B) - AS2_{\delta, \beta}(A,B) < \beta \).

Indexing for LCCS Based Similarity

Even though the approximation algorithm for the D2 distance significantly reduces the computational cost over the exact algorithm, it can still be costly when one is interested in similarity search on massive trajectory databases. Thus, a hierarchical clustering algorithm using the distance D2 is provided that can be used to answer efficiently similarity queries.

The major obstacle in providing an indexing scheme for the distance function D2 is that D2 is not a metric, since it does not obey the triangle inequality. This makes the use of traditional indexing techniques difficult. Indeed, it is easy to construct examples with trajectories A, B and C, where \( D2(\delta, \epsilon, A,C) \mathchar"313E D2(\delta, \epsilon,A,B) + D2(\delta, \epsilon,B,C) \). Such an example is shown in Fig. 3, where \( D2(\delta, \epsilon,A,B) = D2(\delta, \epsilon,B,C) = 0 \) (since the similarity is 1), and \( D2(\delta, \epsilon, A,C) = 1 \) (because the similarity within ε in space is zero).

Figure 3
figure 3

An example where the triangle inequality does not hold for the D2 distance

However, a weaker version of the triangle inequality can be proven, which can help prunning parts of the database and improve the search performance. First, the following function is defined:

$$ LCSS_{\delta,\epsilon,{\cal F}}(A,B) = \max_{f_{c,d} \in {\cal F}} LCSS_{\delta,\epsilon}(A,f_{c,d}(B))\:. $$

Clearly, \( D2(\delta,\epsilon, A, B) = 1 - \frac{LCSS_{\delta,\epsilon,{\cal F}}(A,B)}{min(|A|,|B|)} \) (as before, \( {\cal F} \) is the set of translations). Now, the following can be shown: Given trajectories A,B,C:

$$ \begin{aligned} LCSS_{\delta, 2\epsilon, {\cal F}}(A,C) &\geq LCSS_{\delta, \epsilon,{\cal F}}(A,B)\\ %% &\quad + LCSS_{\delta, \epsilon, {\cal F}}(B,C) - |B| \end{aligned} $$

where \( |B| \) is the length of sequence B.

To create the indexing structure, the set of trajectories is partitioned into groups according to their length, so that the longest trajectory in each group is at most a times the shortest (typically \( a = 2 \) is used.) Then, a hierarchical clustering algorithm is applied on each set, and the tree that the algorithm produces is used as follows:

For every node C of the tree, the medoid (\( M_C \)) of the cluster represented by this node is stored. The medoid is the trajectory that has the minimum distance (or maximum LCSS) from every other trajectory in the cluster: \( \max_{v_i \in C} \min_{v_j \in C} LCSS_{\delta,\epsilon,{\cal F}}(v_i,v_j,e) \). However, keeping only the medoid is not enough. Note that, a method is needed to efficiently prune part of the tree during the search procedure. Namely, given the tree and a query sequence Q, the algorithm should decide whether whether to follow the subtree that is rooted at C or not. However, from the previous lemma it is known that for any sequence B in C:

$$ \begin{aligned} LCSS_{\delta,\epsilon,{\cal F}}(B,Q) &< |B| + LCSS_{\delta,2\epsilon,{\cal F}}(M_C,Q) \\ &\quad- LCSS_{\delta,\epsilon,{\cal F}}(M_C,B) \end{aligned} $$

or in terms of distance:

$$ \begin{aligned} & D2(\delta, \epsilon,B,Q) \\ &= 1 - \frac{LCSS_{\delta,\epsilon,{\cal F}}(B,Q)}{\min(|B|,|Q|)}\\ & \mathchar"313E 1 -\frac{|B|}{\min(|B|,|Q|)} \\ %% &\quad - \frac{LCSS_{\delta,2\epsilon,{\cal F}}(M_C,Q)}{\min(|B|,|Q|)} + \frac{LCSS_{\delta,\epsilon,{\cal F}}(M_C,B)}{\min(|B|,|Q|)}\:. \end{aligned} $$

In order to provide an upper bound on the similarity (or a lower bound on the distance) the expression \( |B| - LCSS_{\delta, \epsilon, {\cal F}}(A,B) \) must be maximized. Therefore, for every node of the tree along with the medoid the trajectory \( r_{c} \) that maximizes this expression is stored. Using this trajectory a lower bound on the distance between the query and any trajectory on the subtree can be estimated.

Next, the search function that uses the index structure discussed above is presented. It is assumed that the tree contains trajectories with minimum length minl and maximum length maxl. For simplicity, only the algorithm for the 1‑Nearest Neighbor query is presented.

The search procedure takes as input a node N in the tree, the query Q and the distance to the closest trajectory found so far (Fig. 4). For each of the children C, it is checked if it is a trajectory or a cluster. In case that it is a trajectory, its distance to Q is compared with the current nearest trajectory. If it is a cluster, first the length of the query is checked and then the appropriate value for \( \min(|B|, |Q|) \) is chosen. Thus, a lower bound L is computed on the distance of the query with any trajectory in the cluster and the result is compared with the distance of the current nearest neighbor mindist. This cluster is examined only if L is smaller than mindist. In the scheme above, the approximate algorithm to compute the \( LCSS_{\delta,\epsilon,{\cal F}} \) is used. Consequently, the value of \( (LCSS_{\delta,\epsilon,{\cal F}}(M_C,B))/(\min(|B|,|Q|)) \) that is computed can be up to β times higher than the exact value. Therefore, since the approximate algorithm of section 3.2 is used, the \( (\beta*\min(|M_C|,|B|))/(\min(|B|,|Q|)) \) should be subtracted from the bound for \( D2(\delta,\epsilon,B,Q) \) to get the correct results.

Figure 4
figure 4

Search procedure for nearest neighbor queries

Key Applications

Sciences

Trajectory data with the characteristics discussed above (multi‐dimensional and noisy) appear in many scientific data. In environmental, earth science and biological data analysis, scientists may be interested in identifying similar patterns (e. g. weather patterns), cluster related objects or subjects based on their trajectories and retrieve subjects with similar movements (e. g., in animal migration studies). In medical applications similar problems may occur, for example, when multiple attribute response curves in drug therapy are analyzed.

Transportation and Monitoring Applications

In many monitoring applications, detecting movements of objects or subjects that exhibit similarity in space and time can be useful. These movements may have been reconstructed from a set of sensors, including cameras and movement sensors and therefore are inherently noisy. Another set of applications arise from cell phone and mobile communication applications where mobile users are tracked over time and patterns and clusters of these users can be used for improving the quality of the network (i. e., by allocating appropriate bandwidth over time and space).

Future Directions

So far, it is assumed that objects are points that move in a multi‐dimensional space, ignoring their shape in space. However, there are many applications where the extent of each object is also important. Therefore, a future direction, is to design similarity models for moving objects with extents, when both the locations and the extents of the objects change over time.

Another direction is to design a more general indexing scheme for distance functions that are similar to LCCS and can work for multiple distance functions and datasets.

Cross References

Co-location Pattern

Patterns, Complex