Robust clustering of functional directional data

A robust approach for clustering functional directional data is proposed. The proposal adapts “impartial trimming” techniques to this particular framework. Impartial trimming uses the dataset itself to tell us which appears to be the most outlying curves. A feasible algorithm is proposed for its practical implementation justified by some theoretical properties. A “warping” approach is also introduced which allows including controlled time warping in that robust clustering procedure to detect typical “templates”. The proposed methodology is illustrated in a real data analysis problem where it is applied to cluster aircraft trajectories.


Introduction
Modern technologies are increasingly allowing us to measure phenomena continuously in time.In those cases, although the curves are often discretized, data sets can be seen as made of curves rather than finite-dimensional measurements.Functional Data Analysis (Ramsay and Silverman 2005;Ferraty and Vieu 2006) are the set of statistical tools specially developed to deal with this particular type of data.In particular, functional Cluster Analysis is recently receiving considerable attention, as can be seen in recent review papers as Jacques and Preda (2014), Hitchcock and Greenwood (2015), and Yassouridis and Leisch (2017).
In this work, we will focus on providing a functional clustering approach that can be applied to cluster functional directional data and where robustness also plays an important role.See Mardia and Jupp (2009) and Ley and Verdebout (2017) for some general references on directional statistics.
It is known that (even) a small fraction of contaminating data can be very detrimental in Cluster Analysis (García-Escudero and Gordaliza 1999).This justifies the interest of applying robust clustering techniques that are able to resist certain amount of outlying observations (García-Escudero et al. 2010;Ritter 2015;García-Escudero et al. 2015).Moreover, the application of robust clustering techniques may be also useful in order to detect anomalous features in our data, which can be very interesting once we are able to explain their anomalous behaviour.In this work, robustness is achieved by allowing to discard a proportion α of functional directional data throughout an "impartial" trimming procedure.The term impartial means that it is the data itself the one that tell us which are the most anomalous curves.This impartial trimming approach was introduced in Rousseeuw (1984), Gordaliza (1991) and Cuesta-Albertos et al. (1997).
Impartial trimming has been already applied in Functional Cluster Analysis in García-Escudero and Gordaliza (2005), Cuesta-Albertos and Fraiman (2007) and, more recently, in Rivera-García et al. (2019).The approach adopted now in our work is more closely related with Cuesta-Albertos and Fraiman (2007) because we are not using projections into finite-dimensional functional subspaces.
The proposed methodology will be introduced in Sect. 2 together with some theoretical results for the proper characterization of the optimal solutions to the underlying problems.These theoretical results are latter applied in Sect. 3 to derive a feasible algorithm for the practical application of the methodology.
The proposed algorithm will be extended in Sect. 4 to allow for "time warping" in the curves assigned to each cluster.Time warping is an appealing idea to address misalignment problems within clusters and to detect typical "templates" which are also useful to describe the detected clusters.
Some guidelines about how to make sensible choices for the number of clusters k and for the trimming level α are given in Sect. 5.
Section 6 provides a simple simulation study to illustrate the ability of the proposed methodology to properly recover alignments in unaligned data, and simultaneously trim outlying curves.
Finally, Sect.7 presents a real data application aimed at clustering aircraft trajectories that motivated our interest in clustering functional directional data.This real data example serves to illustrate all the material introduced in previous sections.Other applications for this methodology are surely possible.For instance, another direct application could be to cluster weather stations based on the observed evolution of local wind directions.Detecting anomalous weather stations, and explaining why they exhibit such strange behavior, can be an interesting task.

Methodology
We are going to use the extrinsic distance in the unit sphere S 1 such that, for every pair ω 1 and ω 2 in [0, 2π ] (or, analogously, ω 1 and ω 2 in S 1 ), we take to measure their distance.Given ω 1 , . . ., ω n in S 1 , the directional mean ω is defined as In this work, we are interested in clustering θ 1 , . . ., θ n "directional functions" where each θ i belongs to C([0, 1], S 1 ) as the set of continuous functions defined on [0, 1] and taking values in S 1 when S 1 is equipped with extrinsic distance in (1).This means that we start from a sample θ 1 , . . ., θ n in where every θ i satisfies and θ i is assumed to be a continuous function in S 1 .To simplify notation, we simply use the notation F for denoting C([0, 1], S 1 ).
The extrinsic distance in S 1 can be extended to a distance in the set of directional functions F just by considering an integrated extrinsic distance defined as: (2) For robust clustering purposes, we consider the impartial trimming approach, where we try that the sample itself inform us which are the "most outlying" directional functions to be trimmed.In this approach, we search for k directional functions {m 1 , . . ., m k } ⊂ F (with k << n) or "prototypes" that better serve to summarize our set of observed curves θ 1 , . . ., θ n , but allowing a proportion α ∈ [0, 1) of curves to be trimmed in an "optimal" way.These m 1 , . . ., m k serve to create a partition of the non-trimmed θ i directional functions, by assigning each θ i to cluster J , if θ i is closest to m J than to the other m j prototypes when using the distance in (2).The fraction α of trimmed θ i directional functions, as suspicious of being outliers, are left unassigned.
To be more precise, let us introduce some further notation in this functional directional framework.For θ ∈ F and The trimmed k-mean problem can be now defined through the following double minimization procedure: inf where x is the least integer greater than or equal to x.
Note that this double minimization is done on every possible subset of indexes H such that H ⊂ {1, . . ., n} and card(H) = n(1 − α) , and every possible set of k function M = {m 1 , . . ., m k } in F. The result of that double minimization is the set of optimally non-trimmed functions, i.e. those with indexes in H, and a set with the k optimal center or prototype directional functions given by M = {m 1 , . . ., m k }.
To gain insights on how this complex double maximization can be simplified, let us present some additional notation and two main results.Given k centres M = {m 1 , . . ., m k }, let us define the optimal radius as In other words, if By using this notation, we introduce the subset H(M) ⊂ {1, . . ., n}, with card(H(M)) = n(1 − α) , defined as Theorem 1 show that, if the optimal set M were known, then the optimal set H including all the non-trimmed curves can be chosen as those with indexes in H(M).Notice that H(M) can easily determined from M just by sorting all the D(θ i ; M) distances.
Therefore, if M were known, there is no needed to explore the combinatorial set of all possible H subsets.Moreover, it is also trivial to see that H(M) can be optimally split as (again, only depending on sorting these D(θ i ; M) distances).
Consequently, if we introduce then the double minimization in (3) can be rewritten as a single minimization, only depending on the set of k optimal directional functions M, as inf On the other hand, if we assume H = H 1 ∪ ... ∪ H k were known, the optimal m j can be easily obtained by computing pointwise directional means: Proof This result easily follows from the fact that Given that d(•, •) is positive, the minimization of that integral is done throughout a pointwise minimization of the integrating term.
Notice that closed expressions for m j (t) are available as (i denotes the imaginary unit) or, analogously, Theorem 1 and Theorem 2 will be applied to derive a feasible algorithm in Sect.3.

Algorithm
Given a sample of directional functions θ 1 , . . ., θ n ⊂ F, a fixed number of clusters k and a fixed trimming level α: 1 Initialize B times: Each initialization starts from k randomly chosen initial centroids m 1 , . . ., m (l)  k , by considering the pointwise directional means: 3 After L iterations of steps 2.1-2.4,we compute the value of the target function j ) resulting from this random initialization.
4 Return as algorithm's output those partitions and templates yielding the smallest value in Step 3.
The theoretical justification of this algorithm follows from the application of Theorem 1 and Theorem 2, that guarantee the monotonically decrease of the target function in the iterative part of the algorithm.B random initializations are considered to avoid that the algorithm get stuck in local minima of the target function (3).A stopping rule can be also added to avoid unnecessary iterations if, after applying Step 2, the iterated solution does not change.

Algorithm with warping
When studying some processes it is usual to find some common patterns that occur at different speeds.One example can be found in some features of living beings (humans, animals, plants) of the same species where growth occur at different paces.Another example is the recognition of speech signals where the same word can be pronounced with varying speeds.It is in this last context where a family of techniques, known as dynamic time warping (DTW) algorithms, where introduced to deal with these different speeds (Sakoe and Chiba 1971).The global aim of these algorithms is to ensure that the varying speeds do not affect the similarity analysis of the curves and therefore allow the mentioned patterns to be detected.This is achieved through an appropriate time warping alignment function, φ, of the curves to be compared.A good reference that explains in more detail the theory of the DTW methodology applied here is Kruskal and Liberman (1983).
In the following we adapt the notation in Giorgino (2009) to our context.Let θ 1 and θ 2 two directional functions and φ = (φ 1 , φ 2 ), where φ 1 , φ 2 : [0, 1] → [0, 1] are two functions that warp the time for θ 1 and θ 2 respectively.In order to have consistent warpings, some constraints have to be imposed on φ.First, φ 1 and φ 2 must be nondecreasing continuous functions to ensure that time order is not reversed and to avoid time jumps.Also, starting and ending curve points must match, i.e., φ 1 (0) = φ 2 (0) = 0 and φ 1 (1) = φ 2 (1) = 1.In other words, we do not allow the beginning or end of any curve to be trimmed.In order to have some control over the local changes in time speed, we may also impose that φ 1 (t), φ 2 (t) ∈ [t − δ, t + δ] for a preset value δ ∈ (0, 1], i.e., φ(t) must lie in a band around t. Now, using the distance defined in (1), an accumulated distortion function d φ is defined as where m φ (t) is a weighting function and M φ the corresponding normalization constant, both to have comparable values for different choices of φ (see, e.g., Giorgino 2009, for more details).Then, the use of a DTW procedure to compare θ 1 and θ 2 requires to find the optimal value of − → D defined as: under the assumed conditions on φ and where d in (5) corresponds to the extrinsic distance in S 1 .
To compute the value of − → D for two given curves θ 1 and θ 2 a numerical approximation of ( 5) is necessary, which we carry out through a discretisation in [0, 1] of θ 1 and θ 2 .If we consider a grid of equispaced nodes of size N in [0, 1] for both curves, then the computation of − → D can be carried out with DTW algorithms for discrete time series as those reviewed and described in Giorgino (2009), where the previous constraints on φ have a direct transpose.Although there is the possibility of estimating these φ functions, in our case we are only interested in computing the value of − → D .Moreover, even though the computational complexity of the DTW algorithm for two time series of length N in the general case is O(N 2 ), in our case, the computational complexity can be notably reduced when adding control over local changes in time speed.
The use of the DTW methodology is not essential in our proposal and could be replaced by other alternatives for curve registration (see, e.g., Marron et al. 2015, ) and the same applies to the chosen constraints on the φ functions.
Our proposed algorithm follows similar lines as the algorithm introduced in Sangalli et al. ( 2010 The k templates ξ j may be seen as a kind of representative directional function for all the directional functions assigned (after warping) to cluster j.Notice that again a fraction α of directional functions with the (hopefully) most outlying behavior are trimmed.
A modified trimmed k-mean including warping can be given as: 1 Initialize B times: Each initialization starts from k randomly chosen initial templates ξ k (as done in the algorithm presented in Sect.3).k randomly chosen θ i directional functions from our sample can be chosen for this purpose. 2Iterate: Given ξ (l−1) 1 , . . ., ξ 3 Let θ i j be the directional function θ i after optimally warping it into the reference template ξ (l−1) j , for i = 1, . . ., n and j = 1, ..., k.

Update templates as ξ
(l) j being the pointwise directional mean of those θ i j 's warped directional functions with i ∈ H (l) j .
3 After L iterations of steps 2.1-2.4,we compute the value of the target function 4 Return as the algorithm's output those partitions and templates yielding the smallest value in Step 3.

Choice of parameters
The correct choice of all the involved parameters, k (number of groups) and α (trimming level), is not an easy problem.The proper choice of k is a classical problem in Cluster Analysis and several proposals can be found in the literature trying to address it.The choice of α is an additional problem appearing now due to the trimming methodology adopted.Moreover, the choice of these two parameters should be done in an unified fashion because their effects are clearly interrelated.For instance, a high trimming level α could allow to entirely discard smaller clusters so that the total number of clusters k has to be decreased.Sometimes the real data problem at hand provides some information on these two parameters, but in many others they are completely unknown and some guidance on their choice is welcome.
In this section, we review a simple approach introduced in García-Escudero et al. (2003), which is based on analyzing the decreasing pattern of the so-called trimmed k-variance functionals defined as where W k (α) is the minimum value attained in the minimization problem in (3) for fixed values of k and α.In fact, it is suggested the analysis of numerical second derivatives of these functionals.In order to approximate them, let us consider an equispaced grid of trimming levels {α 1 , α 2 , . . ., α L } ⊂ [0, 1] with α l = l/(L + 1) and take defined for l ∈ {h + 1, h + 2, . . ., h − n}.We, thus, consider the numerical second derivative functionals as W k : α l → W k (α l ), defined for k = 1, 2, . . .and l ∈ {h + 1, h + 2, . . ., h − n}.The tuning parameter h controls the roughness of these numerical second derivative functionals, in such a way that they are more rough and data dependent when h is small.Notice also that high values of h make it impossible the determination of W k for some values of α l close to 0. We can say that K is a sensible choice for the number of clusters k if the associated numerical second derivative functionals are clearly different when k < K but they almost coincide when k ≥ K .In fact, detecting peaks in these functionals indicates that a higher number of clusters k is surely needed.Initial large and positive values for W k (α l ) also indicates that a higher trimming level would be required as we are still probably trimming outlying observations with this α l trimming level.A more detailed explanation for all these heuristic rules, together with simple justifications, can be found in García-Escudero et al. (2003).

Simulation study
We generate random sets of directional functions {θ i } n i=1 such that where and for i = 21, . . ., 40.
In both cases, h i (•) for i = 1, . . ., n are going to be random warping functions that are piecewise linearly defined and such that h i (0) = 0, h i (1) = 1 and h i (0.5) = a i for a i being randomly drawn from a normal distribution with mean equal to 0.5 and standard deviation equal to 0.07.We are so obtaining two clusters of (unaligned) directional functions.For applying the discretized version of our proposed methodology, we consider an equispaced grid of size 200 in the [0, 1] interval.To introduce contamination, we randomly replace 2 (5% contamination) or 4 (10% contamination) out of these 40 directional functions by directional functions defined as in (6) but with ω i (t) = u i + 2π t where u i are randomly drawn from an uniform distribution in the [0, 2π ] interval.Figure 1 summarizes the results obtained after applying the proposed methodology on 100 simulated data sets generated as explained above.Trimmed procedures are applied with α = 0.1 (Trimming: 0.1) and compared with the untrimmed ones with α = 0 (Trimming: 0).Warping can be considered with δ = 0.1 (Warping: Yes) or not (Warping: No).The same 100 simulated data sets are applied for the comparison of the four different available approaches.
Each row in Fig. 1 shows the results for different numbers of outlying directional functions included (0, 2 and 4).The left column shows boxplots summarizing the where θ 1 and θ 2 are the target "reference" directional functions θ j : [0, 1] → (cos(m j (t)), sin(m j (t))) ∈ S 1 , for j = 1, 2, and ξ 1 and ξ 2 are the output templates obtained from the algorithms in each case, when k = 2. Recall that m 1 and m 2 are the functions used to generate the two clusters before considering the h i warping functions.The N Distance values are thus measuring how "close" the output templates are with respect to the target reference directional functions, after their proper alignment through − → D .We can see that even a small fraction of contaminating directional functions can create wrong assignment decisions that trimming is able to prevent once outlying directional functions can be removed.Figure 2c shows an example of the very bad performance of the proposed methodology allowing warping but without trimming.We can see how the two main clusters are artificially joined together and a small cluster made of few outliers is detected.Of course, the estimation of the target reference directional functions is harmfully affected.
Even in cases where no wrong assignments are obtained, it can be noticed that N Distance seems to be reduced (with and without trimming) when warping is allowed in the algorithm.Given that we are allowing warping in − → D when computing N Distance , it is easy to understand that methods allowing warping are going to provide better performance in that aspect.As can be seen in Fig. 2a, the simple pointwise directional mean (step 2.3 in the algorithm in Sect.3) cannot provide ξ j templates (wider blue lines) that correctly capture the target directional reference θ j functions since the resulting ξ j templates are clearly "oversmoothed" due to the non-alignment of the directional functions in each detected cluster.On the other hand, Fig. 2b shows that this oversmoothing phenomenon is corrected when considering a proper alignment before computing the pointwise directional means (step 2.4 in the algorithm in Sect.4).
We might think that wrongly trimming some non-outlying directional functions could be detrimental, but we can see that the effect of a (slightly) greater trimming level than needed does not necessarily imply worse performance.In fact, smaller proportions of missclassified directional functions and smaller values of N Distance are seen after trimming.This fact could be explained by how the most extremely unaligned curves are discarded as they are surely the trimmed ones.

Application to clustering of aircraft trajectories
The motivation of this application is framed within a research project named AIR-PORTS: Airport Improvement Research On Processes & Operations of Runway, TMA & Surface leaded by Boeing Research and Technology Europe and devoted to analyze the efficiency of commercial flights when taking into account the aircraft trajectories actually flown.After a complex data intake and preprocessing procedure, some Key Performance Indexes (KPI) measuring important aspects as fuel consumption or polluting emissions, were computed.ADS-B (Automatic Dependent Surveillance-Broadcast) signals were considered to determine the real aircraft positions during their flights, after their proper integration with the planned routes from Eurocontrol.This implies a huge amount of information to be processed throughout more than 500 millions of ADS-B signals including, among others, information on the position (latitude, longitude and height) of each plane with a frequency that varies according to the receivers, but that in emission can be 2 times per second.
In a second phase, KPIs were constructed by comparing the real trajectories flown with respect to possible alternative synthetic trajectories that the plane could have flown (such as geodesics and geodesics based on the flight plan) which were generated by using a flight simulator owned by Boeing Research and Technology Europe.The different efficiency KPIs serve to detect which alternative trajectories would have been more efficient.Due to the huge number of trajectories and routes to be compared, Cluster Analysis methods applied to the trajectories were needed to carry out comparative studies between groups of trajectories with their respective KPIs.Moreover, in that clustering problem, there were numerous trajectories that can be considered as atypical and for which their automated detection was interesting.As we will see later, some of these atypical flights corresponds to operational deviations due to adverse weather conditions, congestion problems in the airspace, strikes, ... and also trajectories including ovals close to the destination to wait for the right moment to land.In air navigation, heading is the horizontal angle between the direction of flight and magnetic north.It is common in aviation to characterize trajectories by measuring heading (data in S 1 ), altitude (in meters or feet), and speed (in "mach" units).Using the evolution of these three features over time to group similar trajectories would be equivalent to looking for groups using the evolution of longitude, latitude and altitude (functional data in R 3 ).To simplify this complex functional clustering problem, we focus exclusively on the evolution in time of only the heading after normalizing the flight times so that the time is restricted to the interval [0, 1] in such a way that this problem reduces to a clustering problem of directional functions in F. This simplified problem, together with the need to avoid the damaging effect of atypical trajectories, led us to consider robust clustering for functional directional data as a very relevant problem to be addressed.
We will show a typical example for the problem addressed.In that example, a data set of n = 3955 aircraft trajectories corresponding to 6 months of flights from Madrid airport (LEMD) to Barcelona airport (LEBL) are clustered using the instant headings measured every second.Length of trajectories ranges from 2017 to 4494 seconds.After scaled to the interval [0, 1], we will apply the procedure described in Sect. 3 to the θ i (t) curves, i = 1, . . ., 3955, representing the scaled heading of the aircrafts.The trajectories included in our dataset begin and end when the plane crosses the altitude threshold of 1000 meters.
To compute the values − → D described in Sect. 4 the size of the grid used in the discretisation has been N = 2017, i.e. the minimum length of the trajectories in the data set.Regarding the choice of the δ value to control local changes in time, although the main transformation of the time scale comes from the scaling to the interval [0, 1], we have selected δ = 3 2017 , which means that locally we allow to advance or delay up to 3 steps in the grid.However, the choice of this δ value is not critical and identical or almost identical results are obtained for δ values that do not change much, e.g., up to 10 steps in this case.On the other hand, values above 100 steps in our case already produce deformations in the curves that are not very plausible.
In order to assess an appropriate value of k, as described in Sect.5, a numerical approximation to the second derivative of the trimmed k-variance functionals is computed (see Fig. 3) with α l trimming values {0.01, 0.02, ...} and h = 10.
According to this figure, the minimum k that reaches similar numerical second derivative of the trimmed k-variance functionals with respect to the following ones is Fig. 4 (From left to right, top to bottom) Centroid trajectories for each cluster, some trimmed trajectories, and some trajectories in cluster 1-4 (dark red, orange, green and blue colors, respectively) (color figure online) k = 4.We also observe that the acceleration (second derivative value) is positive and high for α values with α < 0.2, but it is notable smaller and closer to 0 if we consider α > 0.2.Then, a suitable value for α to remove outliers could be around α = 0.20.
With these choices, k = 4 and α = 0.2, we carry out the proposed methodology and obtain the clusters and centroids together with the trimmed trajectories.Figure 4 represents the θ i (t) trajectories in the cylinder F while Fig. 5 represent the trajectories in 2D.
Trimmed trajectories have been represented in 2D (white color) in the second panel of Fig. 5.It can be observed that most of these trajectories correspond to holding manoeuvres (oval courses), that take place in predeterminated places of the airspace just before the arrival to destination; and some other strange trajectories.These holding patterns can be observed at the cylinder in the second plot of Fig. 4 as trajectories that turns around one or more times.
Clusters can be mainly described in terms of the departing and arrival runway direction used, but not only.Cluster 4 (blue color) is composed by trajectories that depart from the south of the airport and approaches destination both from the northeast and southwest.Those from the northeast are in the majority and this is reflected in the corresponding centroid (blue color) in the first plot of Fig. 4. On the opposite, clusters 1-3 are composed by trajectories which depart from the north of LEMD, and the main differences among them are in the way of approaching LEBL.Cluster 1 (dark red) is composed by trajectories that approach LEBL from the northeast and the last section is characterized by a sharp turn towards the airport.Cluster 2 (orange) is composed by trajectories that approach LEBL from the southeast but before the last section of the path they open up to the sea (to the east), in a sharp turn.Finally, cluster 3 (green) is composed by trajectories that approach LEBL both from the northeast -but with a smoother (than those in cluster 1) turn-, and from the southwest -but without the turn observed in Cluster 2-.
One could think of a registration process (alignment or "warping") that allows detecting representative routes described by their headings regardless of their speeds.This will serve to simplify future analyses by grouping flights close to these representative routes.The DTW method makes it easy to control the maximum degree of "time lag" allowed.
Figure 6 shows the output of applying the algorithm with warping of Sect. 4 to the previous dataset, using k = 4 and α = 0.20.The trajectories drawn in the four plots show very slight differences with the corresponding ones in Fig. 4 indicating that in this case the scaling to the interval [0, 1] is sufficient to correct the differences in flight durations and speeds.From left to right, top to bottom, some trajectories in cluster 1-4 (dark red, orange, green and blue colors, respectively).White color curves represent the template routes, ξ j , for each group j = 1, . . ., 4 (color figure online) The use of alternative registration procedures like those in Srivastava and Klassen (2016) could be another approach to follow in this type of problem.

Conclusions and further directions
A new methodology for robust clustering directional functional data has been introduced and a feasible algorithm has been proposed and justified.Robustness against outlying curves is pursued by allowing that a fixed fraction α of curves, hopefully the most outlying ones, are left unassigned or trimmed.The procedure is extended to allow for time warpings in the directional functions.An application to group aircraft trajectories is presented to illustrate the interest of the proposed methodology.
This work reveals many potential lines of work to consider in the future.For instance, it is interesting to study the possibility of carrying out a feasible dimensionality reduction in a way that can alleviate the computational cost or adequately handle the periodicity in the observed curves.Providing more automated ways of determining the parameters k and α in this particular problem is also an interesting line of research.As with other methods that follow a k-means approach, the procedure ideally assumes that the underlying clusters have the same spread/variation compared to other clusters.To address this problem, it might make sense to consider trimmed versions of mixtures of von Mises-Fisher distributions (Banerjee et al. 2003) in a way analogous to how TCLUST (García-Escudero et al. 2008) generalizes the trimmed k-means.Finally, trimming entire curves can be very extreme if outlying measure-ments only occur at a few particular points for that curve.It would be useful to develop procedures that are capable of discarding only the outlying measurements in each curve and keeping the valuable information in non-outlying measurements.

Fig. 1
Fig.1Results of the simulation study in terms of the proportion of missclassified observations (left column) and in terms of the warped distance from the directional functions used to generate the two clusters and the output templates (right column).Different amount of outlying directional functions are considered in each row

Fig. 2
Fig. 2 Outputs of the algorithm for one of the samples in the simulation study.Green and red colors are used for showing functions belonging to the two clusters detected while trimmed directional functions appear in grey color.Blue color is used for representing the output templates.Plot a shows the result when warping and α = 0.1 is considered.Plot b shows the result when no warping and α = 0.1 is considered.Plot c shows the result when warping and α = 0 is considered (color figure online)

Fig. 3
Fig. 3 Numerical second derivative of the trimmed k-variance functionals for the aircraft trajectories dataset

Fig. 6
Fig.6Output of the algorithm with warping applied to the aircraft trajectories example (k = 4, α = 0.2).From left to right, top to bottom, some trajectories in cluster 1-4 (dark red, orange, green and blue colors, respectively).White color curves represent the template routes, ξ j , for each group j = 1, . . ., 4 (color figure online)