Boosted-oriented probabilistic smoothing-spline clustering of series

Fuzzy clustering methods allow the objects to belong to several clusters simultaneously, with different degrees of membership. However, a factor that influences the performance of fuzzy algorithms is the value of fuzzifier parameter. In this paper, we propose a fuzzy clustering procedure for data (time) series that does not depend on the definition of a fuzzifier parameter. It comes from two approaches, theoretically motivated for unsupervised and supervised classification cases, respectively. The first is the Probabilistic Distance clustering procedure. The second is the well known Boosting philosophy. Our idea is to adopt a boosting prospective for unsupervised learning problems, in particular we face with non hierarchical clustering problems. The global performance of the proposed method is investigated by various experiments.


Introduction
We propose a fuzzy approach for clustering data (time) series.The goal of clustering is to discover groups so that objects within a cluster have high similarity among them, and at the same time they are dissimilar to objects in other clusters.Many clustering algorithms for time series have been introduced in the literature.Since clusters can formally be seen as subsets of the data set, one possible classification of clustering methods can be according to whether the subsets are fuzzy (soft) or crisp (hard).Let D be a data set consisting of N series {y 1 , y 2 , ..., y N } ⊂ R n and let K be an integer, with 2 ≤ K < N , the goal is to partition D into C K groups.Crisp clustering methods are based on classical set theory, and restrict that each object of data set belongs to exactly one cluster.It means partitioning the data D into a specified number of mutually exclusive clusters C 1 , C 2 , ...C K .A hard partition of D can be defined as a family of subsets C k that satisfies the following properties (Bezdek, 1981): Let µ ik be the membership function and let U = [µ ik ] be the N × K partition matrix.The elements of U must satisfy the following conditions: The k th column of U contains value of µ ik of the k th subset C k of D. In a hard partition, µ k (y i ) is the indicator function: Following Bezdek (1981) the hard partionining space is thus defined by: M c being the space of all possible hard partition matrices for D.
Generalizing the crisp partition, U is a fuzzy partitions of D with elements µ ik of the partition matrix bearing real values in [0, 1] ( Kaufman and Rousseeuw, 2009).The idea of fuzzy set was conceveid by Zadeh (2009).Fuzzy clustering methods do not assign objects to a cluster but suggest degrees of membership to each group.The larger is the value of the membership value for a given object with respect to a cluster, the larger is the probability of that object to be assigned to that cluster.
Similarly to crisping conditions, Ruspini (1970) defined the following fuzzy partition properties: The fuzzy partitioning space is the set: µ ik < N, ∀k.}Several clustering criteria have been proposed to identify fuzzy partition in D. Among these proposals, the most popular method is fuzzy c-means.
Proposed by Dunn (1973) and developed by Bezdek (1981), fuzzy c-means considers each data point as a possible member of multiple clusters with a membership value.This algorithm is based on minimization of the following objective function: In the equation (1), m is any real number greater than 1, µ ik is the degree of membership of y i in the cluster k and • is any norm expressing the similarity between any measured data and the center.The parameter m is called fuzzifier or weighting coefficient.To perform fuzzy partitioning, the number of clusters and the weighting coefficient have to be choosen.The procedure is carried out through an iterative optimization of the objective function shown above, with the update of membership value µ ik and the cluster centers c k by solving: (2) The loop will stop when where ε is a small number for stopping the iterative procedure, and l indicates the iteration steps.The algorithm is synthesized in box 1.

Box 1 Fuzzyc-means algorithm
Initialize: K = number of centers, m, (1 < m < ∞), ε = a small threshold.Set the counter l = 1 and initialize the matrix of the fuzzy c− partitions k by using equation ( 2).-Update the membership matrix U = [µ ik ] by using equation (3), if One of limitations of fuzzy c-means clustering is the value of fuzzifier m.A large fuzzifier value tends to mask outliers in data sets, i.e. the larger m, the more clusters share their objects and viceversa.For m → ∞ all data objects have identical membership to each cluster, for m = 1, the method becomes equivalent to k-means.The role of the weighting exponent has been well investigated in literature.Pal and Bezdek (1995) suggested taking m ∈ [1.5, 2.5].Dembélé and Kastner (2003) obtain the fuzzifier with an empirical method calculating the coefficient of variation of a function of the distances between all objects of the entire datset.Yu et al. (2004) proposed a theoretical upper bound for m that can prevent the sample mean from being the unique optimizer of a fuzzy c-means objective functions.Futschik and Carlisle (2005) search for a minimal fuzzifier value for which the cluster anlysis of the randomized data set produces no meaningful results, by comparing a modified partitions coefficient for different values of both parameters.Schwämmle and Jensen (2010) showed that the optimal fuzzfier takes values far from the its frequently used value equal to 2. The authors introduced a method to determine the value of the fuzzifier without using the current working data set.Then for high dimensional ones, the fuzzifier value depends directly on the dimension of data set and its number of objects.For low dimensional data set with small number of objects, the authors reduce the search space to find the optimal value of the fuzzifier.According to the authors, this improvement helps choosing the right parameter and saving computational time when processing large data set.On the basis of a robust selection analysis of the algorithm, Wu (2012) founds that a large value of m will make fuzzy c-means algorithm more robust to noise and outliers.The author suggested to use value of fuzzifier ranging between 1.5 and 4. Since the weighting coefficient determines the fuzziness of the resulting classification, we propose a method that is independent from the choice of the fuzzifier.It comes from two approaches, theoretically motivated for unsupervised and supervised classification cases respectively.The first is the Probabilistic Distance (PD) clustering procedure defined by Ben Israel and Iyigun (2008).The second is the well known Boosting philosophy.From the PD approach we took the idea of determining the probabilities of each series to any of the k clusters.As this probability is unequivocally related to the distance of each series from the centers, there are no degrees of freedom in determine the membership matrix.From the Boosting approach (Freund and Schapire, 1997) we took the idea of weighting each series according some measure of badness of fit in order to define an unsupervised learning process based on a weighted re-sampling procedure.As a learner for the boosting procedure we use a smoothing spline approach.Among the smoothing spline techniques, we chose the penalized spline approach (Eilers and Marx, 1996) because of its flexibility and computational efficiency.This paper is organized as follows: Section 2 contains our proposal, in Section 3 the results of some experimental evaluation studies are carried out and some concluding remarks are presented in Section 4.
2 Boosted-oriented probabilistic clustering of time series

The key idea
The boosting approach is based on the idea that a supervised learning algorithm (weak learner) improves its performance by learning from its errors (Freund and Schapire, 1997).It consists of an ensemble method that work with a resampling procedure (Dietterich, 2000).The general idea is to run several times the supervised learning algorithm and assigning a weight to each instance of a data set that governs the resampling (with replacement) process during the iterations.The weights are set in such a way that the misclassified instances gets a weight larger than the weight assigned to well classified instances.In this way, the probability to be included in the sample during the iterations is higher for those instances for which the supervised learning algorithm returns a wrong classification.There exist boosting algorithms for both classification and regression problems (Freund and Schapire, 1997;Dietterich, 2000;Eibl and Pfeiffer, 2002;Gey and Poggi, 2006).In both cases the weighting system combines a synthetic index of the performance of the supervised learning algorithm with some index that represents the individual contribution of a given instance to the overall solution.Our idea is to adapt the boosting philosophy to unsupervised learning problems, specially to non hierarchical cluster analysis.In such a case there not exists a target variable, but as the goal is to assign each instance (i.e. a series) to a cluster, we have a target instance.In other words, we switch from a target variable to a target instance point of view.We take each cluster center as a representative instance for each series and we assume as a synthetic index of the global performance a loss function to be minimized.The probability of each instance to belong to a given cluster is assumed to be the individual contribution of a given instance to the overall solution.In contrast to the boosting approach, the larger the probability of a given series to be member of a given cluster, the larger the weight of that series in the resampling process.As a learner either a smoothing spline techniques or a regression model can be used.We decided to use a penalized spline smoother because of its flexibility and computational efficiency.To define the probabilities of each series to belong to a given cluster we use the PD clustering approach (Ben Israel and Iyigun, 2008).This approach allows us to define a suitable loss function and, at the same time, to propose a fuzzy clustering procedure that does not depend on the definition of a fuzzifier parameter.

P-splines in a nutshell
P-splines have been introduced by Eilers and Marx (1996) as flexible smoothing procedures combining B-splines (de Boor, 1978) and difference penalties.Suppose to observe a set of data {x, y} n j=1 , where the vector x indicates the independent variable (e.g.time) and y the dependent one.We want to describe the available measurements through an appropriate smooth function.Denote B j (x; p) the value of the i − th B-spline of degree p defined on a domain spanned by equidistant knots (in case of not equally spaced knots our reasoning can be generalized using divided differences).A curve that fits the data is given by y(x) = n j=1 a j B j (x; p) where a j (with j = 1, ..., n) are the estimated B-splines coefficients.Unfortunately the curve obtained by minimizing y − Ba 2 w.r.t. a shows more variation than is justified by the data if a dense set of spline functions is used.To avoid this overfitting tendency it is possible to estimate a using a generous number of bases in a penalized regression framework a = argmin where D is a d th order difference penalty matrix and λ is a smoothing parameter.Second or third order difference penalties are suitable in many applications.
The optimal spline coefficients follow from (4) as: The smoothing parameter λ controls the trade-off between smoothness and goodness of fit.For λ → ∞ the final estimates tend to be constant while for λ → 0 the smoother tends to interpolate the observations.Popular methods for smoothing parameter selection are the Akaike Information Criterion and Cross Validation.AIC estimates the predictive log likelihood, by correcting the log likelihood of the fitted model (Λ) by its effective dimension (ED): AIC = 2ED − 2Λ.Following Hastie and Tibshirani (1990) we can compute the effective dimension as ED = tr for the P-spline smoother and where σ is the maximum likelihood estimate of σ.But σ2 = i (y j − ŷ2 j ) 2 /n, so the second term of is a constant.Hence the AIC can be written as The optimal parameter is the one that minimizes the value of AIC(λ).LOO-CV chooses the value of λ that minimizes where h jj is the jth diagonal entry of H = B(B B + λD D) −1 B .Analogous to CV is the generalized cross validation measure (Whaba, 1990) where ED = tr(H).In analogy with cross validation we select the smoothing parameter that minimizes GCV(λ).All these selection procedures suffer of two drawbacks: 1) they require the computation of the effective model dimension which can become time consuming for long data series, and 2) they are sensitive to serial correlation in the noise around the trend.The L-curve (Hansen, 1992) and the derived Vcurve criteria (Frasso and Eilers, 2015) overcome these hitches.The L-curve is a parameterized curve comparing the two ingredients of every regularization or smoothing procedure: badness of the fit and roughness of the final estimate.For a P-spline smoother, the following quantities can be defined The L-curve is obtained by plotting ψ(λ) = log(ω) against φ(λ) = log(θ).This plot typically shows a L-shaped curve and the optimal amount of smoothing is located in the corner of the "L" by maximizing the local curvature measure.The V-curve criterion offers a valuable simplification of the searching criterion by requiring the minimization of the Euclidean distance between the adjacent points on the L-curve and, like in plots of AIC or GCV, the graphical presentation of the V-curve has an axis for λ that can be read off.The V-curve criterion is computed as follows:

PD clustering approach
Let D be a dataset consisting of N series {y 1 , y 2 , ..., y N } ⊂ R n and let C k be k th cluster, with k ∈ (1, K), partitioning D. We suppose that each series has the same domain of length n.
At each cluster C k is associated a cluster center c k , with k = 1, ..., K.
Let d i,k = d(y i , c k ) be a distance function of the i th series from the k th cluster center.
Let P i,k = P (y i , C k ) be the probability of the i th series belonging to the k th cluster.
For each series y ∈ D and each cluster C k , we assume the following relation between probabilities and distances (Ben Israel and Iyigun, 2008): The constant in (7) only depends on series y and it is independent of the cluster k.Equation ( 7) allows to to define the membership probabilities as (Heiser, 2004;Ben-Israel and Iyigun, 2008)

The algorithm
Since the probabilities as defined in equation ( 8) sum up to one among the clusters, we use the quantity K k=1 P i,k as a measure of compliance representation of the i − th series with respect to the overall solution of the clustering procedure.It is easy to note that K k=1 P i,k = 0 if the i − th series coincides with the k − th cluster center, as well as K k=1 P i,k = K −1 if there is maximum uncertainty in assigning the i − th series to any cluster center.For this reason we use as measure of cluster compliance solution the quantity Equation ( 9) is a synthetic uncertainty clustering measure: the lower its value, the better the solution.It equals zero when there is a perfect solution (i.e., each series has probability equal to one to belong to some cluster center).The maximum possible value of equation ( 9) is 1, when each series has probability equal to K −1 to belong to each of the K cluster.The BC index allows to compare the overall clustering solution when the number K of the clusters differs.
From equation 9 we define the following loss function to be minimized as Let γ i,k = d i,k /max K k=1 d i,k be the contribution of the i − th series to generate the k − th cluster.Let Γ be a N × K indicator matrix whose entries are 1 if P i,k > P i,h (k, h = 1, . . ., K, k = h) and −1 otherwise.We define the weight of the i − th series for the k − th cluster as For each cluster k, the weights are first normalized in this way: , then within each cluster we set For each cluster k, a sample L (k) is extracted with replacement from D, taking in account equation ( 11).Then the cluster centers c k = B a, k = 1, . . ., K are estimated by using a P-spline smoother.These centers are then used to compute the membership probabilities according to equation ( 8) for the next iteration.The cluster centers are re-estimated and adaptively updated with an optimal spline smoother.
The choice of the metric depends on the nature of the series, the optimal Pspline smoothing procedure frames our approach in the class of model-based clustering techniques but any suitable smoother can be adopted.Box 2 shows the pseudo-code of our the Boosted-Oriented Smoothing Spline Probabilistic Clustering algorithm.The procedure described in box 2 is repeated a certain number of time due to the sensitivity of final solution to the random choice of cluster center.

Experimental evaluation
To evaluate the performance of the proposed algorithm, we conducted three experiments.In estimating the optimal P-splines smoother, always we used the V-curve criterion as in equation ( 6) to select the optimal λ parameter, and we used a number of interior knots equal to min( n 4 ; 40), in which n is the length of time domain, as suggested by Ruppert (2002).Moreover we need a measure of goodness of fuzzy partitions.To reach this aim, we decided to use a fuzzy variant of the Rand Index proposed by Hullermeier et al. (2012).This index is defined by the complement to 1 of the normalized sum of degree of discordance.The Rand index developed by Rand (1971) is a external evaluation measure to compare the clustering partitions on a set of data.The problem of evaluating the solution of a fuzzy clustering algorithm with the Rand index is that it requires converting the soft partition into a hard one, losing information.As shown in Campello (2007), different fuzzy partitions describing different structures in the data may lead to the same crisp partition and then in the same Rand index value.For this reason the Rand index is not appropriate for fuzzy clustering assessment.
Box 2 Boosted-oriented smoothing-spline probabilistic clustering of time series input D initialize: maxiter = maximum number of iterations; K = the number of clusters; d = a suitable distance measure; c k , k = 1 : . . ., K random cluster centers.for iter=1:maxiter do -compute the -compute the membership probabilities P = [P i,k ] ∀i, k as in equation ( 8); -compute β [iter] as in equation ( 10); -assign the weights to each series for each cluster and compute the N × K matrix W as in equation ( 11); end for end if end for output: estimated cluster centers c * k , membership probabilities matrix P .
To overcome this problem Hüllermeier et al. (2012) proposed a generalization of the Rand index for fuzzy partitions.We recall some essential background.Let P = {P 1 , . . ., P K } be a fuzzy partition of the data set D, each element y i ∈ D is characterized by its membership vector: where P k (y i ) is the degree membership of the i−th series to the k −th cluster P k .Given any pair (y i , y i ) ∈ D, Hellermeier et al. (2012) defined a fuzzy equivalence relation on D in terms of similarity measure on the associated membership vectors ( 12).Generally, this relation is of the form: where • represents the L 1 norm divided by 2 that constitutes a proper metric on [0, 1] K and yields value on [0, 1].E P is equal to 1 if and only if y i and y i have the same membership pattern and is equal to 0 otherwise.The basic idea of the authors to reach the fuzzy extension of the Rand index was to generalize the concept of concordance in the following way.Given 2 fuzzy partition, P and Q and considering a pair (y i , y i ) as being concordant as P and Q agree on its degree of equivalence, they defined the degree of concordance as and degree of discordance as: Finally, the distance measure proposed by Hüllermeier et al. (2012) is defined as the normalized sum of degrees of discordance: The direct generalization of the Rand index corresponds to the normalized degree of concordance and it is equal to: and it reduces to the original Rand index when partitions P and Q are nonfuzzy.
As true fuzzy partition, we always computed the true cluster centers with an optimal P-spline smoother, and then we computed the true probabilities by applying equation (8).

Simulated data
As a first experiment, we generated K = 6 clusters of numerical series at n = 10 equally spaced time points in [0, 1] as described in Coffey et al. (2014).Distinct cluster specific models were used (subscript i refers to the series, subscript j refers to the time domain): with σ 2 u ranging from 0.3 to 1 and ε ij is an autoregressive model of order 1.Cluster means were chosen to reflect the situation where there are series that show little variation in value over time (as given by cluster 3) and series which have distinct signal over time.Cluster sizes were equal to 90, 50, 100, 25, 60 and 35, for cluster 1, 2, 3, 4, 5, 6 respectively, giving a total number of 360 simulated series.Data set is plotted in Fig. 1.Given the nature of the simulated series, we are interested in the similarity of the shape of the series.For this reason the chosen metric was the Penrose shape distance (Penrose, 1952), defined as: where d 2 i,j is the squared average Euclidean distance coefficient and . We performed five analyses with 100, 500, 1000, 5000 and 10000 boosting iterations.In all cases we set 10 random starting points.Figure 2 shows the behavior of the BC function as defined in equation ( 9) during the boosting iterations.In this case the BC values appear to be non-increasing as the number of iterations increases.The values of the BC function are equal to 0.3615, 0.2783, 0.2643, 0.2584, 0.2583 for 100, 500, 1000, 5000 and 10000 boosting iterations respectively.All the solutions return in fact the same results in terms of estimated centers: in example, figure 3 shows the estimated cluster centers for each cluster as returned by the first analysis.For this data set, by using the Penrose shape distance, the Fuzzy Rand Index is equal to 0.8599, 0.8954, 0.9059, 0.9178 and 0.9194 for the solutions with respectively 100, 500, 1000, 5000 and 10000 boosting iterations.Even if the solutions in terms of "hard" clustering are the same, the difference in terms of fuzzy rand index indicates that the partitions returned by the proposed algorithm are really close to the true one.The true value of the BC index is 0.1977.

Synthetic data set
Synthetic.tseriesdata set is freely available from the TSclust R-package (Montero and Vilar, 2014).Synthetic.tseriesdata consist of three partial realizations of length n = 200 of six first order autoregressive models.Figure 4 shows separately the six groups of series.Subplot (a) shows an AR(1) process with moderate autocorrelation.Subplot (b) contains series from a bi-linear process with approximately quadratic conditional mean.Subplot (c) is formed by an exponential autoregressive model with a more complex non-linear structure.Subplot (d) shows a selfexciting threshold autoregressive model with a relatively strong non-linearity.Subplot (e) contains series generated by a general non-linear autoregressive model and subplot (f) shows a smooth transition autoregressive model presenting a weak non-linear structure.As we did not generated these series we do not show completely the simulation setting.For more details about the generating models we refer to Montero and Vilar (2014), pag.24.Assuming that the aim of cluster analysis is to discover the similarity between underlying models, the "true" cluster solution is given by the six clusters involving the three series from the same generating model.Given the nature of the data set considered, we use a periodogram-based distance measure proposed by Caiado at al. (2006).It assesses the dissimilarity between the corresponding spectral representation of time series.By following also the suggestion of to Montero and Vilar (2014), an interesting alternative to measure the dissimilarity between time series is the frequency domain approach.Power spectrum analysis is concerned with the distribution of the signal power in the frequency domain.The power-spectral density is defined as the Fourier transform of the autocorrelation function of i − th series.It is a measure of self-similarity of a signal with its delayed version.The classic method for estimation of the power spectral density of an n-sample record is the periodogram introduced by Schuster (1897).Let y and y be two time series of length n.Let f j = 2πj/n, j = 1, . . ., n/2 in the range 0 to π, be the frequencies of the series.Let P SD y (f j ) = 1 n n t=1 |y t (f j ) exp (−ιtf j )| 2 and P SD y (f j ) = 1 n n t=1 |y t (f j ) exp (−ιtf j )| 2 be the periodograms of series y and y , respectively.Finally, the dissimilarity measure between y and y proposed by Caiado et al. (2006) is defined as the Euclidean distance between periodogram ordinates : We performed our analysis by setting 800 boosting iterations and 10 random starting points.
Table 1 shows the results of applying our algorithm to the Synthetic.tseriesdata set.Each series is assigned to the estimated cluster according to the value of the membership probability matrix.In order to obtain the Fuzzy Rand Index, we computed the true cluster centers with a periodogram modeled by P-spline , and then we computed the true probabilities by applying equation ( 8) by using the periodogram-based distance as in equation ( 14).The Fuzzy Rand is equal to 0.9698.Even if the solutions in terms of "hard" clustering seems to be excellent (since only series is misclassified), the difference in terms of Fuzzy Rand index indicates that the partitions returned by the algorithm are really close to the true one.
Table 1 about here.

A real data example
The "growth" data set is freely available from the internal repository of the R-package fda (Ramsay et al., 2012).This data set comes from the Berkeley Growth Study (Tuddenham and Snyder, 1954).Left hand side of figure 5 shows the growth curves of 93 children, 39 boys and 54 girls, starting by the age of one year till the age of 18.The right hand side of the same figure displays the corresponding growth velocities.
Figure 5 about here.
In the framework of cluster analysis this data set was mainly used for problems of clustering of misaligned data (Sangalli et al., 2010a(Sangalli et al., , 2010b)).We performed two analyses with 800 boosting iterations and with 10 random starting point with k = 2.In the first partitioning analysis we used the Euclidean distance.The estimated centers of both the growth curves and the growth velocity curves are displayed respectively in the left and right hand side of figure 6.As it can be noted, Euclidean distance discriminates between children growing more and children growing less.This can be appreciated by looking at left hand side of the same figure.On average, as expected, boys grow more than girls.
Figure 6 about here.
Nevertheless, Euclidean distance does not seem the right measure to be used in such a case.Probably researchers are interested in the shape of both growth and growth velocity curves during the years.For this reason, we repeated the analysis by using the Penrose shape distance as defined in equation (13).Figure 7 shows the estimated centers for both the growth and the growth velocity curves.The recognized centers are really similar to the ones obtained by Sangalli et al. (2010a;2010b): firstly, as confirmed by looking at tables 4 and 5 with respect to tables 2 and 3, there is a neat separation of boys and girls.Secondly, by looking at right hand side of figure 7, boys start to grow later but they seem to have a more pronounced growth, as it can be noticed by looking at the higher peak in correspondence of 15 year.The Fuzzy Rand index is equal to 0.8884 and 0.8240 by using the Euclidean distance for the partitions of growth and growth velocity curves respectively.The Fuzzy Rand index is equal to 1.000 and 0.9246 by using the Penrose shape distance for the partitions of growth and growth velocity curves respectively.

Concluding remarks
In this paper we merged two approaches, theoretically motivated for respectively unsupervised and supervised classification cases, to propose a new non-hierachical fuzzy clustering algorithm.
From the Probabilistic Distance (PD) clustering (Ben-Israel and Iyigun, 2008) approach we shared the idea of determining the probabilities of each series to any of the k clusters.As this probability is directly related to the distance of each series from the cluster centers, there are no degrees of freedom in determine the membership matrix.
From the Boosting approach (Freund and Schapire, 1997) we shared the idea of weighting each series according some measure of badness of fit in order to define an unsupervised learning process based on a weighted resampling procedure.In contrast to the boosting approach, the higher the probability of a given instance to be member of a given cluster, the higher the weight of that instance in the resampling process.As a learner we can use any smoothing spline technique.We used a P-spline smoother (Eilers and Marx, 1996) because of its nice properties and we choose the optimal spline parameter with the V-curve criterion as defined by Frasso and Eilers (2015).In this way we defined a suitable loss function and, at the same time, we proposed a fuzzy clustering procedure that does not depend on the definition of a fuzzifier parameter.
To evaluate the performance of our proposal, we conducted three experiments, one of them on simulated data and the remaining two on data sets known in literature.The results show that our Boosted-oriented procedure show good performance in terms of data partitioning.Even if the final fuzzy partition is sensitive to the choice of a distance measure, it is independent on any other input parameters.This consideration allows to define a suitable true fuzzy partition with which evaluate the final solution in terms of Fuzzy Rand Index (Hüllermeier et al., 2012).The weigthed re-sampling process allows each series to contribute to the composition of each cluster as well as the adaptive estimation of cluster centers allows the algorithm to learn by its progresses.It is worth-nothing that, as in any partitioning problem, the choice of the distance measure can influence the goodness of partition.Tables Table 1:

Figure 1
Figure 1 about here.

Figure 2
Figure 2 about here.

Figure 3
Figure 3 about here.

Figure 4
Figure 4 about here.

Figure 7
Figure 7 about here.

Figures
Figures

Figure 1 :
Figure 1: Data set generated for simulation study.

Figure 5 :
Figure 5: Growth curves (left hand side) and growth velocity curves (right hand side) of 93 children from Berkeley Growth Study data.

Figure 6 :
Figure 6: Estimated centers of growth curves (left hand side) and growth velocities (right hand side): Euclidean distance.

Figure 7 :
Figure 7: Estimated centers of growth curves (left hand side) and growth velocities (right hand side): Penrose shape distance.
Confusion matrix from clustering on Synthetic.tseriesdata set.

Table 2
about here.

Table 3
about here.

Table 4
about here.

Table 5
about here.

Table 2 :
Confusion matrix of growth curves with the Euclidean distance.Series have been assigned to the clusters according the values of membership probabilities computed as in equation (8).

Table 3 :
Confusion matrix of growth velocity curves with the Euclidean distance.Series have been assigned to the clusters according the values of membership probabilities computed as in equation (8).

Table 4 :
Confusion matrix of growth curves with the Penrose shape distance.Series have been assigned to the clusters according the values of membership probabilities computed as in equation (8).

Table 5 :
Confusion matrix of growth velocity curves with the Penrose shape distance.Series have been assigned to the clusters according the values of membership probabilities computed as in equation (8).