Binarizing Change for Fast Trend Similarity Based Clustering of Time Series Data

Abughali, Ibrahim K. A.; Minz, Sonajharia

doi:10.1007/978-3-319-19941-2_25

Binarizing Change for Fast Trend Similarity Based Clustering of Time Series Data

Ibrahim K. A. Abughali¹⁷ &
Sonajharia Minz¹⁷

Conference paper
First Online: 01 January 2015

2184 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9124))

Abstract

It is observed that traditional clustering methods do not necessarily perform well on time-series data because of the temporal relationships in the observed values over a period of time. Another issue with time series is that databases contain bulk amount of data in terms of dimension and size. Clustering algorithms based on traditional measures of dissimilarity find trade-offs between efficiency and accuracy. In addition, time series analysis should be more concerned with the patterns in change and the points of change rather than the values of change. In this paper a new representation technique and similarity measure have been proposed for agglomerative hierarchical clustering.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Today Time Series data management has become an interesting research topic for the data miners. Particularly, the clustering of time series has attracted the interest.

Clustering is the process of finding natural groups, called clusters, the grouping should maximize inter-cluster variance while minimizing intra-cluster variance [1], most of the clustering techniques can be two major categories, Partition-based clustering and Hierarchical Clustering [2]. Many of the traditional clustering algorithms use Euclidean distance or Pearson’s correlation coefficient to measure the proximity between the data points. However, in case of time-series data these parameters involve the individual magnitudes at each time point therefore the traditional algorithms perform poorly with time-series expressions data, to overcome these limitations the proposed work aims to represent the variations in the measurements of the time-series for fast implementation of an efficient agglomerative nesting algorithm, the focus of this work is on fast whole sequence similarity search in the changes in respect to time rather than the values in the time series data.

The rest of the paper is organized as follows: Sect. 2 presents a brief review of related work. Sections 3 and 4 demonstrates the basic concept and presents the analysis of the proposed algorithm respectively. In Sects. 5 and 6 experimental and the conclusions and some future directions.

2 Related Work

Many clustering algorithms have been proposed such as k-means, DBSCAN, STING, p-cluster and COD [4–6]. One of the recently proposed algorithms is VCD algorithm [3] to analyze the trends of expressions based on their variation over time, using cosine similarity measure with two user inputs, it has been enhanced later in EVCD algorithm [2] for same purpose with one single user input and provides results in several levels which allows the user to select the most appropriate level by using different parameters such as the silhouette coefficient, number of clusters and clusters density. Both algorithms Enhanced Variation Co-expression Detection (EVCD) and (VCD) algorithms [2, 3] inferred that the cosine similarity measure was the most appropriate similarity measure for clustering the time varying microarray data.

3 Concepts and Definition

In order to determine the variation patterns in the time series based on the changes in the values observed at fixed time points binarization of the change has been proposed. Some related definitions are presented in this section.

3.1 Variation Vector

Given a sequence of n + 1 measurements observed at time periods t₀, t₁, t₂…t_n to denote a univariate time series, say, $ {\text{Y}} = \left\langle {{\text{y}}_{0} , {\text{y}}_{1} , {\text{y}}_{2} \ldots {\text{y}}_{\text{n}} } \right\rangle \in {\mathbb{R}}^{{{\text{n}} + 1}} $. A variation vector $ {\text{Y}}_{\text{v}} \in {\mathbb{R}}^{\text{n}} $ of Y is a sequence of the differences denoted by, $ {\text{Y}}_{\text{v}} = \left\langle {{\text{d}}_{1} , {\text{d}}_{2} \ldots {\text{d}}_{\text{n}} } \right\rangle $, where $ {\text{d}}_{\text{i}} = {\text{y}}_{\text{i}} - {\text{y}}_{{{\text{i}} - 1}} $, for $ 1\le {\text{i}} \le {\text{n}} $. The increase in the measurement $ \left( {{\text{y}}_{\text{i}} \ge {\text{y}}_{{{\text{i}} - 1}} } \right) $ and its magnitude is represented by the difference $ {\text{d}}_{\text{i}} \ge 0 $. Similarly, the decrease $ \left( {{\text{y}}_{\text{i}} < {\text{y}}_{{{\text{i}} - 1}} } \right) $ is computed as d_i < 0.

The trend is the tendency of a continuous process that is measured during a fixed time interval. The trend analysis may traditionally be carried out by plotting a trend curve or a trend line and by monitoring the increase (decrease) in the values. Thus trend analyses involve observation of the tendencies of the values by way of analyzing the changes that occur in terms of the quantum of the change and/or the nature of the changes. The pattern of increase or decrease in the values of the measurements may play a significant role in the trend analyses. Variation vectors quantify the difference in measurements at two consecutive time periods say t_i and t_i+1 in terms. The directions of change, increase or decrease, may be captured by the positive or negative sign of the magnitude of difference d_i respectively. Therefore, a binary representation of the direction of change is suitable for computational efficiency. Binarization of the change for any time-series has been proposed by a direction vector. Further, the trend similarity based on the distance metric of the n-dimensional binary vectors has been defined.

3.2 Direction Vector

For a variation vector, $ {\text{Y}}_{\text{v}} = \left\langle {{\text{v}}_{1} , {\text{v}}_{2} , \ldots ,{\text{v}}_{\text{n}} } \right\rangle \in {\mathbb{R}}^{\text{n}} $, a direction vector $ {\text{Y}}_{\text{d}} \in \left\{ {0, 1} \right\}^{\text{n}} $ is defined as $ {\text{Y}}_{\text{d}} = \left\langle {{\text{b}}_{1} , {\text{b}}_{2} , \ldots ,{\text{b}}_{\text{n}} } \right\rangle $,

where,

$$ {\text{b}}_{\text{i}} = \left\{ {\begin{array}{*{20}c} {0\;{\kern 1pt} {\text{if}}{\kern 1pt} \;{\text{v}}_{\text{i}} \ge 0} \\ {1 \;{\kern 1pt} {\text{if}} {\kern 1pt} \;{\text{v}}_{\text{i}} < 0} \\ \end{array} } \right.. $$

(1)

Example 1:

Consider two time series $ {\text{T}}_{1} = \left\langle {3, 7, 2, 0, 4, 5, 9, 7, 2} \right\rangle $ and $ {\text{T}}_{2} = \left\langle {10, 15, 11, 5, 19, 25, 27, 24, 13} \right\rangle $. The corresponding variation vectors are, $ {\text{V}}_{1} = \left\langle {4, - 5, - 2, 4, 1, 4, - 2, - 5} \right\rangle $ and $ {\text{V}}_{2} = \left\langle {5, - 4, - 6, 14, 6, 2, - 3, - 11} \right\rangle $. The direction vectors of T₁ and T₂ are $ {\text{D}}_{1} = \left\langle {0, 1,1, 0, 0, 0, 1, 1} \right\rangle $ and $ {\text{D}}_{1} = \left\langle {0, 1,1, 0, 0, 0, 1, 1} \right\rangle $ respectively.

3.3 Trend Similarity

Let two time series $ {\text{X}} = \left\langle {{\text{x}}_{0} , {\text{x}}_{1} , {\text{x}}_{2} , \ldots ,{\text{x}}_{\text{n}} } \right\rangle $ and $ {\text{Y}} = \left\langle {{\text{y}}_{0} , {\text{y}}_{1} , {\text{y}}_{2} , \ldots ,{\text{y}}_{\text{n}} } \right\rangle $ be measured at the time t₀, t₁,…,t_n. Let $ {\text{X}}_{\text{v}} = \left\langle {{\text{v}}_{1} , {\text{v}}_{2} , \ldots ,{\text{v}}_{\text{n}} } \right\rangle $ and $ {\text{Y}}_{\text{v }} = \left\langle {{\text{u}}_{1} , {\text{u}}_{2} , \ldots ,{\text{u}}_{\text{n}} } \right\rangle $ be the corresponding variation vectors and $ {\text{X}}_{\text{d}} = \left\langle {{\text{l}}_{1} , {\text{l}}_{2} , \ldots ,{\text{l}}_{\text{n}} } \right\rangle $ and $ {\text{Y}}_{\text{d}} = \left\langle {{\text{s}}_{1} , {\text{s}}_{2} , \ldots \ldots {\text{s}}_{\text{n}} } \right\rangle $ be the corresponding direction vectors. Then X and Y are said to be similar in trend if and only if l_i = s_i for 1 ≤ i ≤ n.

Both direction vectors X_d and Y_d are n-bit binary vectors. For each i if x_i ≥ x_i−1 in series X i.e. v_i ≥ 0 then l_i = 0 and l_i = 1 for vice versa. In case of the time series Y the bit value of s _i would depict the increase if the value at t_i from the values at t_i−i as u_i ≥ 0 and correspondingly, s_i = 0, and vice versa. If for each i, l_i = s_i then Y is said to be trend similar to X. It may be noted that for the definition of similarity the magnitude of difference in the two time-series has not been considered. However, only the concept of direction of change i.e. increase or decrease, has been considered. The information in the direction vector may be utilized to determine the degree of similarity.

Example 2:

Consider the direction vectors D₁ and D₂ in the above example corresponding the two time-series T₁ and T₂ each of length 9. The magnitude of the differences are represented by the variation vectors V₁ and V₂. It may be noted that for each i, $ 1 \le {\text{i}} \le 8 $, $ {\text{V}}_{{1{\text{i}}}} \ne {\text{V}}_{{2{\text{i}}}} $. However, D₁ and D₂ are bit-wise equal, i.e. D_1i = D_2i, for $ 1 \le {\text{i}} \le 8 $, therefore, the two series T₁ and T₂ are observed to be similar in trend.

The following metric to measure the distance between two n-dimensional binary vectors has been considered in this work. Let $ \beta = \left\{ {0, 1} \right\} $ and $ I_{n} = \left\{ {0, 1, 2 \ldots n} \right\} $ then the binary function $ d_{binary} : \beta \times \beta \to \beta $. For $ b_{1} , b_{2} \in \beta $,

$$ d_{binary} \left( {b_{1} , b_{2} } \right) = \left\{ {\begin{array}{*{20}c} {0{\kern 1pt} \;\;\;if\;{\kern 1pt} b_{1} = b_{2} } \\ {1{\kern 1pt} \quad otherwise} \\ \end{array} } \right. $$

(2)

Then the distance function between a pair of n-dimensional binary vectors is $ d_{n} :\beta^{n} \times \beta^{n} \to I_{n} $ Consider two n-dimensional binary vectors say $ D_{1} , D_{2} \in \beta^{n} $.

$$ d_{n} \left( {D_{1} , D_{2} } \right) = \sum\nolimits_{j = 1}^{n} { d_{binary} \left( {b_{1j} , b_{2j} } \right)} $$

(3)

Let $ d_{n} \left( {D_{1} , D_{2} } \right) = k $. Then k = 0 if $ \sum\nolimits_{i = 1}^{n} {d_{binary} \left( {b_{1i} , b_{2i} } \right) = 0} $ and k = n if $ \sum\nolimits_{i = 1}^{n} {d_{binary} \left( {b_{1i} , b_{2i} } \right) = n} $. Therefore $ 0 \le k \le n $

Example 3:

Consider the following two sequences as time series, $ T_{1} = \left\langle { 3, 7, 2, 0, 4, 5, 9, 7, 2} \right\rangle $ and $ T_{3} = \left\langle {45, 80, 22, 10, 40, 63, 45, 90, 10} \right\rangle $, then variation vectors V₁ and V₃ of T₁ and T₃ are, $ V_{1} = \left\langle {4, - 5, - 2, 4, 1, 4, - 2, - 5} \right\rangle $ and $ V_{3} = \left\langle {35, - 58, - 12, 30, 23, - 18, 45, - 80} \right\rangle $, the direction vectors D ₁ and D ₃ are $ D_{1} = \left\langle {0, 1, 1, 0, 0, 0, 1, 1} \right\rangle $ and $ D_{3} = \left\langle {0, 1, 1, 0, 0, 1, 0, 1} \right\rangle $.

For $ D_{1} , D_{3} \in B^{8} $, the dissimilarity between D ₁ and D ₃ may be computed using the distance function d ₈,

$$ d_{8} \left( {D_{1} , D_{3} } \right) = 2 $$

(4)

where,

$$ d_{binary} \left( {b_{1i} , b_{2i} } \right) = 1\quad {\text{for }}i \in \left\{ { 6,{ 7}} \right\} $$

(5)

and

$$ d_{binary} \left( {b_{1i} , b_{2i} } \right) = 0\quad {\text{for }}i \in \left\{ { 1,{ 2},{ 3},{ 4},{ 5},{ 8}} \right\} $$

(6)

To allow difference in trends at the certain bits out of the n-bits, the concept of trend dissimilarity of degree-k has been considered where k ≤ n may be the number of bits at which the two n-dimensional direction vectors encounter bit-mismatch.

3.4 Trend Dissimilarity of Degree K

Given two n-dimensional time series T_i and T_j, and their respective direction vectors D_i and D_j, T_i and T_j are said to have dissimilarity of degree k, if $ {\text{d}}_{\text{n}} \left( {{\text{D}}_{\text{i}} , {\text{D}}_{\text{j}} } \right) = {\text{k,}}\quad {\text{for 1}} \le {\text{k}} \le n $.

The clusters at level-0 may contain identical objects. Consider any two arbitrary objects x and y, and the Euclidian distance function d, the traditional measure of dissimilarity. Then $ d(\varvec{x},\varvec{y}) = 0 $, i.e. $ \sqrt {\mathop \sum \nolimits \left( {x_{i} , y_{i} } \right)^{2} } = 0 $ if the two objects are identical. Therefore, the objects x and y must be grouped in the same cluster at level-0, say ith cluster denoted by, C _0,i. Let C _i,j denote cluster-id j at level-i. Then the m clusters at level-0 are C _0,1, C _0,2, C _0,3,…,C _0,m. Let a measure of dissimilarity at 1 bit represented by distance metric d ₁ be associated to the clusters at level-1, dissimilarity at 2 bits represented by d ₂ and so on. Then any two arbitrary objects x, y may be in the same cluster at level-1, C _1,j, only if, $ 0 < d\left( {x, y} \right) \le d_{1} $. In this section the concept of Trend Cluster of level-k using the dissimilarity of degree-k is defined.

3.5 Trend Cluster of Level-K

For $ {\mathcal{T}} = \{ {\text{T}}_{1} , {\text{T}}_{2} , \ldots , {\text{T}}_{\text{m}} \} $, a set of n-dimensional time series of cardinality m, and the set of corresponding direction vectors $ \varGamma = \{ {\text{D}}_{1} , {\text{D}}_{2} , \ldots , {\text{D}}_{\text{m}} \} $, a trend cluster of level-k, C _k,j would include all time-series T_i and T_j in the same cluster if $ {\text{d}}_{\text{n}} \left( {{\text{D}}_{\text{i}} , {\text{D}}_{\text{j}} } \right) = {\text{k}} $. However, if $ {\text{d}}_{\text{n}} \left( {{\text{D}}_{\text{i}} , {\text{D}}_{\text{j}} } \right) \ne k^{{\prime }} $ for all $ {\text{k}}^{{\prime }} , 0 \le {\text{k}}^{{\prime }} < k $, then T_i and T_j will be allocated to distinct trend clusters of level-0, level-1, up to level-k′, say C_k′,i and C_k′,j, but would be grouped in the same trend clusters of level-k, say C_k,i.

Example 4:

Consider time series T₁, T₂ and T₃ as in the Examples 1 and 3. The direction vectors of each is $ {\text{D}}_{1} = \left\langle {0, 1, 1, 0, 0, 0, 1, 1} \right\rangle $, and $ {\text{D}}_{2} = \left\langle {0, 1, 1, 0, 0, 0, 1, 1} \right\rangle $ $ {\text{D}}_{3} = \left\langle {0, 1, 1, 0, 0, 1, 0, 1} \right\rangle $. Consider D₁ and D₂, $ {\text{d}}_{8} \left( {{\text{D}}_{1} , {\text{D}}_{2} } \right) = 0 $ therefore, T₁ and T₂ must be grouped in the same cluster of level-0. Consider D₁ and D₃, $ {\text{d}}_{8} \left( {{\text{D}}_{1} , {\text{D}}_{3} } \right) = 2 $. i.e. the series T₁ and T₃ have the trend dissimilarity of degree-2. Therefore, T₁ and T₃ must be grouped in different trend clusters of level-0 and level-1 say C_0,1 and C_0,3, and C_1,1 and C_1,3 respectively. However, the two must be grouped in the same trend cluster of level-2 say, C_2,1.

Example 5:

Consider the 5-dimensional view of the four gene expressions a, b, c and d, as shown in Fig. 1. The direction vectors D_a and D_c are identical therefore genes a and c are trend similar. Even visually the vectors a and c are the most similar to each other than to the vectors b and d.

An advantage of this approach is the simplicity of representation of the objects of m-dimensional time series database, using only one bit to represent the change in value from time t_i to t_i+1,

$$ {\text{b}}_{\text{i}} = \left\{ {\begin{array}{*{20}c} {0 \;{\text{x}}_{{{\text{i}} + 1}} \ge {\text{x}}_{\text{i}} } \\ {1 \;{\text{x}}_{{{\text{i}} + 1}} < {\text{x}}_{\text{i}} } \\ \end{array} } \right.; \, 0 \, < {\text{ i }} < {\text{ m}} - 1 $$

(7)

The direction vectors are loss transformation of the original data from which no original values can be retrieved. Thus it is a novel representation from the perspective of security and privacy preservation of the original data.

4 Fast Trend Similarity-Based Clustering (FTSC) Algorithm

FTSC algorithm starts with generating the variation vectors, second is binarization of the variation vector, and third is direction vectors indicate similarity in trend in the time series thus forming the trend clusters of level-0 in the hierarchy of clusters. The higher level clusters may result from merging the closest clusters in the previous level starting by smaller clusters, each cluster is represented by a direction vector as medoid of the cluster. The distance between clusters is computed by the distance between the medoids of the two clusters.

The FTSC algorithm is a nonparametric algorithm and it does not require any prior information related to data or number of clusters.

The asymptotic time complexity of the algorithm is quadratic on the product of the dimension of the time series and number of clusters level-i, n _i < n, therefore the complexity of the algorithm is O((mn)²). However, due to the binarization of the variation in the time series, the comparisons of the m bits and distance computation may be implemented using fast bit operators.

5 Experiments and Results

5.1 Data Sets

The experiments have been carried out to perform clustering on two microarray data sets and two financial data sets. Table 1 describes the data sets.

Table 1. Data set

Full size table

5.2 System Configuration

5.3 Design of Experiments

The experiments have been designed to assess the performance of FTSC algorithm in terms the efficiency and accuracy. Efficiency is mainly observed in terms of execution time. The accuracy of the algorithm is considered to be the consistency in cluster allocation to a time series irrespective of the number of re-executed, cluster allocation to multiple copies of the time series data, and the order of input of the time series to the algorithm. Second experiment compares both algorithms FTSC and EVCD.

5.4 Efficiency and Accuracy of FTSC

The first experiment has been designed to examine the speed of Fast Trend Similarity Clustering algorithm to cluster the four data sets. The experiment of running the program implementing the algorithm repeated five times, the average running time to yield the hierarchical clusters for each of the four data sets Affymetrix, Drosophila genome, Exchange Rates and PPPs over GDP and NSE with execution time 00:00:02.66, 00:00:01.72, 00:00:10.11 and 00:00:01.34 respectivly.

The outcomes of running the FTSC algorithm on Affymetrix are presented in Tables 2, 3 and 4. In Table 2 the 7-bit direction vector of gene Id 11251 is 0000001 which is in cluster $ C_{0,0} $ while the two genes 11152 and 12182 in serial 7 and 8 have identical direction vectors 0000101. Therefore, $ C_{0,3} $ includes two genes. The total clusters of level-0 is 115.

Table 2. Direction vectors, clusters of level-0 of AffyMetrix data

Full size table

Table 3. Level-3 cluster formation

Full size table

Table 4. Level-4 cluster formation

Full size table

Tables 3 and 4 present the clusters of level-2 and level-3 respectively. In the two tables the rows display all the clusters $ {\text{C}}_{{{\text{i}}, {\text{j}}}} , $ i denoting the cluster level and j the cluster ID. The cluster medoid has been presented in the second column by the identifier of the direction vector representing the cluster of level-0. In Table 3, the 3rd, 4th, 5th, 6th and 7th column display the clusters of level-2 that are merged to form the cluster of level-3. Thus the cluster id $ {\text{C}}_{3,0} $ represented by the medoid 0 is formed by merging the clusters of level-2 represented by the medoids 4, 7, 14, 29 and 58 yielding the cluster with a total of 486 genes. The cluster $ {\text{C}}_{3,1} $ is the outcome of merging three clusters of level-2 that are represented by the medoids 26, 52 and 81 to the cluster represented by medoid 20 at level-3 having a total of 689 genes. To obtain the clusters $ {\text{C}}_{3,6} $ to $ {\text{C}}_{3,15} $ no other clusters of level-2 were merged to the ones represented by the respective medoids indicated in column two. The blank ‘−’ entries in the table indicate no clusters of level-2. Therefore, the row pertaining to the cluster $ {\text{C}}_{3,6} $ with medoid 16 indicates that no cluster of level-2 satisfied the criterion for the merge operation although the total number of genes in the cluster $ {\text{C}}_{3,6} $ is 2, where number of clusters of level-3 are 16.

The clusters from $ {\text{C}}_{3,6} $ to $ {\text{C}}_{3,15} $ in level-3 have not changed from the previous level with the same medoids and densities.

Similarly the Table 4 exhibits the details of the clusters of level-4. From both Tables 3 and 4 it may be observed that the cluster C_4,0 with medoid 0 has been formed by merging the clusters C_3,0, C_3,5, C_3,6, C_3,7 and C_3,11 referred to by the medoids 0, 9, 16, 31 and 60 respectively. It may also be observed that the density of C_4,0 is the sum of the densities of the C_3,0, C_3,5, C_3,6, C_3,7 and C_3,11. Similarly the cluster C_4,2 is formed by merging C_3,13, and C_3,4, to C_3,3 resulting in the density 1880.

As the FTSC algorithm is an agglomerative clustering algorithm yielding a hierarchical clustering of levels 0–7 for Affymetrix data. The cluster at the highest level C_6,0, represented by the medoid 0 includes all the 12488 genes (Figs. 2 and 3).

In order to estimate the efficiency, accuracy and sensitivity to order of data inputs, all the rows of the Affymetrix data set were duplicated four times and randomly shuffled. Therefore, the algorithm was executed with a total of 4 × 12488 = 49952 rows with 8 dimensions. The output of the program was a hierarchical clustering with levels 0−7 with same number of clusters at each level as before but the density of each cluster was four time the previous density. i.e. the cluster C_4,5 with inputs four time the first run was represented by a gene that had direction vector identical to the gene 9 and contained 192 genes. The same phenomenon was observed for all the clusters of each level from level-0 to level-7. Thus the accuracy of the algorithm has been assessed. The average running time of repeated execution of the four times the original data set was 00:00:10.714.

The repeated execution of the program after randomly shuffling the rows yielded the same number of clusters. However, each time the execution time was differed in the 3rd or the 4th decimal point with the mean being 00:00:02.6599 (Figs. 4 and 5).

5.5 Comparison of FTSC and EVCD Algorithms

In this experiment the results of EVCD algorithm and FTSC algorithm have been compared. Two real world data sets Affymetrix and Drosophilia data sets as described in Table 1 are used in this experiment to assess the novelness of trend dissimilarity as the changes in the time series are represented by direction vectors. The EVCD algorithm is also a parametric algorithm while FTSC algorithm is not. EVCD algorithm requires one user input as the parameter ε. The experiment has been repeated for three values of ε, i.e. 0.01, 0.05 and 0.1 respectively. As EVCD performs a hierarchical clustering, for ε = 0.01, 10 clusters and 6 singletons were obtained at level 14, while for ε = 0.05, 10 clusters and 6 singletons were obtained at level 2, and finally 11 clusters and 2 Singleton were obtained at level 1 for ε = 0.1.

6 Conclusions

The experiments indicate that although the FTSC algorithm has the complexity O((mn)²) it is fast in terms of execution time due to the binarizing the change in the time-series. The binary representation in terms of the direction vector affect the distance computation implemented using bit level operators. The binarization also helps in privacy and security of the actual data. The nonparametric characteristic of the algorithm keeps the end user from exercise of parameter tuning. User also does not require any prior knowledge of the data or the clusters. The FTSC algorithm is time efficient and has the potential to yield accurate clusters of time-series data. The scalability of the algorithm in terms of multi-dimensions time-series and dealing with noise shall be investigated in future. To select a better medoid of the cluster of each higher level is also considered as future work.

References

Esling, P. Agon, C.: Time-series data mining. ACM Computing Surveys, 45(1), 5 (2012)
Google Scholar
Minz, S., Abughali, I.K.A.: Time-varying microarray data sets: co-expression detection. In: IEEE 2011 9th International Conference on ICT & Knowledge Engineering, IEEE Explore 978–1-4577-2162-5/11, pp. 43–46 (2011)
Google Scholar
Yin, Z.-X., Chiang, J.-H.: Novel algorithm for coexpression detection in time-varying microarray data sets. IEEE/ACM Trans. Comput. Biol. Bioinf. 5, 120–135 (2008)
Article MATH Google Scholar
Xu, R., Wunsch, D.: Survey of Clustering Algorithms. IEEE Trans. Neural Netw. 16, 645–678 (2005)
Article Google Scholar
G-Means Algorithm (2007). http://www.cs.utexas.edu/users/dml/Software/gmeans.html
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. In: Proceedings Nat’l Academy of Sciences USA, vol. 95, issue no. 25, pp. 14863–14868 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and System Science, Jawaharlal Nehru University, New Delhi, India
Ibrahim K. A. Abughali & Sonajharia Minz

Authors

Ibrahim K. A. Abughali
View author publications
You can also search for this author in PubMed Google Scholar
Sonajharia Minz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ibrahim K. A. Abughali .

Editor information

Editors and Affiliations

Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Marzena Kryszkiewicz
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, West Bengal, India
Sanghamitra Bandyopadhyay
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Henryk Rybinski
Indian Statistical Institute, Kolkata, West Bengal, India
Sankar K. Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abughali, I.K.A., Minz, S. (2015). Binarizing Change for Fast Trend Similarity Based Clustering of Time Series Data. In: Kryszkiewicz, M., Bandyopadhyay, S., Rybinski, H., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2015. Lecture Notes in Computer Science(), vol 9124. Springer, Cham. https://doi.org/10.1007/978-3-319-19941-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-19941-2_25
Published: 23 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19940-5
Online ISBN: 978-3-319-19941-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Abstract

1 Introduction

2 Related Work

3 Concepts and Definition

3.1 Variation Vector

3.2 Direction Vector

Example 1:

3.3 Trend Similarity

Example 2:

Example 3:

3.4 Trend Dissimilarity of Degree K

3.5 Trend Cluster of Level-K

Example 4:

Example 5:

4 Fast Trend Similarity-Based Clustering (FTSC) Algorithm

5 Experiments and Results

5.1 Data Sets

5.2 System Configuration

5.3 Design of Experiments

5.4 Efficiency and Accuracy of FTSC

5.5 Comparison of FTSC and EVCD Algorithms

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation