Abstract
Clustering is a frequently used method in unsupervised analysis of various data types including time series data. In this study, we first present a discrete k-median (DKM) method based on uncoupled bilinear programming algorithm and modify it for faster implementation, which becomes a variant of the Lloyds algorithm. We also introduce the fuzzy discrete k-median (FDKM) method which is the fuzzy version of the modified algorithm. The main draw for the these two efficient algorithms is that they do not require any input but a matrix of distances as a measure of dissimilarity between pairs of samples to avoid the complications that may arise from working with the actual domain that the data samples reside in. We also include a hiearchical cluster tree (HCT) method and partition around medoids (PAM) method, both of which can use the distance matrix for clustering. The output of all four methods are median samples, which define clusters by assigning each sample to the closest median sample using the distance matrix. We consider four different distance measures, rectilinear, Euclidean, squared-Euclidean and dynamic time warping (DTW) to create the distance matrix, and also mention how the calculation of the distance matrix can be extended to any kernel induced feature space. The main application domain in this study is time series data, where actual samples in the data set are better cluster representations than mean or median points whose components are independently calculated for each dimension of the domain. We present computational results on a public time series benchmark data set and a real life local field potential (LFP) recordings collected from a macaque monkey brain during a visuomotor task.
Mathematics Subject Classification (2010):Primary 62H30, Secondary 68W25
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
K.P. Bennett, O.L. Mangasarian, Bilinear separation of two sets in n-space. Comput. Optim. Appl. 2, 207–227 (1993)
D.J. Berndt, J. Clifford, in Using Dynamic Time Warping to Find Patterns in Time Series. Proceedings of KDD-94: AAAI Workshop on Knowledge Discovery in Databases, pp. 359–370 (1994)
J. Blömer, M.R. Ackermann, C. Sohler, Clustering for metric and nonmetric distance measures. ACM Trans. Algorithms 6, 59:1–59:26 (2010)
P.S. Bradley, U.M. Fayyad, in Refining Initial Points for k-Means Clustering. ICML ’98: Proceedings of the Fifteenth International Conference on Machine Learning, San Francisco, CA, USA, 1998 (Morgan Kaufmann, CA, 1998), pp. 91–99
J.F. Campbell, Integer programming formulations of discrete hub location problems. Eur. J. Oper. Res. 72, 387–405 (1994)
J.F. Campbell, Hub location and the p-hub median problem. Oper. Res. 44, 923–935 (1996)
D. Chhajed, T.J. Lowe, m-median and m-center problems with mutual communication: Solvable special cases. Oper. Res. 40, S56–S66 (1992)
P. Chuchart, S. Supot, C. Thanapong and S. Manas. Automatic segmentation of blood vessels in retinal image based on fuzzy k-median clustering. In Proceedings of the 2007 IEEE International Conference on Integration Technology, pp. 584–588, 2007.
I.S. Dhillon, A. Banerjee, S. Merugu, J. Ghosh, Clustering with bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
M. Ding, R. Coppola, A. Ledberg, S.L. Bressler, R. Nakamura, Large-Scale Visuomotor Integration in the Cerebral Cortex. Cerebr. Cortex 17(1), 44–62 (2007)
P. D’Urso, R. Coppi, P. Giordani, in Fuzzy k-Medoids Clustering Models for Fuzzy Multivariate Time Trajectories. Proceedings of COMPSTAT 2006, vol. 1, pp. 17–29 (2006)
V. Faber, Clustering and the continuous k-means algorithm. Los Alamos Sci. 22, 138–144 (1994)
Y.-J. Fan, O. Seref, W.A. Chaowalitwongse, Mathematical programming formulations and algorithms for discrete k-median clustering with time series data. INFORMS J. Comput.
D. Gada, K.K. Dhiral, K. Kalpakis, V. Puttagunta, in Distance Measures for Effective Clustering of Arima Time-Series. Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 273–280 (2001)
M.R. Garey, D.S. Johnson, Computers and Intractibility: A Guide to the Theory of NP-Completeness (W. H. Freeman, CA, 1979)
K. Jain, V.V. Vazirani, Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM 48(2), 274–296 (2001)
A. Joshi, R. Krishnapuram, L. Yi, in A Fuzzy Relative of the k-Medoids Algorithm with Application to Web Document and Snippet Clustering. Snippet Clustering, Proceedings of IEEE International Conference on Fuzzy Systems – FUZZIEEE99, Korea, 1999
O. Kariv, S.L. Hakimi, An algorithmic approach to network location problems. ii: The p-medians. SIAM J. Appl. Math. 37(3), 539–560 (1979)
L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis (Wiley Series in Probability and Statistics) (Wiley-Interscience, NY, 2005)
E. Keogh, C.A. Ratanamahatana, Exact indexing of dynamic time warping. Knowl. Inform. Syst. 7(3), 358–386 (2005)
S.S. Khan, A. Ahmad, Cluster center initialization algorithm for k-means clustering. Pattern Recogn. Lett. 25(11), 1293–1302 (2004)
J. Liang, H. Zhao, G. Zhang, in Fuzzy k-Median Clustering Based on hsim Function for the High Dimensional Data. Proceedings of the 6th World Congress on Intelligent Control and Automation, pp. 3099–3102 (2006)
S.P. Lloyd, Least squares quantization in pcm. IEEE Trans. Inform. Theor. 28, 129–137 (1982)
O.L. Mangasarian P.S. Bradley, W.N. Street, Clustering via concave minimization. Adv. Neural Inform. Process. Syst. 9, 368–374 (1997)
J.-P. Mei, L. Chen, Fuzzy clustering with weighted medoids for relational data. Pattern Recogn. 43, 1964–1974 (2010)
M.N. Murty, A.K. Jain, P.J. Flynn, Data clustering: A review. ACM Comput. Surv. 31, 264–323 (1999)
O. Nasraoui, R. Krishnapuram, A. Joshi, L. Yi, Low-complexity fuzzy relational clustering algorithms for webmining. IEEE Trans. Fuzzy Syst. 9, 595–607 (2001)
K. Pollard, M. Van Der Laan, J. Bryan, A new partitioning around medoids algorithm. J. Stat. Comput. Simulation 73(8), 575–584 (2003)
P. Raghavan, C.D. Manning, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, London, 2008)
C.S. Revelle, R.W. Swain, Central facilities location. Geogr. Anal. 2(1), 30–42 (1970)
P.P. Rodrigues, J. Gama, J. Pedroso, Hierarchical clustering of time-series data streams. IEEE Trans. Knowl. Data Eng. 20, 615–627 (2008)
P.H.A. Sneath, R.R. Sokal, Numerical Taxonomy: The Principles and Practice of Numerical Classification (W.H. Freeman, San Francisco, 1973)
E. Tardos, M. Charikara, S. Guhab, D.B. Shmoys, A constant-factor approximation algorithm for the k-median problem. J. Comp. Syst. Sci. 65(1), 129–149 (2002)
N. Vlassis, A. Likas, J.J. Verbeek, The global k-means clustering algorithm. Pattern Recogn. 36, 451–461 (2001)
L. Wei, E. Keogh, X. Xi, C.A. Ratanamahatana, The ucr time series classification/clustering (2006)
X. Xi, S.H. Lee, E. Keogh, L. Wei, M. Vlachos, in Lb_keogh Supports Exact Indexing of Shapes Under Rotation Invariance with Arbitrary Representations and Distance Measures. VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB Endowment, 2006), pp. 882–893
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Şeref, O., Chaovalitwongse, W.A. (2013). Clustering Time Series Data with Distance Matrices. In: Pardalos, P., Coleman, T., Xanthopoulos, P. (eds) Optimization and Data Analysis in Biomedical Informatics. Fields Institute Communications, vol 63. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4133-5_2
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4133-5_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4132-8
Online ISBN: 978-1-4614-4133-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)