Clustering Time Series Data with Distance Matrices

Şeref, Onur; Chaovalitwongse, W. Art

doi:10.1007/978-1-4614-4133-5_2

Onur Şeref⁴ &
W. Art Chaovalitwongse⁵

Part of the book series: Fields Institute Communications ((FIC,volume 63))

1249 Accesses

Abstract

Clustering is a frequently used method in unsupervised analysis of various data types including time series data. In this study, we first present a discrete k-median (DKM) method based on uncoupled bilinear programming algorithm and modify it for faster implementation, which becomes a variant of the Lloyds algorithm. We also introduce the fuzzy discrete k-median (FDKM) method which is the fuzzy version of the modified algorithm. The main draw for the these two efficient algorithms is that they do not require any input but a matrix of distances as a measure of dissimilarity between pairs of samples to avoid the complications that may arise from working with the actual domain that the data samples reside in. We also include a hiearchical cluster tree (HCT) method and partition around medoids (PAM) method, both of which can use the distance matrix for clustering. The output of all four methods are median samples, which define clusters by assigning each sample to the closest median sample using the distance matrix. We consider four different distance measures, rectilinear, Euclidean, squared-Euclidean and dynamic time warping (DTW) to create the distance matrix, and also mention how the calculation of the distance matrix can be extended to any kernel induced feature space. The main application domain in this study is time series data, where actual samples in the data set are better cluster representations than mean or median points whose components are independently calculated for each dimension of the domain. We present computational results on a public time series benchmark data set and a real life local field potential (LFP) recordings collected from a macaque monkey brain during a visuomotor task.

Mathematics Subject Classification (2010):Primary 62H30, Secondary 68W25

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

K.P. Bennett, O.L. Mangasarian, Bilinear separation of two sets in n-space. Comput. Optim. Appl. 2, 207–227 (1993)
Article MathSciNet MATH Google Scholar
D.J. Berndt, J. Clifford, in Using Dynamic Time Warping to Find Patterns in Time Series. Proceedings of KDD-94: AAAI Workshop on Knowledge Discovery in Databases, pp. 359–370 (1994)
Google Scholar
J. Blömer, M.R. Ackermann, C. Sohler, Clustering for metric and nonmetric distance measures. ACM Trans. Algorithms 6, 59:1–59:26 (2010)
Google Scholar
P.S. Bradley, U.M. Fayyad, in Refining Initial Points for k-Means Clustering. ICML ’98: Proceedings of the Fifteenth International Conference on Machine Learning, San Francisco, CA, USA, 1998 (Morgan Kaufmann, CA, 1998), pp. 91–99
Google Scholar
J.F. Campbell, Integer programming formulations of discrete hub location problems. Eur. J. Oper. Res. 72, 387–405 (1994)
Article MATH Google Scholar
J.F. Campbell, Hub location and the p-hub median problem. Oper. Res. 44, 923–935 (1996)
Article MathSciNet MATH Google Scholar
D. Chhajed, T.J. Lowe, m-median and m-center problems with mutual communication: Solvable special cases. Oper. Res. 40, S56–S66 (1992)
Article MathSciNet Google Scholar
P. Chuchart, S. Supot, C. Thanapong and S. Manas. Automatic segmentation of blood vessels in retinal image based on fuzzy k-median clustering. In Proceedings of the 2007 IEEE International Conference on Integration Technology, pp. 584–588, 2007.
Google Scholar
I.S. Dhillon, A. Banerjee, S. Merugu, J. Ghosh, Clustering with bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
MathSciNet MATH Google Scholar
M. Ding, R. Coppola, A. Ledberg, S.L. Bressler, R. Nakamura, Large-Scale Visuomotor Integration in the Cerebral Cortex. Cerebr. Cortex 17(1), 44–62 (2007)
Google Scholar
P. D’Urso, R. Coppi, P. Giordani, in Fuzzy k-Medoids Clustering Models for Fuzzy Multivariate Time Trajectories. Proceedings of COMPSTAT 2006, vol. 1, pp. 17–29 (2006)
Google Scholar
V. Faber, Clustering and the continuous k-means algorithm. Los Alamos Sci. 22, 138–144 (1994)
Google Scholar
Y.-J. Fan, O. Seref, W.A. Chaowalitwongse, Mathematical programming formulations and algorithms for discrete k-median clustering with time series data. INFORMS J. Comput.
Google Scholar
D. Gada, K.K. Dhiral, K. Kalpakis, V. Puttagunta, in Distance Measures for Effective Clustering of Arima Time-Series. Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 273–280 (2001)
Google Scholar
M.R. Garey, D.S. Johnson, Computers and Intractibility: A Guide to the Theory of NP-Completeness (W. H. Freeman, CA, 1979)
MATH Google Scholar
K. Jain, V.V. Vazirani, Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM 48(2), 274–296 (2001)
Article MathSciNet MATH Google Scholar
A. Joshi, R. Krishnapuram, L. Yi, in A Fuzzy Relative of the k-Medoids Algorithm with Application to Web Document and Snippet Clustering. Snippet Clustering, Proceedings of IEEE International Conference on Fuzzy Systems – FUZZIEEE99, Korea, 1999
Google Scholar
O. Kariv, S.L. Hakimi, An algorithmic approach to network location problems. ii: The p-medians. SIAM J. Appl. Math. 37(3), 539–560 (1979)
Google Scholar
L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis (Wiley Series in Probability and Statistics) (Wiley-Interscience, NY, 2005)
Google Scholar
E. Keogh, C.A. Ratanamahatana, Exact indexing of dynamic time warping. Knowl. Inform. Syst. 7(3), 358–386 (2005)
Article Google Scholar
S.S. Khan, A. Ahmad, Cluster center initialization algorithm for k-means clustering. Pattern Recogn. Lett. 25(11), 1293–1302 (2004)
Article Google Scholar
J. Liang, H. Zhao, G. Zhang, in Fuzzy k-Median Clustering Based on hsim Function for the High Dimensional Data. Proceedings of the 6th World Congress on Intelligent Control and Automation, pp. 3099–3102 (2006)
Google Scholar
S.P. Lloyd, Least squares quantization in pcm. IEEE Trans. Inform. Theor. 28, 129–137 (1982)
Article MathSciNet MATH Google Scholar
O.L. Mangasarian P.S. Bradley, W.N. Street, Clustering via concave minimization. Adv. Neural Inform. Process. Syst. 9, 368–374 (1997)
Google Scholar
J.-P. Mei, L. Chen, Fuzzy clustering with weighted medoids for relational data. Pattern Recogn. 43, 1964–1974 (2010)
Article MATH Google Scholar
M.N. Murty, A.K. Jain, P.J. Flynn, Data clustering: A review. ACM Comput. Surv. 31, 264–323 (1999)
Article Google Scholar
O. Nasraoui, R. Krishnapuram, A. Joshi, L. Yi, Low-complexity fuzzy relational clustering algorithms for webmining. IEEE Trans. Fuzzy Syst. 9, 595–607 (2001)
Article Google Scholar
K. Pollard, M. Van Der Laan, J. Bryan, A new partitioning around medoids algorithm. J. Stat. Comput. Simulation 73(8), 575–584 (2003)
Article MATH Google Scholar
P. Raghavan, C.D. Manning, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, London, 2008)
MATH Google Scholar
C.S. Revelle, R.W. Swain, Central facilities location. Geogr. Anal. 2(1), 30–42 (1970)
Google Scholar
P.P. Rodrigues, J. Gama, J. Pedroso, Hierarchical clustering of time-series data streams. IEEE Trans. Knowl. Data Eng. 20, 615–627 (2008)
Article Google Scholar
P.H.A. Sneath, R.R. Sokal, Numerical Taxonomy: The Principles and Practice of Numerical Classification (W.H. Freeman, San Francisco, 1973)
MATH Google Scholar
E. Tardos, M. Charikara, S. Guhab, D.B. Shmoys, A constant-factor approximation algorithm for the k-median problem. J. Comp. Syst. Sci. 65(1), 129–149 (2002)
Article MATH Google Scholar
N. Vlassis, A. Likas, J.J. Verbeek, The global k-means clustering algorithm. Pattern Recogn. 36, 451–461 (2001)
Google Scholar
L. Wei, E. Keogh, X. Xi, C.A. Ratanamahatana, The ucr time series classification/clustering (2006)
Google Scholar
X. Xi, S.H. Lee, E. Keogh, L. Wei, M. Vlachos, in Lb_keogh Supports Exact Indexing of Shapes Under Rotation Invariance with Arbitrary Representations and Distance Measures. VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB Endowment, 2006), pp. 882–893
Google Scholar

Download references

Author information

Authors and Affiliations

Business Information Technology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
Onur Şeref
Department of Industrial and Systems Engineering, Department of Radiology, University of Washington, Seattle, WA, 98195, USA
W. Art Chaovalitwongse

Authors

Onur Şeref
View author publications
You can also search for this author in PubMed Google Scholar
W. Art Chaovalitwongse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Onur Şeref .

Editor information

Editors and Affiliations

, Department of Industrial & Systems Engin, University of Florida, Weil Hall 401, Gainesville, 32611, Florida, USA
Panos M. Pardalos
, Department of Mathematics, University of Waterloo, University Avenue West 200, Waterloo, N2L 3G1, Ontario, Canada
Thomas F. Coleman
, Department of Industrial Engineering, University of Central Florida, Central Florida Blvd 4000, Orlando, 32816, Florida, USA
Petros Xanthopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Şeref, O., Chaovalitwongse, W.A. (2013). Clustering Time Series Data with Distance Matrices. In: Pardalos, P., Coleman, T., Xanthopoulos, P. (eds) Optimization and Data Analysis in Biomedical Informatics. Fields Institute Communications, vol 63. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4133-5_2

Download citation

DOI: https://doi.org/10.1007/978-1-4614-4133-5_2
Published: 20 July 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4132-8
Online ISBN: 978-1-4614-4133-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics