Advertisement

Clustering Time Series Data with Distance Matrices

  • Onur ŞerefEmail author
  • W. Art Chaovalitwongse
Chapter
Part of the Fields Institute Communications book series (FIC, volume 63)

Abstract

Clustering is a frequently used method in unsupervised analysis of various data types including time series data. In this study, we first present a discrete k-median (DKM) method based on uncoupled bilinear programming algorithm and modify it for faster implementation, which becomes a variant of the Lloyds algorithm. We also introduce the fuzzy discrete k-median (FDKM) method which is the fuzzy version of the modified algorithm. The main draw for the these two efficient algorithms is that they do not require any input but a matrix of distances as a measure of dissimilarity between pairs of samples to avoid the complications that may arise from working with the actual domain that the data samples reside in. We also include a hiearchical cluster tree (HCT) method and partition around medoids (PAM) method, both of which can use the distance matrix for clustering. The output of all four methods are median samples, which define clusters by assigning each sample to the closest median sample using the distance matrix. We consider four different distance measures, rectilinear, Euclidean, squared-Euclidean and dynamic time warping (DTW) to create the distance matrix, and also mention how the calculation of the distance matrix can be extended to any kernel induced feature space. The main application domain in this study is time series data, where actual samples in the data set are better cluster representations than mean or median points whose components are independently calculated for each dimension of the domain. We present computational results on a public time series benchmark data set and a real life local field potential (LFP) recordings collected from a macaque monkey brain during a visuomotor task.

Keywords

Dynamic Time Warping Facility Location Problem Local Field Potential Normalize Mutual Information Bregman Divergence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    K.P. Bennett, O.L. Mangasarian, Bilinear separation of two sets in n-space. Comput. Optim. Appl. 2, 207–227 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    D.J. Berndt, J. Clifford, in Using Dynamic Time Warping to Find Patterns in Time Series. Proceedings of KDD-94: AAAI Workshop on Knowledge Discovery in Databases, pp. 359–370 (1994)Google Scholar
  3. 3.
    J. Blömer, M.R. Ackermann, C. Sohler, Clustering for metric and nonmetric distance measures. ACM Trans. Algorithms 6, 59:1–59:26 (2010)Google Scholar
  4. 4.
    P.S. Bradley, U.M. Fayyad, in Refining Initial Points for k-Means Clustering. ICML ’98: Proceedings of the Fifteenth International Conference on Machine Learning, San Francisco, CA, USA, 1998 (Morgan Kaufmann, CA, 1998), pp. 91–99Google Scholar
  5. 5.
    J.F. Campbell, Integer programming formulations of discrete hub location problems. Eur. J. Oper. Res. 72, 387–405 (1994)zbMATHCrossRefGoogle Scholar
  6. 6.
    J.F. Campbell, Hub location and the p-hub median problem. Oper. Res. 44, 923–935 (1996)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    D. Chhajed, T.J. Lowe, m-median and m-center problems with mutual communication: Solvable special cases. Oper. Res. 40, S56–S66 (1992)MathSciNetCrossRefGoogle Scholar
  8. 8.
    P. Chuchart, S. Supot, C. Thanapong and S. Manas. Automatic segmentation of blood vessels in retinal image based on fuzzy k-median clustering. In Proceedings of the 2007 IEEE International Conference on Integration Technology, pp. 584–588, 2007.Google Scholar
  9. 9.
    I.S. Dhillon, A. Banerjee, S. Merugu, J. Ghosh, Clustering with bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)MathSciNetzbMATHGoogle Scholar
  10. 10.
    M. Ding, R. Coppola, A. Ledberg, S.L. Bressler, R. Nakamura, Large-Scale Visuomotor Integration in the Cerebral Cortex. Cerebr. Cortex 17(1), 44–62 (2007)Google Scholar
  11. 11.
    P. D’Urso, R. Coppi, P. Giordani, in Fuzzy k-Medoids Clustering Models for Fuzzy Multivariate Time Trajectories. Proceedings of COMPSTAT 2006, vol. 1, pp. 17–29 (2006)Google Scholar
  12. 12.
    V. Faber, Clustering and the continuous k-means algorithm. Los Alamos Sci. 22, 138–144 (1994)Google Scholar
  13. 13.
    Y.-J. Fan, O. Seref, W.A. Chaowalitwongse, Mathematical programming formulations and algorithms for discrete k-median clustering with time series data. INFORMS J. Comput.Google Scholar
  14. 14.
    D. Gada, K.K. Dhiral, K. Kalpakis, V. Puttagunta, in Distance Measures for Effective Clustering of Arima Time-Series. Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 273–280 (2001)Google Scholar
  15. 15.
    M.R. Garey, D.S. Johnson, Computers and Intractibility: A Guide to the Theory of NP-Completeness (W. H. Freeman, CA, 1979)zbMATHGoogle Scholar
  16. 16.
    K. Jain, V.V. Vazirani, Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM 48(2), 274–296 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    A. Joshi, R. Krishnapuram, L. Yi, in A Fuzzy Relative of the k-Medoids Algorithm with Application to Web Document and Snippet Clustering. Snippet Clustering, Proceedings of IEEE International Conference on Fuzzy Systems – FUZZIEEE99, Korea, 1999Google Scholar
  18. 18.
    O. Kariv, S.L. Hakimi, An algorithmic approach to network location problems. ii: The p-medians. SIAM J. Appl. Math. 37(3), 539–560 (1979)Google Scholar
  19. 19.
    L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis (Wiley Series in Probability and Statistics) (Wiley-Interscience, NY, 2005)Google Scholar
  20. 20.
    E. Keogh, C.A. Ratanamahatana, Exact indexing of dynamic time warping. Knowl. Inform. Syst. 7(3), 358–386 (2005)CrossRefGoogle Scholar
  21. 21.
    S.S. Khan, A. Ahmad, Cluster center initialization algorithm for k-means clustering. Pattern Recogn. Lett. 25(11), 1293–1302 (2004)CrossRefGoogle Scholar
  22. 22.
    J. Liang, H. Zhao, G. Zhang, in Fuzzy k-Median Clustering Based on hsim Function for the High Dimensional Data. Proceedings of the 6th World Congress on Intelligent Control and Automation, pp. 3099–3102 (2006)Google Scholar
  23. 23.
    S.P. Lloyd, Least squares quantization in pcm. IEEE Trans. Inform. Theor. 28, 129–137 (1982)MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    O.L. Mangasarian P.S. Bradley, W.N. Street, Clustering via concave minimization. Adv. Neural Inform. Process. Syst. 9, 368–374 (1997)Google Scholar
  25. 25.
    J.-P. Mei, L. Chen, Fuzzy clustering with weighted medoids for relational data. Pattern Recogn. 43, 1964–1974 (2010)zbMATHCrossRefGoogle Scholar
  26. 26.
    M.N. Murty, A.K. Jain, P.J. Flynn, Data clustering: A review. ACM Comput. Surv. 31, 264–323 (1999)CrossRefGoogle Scholar
  27. 27.
    O. Nasraoui, R. Krishnapuram, A. Joshi, L. Yi, Low-complexity fuzzy relational clustering algorithms for webmining. IEEE Trans. Fuzzy Syst. 9, 595–607 (2001)CrossRefGoogle Scholar
  28. 28.
    K. Pollard, M. Van Der Laan, J. Bryan, A new partitioning around medoids algorithm. J. Stat. Comput. Simulation 73(8), 575–584 (2003)zbMATHCrossRefGoogle Scholar
  29. 29.
    P. Raghavan, C.D. Manning, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, London, 2008)zbMATHGoogle Scholar
  30. 30.
    C.S. Revelle, R.W. Swain, Central facilities location. Geogr. Anal. 2(1), 30–42 (1970)Google Scholar
  31. 31.
    P.P. Rodrigues, J. Gama, J. Pedroso, Hierarchical clustering of time-series data streams. IEEE Trans. Knowl. Data Eng. 20, 615–627 (2008)CrossRefGoogle Scholar
  32. 32.
    P.H.A. Sneath, R.R. Sokal, Numerical Taxonomy: The Principles and Practice of Numerical Classification (W.H. Freeman, San Francisco, 1973)zbMATHGoogle Scholar
  33. 33.
    E. Tardos, M. Charikara, S. Guhab, D.B. Shmoys, A constant-factor approximation algorithm for the k-median problem. J. Comp. Syst. Sci. 65(1), 129–149 (2002)zbMATHCrossRefGoogle Scholar
  34. 34.
    N. Vlassis, A. Likas, J.J. Verbeek, The global k-means clustering algorithm. Pattern Recogn. 36, 451–461 (2001)Google Scholar
  35. 35.
    L. Wei, E. Keogh, X. Xi, C.A. Ratanamahatana, The ucr time series classification/clustering (2006)Google Scholar
  36. 36.
    X. Xi, S.H. Lee, E. Keogh, L. Wei, M. Vlachos, in Lb_keogh Supports Exact Indexing of Shapes Under Rotation Invariance with Arbitrary Representations and Distance Measures. VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB Endowment, 2006), pp. 882–893Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Business Information TechnologyVirginia Polytechnic Institute and State UniversityBlacksburgUSA
  2. 2.Department of Industrial and Systems Engineering, Department of RadiologyUniversity of WashingtonSeattleUSA

Personalised recommendations