Skip to main content

Clustering Time Series Data with Distance Matrices

  • Chapter
  • First Online:
Optimization and Data Analysis in Biomedical Informatics

Part of the book series: Fields Institute Communications ((FIC,volume 63))

  • 1249 Accesses

Abstract

Clustering is a frequently used method in unsupervised analysis of various data types including time series data. In this study, we first present a discrete k-median (DKM) method based on uncoupled bilinear programming algorithm and modify it for faster implementation, which becomes a variant of the Lloyds algorithm. We also introduce the fuzzy discrete k-median (FDKM) method which is the fuzzy version of the modified algorithm. The main draw for the these two efficient algorithms is that they do not require any input but a matrix of distances as a measure of dissimilarity between pairs of samples to avoid the complications that may arise from working with the actual domain that the data samples reside in. We also include a hiearchical cluster tree (HCT) method and partition around medoids (PAM) method, both of which can use the distance matrix for clustering. The output of all four methods are median samples, which define clusters by assigning each sample to the closest median sample using the distance matrix. We consider four different distance measures, rectilinear, Euclidean, squared-Euclidean and dynamic time warping (DTW) to create the distance matrix, and also mention how the calculation of the distance matrix can be extended to any kernel induced feature space. The main application domain in this study is time series data, where actual samples in the data set are better cluster representations than mean or median points whose components are independently calculated for each dimension of the domain. We present computational results on a public time series benchmark data set and a real life local field potential (LFP) recordings collected from a macaque monkey brain during a visuomotor task.

Mathematics Subject Classification (2010):Primary 62H30, Secondary 68W25

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. K.P. Bennett, O.L. Mangasarian, Bilinear separation of two sets in n-space. Comput. Optim. Appl. 2, 207–227 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  2. D.J. Berndt, J. Clifford, in Using Dynamic Time Warping to Find Patterns in Time Series. Proceedings of KDD-94: AAAI Workshop on Knowledge Discovery in Databases, pp. 359–370 (1994)

    Google Scholar 

  3. J. Blömer, M.R. Ackermann, C. Sohler, Clustering for metric and nonmetric distance measures. ACM Trans. Algorithms 6, 59:1–59:26 (2010)

    Google Scholar 

  4. P.S. Bradley, U.M. Fayyad, in Refining Initial Points for k-Means Clustering. ICML ’98: Proceedings of the Fifteenth International Conference on Machine Learning, San Francisco, CA, USA, 1998 (Morgan Kaufmann, CA, 1998), pp. 91–99

    Google Scholar 

  5. J.F. Campbell, Integer programming formulations of discrete hub location problems. Eur. J. Oper. Res. 72, 387–405 (1994)

    Article  MATH  Google Scholar 

  6. J.F. Campbell, Hub location and the p-hub median problem. Oper. Res. 44, 923–935 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  7. D. Chhajed, T.J. Lowe, m-median and m-center problems with mutual communication: Solvable special cases. Oper. Res. 40, S56–S66 (1992)

    Article  MathSciNet  Google Scholar 

  8. P. Chuchart, S. Supot, C. Thanapong and S. Manas. Automatic segmentation of blood vessels in retinal image based on fuzzy k-median clustering. In Proceedings of the 2007 IEEE International Conference on Integration Technology, pp. 584–588, 2007.

    Google Scholar 

  9. I.S. Dhillon, A. Banerjee, S. Merugu, J. Ghosh, Clustering with bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)

    MathSciNet  MATH  Google Scholar 

  10. M. Ding, R. Coppola, A. Ledberg, S.L. Bressler, R. Nakamura, Large-Scale Visuomotor Integration in the Cerebral Cortex. Cerebr. Cortex 17(1), 44–62 (2007)

    Google Scholar 

  11. P. D’Urso, R. Coppi, P. Giordani, in Fuzzy k-Medoids Clustering Models for Fuzzy Multivariate Time Trajectories. Proceedings of COMPSTAT 2006, vol. 1, pp. 17–29 (2006)

    Google Scholar 

  12. V. Faber, Clustering and the continuous k-means algorithm. Los Alamos Sci. 22, 138–144 (1994)

    Google Scholar 

  13. Y.-J. Fan, O. Seref, W.A. Chaowalitwongse, Mathematical programming formulations and algorithms for discrete k-median clustering with time series data. INFORMS J. Comput.

    Google Scholar 

  14. D. Gada, K.K. Dhiral, K. Kalpakis, V. Puttagunta, in Distance Measures for Effective Clustering of Arima Time-Series. Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 273–280 (2001)

    Google Scholar 

  15. M.R. Garey, D.S. Johnson, Computers and Intractibility: A Guide to the Theory of NP-Completeness (W. H. Freeman, CA, 1979)

    MATH  Google Scholar 

  16. K. Jain, V.V. Vazirani, Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM 48(2), 274–296 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  17. A. Joshi, R. Krishnapuram, L. Yi, in A Fuzzy Relative of the k-Medoids Algorithm with Application to Web Document and Snippet Clustering. Snippet Clustering, Proceedings of IEEE International Conference on Fuzzy Systems – FUZZIEEE99, Korea, 1999

    Google Scholar 

  18. O. Kariv, S.L. Hakimi, An algorithmic approach to network location problems. ii: The p-medians. SIAM J. Appl. Math. 37(3), 539–560 (1979)

    Google Scholar 

  19. L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis (Wiley Series in Probability and Statistics) (Wiley-Interscience, NY, 2005)

    Google Scholar 

  20. E. Keogh, C.A. Ratanamahatana, Exact indexing of dynamic time warping. Knowl. Inform. Syst. 7(3), 358–386 (2005)

    Article  Google Scholar 

  21. S.S. Khan, A. Ahmad, Cluster center initialization algorithm for k-means clustering. Pattern Recogn. Lett. 25(11), 1293–1302 (2004)

    Article  Google Scholar 

  22. J. Liang, H. Zhao, G. Zhang, in Fuzzy k-Median Clustering Based on hsim Function for the High Dimensional Data. Proceedings of the 6th World Congress on Intelligent Control and Automation, pp. 3099–3102 (2006)

    Google Scholar 

  23. S.P. Lloyd, Least squares quantization in pcm. IEEE Trans. Inform. Theor. 28, 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  24. O.L. Mangasarian P.S. Bradley, W.N. Street, Clustering via concave minimization. Adv. Neural Inform. Process. Syst. 9, 368–374 (1997)

    Google Scholar 

  25. J.-P. Mei, L. Chen, Fuzzy clustering with weighted medoids for relational data. Pattern Recogn. 43, 1964–1974 (2010)

    Article  MATH  Google Scholar 

  26. M.N. Murty, A.K. Jain, P.J. Flynn, Data clustering: A review. ACM Comput. Surv. 31, 264–323 (1999)

    Article  Google Scholar 

  27. O. Nasraoui, R. Krishnapuram, A. Joshi, L. Yi, Low-complexity fuzzy relational clustering algorithms for webmining. IEEE Trans. Fuzzy Syst. 9, 595–607 (2001)

    Article  Google Scholar 

  28. K. Pollard, M. Van Der Laan, J. Bryan, A new partitioning around medoids algorithm. J. Stat. Comput. Simulation 73(8), 575–584 (2003)

    Article  MATH  Google Scholar 

  29. P. Raghavan, C.D. Manning, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, London, 2008)

    MATH  Google Scholar 

  30. C.S. Revelle, R.W. Swain, Central facilities location. Geogr. Anal. 2(1), 30–42 (1970)

    Google Scholar 

  31. P.P. Rodrigues, J. Gama, J. Pedroso, Hierarchical clustering of time-series data streams. IEEE Trans. Knowl. Data Eng. 20, 615–627 (2008)

    Article  Google Scholar 

  32. P.H.A. Sneath, R.R. Sokal, Numerical Taxonomy: The Principles and Practice of Numerical Classification (W.H. Freeman, San Francisco, 1973)

    MATH  Google Scholar 

  33. E. Tardos, M. Charikara, S. Guhab, D.B. Shmoys, A constant-factor approximation algorithm for the k-median problem. J. Comp. Syst. Sci. 65(1), 129–149 (2002)

    Article  MATH  Google Scholar 

  34. N. Vlassis, A. Likas, J.J. Verbeek, The global k-means clustering algorithm. Pattern Recogn. 36, 451–461 (2001)

    Google Scholar 

  35. L. Wei, E. Keogh, X. Xi, C.A. Ratanamahatana, The ucr time series classification/clustering (2006)

    Google Scholar 

  36. X. Xi, S.H. Lee, E. Keogh, L. Wei, M. Vlachos, in Lb_keogh Supports Exact Indexing of Shapes Under Rotation Invariance with Arbitrary Representations and Distance Measures. VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB Endowment, 2006), pp. 882–893

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Onur Şeref .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Şeref, O., Chaovalitwongse, W.A. (2013). Clustering Time Series Data with Distance Matrices. In: Pardalos, P., Coleman, T., Xanthopoulos, P. (eds) Optimization and Data Analysis in Biomedical Informatics. Fields Institute Communications, vol 63. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4133-5_2

Download citation

Publish with us

Policies and ethics