Skip to main content

Clustering Random Walk Time Series

  • 1687 Accesses

Part of the Lecture Notes in Computer Science book series (LNIP,volume 9389)

Abstract

We present in this paper a novel non-parametric approach useful for clustering independent identically distributed stochastic processes. We introduce a pre-processing step consisting in mapping multivariate independent and identically distributed samples from random variables to a generic non-parametric representation which factorizes dependency and marginal distribution apart without losing any information. An associated metric is defined where the balance between random variables dependency and distribution information is controlled by a single parameter. This mixing parameter can be learned or played with by a practitioner, such use is illustrated on the case of clustering financial time series. Experiments, implementation and results obtained on public financial time series are online on a web portal http://www.datagrapple.com.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-25040-3_72
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-25040-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Amari, S.I., Cichocki, A.: Information geometry of divergence functions. Bull. Pol. Acad. Sci. Tech. Sci. 58(1), 183–195 (2010)

    Google Scholar 

  2. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)

    Google Scholar 

  3. Bachelier, L.: Théorie de la spéculation. Gauthier-Villars (1900)

    Google Scholar 

  4. Basseville, M.: Divergence measures for statistical data processing. Sig. Process. 93(4), 621–633 (2013)

    MathSciNet  CrossRef  Google Scholar 

  5. Ben-David, S., Von Luxburg, U., Pál, D.: A sober look at clustering stability. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 5–19. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  6. Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17 (2001)

    Google Scholar 

  7. Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, Seattle, WA, vol. 10, pp. 359–370 (1994)

    Google Scholar 

  8. Carlsson, G., Mémoli, F.: Characterization, stability and convergence of hierarchical clustering methods. J. Mach. Learn. Res. 11, 1425–1470 (2010)

    MathSciNet  MATH  Google Scholar 

  9. Deheuvels, P.: La fonction de dépendance empirique et ses propriétés. Un test non paramétrique d’indépendance. Acad. Roy. Belg. Bull. Cl. Sci. (5) 65(6), 274–292 (1979)

    Google Scholar 

  10. Ding, C., He, X.: K-means clustering via principal component analysis. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 29. ACM (2004)

    Google Scholar 

  11. Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979)

    MathSciNet  CrossRef  MATH  Google Scholar 

  12. Fama, E.F.: The behavior of stock-market prices. J. Bus. 38, 34–105 (1965)

    CrossRef  Google Scholar 

  13. Harel, D., Koren, Y.: On clustering using random walks. In: Hariharan, R., Mukund, M., Vinay, V. (eds.) FSTTCS 2001. LNCS, vol. 2245, pp. 18–41. Springer, Heidelberg (2001)

    CrossRef  Google Scholar 

  14. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    CrossRef  MATH  Google Scholar 

  15. Ivanov, P.C., Rosenblum, M.G., Peng, C., Mietus, J., Havlin, S., Stanley, H., Goldberger, A.L.: Scaling behaviour of heartbeat intervals obtained by wavelet-based time-series analysis. Nature 383(6598), 323–327 (1996)

    CrossRef  Google Scholar 

  16. Keogh, E., Lin, J., Fu, A.: Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining, pp. 8-pp. IEEE (2005)

    Google Scholar 

  17. Krieger, A.M., Green, P.E.: A cautionary note on using internal cross validation to select the number of clusters. Psychometrika 64(3), 341–353 (1999)

    CrossRef  MATH  Google Scholar 

  18. Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)

    CrossRef  MATH  Google Scholar 

  19. Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11. ACM (2003)

    Google Scholar 

  20. Marti, G., Very, P., Donnat, P.: Toward a generic representation of random variables for machine learning (2015). arXiv preprint arXiv:1506.00976

  21. Meila, M., Shi, J.: A random walks view of spectral segmentation. In: AI and STATISTICS (AISTATS) (2001)

    Google Scholar 

  22. Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)

    MathSciNet  CrossRef  MATH  Google Scholar 

  23. Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis, vol. 4. Cambridge University Press, Cambridge (2006)

    MATH  Google Scholar 

  24. Shamir, O., Tishby, N.: Cluster stability for finite samples. In: NIPS (2007)

    Google Scholar 

  25. Shamir, O., Tishby, N.: Model selection and stability in k-means clustering. In: Learning Theory (2008)

    Google Scholar 

  26. Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Université Paris 8 (1959)

    Google Scholar 

  27. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    MathSciNet  CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gautier Marti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Marti, G., Nielsen, F., Very, P., Donnat, P. (2015). Clustering Random Walk Time Series. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2015. Lecture Notes in Computer Science(), vol 9389. Springer, Cham. https://doi.org/10.1007/978-3-319-25040-3_72

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25040-3_72

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25039-7

  • Online ISBN: 978-3-319-25040-3

  • eBook Packages: Computer ScienceComputer Science (R0)