Advertisement

Clustering Random Walk Time Series

  • Gautier MartiEmail author
  • Frank Nielsen
  • Philippe Very
  • Philippe Donnat
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9389)

Abstract

We present in this paper a novel non-parametric approach useful for clustering independent identically distributed stochastic processes. We introduce a pre-processing step consisting in mapping multivariate independent and identically distributed samples from random variables to a generic non-parametric representation which factorizes dependency and marginal distribution apart without losing any information. An associated metric is defined where the balance between random variables dependency and distribution information is controlled by a single parameter. This mixing parameter can be learned or played with by a practitioner, such use is illustrated on the case of clustering financial time series. Experiments, implementation and results obtained on public financial time series are online on a web portal http://www.datagrapple.com.

References

  1. 1.
    Amari, S.I., Cichocki, A.: Information geometry of divergence functions. Bull. Pol. Acad. Sci. Tech. Sci. 58(1), 183–195 (2010)Google Scholar
  2. 2.
    Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)Google Scholar
  3. 3.
    Bachelier, L.: Théorie de la spéculation. Gauthier-Villars (1900)Google Scholar
  4. 4.
    Basseville, M.: Divergence measures for statistical data processing. Sig. Process. 93(4), 621–633 (2013)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Ben-David, S., Von Luxburg, U., Pál, D.: A sober look at clustering stability. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 5–19. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  6. 6.
    Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17 (2001)Google Scholar
  7. 7.
    Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, Seattle, WA, vol. 10, pp. 359–370 (1994)Google Scholar
  8. 8.
    Carlsson, G., Mémoli, F.: Characterization, stability and convergence of hierarchical clustering methods. J. Mach. Learn. Res. 11, 1425–1470 (2010)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Deheuvels, P.: La fonction de dépendance empirique et ses propriétés. Un test non paramétrique d’indépendance. Acad. Roy. Belg. Bull. Cl. Sci. (5) 65(6), 274–292 (1979)Google Scholar
  10. 10.
    Ding, C., He, X.: K-means clustering via principal component analysis. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 29. ACM (2004)Google Scholar
  11. 11.
    Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Fama, E.F.: The behavior of stock-market prices. J. Bus. 38, 34–105 (1965)CrossRefGoogle Scholar
  13. 13.
    Harel, D., Koren, Y.: On clustering using random walks. In: Hariharan, R., Mukund, M., Vinay, V. (eds.) FSTTCS 2001. LNCS, vol. 2245, pp. 18–41. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  14. 14.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRefzbMATHGoogle Scholar
  15. 15.
    Ivanov, P.C., Rosenblum, M.G., Peng, C., Mietus, J., Havlin, S., Stanley, H., Goldberger, A.L.: Scaling behaviour of heartbeat intervals obtained by wavelet-based time-series analysis. Nature 383(6598), 323–327 (1996)CrossRefGoogle Scholar
  16. 16.
    Keogh, E., Lin, J., Fu, A.: Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining, pp. 8-pp. IEEE (2005)Google Scholar
  17. 17.
    Krieger, A.M., Green, P.E.: A cautionary note on using internal cross validation to select the number of clusters. Psychometrika 64(3), 341–353 (1999)CrossRefzbMATHGoogle Scholar
  18. 18.
    Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)CrossRefzbMATHGoogle Scholar
  19. 19.
    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11. ACM (2003)Google Scholar
  20. 20.
    Marti, G., Very, P., Donnat, P.: Toward a generic representation of random variables for machine learning (2015). arXiv preprint arXiv:1506.00976
  21. 21.
    Meila, M., Shi, J.: A random walks view of spectral segmentation. In: AI and STATISTICS (AISTATS) (2001)Google Scholar
  22. 22.
    Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis, vol. 4. Cambridge University Press, Cambridge (2006) zbMATHGoogle Scholar
  24. 24.
    Shamir, O., Tishby, N.: Cluster stability for finite samples. In: NIPS (2007)Google Scholar
  25. 25.
    Shamir, O., Tishby, N.: Model selection and stability in k-means clustering. In: Learning Theory (2008)Google Scholar
  26. 26.
    Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Université Paris 8 (1959)Google Scholar
  27. 27.
    Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Gautier Marti
    • 1
    • 2
    Email author
  • Frank Nielsen
    • 2
  • Philippe Very
    • 1
  • Philippe Donnat
    • 1
  1. 1.Hellebore Capital ManagementParisFrance
  2. 2.Ecole PolytechniquePalaiseauFrance

Personalised recommendations