Abstract
We present in this paper a novel non-parametric approach useful for clustering independent identically distributed stochastic processes. We introduce a pre-processing step consisting in mapping multivariate independent and identically distributed samples from random variables to a generic non-parametric representation which factorizes dependency and marginal distribution apart without losing any information. An associated metric is defined where the balance between random variables dependency and distribution information is controlled by a single parameter. This mixing parameter can be learned or played with by a practitioner, such use is illustrated on the case of clustering financial time series. Experiments, implementation and results obtained on public financial time series are online on a web portal http://www.datagrapple.com.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amari, S.I., Cichocki, A.: Information geometry of divergence functions. Bull. Pol. Acad. Sci. Tech. Sci. 58(1), 183–195 (2010)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Bachelier, L.: Théorie de la spéculation. Gauthier-Villars (1900)
Basseville, M.: Divergence measures for statistical data processing. Sig. Process. 93(4), 621–633 (2013)
Ben-David, S., Von Luxburg, U., Pál, D.: A sober look at clustering stability. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 5–19. Springer, Heidelberg (2006)
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17 (2001)
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, Seattle, WA, vol. 10, pp. 359–370 (1994)
Carlsson, G., Mémoli, F.: Characterization, stability and convergence of hierarchical clustering methods. J. Mach. Learn. Res. 11, 1425–1470 (2010)
Deheuvels, P.: La fonction de dépendance empirique et ses propriétés. Un test non paramétrique d’indépendance. Acad. Roy. Belg. Bull. Cl. Sci. (5) 65(6), 274–292 (1979)
Ding, C., He, X.: K-means clustering via principal component analysis. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 29. ACM (2004)
Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979)
Fama, E.F.: The behavior of stock-market prices. J. Bus. 38, 34–105 (1965)
Harel, D., Koren, Y.: On clustering using random walks. In: Hariharan, R., Mukund, M., Vinay, V. (eds.) FSTTCS 2001. LNCS, vol. 2245, pp. 18–41. Springer, Heidelberg (2001)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Ivanov, P.C., Rosenblum, M.G., Peng, C., Mietus, J., Havlin, S., Stanley, H., Goldberger, A.L.: Scaling behaviour of heartbeat intervals obtained by wavelet-based time-series analysis. Nature 383(6598), 323–327 (1996)
Keogh, E., Lin, J., Fu, A.: Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining, pp. 8-pp. IEEE (2005)
Krieger, A.M., Green, P.E.: A cautionary note on using internal cross validation to select the number of clusters. Psychometrika 64(3), 341–353 (1999)
Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11. ACM (2003)
Marti, G., Very, P., Donnat, P.: Toward a generic representation of random variables for machine learning (2015). arXiv preprint arXiv:1506.00976
Meila, M., Shi, J.: A random walks view of spectral segmentation. In: AI and STATISTICS (AISTATS) (2001)
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)
Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis, vol. 4. Cambridge University Press, Cambridge (2006)
Shamir, O., Tishby, N.: Cluster stability for finite samples. In: NIPS (2007)
Shamir, O., Tishby, N.: Model selection and stability in k-means clustering. In: Learning Theory (2008)
Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Université Paris 8 (1959)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Marti, G., Nielsen, F., Very, P., Donnat, P. (2015). Clustering Random Walk Time Series. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2015. Lecture Notes in Computer Science(), vol 9389. Springer, Cham. https://doi.org/10.1007/978-3-319-25040-3_72
Download citation
DOI: https://doi.org/10.1007/978-3-319-25040-3_72
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25039-7
Online ISBN: 978-3-319-25040-3
eBook Packages: Computer ScienceComputer Science (R0)