Abstract
We propose ClusPath, a novel algorithm for detecting general evolution tendencies in a population of entities. We show how abstract notions, such as the Swedish socio-economical model (in a political dataset) or the companies fiscal optimization (in an economical dataset) can be inferred from low-level descriptive features. Such high-level regularities in the evolution of entities are detected by combining spatial and temporal features into a spatio-temporal dissimilarity measure and using semi-supervised clustering techniques. The relations between the evolution phases are modeled using a graph structure, inferred simultaneously with the partition, by using a “slow changing world” assumption. The idea is to ensure a smooth passage for entities along their evolution paths, which catches the long-term trends in the dataset. Additionally, we also provide a method, based on an evolutionary algorithm, to tune the parameters of ClusPath to new, unseen datasets. This method assesses the fitness of a solution using four opposed quality measures and proposes a balanced compromise.
Similar content being viewed by others
Notes
Download pretreated version of the CPDS1 dataset here: http://goo.gl/17ihsf.
References
Araujo R, Kamel MS (2014) Semi-supervised kernel-based temporal clustering. In: International conference on machine learning and applications, IEEE, ICMLA ’14, pp 123–128
Armingeon K., Isler C, Laura Knöpfel DW, Engler S (2011) Comparative political data set 1960–2009. University of Berne
Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: International conference on knowledge discovery and data mining, ACM, SIGKDD ’06, pp 554–560
Chi Y, Song X, Zhou D, Hino K, Tseng BL (2007) Evolutionary Spectral Clustering by Incorporating Temporal Smoothness. In: International Conference on Knowledge Discovery and Data Mining (KDD), San Jose, USA, pp 153–162
De la Torre F, Agell C (2007) Multimodal diaries. In: Multimedia and expo, IEEE, pp 839–842
De Smet Y, Eppe S (2009) Multicriteria relational clustering: the case of binary outranking matrices. Evol Multi-Criterion Optim 5467:380–392
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. Evol Comput 6(2):182–197
Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57
Erixon L (2000) A Swedish economic policy: the theory, application and validity of the Rehn–Meidner model. Tech. rep., Department of Economics, Stockholm University
Gaffney S, Smyth P (1999) Trajectory clustering with mixtures of regression models. In: International conference on knowledge discovery and data mining, ACM Press, New York, USA, SIGKDD ’99, pp 63–72. doi:10.1145/312129.312198
Halsall-Whitney H, Thibault J (2006) Multi-objective optimization for chemical processes and controller design: approximating and classifying the pareto domain. Comput Chem Eng 30(6–7):1155–1168
Kafafy A, Bounekkar A, Bonnevay S (2011) A hybrid evolutionary metaheuristics (HEMH) applied on 0/1 multiobjective knapsack problems. In: Genetic and evolutionary computation, ACM Press, New York, USA, GECCO ’11, p 497
Kalnis P, Mamoulis N, Bakiras S (2005) On discovering moving clusters in spatio-temporal data, chap. 21. In: Bauzer Medeiros C, Egenhofer M, Bertino E (eds) Advances in spatial and temporal databases, vol 3633., Lecture notes in computer science, Springer, Berlin, pp 364–381
Liang Z, Tomioka R, Murata H, Asaoka R, Yamanishi K (2013) Quantitative prediction of glaucomatous visual field loss from few measurements. In: International conference on data mining, ICDM ’13, pp 1121–1126
Lin WH, Hauptmann A (2006) Structuring continuous video recordings of everyday life using time-constrained clustering. In: Chang EY, Hanjalic A, Sebe N (eds) Multimedia content analysis, management, and retrieval, pp 60730D–60730D-9
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297
Mihăiţă AS, Camargo M, Lhoste P (2014) Optimization of a complex urban intersection using discrete event simulation and evolutionary algorithms. In: International federation of automatic control, IFAC’14, vol 19, pp 8768–8774
Rizoiu MA, Velcin J, Lallich S (2012) Structuring typical evolutions using temporal-driven constrained clustering. In: International conference on tools with artificial intelligence, ICTAI ’12, vol 1, IEEE, Athens, Greece, pp 610–617
Rizoiu MA, Velcin J, Lallich S (2014) How to use temporal-driven constrained clustering to detect typical evolutions. Int J Artif Intell Tools 23(04):1460,013
Rocha C, Dias LC, Dimas I (2013) Multicriteria classification with unknown categories: a clustering–sorting approach and an application to conflict management. J Multi-Criteria Decis Anal 20(1–2):13–27
Sawaragi Y, Nakayama H, Tanino T (1985) Theory of multiobjective optimization, vol 176. Academic Press, New York
Siddiqui ZF, Oliveira M, Gama J, Spiliopoulou M (2012) Where are we going? Predicting the evolution of individuals. In: Hollmén J, Klawonn F, Tucker A (eds) Advances in intelligent data analysis V, vol 7619. Lecture notes in computer science, Springer, Berlin, pp 357–368
Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained K-means clustering with background knowledge. In: International conference on machine learning, ICML ’01, pp 577–584
Xu T, Zhang Z, Yu PS, Long B (2012) Generative models for evolutionary clustering. ACM Trans Knowl Discov Data (TKDD) 6(2):7
Zitzler E, Laumanns M, Thiele L (2001) SPEA2: improving the strength Pareto evolutionary algorithm. In: Evolutionary methods for design, optimisation and control with applications to industrial problems, EUROGEN ’01, pp 95–100
Acknowledgments
NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Research Involving Human Participants and/or Animals
The authors declare that no part of the research presented in this manuscript involved any humans or animals.
Additional information
Responsible editors: Joao Gama, Indre Zliobaite , Alipio Jorge, Concha Bielza.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Rizoiu, MA., Velcin, J., Bonnevay, S. et al. ClusPath: a temporal-driven clustering to infer typical evolution paths. Data Min Knowl Disc 30, 1324–1349 (2016). https://doi.org/10.1007/s10618-015-0445-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0445-7