Data Mining and Knowledge Discovery

, Volume 30, Issue 5, pp 1324–1349 | Cite as

ClusPath: a temporal-driven clustering to infer typical evolution paths

  • Marian-Andrei Rizoiu
  • Julien Velcin
  • Stéphane Bonnevay
  • Stéphane Lallich
Article

Abstract

We propose ClusPath, a novel algorithm for detecting general evolution tendencies in a population of entities. We show how abstract notions, such as the Swedish socio-economical model (in a political dataset) or the companies fiscal optimization (in an economical dataset) can be inferred from low-level descriptive features. Such high-level regularities in the evolution of entities are detected by combining spatial and temporal features into a spatio-temporal dissimilarity measure and using semi-supervised clustering techniques. The relations between the evolution phases are modeled using a graph structure, inferred simultaneously with the partition, by using a “slow changing world” assumption. The idea is to ensure a smooth passage for entities along their evolution paths, which catches the long-term trends in the dataset. Additionally, we also provide a method, based on an evolutionary algorithm, to tune the parameters of ClusPath to new, unseen datasets. This method assesses the fitness of a solution using four opposed quality measures and proposes a balanced compromise.

Keywords

Detection of long-term trends Evolutionary clustering  Temporal clustering Temporal cluster graph Semi-supervised clustering Pareto front estimation 

Notes

Acknowledgments

NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program.

Compliance with Ethical Standards

Conflict of Interest

The authors declare that they have no conflict of interest.

Research Involving Human Participants and/or Animals

The authors declare that no part of the research presented in this manuscript involved any humans or animals.

Supplementary material

10618_2015_445_MOESM1_ESM.pdf (264 kb)
Supplementary material 1 (pdf 264 KB)

References

  1. Araujo R, Kamel MS (2014) Semi-supervised kernel-based temporal clustering. In: International conference on machine learning and applications, IEEE, ICMLA ’14, pp 123–128Google Scholar
  2. Armingeon K., Isler C, Laura Knöpfel DW, Engler S (2011) Comparative political data set 1960–2009. University of BerneGoogle Scholar
  3. Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: International conference on knowledge discovery and data mining, ACM, SIGKDD ’06, pp 554–560Google Scholar
  4. Chi Y, Song X, Zhou D, Hino K, Tseng BL (2007) Evolutionary Spectral Clustering by Incorporating Temporal Smoothness. In: International Conference on Knowledge Discovery and Data Mining (KDD), San Jose, USA, pp 153–162Google Scholar
  5. De la Torre F, Agell C (2007) Multimodal diaries. In: Multimedia and expo, IEEE, pp 839–842Google Scholar
  6. De Smet Y, Eppe S (2009) Multicriteria relational clustering: the case of binary outranking matrices. Evol Multi-Criterion Optim 5467:380–392CrossRefGoogle Scholar
  7. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. Evol Comput 6(2):182–197CrossRefGoogle Scholar
  8. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57MathSciNetCrossRefMATHGoogle Scholar
  9. Erixon L (2000) A Swedish economic policy: the theory, application and validity of the Rehn–Meidner model. Tech. rep., Department of Economics, Stockholm UniversityGoogle Scholar
  10. Gaffney S, Smyth P (1999) Trajectory clustering with mixtures of regression models. In: International conference on knowledge discovery and data mining, ACM Press, New York, USA, SIGKDD ’99, pp 63–72. doi:10.1145/312129.312198
  11. Halsall-Whitney H, Thibault J (2006) Multi-objective optimization for chemical processes and controller design: approximating and classifying the pareto domain. Comput Chem Eng 30(6–7):1155–1168CrossRefGoogle Scholar
  12. Kafafy A, Bounekkar A, Bonnevay S (2011) A hybrid evolutionary metaheuristics (HEMH) applied on 0/1 multiobjective knapsack problems. In: Genetic and evolutionary computation, ACM Press, New York, USA, GECCO ’11, p 497Google Scholar
  13. Kalnis P, Mamoulis N, Bakiras S (2005) On discovering moving clusters in spatio-temporal data, chap. 21. In: Bauzer Medeiros C, Egenhofer M, Bertino E (eds) Advances in spatial and temporal databases, vol 3633., Lecture notes in computer science, Springer, Berlin, pp 364–381Google Scholar
  14. Liang Z, Tomioka R, Murata H, Asaoka R, Yamanishi K (2013) Quantitative prediction of glaucomatous visual field loss from few measurements. In: International conference on data mining, ICDM ’13, pp 1121–1126Google Scholar
  15. Lin WH, Hauptmann A (2006) Structuring continuous video recordings of everyday life using time-constrained clustering. In: Chang EY, Hanjalic A, Sebe N (eds) Multimedia content analysis, management, and retrieval, pp 60730D–60730D-9Google Scholar
  16. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297Google Scholar
  17. Mihăiţă AS, Camargo M, Lhoste P (2014) Optimization of a complex urban intersection using discrete event simulation and evolutionary algorithms. In: International federation of automatic control, IFAC’14, vol 19, pp 8768–8774Google Scholar
  18. Rizoiu MA, Velcin J, Lallich S (2012) Structuring typical evolutions using temporal-driven constrained clustering. In: International conference on tools with artificial intelligence, ICTAI ’12, vol 1, IEEE, Athens, Greece, pp 610–617Google Scholar
  19. Rizoiu MA, Velcin J, Lallich S (2014) How to use temporal-driven constrained clustering to detect typical evolutions. Int J Artif Intell Tools 23(04):1460,013CrossRefGoogle Scholar
  20. Rocha C, Dias LC, Dimas I (2013) Multicriteria classification with unknown categories: a clustering–sorting approach and an application to conflict management. J Multi-Criteria Decis Anal 20(1–2):13–27CrossRefGoogle Scholar
  21. Sawaragi Y, Nakayama H, Tanino T (1985) Theory of multiobjective optimization, vol 176. Academic Press, New YorkMATHGoogle Scholar
  22. Siddiqui ZF, Oliveira M, Gama J, Spiliopoulou M (2012) Where are we going? Predicting the evolution of individuals. In: Hollmén J, Klawonn F, Tucker A (eds) Advances in intelligent data analysis V, vol 7619. Lecture notes in computer science, Springer, Berlin, pp 357–368Google Scholar
  23. Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained K-means clustering with background knowledge. In: International conference on machine learning, ICML ’01, pp 577–584Google Scholar
  24. Xu T, Zhang Z, Yu PS, Long B (2012) Generative models for evolutionary clustering. ACM Trans Knowl Discov Data (TKDD) 6(2):7Google Scholar
  25. Zitzler E, Laumanns M, Thiele L (2001) SPEA2: improving the strength Pareto evolutionary algorithm. In: Evolutionary methods for design, optimisation and control with applications to industrial problems, EUROGEN ’01, pp 95–100Google Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  • Marian-Andrei Rizoiu
    • 1
  • Julien Velcin
    • 2
  • Stéphane Bonnevay
    • 2
  • Stéphane Lallich
    • 2
  1. 1.NICTA & Australian National UniversityCanberraAustralia
  2. 2.ERIC LaboratoryUniversité de LyonLyonFrance

Personalised recommendations