Evolutionary Intelligence

, Volume 6, Issue 4, pp 229–256 | Cite as

On evolutionary subspace clustering with symbiosis

Research Paper
  • 290 Downloads

Abstract

Subspace clustering identifies the attribute support for each cluster as well as identifying the location and number of clusters. In the most general case, attributes associated with each cluster could be unique. A multi-objective evolutionary method is proposed to identify the unique attribute support of each cluster while detecting its data instances. The proposed algorithm, symbiotic evolutionary subspace clustering (S-ESC) borrows from ‘symbiosis’ in the sense that each clustering solution is defined in terms of a host (single member of the host population) and a number of coevolved cluster centroids (or symbionts in an independent symbiont population). Symbionts define clusters and therefore attribute subspaces, whereas hosts define sets of clusters to constitute a non-degenerate solution. The symbiotic representation of S-ESC is the key to making it scalable to high-dimensional datasets, while an integrated subsampling process makes it scalable to tasks with a large number of data items. Benchmarking is performed against a test suite of 59 subspace clustering tasks with four well known comparator algorithms from both the full-dimensional and subspace clustering literature: EM, MINECLUS, PROCLUS, STATPC. Performance of the S-ESC algorithm was found to be robust across a wide cross-section of properties with a common parameterization utilized throughout. This was not the case for the comparator algorithms. Specifically, performance could be sensitive to the particular data distribution or parameter sweeps might be necessary to provide comparable performance. An additional evaluation is performed against a non-symbiotic GA, with S-ESC still returning superior clustering solutions.

Keywords

Subspace clustering Evolutionary multi-objective optimization Symbiosis 

Notes

Acknowledgment

The authors gratefully acknowledge support from the NSERC Discovery grant, NSERC RTI and CFI New Opportunities programs (Canada).

References

  1. 1.
    Aggarwal CC, Wolf JL, Yu Philip S, Procopiuc Cecilia, Park Jong Soo (1999) Fast algorithms for projected clustering. In ACM SIGMOD International conference on management of data, pp 61–72. ACMGoogle Scholar
  2. 2.
    Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1988) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 27:94–105CrossRefGoogle Scholar
  3. 3.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In ACM International conference on very large data bases, pp 487–499Google Scholar
  4. 4.
    Assent I, Krieger R, Steffens A, Seidl T (2006) A novel biology inspired model for evolutionary subspace clustering. In Proceedings. Annual symposium on nature inspired smart information systems (NiSIS)Google Scholar
  5. 5.
    Bacquet C, Zincir-Heywood AN, Heywood MI (2011) Genetic optimization and hierarchical clustering applied to encrypted traffic identification. In IEEE symposium on computational intelligence in cyber security, pp 194–201Google Scholar
  6. 6.
    Boudjeloud-Assala L, Blansché A (2012) Iterative evolutionary subspace clustering. In International Conference on neural information processing (ICONIP), pp 424–431. SpringerGoogle Scholar
  7. 7.
    Calcott B, Sterelny K, Szathmáry E (2001) The major transitions in evolution revisited. The Vienna series in theoretical biology. MIT Press, CambridgeGoogle Scholar
  8. 8.
    Cho H, Dhillon IS, Guan Y, Sra S (2004) Minimum sum-squared residue co-clustering of gene expression data. In SIAM International conference on data miningGoogle Scholar
  9. 9.
    Deb K, Pratap A, Agarwal S, Meyarivan TAMT (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6(2):182–197CrossRefGoogle Scholar
  10. 10.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Series B (Methodological) 39(1):1–38MATHMathSciNetGoogle Scholar
  11. 11.
    Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the 21st international conference on Machine learning, pp 36– ACMGoogle Scholar
  12. 12.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor 11(1):10–18CrossRefGoogle Scholar
  13. 13.
    Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evolut Comput 11(1):56–76CrossRefGoogle Scholar
  14. 14.
    Hruschka ER, Campello BRJG, Freitas AA, De Carvalho APLF (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst, Man, Cybern: Part C 39(2):133–155CrossRefGoogle Scholar
  15. 15.
    Jensen MT (2003) Reducing the run-time complexity of multiobjective EAs: The NSGA-II and other algorithms. IEEE Trans Evolut Comput 7(5):503–515CrossRefGoogle Scholar
  16. 16.
    Kluger Y, Basri R, Chang JT, Gerstein M (2003) Spectral bi-clustering of microarray data: co-clustering genes and conditions. Genome Res 13:703–716CrossRefGoogle Scholar
  17. 17.
    Kriegel H-P, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering and correlation clustering. ACM Trans Knowl Discov Data 3(1):1–58CrossRefGoogle Scholar
  18. 18.
    Kriegel H-P, Kröger P, Zimek A (2012) Subspace clustering. WIREs Data Mining Knowl Discov 2:351–364CrossRefGoogle Scholar
  19. 19.
    Liebovitch L, Toth T (1989) A fast algorithm to determine fractal dimensions by box counting. Phys Lett 141A(8)Google Scholar
  20. 20.
    Lu Y, Wang S, Li S, Zhou C (2011) Particle swarm optimizer for variable weighting in clustering high-dimensional data. Machine Learn 82(1):43–70CrossRefMathSciNetGoogle Scholar
  21. 21.
    Margulis L, Fester R (1991) Symbiosis as a source of evolutionary innovation. MIT Press, CambridgeGoogle Scholar
  22. 22.
    McLachlan G, Krishnan T (1997) The EM algorithm and extensions. Wiley-Interscience,MATHGoogle Scholar
  23. 23.
    Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In ACM International conference on knowledge discovery and data mining, pp 533–541. ACMGoogle Scholar
  24. 24.
    Moise G, Zimek A, Kröger P, Kriegel H-P, Sander J (2009) Subspace and projected clustering: experimental evaluation and analysis. Knowl Inform Syst 21:299–326CrossRefGoogle Scholar
  25. 25.
    Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. Int Conf Very Large Data Bases 2:1270–1281Google Scholar
  26. 26.
    Nourashrafeddin S, Arnold D, Milios E (2012) An evolutionary subspace clustering algorithm for high-dimensional data. In Proceedings of the ACM genetic and evolutionary computation conference companion, pp 1497–1498Google Scholar
  27. 27.
    Okasha S (2005) Multilevel selection and the major transitions in evolution. Philos Sci 72:1013–1025CrossRefGoogle Scholar
  28. 28.
    Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6:90–105CrossRefGoogle Scholar
  29. 29.
    Patrikainen A, Meila M (2006) Comparing subspace clusterings. IEEE Trans Knowl Data Eng 18:902–916CrossRefGoogle Scholar
  30. 30.
    Pelleg D, Moore AW et al (2000) X-means: extending k-means with efficient estimation of the number of clusters. In International conference on machine learning, pp 727–734Google Scholar
  31. 31.
    Procopiuc CM, Jones M, Agarwal PK, Murali TM (2002) A monte carlo algorithm for fast projective clustering. In ACM International conference on management of data, SIGMOD ’02, pages 418–427, New York, NY, USA, 2002. ACMGoogle Scholar
  32. 32.
    Queller DC (2000) Relatedness and the fracternal major transitions. Philos Trans R Soc Lond B 355:1647–1655CrossRefGoogle Scholar
  33. 33.
    Rachmawati L, Srinivasan D (2009) Multiobjective evolutionary algorithm with controllable focus on the knees of the pareto front. IEEE Trans Evolut Comput 13(4):810–824Google Scholar
  34. 34.
    Sarafis IA, Trinder PW, Zalzala AMS (2003) Towards effective subspace clustering with an evolutionary algorithm. In IEEE Congress on Evolutionary Computation, pp 797–806Google Scholar
  35. 35.
    Schütze O, Laumanns M, Coello CAC (2008) Approximating the knee of an MOP with stochastic search algorithms. In Parallel problem solving from nature, volume 5199 of LNCS, pp 795–804Google Scholar
  36. 36.
    Sim K, Gopalkrishnan V, Zimek A, Cong G (2012) A survey on enhanced subspace clustering. Data Mining Knowl Discov 26:332–397CrossRefMathSciNetGoogle Scholar
  37. 37.
    Vahdat A, Heywood MI, Zincir-Heywood AN (2010) bottom–up evolutionary subspace clustering. In IEEE Congress on Evolutionary Computation, pp 1371–1378Google Scholar
  38. 38.
    Vahdat A, Heywood MI, Zincir-Heywood AN (2012) Symbiotic evolutionary subspace clustering. In IEEE Congress on Evolutionary Computation, pp 2724–2731Google Scholar
  39. 39.
    Vaidya PM (1989) Ano (n logn) algorithm for the all-nearest-neighbors problem. Discret Comput Geometr 4(1):101–115CrossRefMATHGoogle Scholar
  40. 40.
    Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, 2 ednGoogle Scholar
  41. 41.
    Wu SX, Banzhaf W (2011) A hierarchical cooperative evolutionary algorithm. In ACM Genetic and Evolutionary Computation Conference, pp 233–240Google Scholar
  42. 42.
    Yiu ML, Mamoulis N (2003) Frequent-pattern based iterative projected clustering. IEEE International Conference on Data Mining, page 689Google Scholar
  43. 43.
    Zhu L, Cao L, Yang J (2012) Multiobjective evolutionary algorithm-based soft subspace clustering. In Evolutionary Computation (CEC), 2012 IEEE Congress on, pp 2732–2739Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Faculty of Computer ScienceDalhousie UniversityHalifaxCanada

Personalised recommendations