# On evolutionary subspace clustering with symbiosis

- 290 Downloads

## Abstract

Subspace clustering identifies the attribute support for each cluster as well as identifying the location and number of clusters. In the most general case, attributes associated with each cluster could be unique. A multi-objective evolutionary method is proposed to identify the unique attribute support of each cluster while detecting its data instances. The proposed algorithm, symbiotic evolutionary subspace clustering (S-ESC) borrows from ‘symbiosis’ in the sense that each clustering solution is defined in terms of a host (single member of the host population) and a number of coevolved cluster centroids (or symbionts in an independent symbiont population). Symbionts define clusters and therefore attribute subspaces, whereas hosts define sets of clusters to constitute a non-degenerate solution. The symbiotic representation of S-ESC is the key to making it scalable to high-dimensional datasets, while an integrated subsampling process makes it scalable to tasks with a large number of data items. Benchmarking is performed against a test suite of 59 subspace clustering tasks with four well known comparator algorithms from both the full-dimensional and subspace clustering literature: EM, MINECLUS, PROCLUS, STATPC. Performance of the S-ESC algorithm was found to be robust across a wide cross-section of properties with a common parameterization utilized throughout. This was not the case for the comparator algorithms. Specifically, performance could be sensitive to the particular data distribution or parameter sweeps might be necessary to provide comparable performance. An additional evaluation is performed against a non-symbiotic GA, with S-ESC still returning superior clustering solutions.

## Keywords

Subspace clustering Evolutionary multi-objective optimization Symbiosis## Notes

### Acknowledgment

The authors gratefully acknowledge support from the NSERC Discovery grant, NSERC RTI and CFI New Opportunities programs (Canada).

## References

- 1.Aggarwal CC, Wolf JL, Yu Philip S, Procopiuc Cecilia, Park Jong Soo (1999) Fast algorithms for projected clustering. In ACM SIGMOD International conference on management of data, pp 61–72. ACMGoogle Scholar
- 2.Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1988) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 27:94–105CrossRefGoogle Scholar
- 3.Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In ACM International conference on very large data bases, pp 487–499Google Scholar
- 4.Assent I, Krieger R, Steffens A, Seidl T (2006) A novel biology inspired model for evolutionary subspace clustering. In Proceedings. Annual symposium on nature inspired smart information systems (NiSIS)Google Scholar
- 5.Bacquet C, Zincir-Heywood AN, Heywood MI (2011) Genetic optimization and hierarchical clustering applied to encrypted traffic identification. In IEEE symposium on computational intelligence in cyber security, pp 194–201Google Scholar
- 6.Boudjeloud-Assala L, Blansché A (2012) Iterative evolutionary subspace clustering. In International Conference on neural information processing (ICONIP), pp 424–431. SpringerGoogle Scholar
- 7.Calcott B, Sterelny K, Szathmáry E (2001) The major transitions in evolution revisited. The Vienna series in theoretical biology. MIT Press, CambridgeGoogle Scholar
- 8.Cho H, Dhillon IS, Guan Y, Sra S (2004) Minimum sum-squared residue co-clustering of gene expression data. In SIAM International conference on data miningGoogle Scholar
- 9.Deb K, Pratap A, Agarwal S, Meyarivan TAMT (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6(2):182–197CrossRefGoogle Scholar
- 10.Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Series B (Methodological) 39(1):1–38zbMATHMathSciNetGoogle Scholar
- 11.Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the 21st international conference on Machine learning, pp 36– ACMGoogle Scholar
- 12.Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor 11(1):10–18CrossRefGoogle Scholar
- 13.Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evolut Comput 11(1):56–76CrossRefGoogle Scholar
- 14.Hruschka ER, Campello BRJG, Freitas AA, De Carvalho APLF (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst, Man, Cybern: Part C 39(2):133–155CrossRefGoogle Scholar
- 15.Jensen MT (2003) Reducing the run-time complexity of multiobjective EAs: The NSGA-II and other algorithms. IEEE Trans Evolut Comput 7(5):503–515CrossRefGoogle Scholar
- 16.Kluger Y, Basri R, Chang JT, Gerstein M (2003) Spectral bi-clustering of microarray data: co-clustering genes and conditions. Genome Res 13:703–716CrossRefGoogle Scholar
- 17.Kriegel H-P, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering and correlation clustering. ACM Trans Knowl Discov Data 3(1):1–58CrossRefGoogle Scholar
- 18.Kriegel H-P, Kröger P, Zimek A (2012) Subspace clustering. WIREs Data Mining Knowl Discov 2:351–364CrossRefGoogle Scholar
- 19.Liebovitch L, Toth T (1989) A fast algorithm to determine fractal dimensions by box counting. Phys Lett 141A(8)Google Scholar
- 20.Lu Y, Wang S, Li S, Zhou C (2011) Particle swarm optimizer for variable weighting in clustering high-dimensional data. Machine Learn 82(1):43–70CrossRefMathSciNetGoogle Scholar
- 21.Margulis L, Fester R (1991) Symbiosis as a source of evolutionary innovation. MIT Press, CambridgeGoogle Scholar
- 22.McLachlan G, Krishnan T (1997) The EM algorithm and extensions. Wiley-Interscience,zbMATHGoogle Scholar
- 23.Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In ACM International conference on knowledge discovery and data mining, pp 533–541. ACMGoogle Scholar
- 24.Moise G, Zimek A, Kröger P, Kriegel H-P, Sander J (2009) Subspace and projected clustering: experimental evaluation and analysis. Knowl Inform Syst 21:299–326CrossRefGoogle Scholar
- 25.Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. Int Conf Very Large Data Bases 2:1270–1281Google Scholar
- 26.Nourashrafeddin S, Arnold D, Milios E (2012) An evolutionary subspace clustering algorithm for high-dimensional data. In Proceedings of the ACM genetic and evolutionary computation conference companion, pp 1497–1498Google Scholar
- 27.Okasha S (2005) Multilevel selection and the major transitions in evolution. Philos Sci 72:1013–1025CrossRefGoogle Scholar
- 28.Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6:90–105CrossRefGoogle Scholar
- 29.Patrikainen A, Meila M (2006) Comparing subspace clusterings. IEEE Trans Knowl Data Eng 18:902–916CrossRefGoogle Scholar
- 30.Pelleg D, Moore AW et al (2000) X-means: extending k-means with efficient estimation of the number of clusters. In International conference on machine learning, pp 727–734Google Scholar
- 31.Procopiuc CM, Jones M, Agarwal PK, Murali TM (2002) A monte carlo algorithm for fast projective clustering. In ACM International conference on management of data, SIGMOD ’02, pages 418–427, New York, NY, USA, 2002. ACMGoogle Scholar
- 32.Queller DC (2000) Relatedness and the fracternal major transitions. Philos Trans R Soc Lond B 355:1647–1655CrossRefGoogle Scholar
- 33.Rachmawati L, Srinivasan D (2009) Multiobjective evolutionary algorithm with controllable focus on the knees of the pareto front. IEEE Trans Evolut Comput 13(4):810–824Google Scholar
- 34.Sarafis IA, Trinder PW, Zalzala AMS (2003) Towards effective subspace clustering with an evolutionary algorithm. In IEEE Congress on Evolutionary Computation, pp 797–806Google Scholar
- 35.Schütze O, Laumanns M, Coello CAC (2008) Approximating the knee of an MOP with stochastic search algorithms. In Parallel problem solving from nature, volume 5199 of
*LNCS*, pp 795–804Google Scholar - 36.Sim K, Gopalkrishnan V, Zimek A, Cong G (2012) A survey on enhanced subspace clustering. Data Mining Knowl Discov 26:332–397CrossRefMathSciNetGoogle Scholar
- 37.Vahdat A, Heywood MI, Zincir-Heywood AN (2010) bottom–up evolutionary subspace clustering. In IEEE Congress on Evolutionary Computation, pp 1371–1378Google Scholar
- 38.Vahdat A, Heywood MI, Zincir-Heywood AN (2012) Symbiotic evolutionary subspace clustering. In IEEE Congress on Evolutionary Computation, pp 2724–2731Google Scholar
- 39.Vaidya PM (1989) Ano (n logn) algorithm for the all-nearest-neighbors problem. Discret Comput Geometr 4(1):101–115CrossRefzbMATHGoogle Scholar
- 40.Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, 2 ednGoogle Scholar
- 41.Wu SX, Banzhaf W (2011) A hierarchical cooperative evolutionary algorithm. In ACM Genetic and Evolutionary Computation Conference, pp 233–240Google Scholar
- 42.Yiu ML, Mamoulis N (2003) Frequent-pattern based iterative projected clustering. IEEE International Conference on Data Mining, page 689Google Scholar
- 43.Zhu L, Cao L, Yang J (2012) Multiobjective evolutionary algorithm-based soft subspace clustering. In Evolutionary Computation (CEC), 2012 IEEE Congress on, pp 2732–2739Google Scholar