Aggregation of multi-objective fuzzy symmetry-based clustering techniques for improving gene and cancer classification
The current work reports about the application of a cluster ensemble approach in combining results produced by some multiobjective-based clustering techniques. Firstly, some multiobjective-based fuzzy clustering techniques are developed using the search capabilities of differential evolution and particle swarm optimization. Both these clustering techniques utilize a recently developed point symmetry-based distance for allocation of points to different clusters. The appropriate partitioning from a data set is identified by optimizing simultaneously two cluster quality measures, namely Xie–Beni index and FSym-index. First objective function uses Euclidean distance as a similarity measure, and the second objective function uses point symmetry-based distance in its computation. A set of trade-off solutions are produced by each of these clustering techniques on the final Pareto optimal front. Finally, this set of solutions are combined using a link-based cluster ensemble technique. The effectiveness of ensemble techniques is illustrated on partitioning some real-life gene expression and cancer data sets where automatic identification of set of genes or set of cancer tissues is a pressing issue. The potency of the ensemble techniques applied on both the multi-objective DE- and PSO-based clustering approaches is shown in comparison with several state-of-the-art techniques.
KeywordsUnsupervised classification Cluster ensemble Multi-objective particle swarm optimization Multi-objective differential evolution Symmetry Gene expression data Cancer data classification
Authors would like to acknowledge the help from Indian Institute of Technology Patna and National Institute of Technology Mizoram to conduct this research.
Compliance with ethical standards
Conflict of interest
All the authors declare that they do not have any conflict of interest.
Human and animal rights
We have not performed any experiments which involve animals or humans.
- Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson JJ, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511CrossRefGoogle Scholar
- Bakhshali M (2017) Segmentation and enhancement of brain mr images using fuzzy clustering based on information theory. Soft Comput. https://doi.org/10.1007/s00500-016-2210-2
- Bandyopadhyay S, Maulik U, Wang JT (eds) (2007a) Analysis of biological data: a soft computing approach. Volume 3 of science, engineering, and biology informatics. World Scientific, SingaporeGoogle Scholar
- Das S, Konar A, Chakraborty UK (2005) Two improved differential evolution schemes for faster global search. In: Genetic and evolutionary computation conference, GECCO 2005, proceedings, Washington DC, USA, June 25–29, pp 991–998Google Scholar
- Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Machine learning, proceedings of the twenty-first international conference (ICML 2004), Banff, Alberta, Canada, July 4–8Google Scholar
- Iam-on N, Boongoen T, Garrett SM (2008) Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Discovery science, 11th international conference, DS 2008, Budapest, Hungary, October 13–16, 2008. Proceedings, pp 222–233Google Scholar
- Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, pp 1942–1948Google Scholar
- Klink S, Reuther P, Weber A, Walter B, Ley M (2006) Analysing social networks within bibliographical data. In: Database and expert systems applications, 17th international conference, DEXA 2006, Kraków, Poland, September 4–8, 2006, Proceedings, pp 234–243Google Scholar
- Maulik U, Mukhopadhyay A, Bandyopadhyay S (2009) Combining pareto-optimal clusters using supervised learning for identifying co-expressed genes. BMC Bioinform 10(27):1197–1208Google Scholar
- Nemenyi P (1963) Distribution-free multiple comparisons. Ph.D. thesis, New Jersey, USAGoogle Scholar
- Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65Google Scholar
- Saha S (2017) Enhancing point symmetry-based distance for data clustering. Soft Comput. https://doi.org/10.1007/s00500-016-2477-3
- Sharan R, Shamir R (2000) Center CLICK: a clustering algorithm with applications to gene expression analysis. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, August 19–23, 2000, La Jolla/San Diego, CA, USA, pp 307–316Google Scholar
- Yin C, Xia L, Zhang S et al (2017) Improved clustering algorithm based on high-speed network data stream. Soft Comput. https://doi.org/10.1007/s00500-017-2708-2
- Zhou Z, Zhu S (2017) Kernel-based multiobjective clustering algorithm with automatic attribute weighting. Soft Comput. https://doi.org/10.1007/s00500-017-2590-y