A new preference disaggregation method for clustering problem: DISclustering
- 29 Downloads
Clustering, a famous technique in data analysis and data mining, attempts to find valuable patterns in datasets. In this technique, a set of alternatives is partitioned into logical groups which are called clusters. The partitioning is based on some predefined attributes to find clusters in which their alternatives are similar to each other comparing to other clusters. In conventional methods, the similarity is usually defined by a distance-based measurement, whereas in this study, we have proposed a new multi-attribute preference disaggregation method called DISclustering in which a new measurement named global utility is introduced for cluster similarity. In DISclustering, the global utility of each alternative is calculated through a feed-forward neural network in which its parameters are determined using SA algorithm. Each alternative is assigned to a cluster based on comparing the obtained global utility with cluster boundaries, called utility thresholds; aim to minimize the intra-cluster distances (ICD). For this purpose, all utility thresholds are estimated using PSO algorithm. The performance of the proposed method is compared with 18 clustering algorithms on 14 real datasets based on F-measure and object function values (ICD values using intra-cluster or Gower distances). The experimental results and hypothesis statistical test indicate that DISclustering algorithm significantly improved clustering results on F-measure criteria in which outperforms in almost 13 compared algorithms out of 18. Note that, DISclustering calculates cluster centroid in a different way comparing to other algorithms. Hence, its ICD values are less eligible to perform a fair comparison.
KeywordsClustering Particle swarm optimization (PSO) Simulated annealing (SA) Feed-forward neural network (FFNN) Multi-attribute preference disaggregation
The authors would like to thank referees for their helpful comments.
Compliance with ethical standards
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
This article does not contain any studies with human participants or animals performed by the author.
Informed consent was obtained from all individual participants included in the study.
- Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19Google Scholar
- Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th international conference on computer science and information technology (CSIT), 2016, pp 1–6Google Scholar
- Abualigah LM, Khader AT, Al-Betar MA, Hanandeh ES (2017c) A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. Management 9(11)Google Scholar
- Abualigah LM, Khader AT, Hanandeh ES (2018a) A novel weighting scheme applied to improve the text document clustering techniques. In: Innovative computing, optimization and its applications. Springer, pp 305–320Google Scholar
- Abualigah LM, Khader AT, Hanandeh ES (2018e) A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering? Intell Decis Technol 1–12 (preprint)Google Scholar
- Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications, no. 2, vol 27. ACM, New YorkGoogle Scholar
- Chatterjee GSS, Zhang A (1998) WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: VLDB’98 proceedings of the 24rd international conference on very large data bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998, pp 428–439Google Scholar
- Dalli A (2003) Adaptation of the F-measure to cluster based lexicon quality evaluation. In: Proceedings of the EACL 2003 workshop on evaluation initiatives in natural language processing: are evaluation methods, metrics and resources reusable?, 2003, pp 51–56Google Scholar
- Devaud JM, Groussaud G, Jacquet-Lagreze E (1980) UTADIS: Une méthode de construction de fonctions d’utilité additives rendant compte de jugements globaux. European Working Group Multicriteria Decision Aid, Bochum, p 94Google Scholar
- Esmaelian M, Shahmoradi H, Nemati F (2017) P-UTADIS: a multi criteria classification method. In: Nassiri-Mofakham F (ed) Current and future developments in artificial intelligence. Bentham Science Publishers, Sharjah, pp 213–266Google Scholar
- Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96(34):226–231Google Scholar
- Handl J, Knowles J, Dorigo M (2003) Ant-based clustering: a comparative study of its relative performance with respect to k-means, average link and id-som. In: Proceedings of the third international conference on hybrid intelligent systems. IOS PressGoogle Scholar
- Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. KDD 98:58–65Google Scholar
- Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: 26th International conference on very large databases, 2000, pp 506–515Google Scholar
- Jacquet-Lagrèze E (1995) An application of the UTA discriminant model for the evaluation of R & D projects. In: Advances in multicriteria analysis. Springer, pp 203–211Google Scholar
- MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967, vol 1, no 14, pp 281–297Google Scholar
- Rokach L, Maimon O (2005) Clustering methods. In: Data mining and knowledge discovery handbook. Springer, pp 321–352Google Scholar
- Schikuta E (1996) Grid-clustering: an efficient hierarchical clustering method for very large data sets. In: Proceedings of 13th international conference on pattern recognition, 1996, vol 2, pp 101–105Google Scholar
- Shi Y (2001) Particle swarm optimization: developments, applications and resources. In: Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), 2001, vol 1, pp 81–86Google Scholar
- Taguchi G (1990) Introduction to quality engineering, Tokyo. Asian Product OrganGoogle Scholar
- Van Laarhoven PJM, Aarts EHL (1987) Simulated annealing. In: Simulated annealing: theory and applications. Springer, pp 7–15Google Scholar
- Walpole RE (1982) Introduction to statisticsGoogle Scholar
- Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. VLDB 97:186–195Google Scholar