Advertisement

A new preference disaggregation method for clustering problem: DISclustering

  • Majid EsmaelianEmail author
  • Hadi Shahmoradi
  • Fateme Nemati
Methodologies and Application
  • 29 Downloads

Abstract

Clustering, a famous technique in data analysis and data mining, attempts to find valuable patterns in datasets. In this technique, a set of alternatives is partitioned into logical groups which are called clusters. The partitioning is based on some predefined attributes to find clusters in which their alternatives are similar to each other comparing to other clusters. In conventional methods, the similarity is usually defined by a distance-based measurement, whereas in this study, we have proposed a new multi-attribute preference disaggregation method called DISclustering in which a new measurement named global utility is introduced for cluster similarity. In DISclustering, the global utility of each alternative is calculated through a feed-forward neural network in which its parameters are determined using SA algorithm. Each alternative is assigned to a cluster based on comparing the obtained global utility with cluster boundaries, called utility thresholds; aim to minimize the intra-cluster distances (ICD). For this purpose, all utility thresholds are estimated using PSO algorithm. The performance of the proposed method is compared with 18 clustering algorithms on 14 real datasets based on F-measure and object function values (ICD values using intra-cluster or Gower distances). The experimental results and hypothesis statistical test indicate that DISclustering algorithm significantly improved clustering results on F-measure criteria in which outperforms in almost 13 compared algorithms out of 18. Note that, DISclustering calculates cluster centroid in a different way comparing to other algorithms. Hence, its ICD values are less eligible to perform a fair comparison.

Keywords

Clustering Particle swarm optimization (PSO) Simulated annealing (SA) Feed-forward neural network (FFNN) Multi-attribute preference disaggregation 

Notes

Acknowledgements

The authors would like to thank referees for their helpful comments.

Compliance with ethical standards

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by the author.

Informed consent

Informed consent was obtained from all individual participants included in the study.

References

  1. Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19Google Scholar
  2. Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795CrossRefGoogle Scholar
  3. Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th international conference on computer science and information technology (CSIT), 2016, pp 1–6Google Scholar
  4. Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017a) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 84:24–36CrossRefGoogle Scholar
  5. Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH (2017b) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435CrossRefGoogle Scholar
  6. Abualigah LM, Khader AT, Al-Betar MA, Hanandeh ES (2017c) A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. Management 9(11)Google Scholar
  7. Abualigah LM, Khader AT, Hanandeh ES (2018a) A novel weighting scheme applied to improve the text document clustering techniques. In: Innovative computing, optimization and its applications. Springer, pp 305–320Google Scholar
  8. Abualigah LM, Khader AT, Hanandeh ES (2018b) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466CrossRefGoogle Scholar
  9. Abualigah LM, Khader AT, Hanandeh ES (2018c) A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell 73:111–125CrossRefGoogle Scholar
  10. Abualigah LM, Khader AT, Hanandeh ES (2018d) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071CrossRefGoogle Scholar
  11. Abualigah LM, Khader AT, Hanandeh ES (2018e) A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering? Intell Decis Technol 1–12 (preprint)Google Scholar
  12. Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications, vol 2. Chapman and Hall, Boca RatonzbMATHCrossRefGoogle Scholar
  13. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications, no. 2, vol 27. ACM, New YorkGoogle Scholar
  14. Bishop CM (2006) Pattern recognition and machine learning. Springer, BerlinzbMATHGoogle Scholar
  15. Chatterjee GSS, Zhang A (1998) WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: VLDB’98 proceedings of the 24rd international conference on very large data bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998, pp 428–439Google Scholar
  16. Clerc M, Kennedy J (2002) The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans Evol Comput 6(1):58–73CrossRefGoogle Scholar
  17. Dalli A (2003) Adaptation of the F-measure to cluster based lexicon quality evaluation. In: Proceedings of the EACL 2003 workshop on evaluation initiatives in natural language processing: are evaluation methods, metrics and resources reusable?, 2003, pp 51–56Google Scholar
  18. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–22MathSciNetzbMATHGoogle Scholar
  19. Devaud JM, Groussaud G, Jacquet-Lagreze E (1980) UTADIS: Une méthode de construction de fonctions d’utilité additives rendant compte de jugements globaux. European Working Group Multicriteria Decision Aid, Bochum, p 94Google Scholar
  20. Esmaelian M, Shahmoradi H, Vali M (2016) A novel classification method: a hybrid approach based on extension of the UTADIS with polynomial and PSO-GA algorithm. Appl Soft Comput 49:56–70CrossRefGoogle Scholar
  21. Esmaelian M, Shahmoradi H, Nemati F (2017) P-UTADIS: a multi criteria classification method. In: Nassiri-Mofakham F (ed) Current and future developments in artificial intelligence. Bentham Science Publishers, Sharjah, pp 213–266Google Scholar
  22. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96(34):226–231Google Scholar
  23. Fan C-Y, Fan P-S, Chan T-Y, Chang S-H (2012) Using hybrid data mining and machine learning clustering analysis to predict the turnover rate for technology professionals. Expert Syst Appl 39(10):8844–8851CrossRefGoogle Scholar
  24. Figueira J, Greco S, Ehrgott M (2005) Multiple criteria decision analysis: state of the art surveys, vol 78. Springer, BerlinzbMATHCrossRefGoogle Scholar
  25. Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857–871CrossRefGoogle Scholar
  26. Grigoras G, Scarlatache F (2015) An assessment of the renewable energy potential using a clustering based data mining method. Case study in Romania. Energy 81:416–429CrossRefGoogle Scholar
  27. Handl J, Knowles J, Dorigo M (2003) Ant-based clustering: a comparative study of its relative performance with respect to k-means, average link and id-som. In: Proceedings of the third international conference on hybrid intelligent systems. IOS PressGoogle Scholar
  28. Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. KDD 98:58–65Google Scholar
  29. Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: 26th International conference on very large databases, 2000, pp 506–515Google Scholar
  30. Hu G, Zhou S, Guan J, Hu X (2008) Towards effective document clustering: a constrained K-means based approach. Inf. Process. Manag. 44(4):1397–1409CrossRefGoogle Scholar
  31. Huang G, Liu T, Yang Y, Lin Z, Song S, Wu C (2015) Discriminative clustering via extreme learning machine. Neural Netw 70:1–8zbMATHCrossRefGoogle Scholar
  32. Iván G, Grolmusz V (2014) On dimension reduction of clustering results in structural bioinformatics. Biochim Biophys Acta (BBA)-Proteins Proteom 1844(12):2277–2283CrossRefGoogle Scholar
  33. Jacquet-Lagrèze E (1995) An application of the UTA discriminant model for the evaluation of R & D projects. In: Advances in multicriteria analysis. Springer, pp 203–211Google Scholar
  34. Jacquet-Lagreze E, Siskos J (1982) Assessing a set of additive utility functions for multicriteria decision-making, the UTA method. Eur J Oper Res 10(2):151–164zbMATHCrossRefGoogle Scholar
  35. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRefGoogle Scholar
  36. Kargari M, Sepehri MM (2012) Stores clustering using a data mining approach for distributing automotive spare-parts to reduce transportation costs. Expert Syst Appl 39(5):4740–4748CrossRefGoogle Scholar
  37. Kerr G, Ruskin HJ, Crane M, Doolan P (2008) Techniques for clustering gene expression data. Comput Biol Med 38(3):283–293CrossRefGoogle Scholar
  38. King B (1967) Step-wise clustering procedures. J Am Stat Assoc 62(317):86–101CrossRefGoogle Scholar
  39. Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404CrossRefGoogle Scholar
  40. Liu D, Jiang M, Yang X, Li H (2016) Analyzing documents with quantum clustering: a novel pattern recognition algorithm based on quantum mechanics. Pattern Recognit. Lett. 77:8–13CrossRefGoogle Scholar
  41. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137MathSciNetzbMATHCrossRefGoogle Scholar
  42. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967, vol 1, no 14, pp 281–297Google Scholar
  43. McQuitty LL (1957) Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educ Psychol Meas 17(2):207–229CrossRefGoogle Scholar
  44. Melin P, Castillo O (2014) A review on type-2 fuzzy logic applications in clustering, classification and pattern recognition. Appl Soft Comput 21:568–577CrossRefGoogle Scholar
  45. Mirkin B (2012) Clustering: a data recovery approach, vol 19. Chapman and Hall, Boca RatonzbMATHGoogle Scholar
  46. Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag 13(6):47–60CrossRefGoogle Scholar
  47. Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, CambridgezbMATHGoogle Scholar
  48. Peng Y, Zheng W-L, Lu B-L (2016) An unsupervised discriminative extreme learning machine and its applications to data clustering. Neurocomputing 174:250–264CrossRefGoogle Scholar
  49. Rokach L, Maimon O (2005) Clustering methods. In: Data mining and knowledge discovery handbook. Springer, pp 321–352Google Scholar
  50. Schikuta E (1996) Grid-clustering: an efficient hierarchical clustering method for very large data sets. In: Proceedings of 13th international conference on pattern recognition, 1996, vol 2, pp 101–105Google Scholar
  51. Shi Y (2001) Particle swarm optimization: developments, applications and resources. In: Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), 2001, vol 1, pp 81–86Google Scholar
  52. Taguchi G (1990) Introduction to quality engineering, Tokyo. Asian Product OrganGoogle Scholar
  53. Van Laarhoven PJM, Aarts EHL (1987) Simulated annealing. In: Simulated annealing: theory and applications. Springer, pp 7–15Google Scholar
  54. Walpole RE (1982) Introduction to statisticsGoogle Scholar
  55. Walpole RE, Myers RH, Myers SL, Ye K (2011) Probability and statistics for engineers and scientists, 9th edn. Pearson, LondonzbMATHGoogle Scholar
  56. Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. VLDB 97:186–195Google Scholar
  57. Wangchamhan T, Chiewchanwattana S, Sunat K (2017) Efficient algorithms based on the k-means and chaotic league championship algorithm for numeric, categorical, and mixed-type data clustering. Expert Syst Appl 90:146–167CrossRefGoogle Scholar
  58. Warnekar CS, Krishna G (1979) A heuristic clustering algorithm using union of overlapping pattern-cells. Pattern Recognit 11(2):85–93zbMATHCrossRefGoogle Scholar
  59. Zahn CT (1970) Graph theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 20(SLAC-PUB-0672-REV):68zbMATHGoogle Scholar
  60. Zell A (1994) Simulation neuronaler netze, no. 5.3, vol 1. Addison-Wesley, BonnzbMATHGoogle Scholar
  61. Zhao L, Yang Y (2009) PSO-based single multiplicative neuron model for time series prediction. Expert Syst Appl 36(2):2805–2812MathSciNetCrossRefGoogle Scholar
  62. Zopounidis C, Doumpos M (2002) Multicriteria classification and sorting methods: a literature review. Eur J Oper Res 138(2):229–246zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of ManagementUniversity of IsfahanIsfahanIran
  2. 2.Department of Artificial Intelligence, Faculty of Computer EngineeringUniversity of IsfahanIsfahanIran

Personalised recommendations