Abstract
Customer segmentation (CS) is the most critical application in the field of customer relationship management that primarily depends on clustering algorithms. Rough k-means (RKM) clustering algorithm is widely adopted in the literature for achieving CS objective. However, the RKM has certain limitations that prevent its successful application to CS. First, it is sensitive to random initial cluster centers. Second, it uses default values for parameters \(w_{l}\) and \(w_{u}\) used in calculating cluster centers. To address these limitations, a new initialization method is proposed in this study. The proposed initialization mitigates the problems associated with the random choice of initial cluster centers to achieve stable clustering results. A weight optimization scheme for \(w_{l}\) and \(w_{u}\) is proposed in this study. This scheme helps to estimate suitable weights for \(w_{l}\) and \(w_{u}\) by counting the number of data points present in clusters. Extensive experiments were carried out by using several benchmark datasets to assess the performance of these proposed methods in comparison with the existing algorithm. The results reveal that the proposed methods have improved the performance of the RKM algorithm, which is validated by the evaluation metrics, namely convergence speed, clustering accuracy, Davies–Bouldin (DB) index, within/total (W/T) clustering error index and statistical significance \(t\) test. Further, the results are compared with other promising clustering algorithms to show its advantage. A CS framework that shows the utility of these proposed methods in the application domain is also proposed. Finally, it is demonstrated through a case study in a retail supermarket.
Similar content being viewed by others
References
Andaleeb SS (2016) Market segmentation, targeting and positioning. Strateg Mark Manag Asia. https://doi.org/10.1108/978-1-78635-746-520161006
Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of eighteenth annual ACM-SIAM symposium on discrete algorithms, pp 1027–1025. https://doi.org/10.1145/1283383.1283494
Athanassopoulos AD (2000) Customer satisfaction cues to support market segmentation and explain switching behavior. J Bus Res 47:191–207. https://doi.org/10.1016/S0148-2963(98)00060-5
Ballestar MT, Grau-Carles P, Sainz J (2018) Customer segmentation in e-commerce: applications to the cashback business model. J Bus Res 88:407–414. https://doi.org/10.1016/j.jbusres.2017.11.047
Bell R (2015) A beginner’s guide to Big O notation—Rob Bell. Rob Bell
Bhatnagar N, Maryott K (2008) Approaches to customer segmentation. J Relatsh Cust Sel Prioritization. https://doi.org/10.1300/J366v06n03
Blattberg RC, Kim B-D, Neslin SA (2008) RFM analysis. In: Database marketing, pp 323–337. https://doi.org/10.1007/978-0-387-72579-6_12
Bubeck S, Meila M, von Luxburg U (2009) How the initialization affects the stability of the k-means algorithm. ESAIM Probab Stat 16:436–452. https://doi.org/10.1051/ps/2012013
Carmichael G, Chen YW, Luo C (2018) Data-driven segmentation of consumers’ purchase behaviour in the retail industry. In: 2018 4th International conference on information management, ICIM 2018, pp 215–219. https://doi.org/10.1109/infoman.2018.8392838
Chang HC, Tsai HP (2011) Group RFM analysis as a novel framework to discover better customer consumption behavior. Expert Syst Appl 38:14499–14513. https://doi.org/10.1016/j.eswa.2011.05.034
Chen Y, Zhang G, Hu D, Wang S (2006) Customer segmentation in customer relationship management based on data mining. IFIP Int Fed Inf Process 207:288–293. https://doi.org/10.1007/0-387-34403-9_40
Chen YL, Kuo MH, Wu SY, Tang K (2009) Discovering recency, frequency, and monetary (RFM) sequential patterns from customers’ purchasing data. Electron Commer Res Appl 8:241–251. https://doi.org/10.1016/j.elerap.2009.03.002
Cheng W, Wang W, Batista S (2019) Grid-based clustering. In: Data clustering, pp 128–148
Chou PB, Grossman E, Gunopulos D, Kamesam P (2000) Identifying prospective customers. In: Proceeding of sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 447–456. Doi: 10.1145/347090.347183
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell PAMI-1 2:224–227. https://doi.org/10.1109/TPAMI.1979.4766909
Devi SS, Singh NH, Laskar RH (2020) Fuzzy C-means clustering with histogram based cluster selection for skin lesion segmentation using non-dermoscopic images. Int J Interact Multimed Artif Intell 6:26. https://doi.org/10.9781/ijimai.2020.01.001
Díaz A, Gómez M, Molina A, Santos J (2018) A segmentation study of cinema consumers based on values and lifestyle. J Retail Consum Serv 41:79–89. https://doi.org/10.1016/j.jretconser.2017.12.001
Dzobo O, Alvehag K, Gaunt CT, Herman R (2014) Multi-dimensional customer segmentation model for power system reliability-worth analysis. Int J Electr Power Energy Syst 62:532–539. https://doi.org/10.1016/j.ijepes.2014.04.066
Fränti P, Sieranoja S (2019) How much can k-means be improved by using better initialization and repeats? Pattern Recognit 93:95–112. https://doi.org/10.1016/j.patcog.2019.04.014
Gilboa S (2009) A segmentation study of Israeli mall customers. J Retail Consum Serv 16:135–144. https://doi.org/10.1016/j.jretconser.2008.11.001
Griva A, Bardaki C, Pramatari K, Papakiriakopoulos D (2018) Retail business analytics: customer visit segmentation using market basket data. Expert Syst Appl 100:1–16. https://doi.org/10.1016/j.eswa.2018.01.029
Guo Z, Zhou K, Zhang X et al (2018) Data mining based framework for exploring household electricity consumption patterns: a case study in China context. J Clean Prod 195:773–785. https://doi.org/10.1016/j.jclepro.2018.05.254
Hamzaoui Y, Amnai M, Choukri A, Fakhri Y (2018) Novel clustering method based on K-medoids and mobility metric. Int J Interact Multimed Artif Intell 5:1. https://doi.org/10.9781/ijimai.2017.11.001
Hashemzadeh M, Golzari Oskouei A, Farajzadeh N (2019) New fuzzy C-means clustering method based on feature-weight and cluster-weight learning. Appl Soft Comput J 78:324–345. https://doi.org/10.1016/j.asoc.2019.02.038
Hiziroglu A (2013) Soft computing applications in customer segmentation: state-of-art review and critique. Expert Syst Appl 40:6491–6507. https://doi.org/10.1016/j.eswa.2013.05.052
Hong T, Kim E (2012) Segmenting customers in online stores based on factors that affect the customer’s intention to purchase. Expert Syst Appl 39:2127–2131. https://doi.org/10.1016/j.eswa.2011.07.114
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323
Kotler P (2009) Marketing management: a south Asian perspective. Pearson Education India, New Delhi
Kriegel HP, Kröger P, Sander J, Zimek A (2011) Density-based clustering. Wiley Interdiscip Rev Data Min Knowl Discov. https://doi.org/10.1002/widm.30
Kumar S, Kumar-Solanki V, Choudhary SK et al (2020) Comparative study on ant colony optimization (ACO) and K-means clustering approaches for jobs scheduling and energy optimization model in internet of things (IoT). Int J Interact Multimed Artif Intell 6:107. https://doi.org/10.9781/ijimai.2020.01.003
Liao SH, Chang HK (2016) A rough set-based association rule approach for a recommendation system for online consumers. Inf Process Manag 52:1142–1160. https://doi.org/10.1016/j.ipm.2016.05.003
Liao SH, Chen YJ, Hsieh HH (2011) Mining customer knowledge for direct selling and marketing. Expert Syst Appl 38:6059–6069. https://doi.org/10.1016/j.eswa.2010.11.007
Lingras P, Peters G (2011) Rough clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1:64–72. https://doi.org/10.1002/widm.16
Lingras P, West C (2004) Interval set clustering of web users with rough K-means. J Intell Inf Syst 23:5–16. https://doi.org/10.1023/B:JIIS.0000029668.88665.1a
Lingras P, Yan R, West C (2003) Comparison of conventional and rough K-means clustering. In: Lecture notes in artifical intelligence (Subseries Lecture notes in computer science, vol 2639, pp 130–137
Liu Y, Kiang M, Brusco M (2012) A unified framework for market segmentation and its applications. Expert Syst Appl 39:10292–10302. https://doi.org/10.1016/j.eswa.2012.02.161
López JJ, Aguado JA, Martín F et al (2011) Hopfield-K-means clustering algorithm: a proposal for the segmentation of electricity customers. Electr Power Syst Res 81:716–724. https://doi.org/10.1016/j.epsr.2010.10.036
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of fifth Berkeley symposium on math statistics probability, vol 1, pp 281–297. Citeulike-article-id: 6083430
Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31. https://doi.org/10.1016/j.dss.2014.03.001
Ozer M (2001) User segmentation of online music services using fuzzy clustering. Omega 29:193–206. https://doi.org/10.1016/S0305-0483(00)00042-6
Pawlak Z (1982) Rough sets. Int J Inf Comput Sci 11:341–356
Peker S, Kocyigit A, Eren PE (2017) LRFMP model for customer segmentation in the grocery retail industry: a case study. Mark Intell Plan 35:544–559. https://doi.org/10.1108/MIP-11-2016-0210
Peters G (2006) Some refinements of rough k-means clustering. Pattern Recognit 39:1481–1491. https://doi.org/10.1016/j.patcog.2006.02.002
Peters G (2014) Rough clustering utilizing the principle of indifference. Inf Sci (NY) 277:358–374. https://doi.org/10.1016/j.ins.2014.02.073
Peters G (2015) Is there any need for rough clustering? Pattern Recognit Lett 53:31–37. https://doi.org/10.1016/j.patrec.2014.11.003
Peters G, Weber R, Nowatzke R (2012) Dynamic rough clustering and its applications. Appl Soft Comput J 12:3193–3207. https://doi.org/10.1016/j.asoc.2012.05.015
Peters G, Crespo F, Lingras P, Weber R (2013) Soft clustering—fuzzy and rough approaches and their extensions and derivatives. Int J Approx Reason 54:307–322. https://doi.org/10.1016/j.ijar.2012.10.003
Prabhagar V, Punniyamoorthy M (2020) A new initialization and performance measure for the rough k-means clustering. Soft Comput. https://doi.org/10.1007/s00500-019-04625-9
Preheim SP, Perrott AR, Martin-Platero AM et al (2013) Distribution-based clustering: using ecology to refine the operational taxonomic unit. Appl Environ Microbiol 79:6593–6603. https://doi.org/10.1128/AEM.00342-13
Rajamohamed R, Manokaran J (2017) Improved credit card churn prediction based on rough clustering and supervised learning techniques. Cluster Comput 21:1–13. https://doi.org/10.1007/s10586-017-0933-1
Shin HW, Sohn SY (2004) Segmentation of stock trading customers according to potential value. Expert Syst Appl 27:27–33. https://doi.org/10.1016/j.eswa.2003.12.002
Smith WR (1956) Product differentiation and market segmentation as alternative marketing strategies. J Mark 21:3. https://doi.org/10.2307/1247695
Tsai CF, Hu YH, Lu YH (2015) Customer segmentation issues and strategies for an automobile dealership with two clustering techniques. Expert Syst 32:65–76. https://doi.org/10.1111/exsy.12056
Wedel M, Kamakura WA (2000) Market segmentation: conceptual and methodological foundations (International series in quantitative marketing). Springer, Berlin
Wei J-T, Lin S-Y, Wu H-H (2010) A review of the application of RFM model. Afr J Bus Manag Dec Spec Rev 4:4199–4206. https://doi.org/10.5897/AJBM
Wu WW (2011) Segmenting and mining the ERP users’ perceived benefits using the rough set approach. Expert Syst Appl 38:6940–6948. https://doi.org/10.1016/j.eswa.2010.12.030
Yang AX (2004) How to develop new approaches to RFM segmentation. J Target Meas Anal Mark 13:50–60. https://doi.org/10.1057/palgrave.jt.5740131
Yang J, Zhao J, Wen F, Dong Z (2019) A model of customizing electricity retail prices based on load profile clustering analysis. IEEE Trans Smart Grid 10:3374–3386. https://doi.org/10.1109/TSG.2018.2825335
You Z, Si YW, Zhang D et al (2015) A decision-making framework for precision marketing. Expert Syst Appl 42:3357–3367. https://doi.org/10.1016/j.eswa.2014.12.022
Zhang K (2019) A three-way c-means algorithm. Appl Soft Comput J 82:105536. https://doi.org/10.1016/j.asoc.2019.105536
Zhang TJ, Huang XH, Tang JF, Luo XG (2011) Case study on cluster analysis of the telecom customers based on consumers’ behavior. In: 2011 IEEE 18th International conference on industrial engineering and engineering management IE EM 2011 part 2, pp 1358–1362. https://doi.org/10.1109/ieem.2011.6035407
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sivaguru, M., Punniyamoorthy, M. Performance-enhanced rough \(k\)-means clustering algorithm. Soft Comput 25, 1595–1616 (2021). https://doi.org/10.1007/s00500-020-05247-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05247-2