Abstract
National Household Budget Survey (HBS) data includes sociodemographic and financial indicators that are the elements of government public policy actions. Finding the optimal grouping of households in a given, sufficiently large data is a challenging task for policymakers. Soft classification techniques such as Fuzzy C-means (FCM) provide a deep understanding of hidden patterns in the variable set. This study aims to compare FCM and k-means (KM) classification performance for the grouping of households in terms of sociodemographic and out-of-pocket (OOP) health expenditure variables. Health expenditure variables have heavily skewed distributions and that the shape of the variable distribution has a measurable effect on classifiers. Incorporating Bayesian data generation procedures into the variable transformation process will increase the ability to deal with skewness and improve model performance. However, there is a scarcity of knowledge about the embedded strategy performance of the Bayesian data generation approach with unsupervised learning with the application on health expenditures. This study applied the aforementioned strategy to Turkish HBS data for the year 2015 while comparing FCM and KM classification performance. Normality test results for the distribution of logarithmic (KS = 0.006; p > 0.05) and Box-Cox transformed (KS = 0.006; p > 0.05) health expenditure variables, which were generated using lognormal distributions from a Bayesian viewpoint, are next to normal. Moreover, KM clustering (Sil = 0.48) results are better than FCM (Sil = 0.4198) for classifying households. The optimal number of household groups is 20. Further studies will compare the cluster-seeking performance of other unsupervised learning algorithms while incorporating arbitrary health expenditure variables into the study model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Degirmenci, T., Ozbakır, L.: Differentiating households to analyze consumption patterns: a data mining study on official household budget data. WIREs Data Mining Knowl. Discov. 8(1), 1–15 (2017)
Palarea-Albaladejo, J., Fernández, J.A.M., Soto, J.: Dealing with distances and transformations for fuzz c-means clustering of compositional data. J. Classif. 29(2), 144–169 (2012)
Sun, H., Wang, S., Jiang, Q.: FCM-based model selection algorithms for determining the number of clusters. Pattern Recogn. 37(10), 2027–2037 (2004)
Kumar, K.M., Reddy, A.R.M.: An efficient k-means clustering filtering algorithm using density based initial cluster centers. Inf. Sci. 418–419, 286–301 (2017)
Husein, A.M., Harahap, M., Aisyah, S., Purba, W., Muhazir, A.: The implementation of two stages clustering (k-means clustering and adaptive neuro fuzzy inference system) for prediction of medicine need based on medical data. J. Phys: Conf. Ser. 978(1), 012019 (2017)
Panda, S., Sahu, S., Jena, P., Chattopadhyay, S.: Comparing Fuzzy-C Means and K-Means clustering techniques: a comprehensive study. In: Wyld, D., Zizka, J., Nagamalai, D. (eds.) Advances in Computer Science, Engineering & Applications. Advances in Intelligent and Soft Computing. Springer, Berlin (2012)
Sarlin, P., Eklund, T.: Fuzzy clustering of the self-organizing map: some applications on financial time series. In: Laaksonen J., Honkela T. (eds.) Advances in Self-Organizing Maps. WSOM 2011. Lecture Notes in Computer Science, vol. 6731. Springer, Berlin (2011)
Marko, N.F., Weil, R.J.: Non-Gaussian distributions affect identification of expression patterns, functional annotation, and prospective classification in human cancer genomes. PLoS ONE 7(10), 1–15 (2012)
Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20(4), 461–494 (2001)
Nixon, R.M., Thompson, S.G.: Parametric modelling of cost data in medical studies. Stat. Med. 23(8), 1311–1331 (2004)
Manning, W.: Dealing with skewed data on costs and expenditures. In: Jones, A.M. (ed.) The Elgar Companion to Health Economics, 2nd edn. Edward Elgar Publishing (2012). (chapter 44)
Box, G.E.P., Cox, D.R.: An analysis of transformations. J. Roy. Stat. Soc.: Ser. B (Methodol.) 26(2), 211–243 (1964)
Yapıcı-Pehlivan, N., Gursoy, Z.: Determination of individuals’ life satisfaction levels living in Turkey by FMCDM methods. Kybernetes 48(8), 1871–1893 (2019)
Gao, S., Zhou, C.: Differential privacy data publishing in the big data platform of precise poverty alleviation. Soft Comput. 67 (2019). https://doi.org/10.1007/s00500-019-04352-1
Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: International Conference on Data Mining Proceedings, Maebashi City, Japan, pp. 115–122. IEEE (2002)
Mohamad, I.B., Usman, D.: Standardization and its effects on k-means clustering algorithm. Res. J. Appl. Sci. Eng. Technol. 6(17), 3299–3303 (2013)
Turkish Statistical Institute (TurkStat) Statistical Indicators. Household budget survey data (2015). http://www.turkstat.gov.tr/UstMenu.do?metod=temelist. Accessed 01 Jan 2020
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validity of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cinaroglu, S. (2021). Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data. In: Kahraman, C., Cevik Onar, S., Oztaysi, B., Sari, I., Cebi, S., Tolga, A. (eds) Intelligent and Fuzzy Techniques: Smart and Innovative Solutions. INFUS 2020. Advances in Intelligent Systems and Computing, vol 1197. Springer, Cham. https://doi.org/10.1007/978-3-030-51156-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-51156-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51155-5
Online ISBN: 978-3-030-51156-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)