Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data

Cinaroglu, Songul

doi:10.1007/978-3-030-51156-2_8

Songul Cinaroglu²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1197))

Included in the following conference series:

International Conference on Intelligent and Fuzzy Systems

2411 Accesses
1 Citations

Abstract

National Household Budget Survey (HBS) data includes sociodemographic and financial indicators that are the elements of government public policy actions. Finding the optimal grouping of households in a given, sufficiently large data is a challenging task for policymakers. Soft classification techniques such as Fuzzy C-means (FCM) provide a deep understanding of hidden patterns in the variable set. This study aims to compare FCM and k-means (KM) classification performance for the grouping of households in terms of sociodemographic and out-of-pocket (OOP) health expenditure variables. Health expenditure variables have heavily skewed distributions and that the shape of the variable distribution has a measurable effect on classifiers. Incorporating Bayesian data generation procedures into the variable transformation process will increase the ability to deal with skewness and improve model performance. However, there is a scarcity of knowledge about the embedded strategy performance of the Bayesian data generation approach with unsupervised learning with the application on health expenditures. This study applied the aforementioned strategy to Turkish HBS data for the year 2015 while comparing FCM and KM classification performance. Normality test results for the distribution of logarithmic (KS = 0.006; p > 0.05) and Box-Cox transformed (KS = 0.006; p > 0.05) health expenditure variables, which were generated using lognormal distributions from a Bayesian viewpoint, are next to normal. Moreover, KM clustering (Sil = 0.48) results are better than FCM (Sil = 0.4198) for classifying households. The optimal number of household groups is 20. Further studies will compare the cluster-seeking performance of other unsupervised learning algorithms while incorporating arbitrary health expenditure variables into the study model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analysis of Quality of Living Data of Households of Indian Districts Using Machine Learning Approach of Fuzzy C-Means Clustering

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

Article 06 March 2019

A fuzzy functional k-means approach for monitoring Italian regions according to health evolution over time

Article 13 November 2019

References

Degirmenci, T., Ozbakır, L.: Differentiating households to analyze consumption patterns: a data mining study on official household budget data. WIREs Data Mining Knowl. Discov. 8(1), 1–15 (2017)
Google Scholar
Palarea-Albaladejo, J., Fernández, J.A.M., Soto, J.: Dealing with distances and transformations for fuzz c-means clustering of compositional data. J. Classif. 29(2), 144–169 (2012)
MATH Google Scholar
Sun, H., Wang, S., Jiang, Q.: FCM-based model selection algorithms for determining the number of clusters. Pattern Recogn. 37(10), 2027–2037 (2004)
MATH Google Scholar
Kumar, K.M., Reddy, A.R.M.: An efficient k-means clustering filtering algorithm using density based initial cluster centers. Inf. Sci. 418–419, 286–301 (2017)
MathSciNet Google Scholar
Husein, A.M., Harahap, M., Aisyah, S., Purba, W., Muhazir, A.: The implementation of two stages clustering (k-means clustering and adaptive neuro fuzzy inference system) for prediction of medicine need based on medical data. J. Phys: Conf. Ser. 978(1), 012019 (2017)
Google Scholar
Panda, S., Sahu, S., Jena, P., Chattopadhyay, S.: Comparing Fuzzy-C Means and K-Means clustering techniques: a comprehensive study. In: Wyld, D., Zizka, J., Nagamalai, D. (eds.) Advances in Computer Science, Engineering & Applications. Advances in Intelligent and Soft Computing. Springer, Berlin (2012)
Google Scholar
Sarlin, P., Eklund, T.: Fuzzy clustering of the self-organizing map: some applications on financial time series. In: Laaksonen J., Honkela T. (eds.) Advances in Self-Organizing Maps. WSOM 2011. Lecture Notes in Computer Science, vol. 6731. Springer, Berlin (2011)
Google Scholar
Marko, N.F., Weil, R.J.: Non-Gaussian distributions affect identification of expression patterns, functional annotation, and prospective classification in human cancer genomes. PLoS ONE 7(10), 1–15 (2012)
Google Scholar
Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20(4), 461–494 (2001)
Google Scholar
Nixon, R.M., Thompson, S.G.: Parametric modelling of cost data in medical studies. Stat. Med. 23(8), 1311–1331 (2004)
Google Scholar
Manning, W.: Dealing with skewed data on costs and expenditures. In: Jones, A.M. (ed.) The Elgar Companion to Health Economics, 2nd edn. Edward Elgar Publishing (2012). (chapter 44)
Google Scholar
Box, G.E.P., Cox, D.R.: An analysis of transformations. J. Roy. Stat. Soc.: Ser. B (Methodol.) 26(2), 211–243 (1964)
MATH Google Scholar
Yapıcı-Pehlivan, N., Gursoy, Z.: Determination of individuals’ life satisfaction levels living in Turkey by FMCDM methods. Kybernetes 48(8), 1871–1893 (2019)
Google Scholar
Gao, S., Zhou, C.: Differential privacy data publishing in the big data platform of precise poverty alleviation. Soft Comput. 67 (2019). https://doi.org/10.1007/s00500-019-04352-1
Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: International Conference on Data Mining Proceedings, Maebashi City, Japan, pp. 115–122. IEEE (2002)
Google Scholar
Mohamad, I.B., Usman, D.: Standardization and its effects on k-means clustering algorithm. Res. J. Appl. Sci. Eng. Technol. 6(17), 3299–3303 (2013)
Google Scholar
Turkish Statistical Institute (TurkStat) Statistical Indicators. Household budget survey data (2015). http://www.turkstat.gov.tr/UstMenu.do?metod=temelist. Accessed 01 Jan 2020
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validity of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Health Care Management, FEAS, Hacettepe University, Ankara, Turkey
Songul Cinaroglu

Authors

Songul Cinaroglu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Songul Cinaroglu .

Editor information

Editors and Affiliations

Department of Industrial Engineering, Istanbul Technical University, Istanbul, Turkey
Cengiz Kahraman
Department of Industrial Engineering, Istanbul Technical University, Istanbul, Turkey
Sezi Cevik Onar
Department of Industrial Engineering, Istanbul Technical University, İstanbul, Turkey
Basar Oztaysi
Department of Industrial Engineering, Istanbul Technical University, Istanbul, Turkey
Irem Ucal Sari
Industrial Engineering Department, Yildiz Technical University, Istanbul, Turkey
Selcuk Cebi
Industrial Engineering Department, Galatasaray University, Istanbul, Turkey
A. Cagri Tolga

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cinaroglu, S. (2021). Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data. In: Kahraman, C., Cevik Onar, S., Oztaysi, B., Sari, I., Cebi, S., Tolga, A. (eds) Intelligent and Fuzzy Techniques: Smart and Innovative Solutions. INFUS 2020. Advances in Intelligent Systems and Computing, vol 1197. Springer, Cham. https://doi.org/10.1007/978-3-030-51156-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-51156-2_8
Published: 11 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51155-5
Online ISBN: 978-3-030-51156-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data

Abstract

Access this chapter

Similar content being viewed by others

Analysis of Quality of Living Data of Households of Indian Districts Using Machine Learning Approach of Fuzzy C-Means Clustering

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

A fuzzy functional k-means approach for monitoring Italian regions according to health evolution over time

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data

Abstract

Access this chapter

Similar content being viewed by others

Analysis of Quality of Living Data of Households of Indian Districts Using Machine Learning Approach of Fuzzy C-Means Clustering

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

A fuzzy functional k-means approach for monitoring Italian regions according to health evolution over time

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation