Skip to main content

Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data

  • Conference paper
  • First Online:
Intelligent and Fuzzy Techniques: Smart and Innovative Solutions (INFUS 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1197))

Included in the following conference series:

Abstract

National Household Budget Survey (HBS) data includes sociodemographic and financial indicators that are the elements of government public policy actions. Finding the optimal grouping of households in a given, sufficiently large data is a challenging task for policymakers. Soft classification techniques such as Fuzzy C-means (FCM) provide a deep understanding of hidden patterns in the variable set. This study aims to compare FCM and k-means (KM) classification performance for the grouping of households in terms of sociodemographic and out-of-pocket (OOP) health expenditure variables. Health expenditure variables have heavily skewed distributions and that the shape of the variable distribution has a measurable effect on classifiers. Incorporating Bayesian data generation procedures into the variable transformation process will increase the ability to deal with skewness and improve model performance. However, there is a scarcity of knowledge about the embedded strategy performance of the Bayesian data generation approach with unsupervised learning with the application on health expenditures. This study applied the aforementioned strategy to Turkish HBS data for the year 2015 while comparing FCM and KM classification performance. Normality test results for the distribution of logarithmic (KS = 0.006; p > 0.05) and Box-Cox transformed (KS = 0.006; p > 0.05) health expenditure variables, which were generated using lognormal distributions from a Bayesian viewpoint, are next to normal. Moreover, KM clustering (Sil = 0.48) results are better than FCM (Sil = 0.4198) for classifying households. The optimal number of household groups is 20. Further studies will compare the cluster-seeking performance of other unsupervised learning algorithms while incorporating arbitrary health expenditure variables into the study model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Degirmenci, T., Ozbakır, L.: Differentiating households to analyze consumption patterns: a data mining study on official household budget data. WIREs Data Mining Knowl. Discov. 8(1), 1–15 (2017)

    Google Scholar 

  2. Palarea-Albaladejo, J., Fernández, J.A.M., Soto, J.: Dealing with distances and transformations for fuzz c-means clustering of compositional data. J. Classif. 29(2), 144–169 (2012)

    MATH  Google Scholar 

  3. Sun, H., Wang, S., Jiang, Q.: FCM-based model selection algorithms for determining the number of clusters. Pattern Recogn. 37(10), 2027–2037 (2004)

    MATH  Google Scholar 

  4. Kumar, K.M., Reddy, A.R.M.: An efficient k-means clustering filtering algorithm using density based initial cluster centers. Inf. Sci. 418–419, 286–301 (2017)

    MathSciNet  Google Scholar 

  5. Husein, A.M., Harahap, M., Aisyah, S., Purba, W., Muhazir, A.: The implementation of two stages clustering (k-means clustering and adaptive neuro fuzzy inference system) for prediction of medicine need based on medical data. J. Phys: Conf. Ser. 978(1), 012019 (2017)

    Google Scholar 

  6. Panda, S., Sahu, S., Jena, P., Chattopadhyay, S.: Comparing Fuzzy-C Means and K-Means clustering techniques: a comprehensive study. In: Wyld, D., Zizka, J., Nagamalai, D. (eds.) Advances in Computer Science, Engineering & Applications. Advances in Intelligent and Soft Computing. Springer, Berlin (2012)

    Google Scholar 

  7. Sarlin, P., Eklund, T.: Fuzzy clustering of the self-organizing map: some applications on financial time series. In: Laaksonen J., Honkela T. (eds.) Advances in Self-Organizing Maps. WSOM 2011. Lecture Notes in Computer Science, vol. 6731. Springer, Berlin (2011)

    Google Scholar 

  8. Marko, N.F., Weil, R.J.: Non-Gaussian distributions affect identification of expression patterns, functional annotation, and prospective classification in human cancer genomes. PLoS ONE 7(10), 1–15 (2012)

    Google Scholar 

  9. Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20(4), 461–494 (2001)

    Google Scholar 

  10. Nixon, R.M., Thompson, S.G.: Parametric modelling of cost data in medical studies. Stat. Med. 23(8), 1311–1331 (2004)

    Google Scholar 

  11. Manning, W.: Dealing with skewed data on costs and expenditures. In: Jones, A.M. (ed.) The Elgar Companion to Health Economics, 2nd edn. Edward Elgar Publishing (2012). (chapter 44)

    Google Scholar 

  12. Box, G.E.P., Cox, D.R.: An analysis of transformations. J. Roy. Stat. Soc.: Ser. B (Methodol.) 26(2), 211–243 (1964)

    MATH  Google Scholar 

  13. Yapıcı-Pehlivan, N., Gursoy, Z.: Determination of individuals’ life satisfaction levels living in Turkey by FMCDM methods. Kybernetes 48(8), 1871–1893 (2019)

    Google Scholar 

  14. Gao, S., Zhou, C.: Differential privacy data publishing in the big data platform of precise poverty alleviation. Soft Comput. 67 (2019). https://doi.org/10.1007/s00500-019-04352-1

  15. Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: International Conference on Data Mining Proceedings, Maebashi City, Japan, pp. 115–122. IEEE (2002)

    Google Scholar 

  16. Mohamad, I.B., Usman, D.: Standardization and its effects on k-means clustering algorithm. Res. J. Appl. Sci. Eng. Technol. 6(17), 3299–3303 (2013)

    Google Scholar 

  17. Turkish Statistical Institute (TurkStat) Statistical Indicators. Household budget survey data (2015). http://www.turkstat.gov.tr/UstMenu.do?metod=temelist. Accessed 01 Jan 2020

  18. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validity of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songul Cinaroglu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cinaroglu, S. (2021). Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data. In: Kahraman, C., Cevik Onar, S., Oztaysi, B., Sari, I., Cebi, S., Tolga, A. (eds) Intelligent and Fuzzy Techniques: Smart and Innovative Solutions. INFUS 2020. Advances in Intelligent Systems and Computing, vol 1197. Springer, Cham. https://doi.org/10.1007/978-3-030-51156-2_8

Download citation

Publish with us

Policies and ethics