Skip to main content
Log in

A Novel Fuzzy Rough Clustering Parameter-based missing value imputation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

For a long time, missing values are the biggest challenging problem in data mining, machine learning and statistical analysis. In the current scenario, various methods exist to handle the missing values as it’s an important task to discover meaningful information. However, the most frequently used method to handle the missing values in a large dataset is discarding the instances with missing values. In such situation, deletion of instances with missing values causes loss of crucial information, which affects the performance of algorithms. Hence, an intelligent method needs to handle the missing values. In the recent past, the fuzzy and rough set has been widely employed in many applications. In this research work, a Novel Fuzzy C-Means Rough Parameter-based missing value imputation method is proposed with the hybridization of the fuzzy and rough set to handle missing values. The proposed algorithm is capable of handling the situation of uncertainty and vagueness in datasets through rough and fuzzy sets while maintaining vital information. The experimentation has been carried out on three benchmark datasets such as the Dukes’ B colon cancer dataset, the Mice Protein Expression and Yeast datasets to asses the efficacy of the proposed method. It is observed that the proposed method produces improved results than Fuzzy C-Means Centroid-based missing value imputation and Fuzzy C-Means Parameter-based missing value imputation method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Rey-del-Castillo P, Cardeñosa J (2012) Fuzzy min-max neural networks for categorical data: application to missing data imputation. Neural Comput Appl 21(6):1349–1362

    Article  Google Scholar 

  2. García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282

    Article  Google Scholar 

  3. Raja PS, Thangavel K (2016) Soft clustering based missing value imputation. In: Annual convention of the Computer Society of India. Springer, Singapore, pp 119–133

  4. Liu ZG, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95

    Article  Google Scholar 

  5. Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164

    Article  Google Scholar 

  6. Tuikkala J, Elo LL, Nevalainen OS, Aittokallio T (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinform 9(1):202

    Article  Google Scholar 

  7. Rahman MM, Davis DN (2013) Machine learning-based missing value imputation method for clinical datasets. In: Yang GC, Ao S, Gelman L (eds) IAENG transactions on engineering technologies. Springer, Dordrecht, pp 245–257

    Chapter  Google Scholar 

  8. Tian J, Yu B, Yu D, Ma S (2014) Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering. Appl Intell 40(2):376–388

    Article  Google Scholar 

  9. Liao Z, Lu X, Yang T, Wang H (2009) Missing data imputation: a fuzzy K-means clustering algorithm over a sliding window. In: Sixth international conference on fuzzy systems and knowledge discovery, 2009. FSKD’09, vol 3. IEEE, pp 133–137

  10. Luengo J, Sáez JA, Herrera F (2012) Missing data imputation for fuzzy rule-based classification systems. Soft Comput 16(5):863–881

    Article  Google Scholar 

  11. Zhang Y, Kambhampati C, Davis DN, Goode K, Cleland JG (2012) A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In: 2012 9th international conference on fuzzy systems and knowledge discovery (FSKD). IEEE, pp 2840–2844

  12. Stefanowski J, Tsoukias A (2001) Incomplete information tables and rough classification. Comput Intell 17(3):545–566

    Article  MATH  Google Scholar 

  13. Pan R, Yang T, Cao J, Lu K, Zhang Z (2015) Missing data imputation by K nearest neighbours based on grey relational structure and mutual information. Appl Intell 43(3):614–632

    Article  Google Scholar 

  14. Luengo J, García S, Herrera F (2012) On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowl Inf Syst 32(1):77–108

    Article  Google Scholar 

  15. Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. In: International conference on rough sets and current trends in computing. Springer, Berlin, Heidelberg, pp 573–579

  16. Li D, Deogun J, Spaulding W, Shuart B (2005) Dealing with missing data: algorithms based on fuzzy set and rough set theories. In: Peters JF, Skowron A (eds) Transactions on rough sets IV. Springer, Berlin, pp 37–57

    Chapter  MATH  Google Scholar 

  17. Rahman MG, Islam MZ (2016) Missing value imputation using a fuzzy clustering-based EM approach. Knowl Inf Syst 46(2):389–422

    Article  Google Scholar 

  18. Tang J, Zhang G, Wang Y, Wang H, Liu F (2015) A hybrid approach to integrate fuzzy C-means based imputation method with genetic algorithm for missing traffic volume data estimation. Transp Res Part C Emerg Technol 51:29–40

    Article  Google Scholar 

  19. García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR, Verleysen M (2008) K-nearest neighbours based on mutual information for incomplete data classification. In: ESANN, pp 37–42

  20. Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35

    Article  Google Scholar 

  21. Zhang L, Lu W, Liu X, Pedrycz W, Zhong C (2016) Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values. Knowl Based Syst 99:51–70

    Article  Google Scholar 

  22. Luengo J, García S, Herrera F (2010) A study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: the good synergy between RBFNs and EventCovering method. Neural Netw 23(3):406–418

    Article  Google Scholar 

  23. Peters G, Lampart M (2006) A partitive rough clustering algorithm. In: International conference on rough sets and current trends in computing. Springer, Berlin, pp 657–666

  24. Panda S, Sahu S, Jena P, Chattopadhyay S (2012) Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Wyld D, Zizka J, Nagamalai D (eds)  Advances in computer science, engineering & applications. Springer, Berlin, pp 451–460

    Chapter  Google Scholar 

  25. Zadeh LA (1968) Probability measures of fuzzy events. J Math Anal Appl 23(2):421–427

    Article  MathSciNet  MATH  Google Scholar 

  26. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics 3(3):32–57 https://doi.org/10.1080/01969727308546046

    Article  MathSciNet  MATH  Google Scholar 

  27. Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B (Cybernetics) 31(5):735–744

    Article  Google Scholar 

  28. Pawlak Z (1982) Rough sets. Int J Parallel Program 11(5):341–356

    MATH  Google Scholar 

  29. Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29(7):661–688

    Article  MATH  Google Scholar 

  30. Bonikowski Z, Bryniarski E, Wybraniec-Skardowska U (1998) Extensions and intentions in the rough set theory. Inf Sci 107(1–4):149–167

    Article  MathSciNet  MATH  Google Scholar 

  31. Liang J, Shi Z (2004) The information entropy, rough entropy and knowledge granulation in rough set theory. Int J Uncertain Fuzziness Knowl Based Syst 12(01):37–46

    Article  MathSciNet  MATH  Google Scholar 

  32. Peters G, Lampart M, Weber R (2008) Evolutionary rough k-medoid clustering. In: Peters JF, Skowron A (eds) Transactions on rough sets VIII. Springer, Berlin, pp 289–306

    Chapter  MATH  Google Scholar 

  33. Peters G (2005) Outliers in rough k-means clustering. In: International conference on pattern recognition and machine intelligence. Springer, Berlin, Heidelberg, pp 702–707

  34. Lingras P, Peters G (2011) Rough clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):64–72

    Article  Google Scholar 

  35. Wang Y, Jatkoe T, Zhang Y, Mutch MG, Talantov D, Jiang J, Atkins D (2004) Gene expression profiles and molecular markers to predict recurrence of Dukes’ B colon cancer. J Clin Oncol 22(9):1564–1571

    Article  Google Scholar 

  36. https://archive.ics.uci.edu/ml/datasets/Yeast

  37. https://archive.ics.uci.edu/ml/datasets/Mice+Protein+Expression

  38. Crespo Turrado C, Sánchez Lasheras F, Calvo-Rollé JL, Piñón-Pazos AJ, de Cos Juez FJ (2015) A new missing data imputation algorithm applied to electrical data loggers. Sensors 15(12):31069–31082

    Article  Google Scholar 

  39. Sim J, Lee JS, Kwon O (2015) Missing values and optimal selection of an imputation method and classification algorithm to improve the accuracy of ubiquitous computing applications. Math Probl Eng 2015:538613

    Article  Google Scholar 

  40. Bertsimas D, Pawlowski C, Zhuo YD (2017) From predictive methods to missing data imputation: an optimization approach. J Mach Learn Res 18(1):7133–7171

    MathSciNet  MATH  Google Scholar 

  41. Raja PS, Thangavel K (2019) Missing value imputation using unsupervised machine learning techniques. Soft Comput. https://doi.org/10.1007/s00500-019-04199-6

    Article  Google Scholar 

Download references

Acknowledgements

Authors would like to thank UGC, New Delhi, for the financial support received under UGC Rajiv Gandhi National Fellowship (F1-17.1/2016-17/RGNF-2015-17-SC-TAM-28324) and UGC-SAP No. F.5-6/2018/DRS-II (SAP-II). The authors extend their sincere thanks to the anonymous referees for their suggestions to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. S. Raja.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raja, P.S., Sasirekha, K. & Thangavel, K. A Novel Fuzzy Rough Clustering Parameter-based missing value imputation. Neural Comput & Applic 32, 10033–10050 (2020). https://doi.org/10.1007/s00521-019-04535-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04535-9

Keywords

Navigation