Managing uncertainty in imputing missing symptom value for healthcare of rural India

  • Sayan DasEmail author
  • Jaya Sil
Part of the following topical collections:
  1. Special Issue on Application of Artificial Intelligence in Health Research



In India, 67% of the total population live in remote area, where providing primary healthcare is a real challenge due to the scarcity of doctors. Health kiosks are deployed in remote villages and basic health data like blood pressure, pulse rate, height–weight, BMI, Oxygen saturation level (SpO2) etc. are collected. The acquired data is often imprecise due to measurement error and contains missing value. The paper proposes a comprehensive framework to impute missing symptom values by managing uncertainty present in the data set.


The data sets are fuzzified to manage uncertainty and fuzzy c-means clustering algorithm has been applied to group the symptom feature vectors into different disease classes. The missing symptom values corresponding to each disease are imputed using multiple fuzzy based regression model. Relations between different symptoms are framed with the help of experts and medical literature. Blood pressure symptom has been dealt with using a novel approach due to its characteristics and different from other symptoms. Patients’ records obtained from the kiosks are not adequate, so relevant data are simulated by the Monte Carlo method to avoid over-fitting problem while imputing missing values of the symptoms. The generated datasets are verified using Kulberk–Leiber (K–L) distance and distance correlation (dCor) techniques, showing that the simulated data sets are well correlated with the real data set.


Using the data sets, the proposed model is built and new patients are provisionally diagnosed using Softmax cost function. Multiple class labels as diseases are determined by achieving about 98% accuracy and verified with the ground truth provided by the experts.


It is worth to mention that the system is for primary healthcare and in emergency cases, patients are referred to the experts.


Rural healthcare Missing value Regression model Fuzzification Monte Carlo method Softmax classifier 



This work is supported by Information Technology Research Academy (ITRA), Digital India Corporation (formerly Media Lab Asia), Government of India under, ITRA-Mobile Grant [ITRA/15(59)/Mobile/Remote Health/01].

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Chokshi M, et al. Health systems in India. J Perinatol. 2016;36:S9–12. Scholar
  2. 2.
    Lodh N, Sil J, Bhattacharya I. Graph based clinical decision support system using ontological framework. In: Proc. Conference on Computational Intelligence, Communications and Business Analytics (CICBA-2017). Springer.Google Scholar
  3. 3.
    Pedersen AB, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157.CrossRefGoogle Scholar
  4. 4.
    Paul A, Sil J. Estimating missing value in microarray gene expression data using fuzzy similarity measure. In: IEEE International Conference on Fuzzy Systems. 27–30 June 2011, Taipei, Taiwan.Google Scholar
  5. 5.
    Zainuri NA, Jemain AA, Muda N. A comparison of various imputation methods for missing values in air quality data. Sains Malays. 2015;44(3):449–56.CrossRefGoogle Scholar
  6. 6.
    Tian J, Yu B, Yu D, Ma S. Clustering-based multiple imputation via gray relational analysis for missing data and its application to aerospace field. Sci World J. 2013. Scholar
  7. 7.
    Wu X, Kumar V, Quinlan JR, et al. Top 10 algorithms in data mining. Knowl Inf Syst. 2007;14(1):1–37. Scholar
  8. 8.
    Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987. Scholar
  9. 9.
    Andridge RR, Little RJ. A review of hot deck imputation for survey non-response. Int Stat Rev. 2010;78(1):40–64.CrossRefGoogle Scholar
  10. 10.
    Schafer JL. Analysis of incomplete multivariate data. Baco Raton: CRC Press; 1997.CrossRefGoogle Scholar
  11. 11.
    Little RJA, Rubin DB. Statistical analysis with missing data. New York: Wiley; 2014.zbMATHGoogle Scholar
  12. 12.
    Zhu X. Comparison of four methods for handing missing data in longitudinal data analysis through a simulation study. Open J Stat. 2014. Scholar
  13. 13.
    Moon TK. The expectation-maximization algorithm. IEEE Signal Process Mag. 1996;13:47–60.CrossRefGoogle Scholar
  14. 14.
    Ibrahim JG, Chen M-H, Lipsitz SR. Monte Carlo EM for missing covariates in parametric regression models. Biometrics. 1999;55:591–6.CrossRefGoogle Scholar
  15. 15.
    Bezdek JC, et al. FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1984;10(2–3):191–203.CrossRefGoogle Scholar
  16. 16.
    Purwar A, Singh SK. Hybrid prediction model with missing value imputation for medical data. Expert Syst Appl. 2015;42(13):5621–31.CrossRefGoogle Scholar
  17. 17.
    Chowdhury MH, Islam MK, Khan SI. Imputation of missing healthcare data. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), IEEE, 2017.Google Scholar
  18. 18.
    Jinsung Y, Jordon J, van der Schaar M. GAIN: missing data imputation using generative adversarial nets. arXiv:1806.02920 (2018).
  19. 19.
    Purwar A, Singh SK. Issues in data mining: a comprehensive survey. In: 2014 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), IEEE, 2014.Google Scholar
  20. 20.
    Govardhan A, Madhu G, Rajinikanth TV. A non-parametric discretization based imputation algorithm for continuous attributes with missing data values. Int J Inf Process. 2014;8:64–72.Google Scholar
  21. 21.
    Rao S, et al. A new intelligence-based approach for computer-aided diagnosis of dengue fever. IEEE Trans Inf Technol Biomed. 2012;16:112–8.CrossRefGoogle Scholar
  22. 22.
    Tsai C-F, Li M-L, Lin W-C. A class center based approach for missing value imputation. Knowl Based Syst. 2018;151:124–35.CrossRefGoogle Scholar
  23. 23.
    Sujatha M, et al. Rough set theory based missing value imputation. Cognitive science and health bioinformatics. Singapore: Springer; 2018. p. 97–106.CrossRefGoogle Scholar
  24. 24.
    Kumar MN. Performance comparison of state-of-the-art missing value imputation algorithms on some bench mark datasets. arXiv:1307.5599 2013.
  25. 25.
    Casillas A et al. First approaches on Spanish medical record classification using diagnostic Term to class transduction. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing. 2012.Google Scholar
  26. 26.
    Kasper D, et al. Harrison’s principles of internal medicine. New York: McGraw-Hill Education; 2015.Google Scholar
  27. 27.
    Glynn M, Drake WM. Hutchison’s clinical methods E-book: an integrated approach to clinical practice. Amsterdam: Elsevier Health Sciences; 2017.Google Scholar
  28. 28.
    Ryan TP. Modern regression methods. New York: Wiley; 2008.CrossRefGoogle Scholar
  29. 29.
    Rubinstein RY, Kroese DP. Simulation and the Monte Carlo method. New York: Wiley; 2016.CrossRefGoogle Scholar
  30. 30.
  31. 31.
    Richards MT, Richards DSP, Martínez-Gómez E. Interpreting the distance correlation results for the COMBO-17 survey. Astrophys J Lett. 2014;784:L34.CrossRefGoogle Scholar
  32. 32.
    Chen L, et al. Softmax regression based deep sparse autoencoder network for facial emotion recognition in human–robot interaction. Inf Sci. 2018;428:49–61.MathSciNetCrossRefGoogle Scholar
  33. 33.
    Mamdani EH, Assilian S. An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud. 1975;7(1):1–3.CrossRefGoogle Scholar
  34. 34.
    Myers RH, Myers RH. Classical and modern regression with applications, vol. 2. Belmont: Duxbury Press; 1990.zbMATHGoogle Scholar
  35. 35.
    Fortemps P, Roubens M. Ranking and defuzzification methods based on area compensation. Fuzzy Sets Syst. 1996;82(3):319–30.MathSciNetCrossRefGoogle Scholar
  36. 36.
    Saitta S, et al. A bounded index for cluster validity. International Workshop on Machine Learning and Data Mining in Pattern Recognition. Berlin: Springer; 2007. p. 174–87.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Computer Science and TechnologyIndian Institute of Engineering Science and TechnologyShibpurIndia

Personalised recommendations