Skip to main content

Advertisement

Log in

Managing uncertainty in imputing missing symptom value for healthcare of rural India

  • Research
  • Published:
Health Information Science and Systems Aims and scope Submit manuscript

Abstract

Purpose

In India, 67% of the total population live in remote area, where providing primary healthcare is a real challenge due to the scarcity of doctors. Health kiosks are deployed in remote villages and basic health data like blood pressure, pulse rate, height–weight, BMI, Oxygen saturation level (SpO2) etc. are collected. The acquired data is often imprecise due to measurement error and contains missing value. The paper proposes a comprehensive framework to impute missing symptom values by managing uncertainty present in the data set.

Methods

The data sets are fuzzified to manage uncertainty and fuzzy c-means clustering algorithm has been applied to group the symptom feature vectors into different disease classes. The missing symptom values corresponding to each disease are imputed using multiple fuzzy based regression model. Relations between different symptoms are framed with the help of experts and medical literature. Blood pressure symptom has been dealt with using a novel approach due to its characteristics and different from other symptoms. Patients’ records obtained from the kiosks are not adequate, so relevant data are simulated by the Monte Carlo method to avoid over-fitting problem while imputing missing values of the symptoms. The generated datasets are verified using Kulberk–Leiber (K–L) distance and distance correlation (dCor) techniques, showing that the simulated data sets are well correlated with the real data set.

Results

Using the data sets, the proposed model is built and new patients are provisionally diagnosed using Softmax cost function. Multiple class labels as diseases are determined by achieving about 98% accuracy and verified with the ground truth provided by the experts.

Conclusions

It is worth to mention that the system is for primary healthcare and in emergency cases, patients are referred to the experts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Chokshi M, et al. Health systems in India. J Perinatol. 2016;36:S9–12. https://doi.org/10.1038/jp.2016.184.

    Article  Google Scholar 

  2. Lodh N, Sil J, Bhattacharya I. Graph based clinical decision support system using ontological framework. In: Proc. Conference on Computational Intelligence, Communications and Business Analytics (CICBA-2017). Springer.

  3. Pedersen AB, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157.

    Article  Google Scholar 

  4. Paul A, Sil J. Estimating missing value in microarray gene expression data using fuzzy similarity measure. In: IEEE International Conference on Fuzzy Systems. 27–30 June 2011, Taipei, Taiwan.

  5. Zainuri NA, Jemain AA, Muda N. A comparison of various imputation methods for missing values in air quality data. Sains Malays. 2015;44(3):449–56.

    Article  Google Scholar 

  6. Tian J, Yu B, Yu D, Ma S. Clustering-based multiple imputation via gray relational analysis for missing data and its application to aerospace field. Sci World J. 2013. https://doi.org/10.1155/2013/720392.

    Article  Google Scholar 

  7. Wu X, Kumar V, Quinlan JR, et al. Top 10 algorithms in data mining. Knowl Inf Syst. 2007;14(1):1–37. https://doi.org/10.1007/s10115-007-0114-2.

    Article  Google Scholar 

  8. Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987. https://doi.org/10.1002/9780470316696.

    Book  MATH  Google Scholar 

  9. Andridge RR, Little RJ. A review of hot deck imputation for survey non-response. Int Stat Rev. 2010;78(1):40–64.

    Article  Google Scholar 

  10. Schafer JL. Analysis of incomplete multivariate data. Baco Raton: CRC Press; 1997.

    Book  Google Scholar 

  11. Little RJA, Rubin DB. Statistical analysis with missing data. New York: Wiley; 2014.

    MATH  Google Scholar 

  12. Zhu X. Comparison of four methods for handing missing data in longitudinal data analysis through a simulation study. Open J Stat. 2014. https://doi.org/10.4236/ojs.2014.411088.

    Article  Google Scholar 

  13. Moon TK. The expectation-maximization algorithm. IEEE Signal Process Mag. 1996;13:47–60.

    Article  Google Scholar 

  14. Ibrahim JG, Chen M-H, Lipsitz SR. Monte Carlo EM for missing covariates in parametric regression models. Biometrics. 1999;55:591–6.

    Article  Google Scholar 

  15. Bezdek JC, et al. FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1984;10(2–3):191–203.

    Article  Google Scholar 

  16. Purwar A, Singh SK. Hybrid prediction model with missing value imputation for medical data. Expert Syst Appl. 2015;42(13):5621–31.

    Article  Google Scholar 

  17. Chowdhury MH, Islam MK, Khan SI. Imputation of missing healthcare data. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), IEEE, 2017.

  18. Jinsung Y, Jordon J, van der Schaar M. GAIN: missing data imputation using generative adversarial nets. arXiv:1806.02920 (2018).

  19. Purwar A, Singh SK. Issues in data mining: a comprehensive survey. In: 2014 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), IEEE, 2014.

  20. Govardhan A, Madhu G, Rajinikanth TV. A non-parametric discretization based imputation algorithm for continuous attributes with missing data values. Int J Inf Process. 2014;8:64–72.

    Google Scholar 

  21. Rao S, et al. A new intelligence-based approach for computer-aided diagnosis of dengue fever. IEEE Trans Inf Technol Biomed. 2012;16:112–8.

    Article  Google Scholar 

  22. Tsai C-F, Li M-L, Lin W-C. A class center based approach for missing value imputation. Knowl Based Syst. 2018;151:124–35.

    Article  Google Scholar 

  23. Sujatha M, et al. Rough set theory based missing value imputation. Cognitive science and health bioinformatics. Singapore: Springer; 2018. p. 97–106.

    Book  Google Scholar 

  24. Kumar MN. Performance comparison of state-of-the-art missing value imputation algorithms on some bench mark datasets. arXiv:1307.5599 2013.

  25. Casillas A et al. First approaches on Spanish medical record classification using diagnostic Term to class transduction. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing. 2012.

  26. Kasper D, et al. Harrison’s principles of internal medicine. New York: McGraw-Hill Education; 2015.

    Google Scholar 

  27. Glynn M, Drake WM. Hutchison’s clinical methods E-book: an integrated approach to clinical practice. Amsterdam: Elsevier Health Sciences; 2017.

    Google Scholar 

  28. Ryan TP. Modern regression methods. New York: Wiley; 2008.

    Book  Google Scholar 

  29. Rubinstein RY, Kroese DP. Simulation and the Monte Carlo method. New York: Wiley; 2016.

    Book  Google Scholar 

  30. Allison L. http://www.allisons.org/ll/MML/KL/Normal/.

  31. Richards MT, Richards DSP, Martínez-Gómez E. Interpreting the distance correlation results for the COMBO-17 survey. Astrophys J Lett. 2014;784:L34.

    Article  Google Scholar 

  32. Chen L, et al. Softmax regression based deep sparse autoencoder network for facial emotion recognition in human–robot interaction. Inf Sci. 2018;428:49–61.

    Article  MathSciNet  Google Scholar 

  33. Mamdani EH, Assilian S. An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud. 1975;7(1):1–3.

    Article  Google Scholar 

  34. Myers RH, Myers RH. Classical and modern regression with applications, vol. 2. Belmont: Duxbury Press; 1990.

    MATH  Google Scholar 

  35. Fortemps P, Roubens M. Ranking and defuzzification methods based on area compensation. Fuzzy Sets Syst. 1996;82(3):319–30.

    Article  MathSciNet  Google Scholar 

  36. Saitta S, et al. A bounded index for cluster validity. International Workshop on Machine Learning and Data Mining in Pattern Recognition. Berlin: Springer; 2007. p. 174–87.

    Chapter  Google Scholar 

Download references

Acknowledgements

This work is supported by Information Technology Research Academy (ITRA), Digital India Corporation (formerly Media Lab Asia), Government of India under, ITRA-Mobile Grant [ITRA/15(59)/Mobile/Remote Health/01].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sayan Das.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, S., Sil, J. Managing uncertainty in imputing missing symptom value for healthcare of rural India. Health Inf Sci Syst 7, 5 (2019). https://doi.org/10.1007/s13755-019-0066-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13755-019-0066-4

Keywords

Navigation