Managing uncertainty in imputing missing symptom value for healthcare of rural India

Das, Sayan; Sil, Jaya

doi:10.1007/s13755-019-0066-4

Managing uncertainty in imputing missing symptom value for healthcare of rural India

Research
Published: 18 February 2019

Volume 7, article number 5, (2019)
Cite this article

Health Information Science and Systems Aims and scope Submit manuscript

266 Accesses
4 Citations
Explore all metrics

Abstract

Purpose

In India, 67% of the total population live in remote area, where providing primary healthcare is a real challenge due to the scarcity of doctors. Health kiosks are deployed in remote villages and basic health data like blood pressure, pulse rate, height–weight, BMI, Oxygen saturation level (SpO₂) etc. are collected. The acquired data is often imprecise due to measurement error and contains missing value. The paper proposes a comprehensive framework to impute missing symptom values by managing uncertainty present in the data set.

Methods

The data sets are fuzzified to manage uncertainty and fuzzy c-means clustering algorithm has been applied to group the symptom feature vectors into different disease classes. The missing symptom values corresponding to each disease are imputed using multiple fuzzy based regression model. Relations between different symptoms are framed with the help of experts and medical literature. Blood pressure symptom has been dealt with using a novel approach due to its characteristics and different from other symptoms. Patients’ records obtained from the kiosks are not adequate, so relevant data are simulated by the Monte Carlo method to avoid over-fitting problem while imputing missing values of the symptoms. The generated datasets are verified using Kulberk–Leiber (K–L) distance and distance correlation (dCor) techniques, showing that the simulated data sets are well correlated with the real data set.

Results

Using the data sets, the proposed model is built and new patients are provisionally diagnosed using Softmax cost function. Multiple class labels as diseases are determined by achieving about 98% accuracy and verified with the ground truth provided by the experts.

Conclusions

It is worth to mention that the system is for primary healthcare and in emergency cases, patients are referred to the experts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Missing Value Imputation in Medical Records for Remote Health Care

An Imputation for Missing Data Features Based on Fuzzy Swarm Approach in Heart Disease Classification

Missing data imputation with fuzzy feature selection for diabetes dataset

Article 26 March 2019

References

Chokshi M, et al. Health systems in India. J Perinatol. 2016;36:S9–12. https://doi.org/10.1038/jp.2016.184.
Article Google Scholar
Lodh N, Sil J, Bhattacharya I. Graph based clinical decision support system using ontological framework. In: Proc. Conference on Computational Intelligence, Communications and Business Analytics (CICBA-2017). Springer.
Pedersen AB, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157.
Article Google Scholar
Paul A, Sil J. Estimating missing value in microarray gene expression data using fuzzy similarity measure. In: IEEE International Conference on Fuzzy Systems. 27–30 June 2011, Taipei, Taiwan.
Zainuri NA, Jemain AA, Muda N. A comparison of various imputation methods for missing values in air quality data. Sains Malays. 2015;44(3):449–56.
Article Google Scholar
Tian J, Yu B, Yu D, Ma S. Clustering-based multiple imputation via gray relational analysis for missing data and its application to aerospace field. Sci World J. 2013. https://doi.org/10.1155/2013/720392.
Article Google Scholar
Wu X, Kumar V, Quinlan JR, et al. Top 10 algorithms in data mining. Knowl Inf Syst. 2007;14(1):1–37. https://doi.org/10.1007/s10115-007-0114-2.
Article Google Scholar
Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987. https://doi.org/10.1002/9780470316696.
Book MATH Google Scholar
Andridge RR, Little RJ. A review of hot deck imputation for survey non-response. Int Stat Rev. 2010;78(1):40–64.
Article Google Scholar
Schafer JL. Analysis of incomplete multivariate data. Baco Raton: CRC Press; 1997.
Book Google Scholar
Little RJA, Rubin DB. Statistical analysis with missing data. New York: Wiley; 2014.
MATH Google Scholar
Zhu X. Comparison of four methods for handing missing data in longitudinal data analysis through a simulation study. Open J Stat. 2014. https://doi.org/10.4236/ojs.2014.411088.
Article Google Scholar
Moon TK. The expectation-maximization algorithm. IEEE Signal Process Mag. 1996;13:47–60.
Article Google Scholar
Ibrahim JG, Chen M-H, Lipsitz SR. Monte Carlo EM for missing covariates in parametric regression models. Biometrics. 1999;55:591–6.
Article Google Scholar
Bezdek JC, et al. FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1984;10(2–3):191–203.
Article Google Scholar
Purwar A, Singh SK. Hybrid prediction model with missing value imputation for medical data. Expert Syst Appl. 2015;42(13):5621–31.
Article Google Scholar
Chowdhury MH, Islam MK, Khan SI. Imputation of missing healthcare data. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), IEEE, 2017.
Jinsung Y, Jordon J, van der Schaar M. GAIN: missing data imputation using generative adversarial nets. arXiv:1806.02920 (2018).
Purwar A, Singh SK. Issues in data mining: a comprehensive survey. In: 2014 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), IEEE, 2014.
Govardhan A, Madhu G, Rajinikanth TV. A non-parametric discretization based imputation algorithm for continuous attributes with missing data values. Int J Inf Process. 2014;8:64–72.
Google Scholar
Rao S, et al. A new intelligence-based approach for computer-aided diagnosis of dengue fever. IEEE Trans Inf Technol Biomed. 2012;16:112–8.
Article Google Scholar
Tsai C-F, Li M-L, Lin W-C. A class center based approach for missing value imputation. Knowl Based Syst. 2018;151:124–35.
Article Google Scholar
Sujatha M, et al. Rough set theory based missing value imputation. Cognitive science and health bioinformatics. Singapore: Springer; 2018. p. 97–106.
Book Google Scholar
Kumar MN. Performance comparison of state-of-the-art missing value imputation algorithms on some bench mark datasets. arXiv:1307.5599 2013.
Casillas A et al. First approaches on Spanish medical record classification using diagnostic Term to class transduction. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing. 2012.
Kasper D, et al. Harrison’s principles of internal medicine. New York: McGraw-Hill Education; 2015.
Google Scholar
Glynn M, Drake WM. Hutchison’s clinical methods E-book: an integrated approach to clinical practice. Amsterdam: Elsevier Health Sciences; 2017.
Google Scholar
Ryan TP. Modern regression methods. New York: Wiley; 2008.
Book Google Scholar
Rubinstein RY, Kroese DP. Simulation and the Monte Carlo method. New York: Wiley; 2016.
Book Google Scholar
Allison L. http://www.allisons.org/ll/MML/KL/Normal/.
Richards MT, Richards DSP, Martínez-Gómez E. Interpreting the distance correlation results for the COMBO-17 survey. Astrophys J Lett. 2014;784:L34.
Article Google Scholar
Chen L, et al. Softmax regression based deep sparse autoencoder network for facial emotion recognition in human–robot interaction. Inf Sci. 2018;428:49–61.
Article MathSciNet Google Scholar
Mamdani EH, Assilian S. An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud. 1975;7(1):1–3.
Article Google Scholar
Myers RH, Myers RH. Classical and modern regression with applications, vol. 2. Belmont: Duxbury Press; 1990.
MATH Google Scholar
Fortemps P, Roubens M. Ranking and defuzzification methods based on area compensation. Fuzzy Sets Syst. 1996;82(3):319–30.
Article MathSciNet Google Scholar
Saitta S, et al. A bounded index for cluster validity. International Workshop on Machine Learning and Data Mining in Pattern Recognition. Berlin: Springer; 2007. p. 174–87.
Chapter Google Scholar

Download references

Acknowledgements

This work is supported by Information Technology Research Academy (ITRA), Digital India Corporation (formerly Media Lab Asia), Government of India under, ITRA-Mobile Grant [ITRA/15(59)/Mobile/Remote Health/01].

Author information

Authors and Affiliations

Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, West Bengal, India
Sayan Das & Jaya Sil

Authors

Sayan Das
View author publications
You can also search for this author in PubMed Google Scholar
Jaya Sil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sayan Das.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, S., Sil, J. Managing uncertainty in imputing missing symptom value for healthcare of rural India. Health Inf Sci Syst 7, 5 (2019). https://doi.org/10.1007/s13755-019-0066-4

Download citation

Received: 17 June 2018
Accepted: 01 February 2019
Published: 18 February 2019
DOI: https://doi.org/10.1007/s13755-019-0066-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Managing uncertainty in imputing missing symptom value for healthcare of rural India