Abstract
Purpose
In India, 67% of the total population live in remote area, where providing primary healthcare is a real challenge due to the scarcity of doctors. Health kiosks are deployed in remote villages and basic health data like blood pressure, pulse rate, height–weight, BMI, Oxygen saturation level (SpO2) etc. are collected. The acquired data is often imprecise due to measurement error and contains missing value. The paper proposes a comprehensive framework to impute missing symptom values by managing uncertainty present in the data set.
Methods
The data sets are fuzzified to manage uncertainty and fuzzy c-means clustering algorithm has been applied to group the symptom feature vectors into different disease classes. The missing symptom values corresponding to each disease are imputed using multiple fuzzy based regression model. Relations between different symptoms are framed with the help of experts and medical literature. Blood pressure symptom has been dealt with using a novel approach due to its characteristics and different from other symptoms. Patients’ records obtained from the kiosks are not adequate, so relevant data are simulated by the Monte Carlo method to avoid over-fitting problem while imputing missing values of the symptoms. The generated datasets are verified using Kulberk–Leiber (K–L) distance and distance correlation (dCor) techniques, showing that the simulated data sets are well correlated with the real data set.
Results
Using the data sets, the proposed model is built and new patients are provisionally diagnosed using Softmax cost function. Multiple class labels as diseases are determined by achieving about 98% accuracy and verified with the ground truth provided by the experts.
Conclusions
It is worth to mention that the system is for primary healthcare and in emergency cases, patients are referred to the experts.
Similar content being viewed by others
References
Chokshi M, et al. Health systems in India. J Perinatol. 2016;36:S9–12. https://doi.org/10.1038/jp.2016.184.
Lodh N, Sil J, Bhattacharya I. Graph based clinical decision support system using ontological framework. In: Proc. Conference on Computational Intelligence, Communications and Business Analytics (CICBA-2017). Springer.
Pedersen AB, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157.
Paul A, Sil J. Estimating missing value in microarray gene expression data using fuzzy similarity measure. In: IEEE International Conference on Fuzzy Systems. 27–30 June 2011, Taipei, Taiwan.
Zainuri NA, Jemain AA, Muda N. A comparison of various imputation methods for missing values in air quality data. Sains Malays. 2015;44(3):449–56.
Tian J, Yu B, Yu D, Ma S. Clustering-based multiple imputation via gray relational analysis for missing data and its application to aerospace field. Sci World J. 2013. https://doi.org/10.1155/2013/720392.
Wu X, Kumar V, Quinlan JR, et al. Top 10 algorithms in data mining. Knowl Inf Syst. 2007;14(1):1–37. https://doi.org/10.1007/s10115-007-0114-2.
Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987. https://doi.org/10.1002/9780470316696.
Andridge RR, Little RJ. A review of hot deck imputation for survey non-response. Int Stat Rev. 2010;78(1):40–64.
Schafer JL. Analysis of incomplete multivariate data. Baco Raton: CRC Press; 1997.
Little RJA, Rubin DB. Statistical analysis with missing data. New York: Wiley; 2014.
Zhu X. Comparison of four methods for handing missing data in longitudinal data analysis through a simulation study. Open J Stat. 2014. https://doi.org/10.4236/ojs.2014.411088.
Moon TK. The expectation-maximization algorithm. IEEE Signal Process Mag. 1996;13:47–60.
Ibrahim JG, Chen M-H, Lipsitz SR. Monte Carlo EM for missing covariates in parametric regression models. Biometrics. 1999;55:591–6.
Bezdek JC, et al. FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1984;10(2–3):191–203.
Purwar A, Singh SK. Hybrid prediction model with missing value imputation for medical data. Expert Syst Appl. 2015;42(13):5621–31.
Chowdhury MH, Islam MK, Khan SI. Imputation of missing healthcare data. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), IEEE, 2017.
Jinsung Y, Jordon J, van der Schaar M. GAIN: missing data imputation using generative adversarial nets. arXiv:1806.02920 (2018).
Purwar A, Singh SK. Issues in data mining: a comprehensive survey. In: 2014 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), IEEE, 2014.
Govardhan A, Madhu G, Rajinikanth TV. A non-parametric discretization based imputation algorithm for continuous attributes with missing data values. Int J Inf Process. 2014;8:64–72.
Rao S, et al. A new intelligence-based approach for computer-aided diagnosis of dengue fever. IEEE Trans Inf Technol Biomed. 2012;16:112–8.
Tsai C-F, Li M-L, Lin W-C. A class center based approach for missing value imputation. Knowl Based Syst. 2018;151:124–35.
Sujatha M, et al. Rough set theory based missing value imputation. Cognitive science and health bioinformatics. Singapore: Springer; 2018. p. 97–106.
Kumar MN. Performance comparison of state-of-the-art missing value imputation algorithms on some bench mark datasets. arXiv:1307.5599 2013.
Casillas A et al. First approaches on Spanish medical record classification using diagnostic Term to class transduction. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing. 2012.
Kasper D, et al. Harrison’s principles of internal medicine. New York: McGraw-Hill Education; 2015.
Glynn M, Drake WM. Hutchison’s clinical methods E-book: an integrated approach to clinical practice. Amsterdam: Elsevier Health Sciences; 2017.
Ryan TP. Modern regression methods. New York: Wiley; 2008.
Rubinstein RY, Kroese DP. Simulation and the Monte Carlo method. New York: Wiley; 2016.
Allison L. http://www.allisons.org/ll/MML/KL/Normal/.
Richards MT, Richards DSP, Martínez-Gómez E. Interpreting the distance correlation results for the COMBO-17 survey. Astrophys J Lett. 2014;784:L34.
Chen L, et al. Softmax regression based deep sparse autoencoder network for facial emotion recognition in human–robot interaction. Inf Sci. 2018;428:49–61.
Mamdani EH, Assilian S. An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud. 1975;7(1):1–3.
Myers RH, Myers RH. Classical and modern regression with applications, vol. 2. Belmont: Duxbury Press; 1990.
Fortemps P, Roubens M. Ranking and defuzzification methods based on area compensation. Fuzzy Sets Syst. 1996;82(3):319–30.
Saitta S, et al. A bounded index for cluster validity. International Workshop on Machine Learning and Data Mining in Pattern Recognition. Berlin: Springer; 2007. p. 174–87.
Acknowledgements
This work is supported by Information Technology Research Academy (ITRA), Digital India Corporation (formerly Media Lab Asia), Government of India under, ITRA-Mobile Grant [ITRA/15(59)/Mobile/Remote Health/01].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Das, S., Sil, J. Managing uncertainty in imputing missing symptom value for healthcare of rural India. Health Inf Sci Syst 7, 5 (2019). https://doi.org/10.1007/s13755-019-0066-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13755-019-0066-4