Spatial Distribution Based Provisional Disease Diagnosis in Remote Healthcare
Patients in rural India cannot able to enquire about their health using appropriate disease related keywords, submitted as query. Lack of domain knowledge prevents the patients to refine the query using well-known feedback mechanism. Moreover, due to scarcity of doctors in rural India, the health assistants who run the health centers do not have enough knowledge to treat the patients based on the imprecise query. In the paper, we propose an autonomous provisional disease diagnosis system by classifying the query, which has been expanded using semantic of the domain knowledge. First, we apply spatial distribution based nearest neighbor spacing distribution (NNSD) on the disease related medical document corpus (MDC) to find the relevant terms, mostly symptoms with respect to different diseases. We frame a symptom vocabulary (SV) with the unique terms present in different diseases, known apriori. Each query is expanded as bag of symptoms (BoS) using 5-gram collocation model and log likelihood ratio (LLR) to measure the association between the query and the terms in the MDC. The terms in the BoS may not exactly match with the symptoms in the SV but have contextual similarity. We propose a novel approach to know which symptoms in the SV are nearest in context to the corresponding terms in the BoS. The feature vector is obtained by encoding the SV with respect to (w.r.t.) each BoS, which is sparse in nature. We apply sparse representation based classifier (SRC) to classify the query into a particular disease. Proposed nearest neighbor spacing distribution based sparse representation classifier (NNSD-SRC) shows promising performance considering MDC dataset and we validate the results with the doctors showing negligible error.
KeywordsSpatial distribution Provisional diagnosis Sparse classifier
This research was supported by grants from Information Technology Research Academy (ITRA), under the Department of Electronics and Information Technology (DeitY), Government of India.
- 2.Sil, J., Bhattacharya, I.: Patient classification based on expanded query using 5-gram collocation and binary tree. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015 36678 2015, pp. 1–10. IEEE (2015)Google Scholar
- 5.Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning (2003)Google Scholar
- 6.Pauls, A., Klein, D.: Faster and smaller N-gram language models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 258–267. Association for Computational Linguistics (2011)Google Scholar
- 9.Bhattacharya, I., Sil, J.: Query classification using LDA topic model and sparse representation based classifier. In: 2016 Proceedings of the 3rd IKDD Conference on Data Science, p. 24. ACM, March 2016Google Scholar
- 10.Harrison’s Principles of Internal Medicine, vol. 2. McGraw-Hill Medical, New York (2008)Google Scholar