Abstract
In digitized world, data is growing exponentially and Big Data Analytics is an emerging trend and a dominant research field. Data mining techniques play an energetic role in the application of Big Data in healthcare sector. Data mining algorithms give an exposure to analyse, detect and predict the presence of disease and help doctors in decision-making by early detection and right management. The main objective of data mining techniques in healthcare systems is to design an automated tool which diagnoses the medical data and intimates the patients and doctors about the intensity of the disease and the type of treatment to be best practiced based on the symptoms, patient record and treatment history. This paper emphasises on diabetes medical data where classification and clustering algorithms are implemented and the efficiency of the same is examined.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: Promise and potential. Health Information Science and Systems.
Diabetes Mellitus. https://en.wikipedia.org/wiki/Diabetes_mellitus.
Agicha, K., et al. Survey on predictive analysis of diabetes in young and old patients. International Journal of Advanced Research in Computer Science and Software Engineering.
Iyer, A., Jeyalatha, S., & Sumbaly, R. (2015, January). Diagnosis of diabetes using classification mining techniques. International Journal of Data Mining & Knowledge Management Process (IJDKP), 5(1).
Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. OPTICS: Ordering Points To Identify the Clustering Structure. Institute for Computer Science, University of Munic.
Alzaalan, M. E., & Aldahdooh, R. T. (2012, February). EOPTICS “Enhancement ordering points to identify the clustering structure”. International Journal of Computer Applications (0975–8887), 40(17).
Senthil kumaran, M., & Rangarajan, R. (2011). Ordering points to identify the clustering structure (OPTICS) with ant colony optimization for wireless sensor networks. European Journal of Scientific Research, 59(4), 571–582 (ISSN 1450-216X).
Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1, 141–182.
Zhang, T., Ramakrishnan, R., & Livny, M. BIRCH: An efficient data clustering method for very large databases.
Du, H. Z., & Li, Y. B. (2010). An improved BIRCH clustering algorithm and application in thermal power. In 2010 International Conference on Web Information Systems and Mining.
Feng, X., & Pan, Q. The algorithm of deviation measure for cluster models based on the FOCUS framework and BIRCH. In Second International Symposium on Intelligent Information Technology Application.
UCI Machine Learning Repository Pima Indians Diabetes Database https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes.
Naïve Bayes. https://en.wikipedia.org/wiki/Naive_Bayes_classifier.
Optics Algorithm. https://en.wikipedia.org/wiki/OPTICS_algorithm.
Birch Algorithm. https://people.eecs.berkeley.edu/~fox/summaries/database/birch.html.
Silhouette Method. https://en.wikipedia.org/wiki/Silhouette_(clustering).
Acknowledgements
The authors express their sincere gratitude to Prof. N. R. Shetty, Advisor and Dr. H. C. Nagaraj, Principal, Nitte Meenakshi Institute of Technology for giving constant encouragement and support to carry out research at NMIT.
The authors extend their thanks to Vision Group on Science and Technology (VGST), Government of Karnataka to acknowledge our research and providing financial support to set up the infrastructure required to carry out the research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1
Algorithm: Gaussian Naïve Bayes
Input: Dataset
Output: Classification into different categories
Algorithm Steps:
Step 1. Segment the data by the class.
Step 2. Calculate the probability of each of the class.
Step 3. Find the average and variance of individual attribute x belonging to a class c.
-
a.
Let \( _{{}} \mu_{x} \) be the average of the attribute values in x allied with class c.
-
b.
Let \( \sigma_{x}^{2} \) represent the variance of the attribute values in x related with class c.
Step 4. The probability distribution is computed by
Step 5. Calculate the probability of the attribute x
Naïve Bayes Classifier = argmax P(c) \( \mathop \prod \limits_{i} P (x_{i} |c) \)
Step 6. End.
Appendix 2
Algorithm: OPTICS
Input: Dataset
Output: Clusters
Algorithm Steps:
Step 1. Initially ε and MinPts need to be specified.
Step 2. All the data points in the dataset are marked as unprocessed.
Step 3. Neighbours are found for each point p which is unprocessed.
Step 4. Now mark the point as processed.
Step 5. Initialize the core distance for the data point p.
Step 6. Create an Order file and append point p to the file.
Step 7. If core distance initialization is unsuccessful, return back to Step 3 otherwise go to Step 8.
Step 8. Calculate the reachability distance for each of the neighbours and update the order seed with the reference of new values.
Step 9. Find the neighbours for each data point in the order seed and update the point as processed.
Step 10. Fix the core distance of the point and append to the order file.
Step 11. If undefined core distance exists, go back to Step 9, else continue with Step 12.
Step 12. Repeat Step 8 until there is no change in the order seed.
Step 13. End.
Appendix 3
Algorithm: BIRCH
Input: Dataset
Output: Clusters
Algorithm steps:
Step 1. Set an initial threshold value and insert data points to the CF tree w.r.t the Insertion algorithm.
Step 2. Increase the threshold value if the size of the tree exceeds the memory limit assigned to it.
Step 3. Reconstruct the partially built tree according to the newly set threshold values and memory limit.
Step 4. Repeat Step 1 to Step 3 until all the data objects are scanned and form a complete tree.
Step 5. Smaller CF trees are built by varying the threshold values and eliminating the Outliers.
Step 6. Considering the leaf entities of the CF tree, the clustering quality is improved by applying the global clustering algorithm.
Step 7. Redistribution of data objects and labelling each point in the completely built CF tree.
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mamatha Bai, B.G., Nalini, B.M., Majumdar, J. (2019). Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care. In: Shetty, N., Patnaik, L., Nagaraj, H., Hamsavath, P., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Advances in Intelligent Systems and Computing, vol 882. Springer, Singapore. https://doi.org/10.1007/978-981-13-5953-8_37
Download citation
DOI: https://doi.org/10.1007/978-981-13-5953-8_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5952-1
Online ISBN: 978-981-13-5953-8
eBook Packages: EngineeringEngineering (R0)