Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care

Mamatha Bai, B. G.; Nalini, B. M.; Majumdar, Jharna

doi:10.1007/978-981-13-5953-8_37

B. G. Mamatha Bai¹⁹,
B. M. Nalini¹⁹ &
Jharna Majumdar¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 882))

1060 Accesses
9 Citations

Abstract

In digitized world, data is growing exponentially and Big Data Analytics is an emerging trend and a dominant research field. Data mining techniques play an energetic role in the application of Big Data in healthcare sector. Data mining algorithms give an exposure to analyse, detect and predict the presence of disease and help doctors in decision-making by early detection and right management. The main objective of data mining techniques in healthcare systems is to design an automated tool which diagnoses the medical data and intimates the patients and doctors about the intensity of the disease and the type of treatment to be best practiced based on the symptoms, patient record and treatment history. This paper emphasises on diabetes medical data where classification and clustering algorithms are implemented and the efficiency of the same is examined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: Promise and potential. Health Information Science and Systems.
Google Scholar
Diabetes Mellitus. https://en.wikipedia.org/wiki/Diabetes_mellitus.
Agicha, K., et al. Survey on predictive analysis of diabetes in young and old patients. International Journal of Advanced Research in Computer Science and Software Engineering.
Google Scholar
Iyer, A., Jeyalatha, S., & Sumbaly, R. (2015, January). Diagnosis of diabetes using classification mining techniques. International Journal of Data Mining & Knowledge Management Process (IJDKP), 5(1).
Google Scholar
Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. OPTICS: Ordering Points To Identify the Clustering Structure. Institute for Computer Science, University of Munic.
Google Scholar
Alzaalan, M. E., & Aldahdooh, R. T. (2012, February). EOPTICS “Enhancement ordering points to identify the clustering structure”. International Journal of Computer Applications (0975–8887), 40(17).
Google Scholar
Senthil kumaran, M., & Rangarajan, R. (2011). Ordering points to identify the clustering structure (OPTICS) with ant colony optimization for wireless sensor networks. European Journal of Scientific Research, 59(4), 571–582 (ISSN 1450-216X).
Google Scholar
Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1, 141–182.
Article Google Scholar
Zhang, T., Ramakrishnan, R., & Livny, M. BIRCH: An efficient data clustering method for very large databases.
Google Scholar
Du, H. Z., & Li, Y. B. (2010). An improved BIRCH clustering algorithm and application in thermal power. In 2010 International Conference on Web Information Systems and Mining.
Google Scholar
Feng, X., & Pan, Q. The algorithm of deviation measure for cluster models based on the FOCUS framework and BIRCH. In Second International Symposium on Intelligent Information Technology Application.
Google Scholar
UCI Machine Learning Repository Pima Indians Diabetes Database https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes.
Naïve Bayes. https://en.wikipedia.org/wiki/Naive_Bayes_classifier.
Optics Algorithm. https://en.wikipedia.org/wiki/OPTICS_algorithm.
Birch Algorithm. https://people.eecs.berkeley.edu/~fox/summaries/database/birch.html.
Silhouette Method. https://en.wikipedia.org/wiki/Silhouette_(clustering).

Download references

Acknowledgements

The authors express their sincere gratitude to Prof. N. R. Shetty, Advisor and Dr. H. C. Nagaraj, Principal, Nitte Meenakshi Institute of Technology for giving constant encouragement and support to carry out research at NMIT.

The authors extend their thanks to Vision Group on Science and Technology (VGST), Government of Karnataka to acknowledge our research and providing financial support to set up the infrastructure required to carry out the research.

Author information

Authors and Affiliations

Department of CSE, Nitte Meenakshi Institute of Technology, Bangalore, India
B. G. Mamatha Bai, B. M. Nalini & Jharna Majumdar

Authors

B. G. Mamatha Bai
View author publications
You can also search for this author in PubMed Google Scholar
B. M. Nalini
View author publications
You can also search for this author in PubMed Google Scholar
Jharna Majumdar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. G. Mamatha Bai .

Editor information

Editors and Affiliations

Chancellor, Central University of Karnataka, Kalaburagi, Karnataka, India
N. R. Shetty
INSA Senior Scientist, National Institute of Advanced Studies,, Bangalore, Karnataka, India
L. M. Patnaik
Principal, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India
H. C. Nagaraj
Professor, Nitte Meenakshi Inst of Tech, Bangalore, Karnataka, India
Prasad Naik Hamsavath
Professor, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India
N. Nalini

Appendices

Appendix 1

Algorithm: Gaussian Naïve Bayes

Input: Dataset

Output: Classification into different categories

Algorithm Steps:

Step 1. Segment the data by the class.

Step 2. Calculate the probability of each of the class.

$$ Class\,Probabilit = \frac{Class\,Count}{Total\,Count} $$

Step 3. Find the average and variance of individual attribute x belonging to a class c.

a.
Let $ _{{}} \mu_{x} $ be the average of the attribute values in x allied with class c.
b.
Let $ \sigma_{x}^{2} $ represent the variance of the attribute values in x related with class c.

Step 4. The probability distribution is computed by

$$ {\text{p}}\left( {{\text{x}} = {\text{v|c}}} \right) = \frac{1}{{\sqrt {2\pi \sigma_{x}^{2} } }}e^{{\frac{{ - \left( {v - \mu_{x} } \right)^{2} }}{{2\sigma_{x}^{2} }}}} $$

Step 5. Calculate the probability of the attribute x

$$ P(x_{1} ,x_{2 \ldots \ldots \ldots } x_{n} |c) = \mathop \prod \limits_{i} P (x_{i} |c) $$

Naïve Bayes Classifier = argmax P(c) $ \mathop \prod \limits_{i} P (x_{i} |c) $

Step 6. End.

Appendix 2

Algorithm: OPTICS

Input: Dataset

Output: Clusters

Algorithm Steps:

Step 1. Initially ε and MinPts need to be specified.

Step 2. All the data points in the dataset are marked as unprocessed.

Step 3. Neighbours are found for each point p which is unprocessed.

Step 4. Now mark the point as processed.

Step 5. Initialize the core distance for the data point p.

Step 6. Create an Order file and append point p to the file.

Step 7. If core distance initialization is unsuccessful, return back to Step 3 otherwise go to Step 8.

Step 8. Calculate the reachability distance for each of the neighbours and update the order seed with the reference of new values.

Step 9. Find the neighbours for each data point in the order seed and update the point as processed.

Step 10. Fix the core distance of the point and append to the order file.

Step 11. If undefined core distance exists, go back to Step 9, else continue with Step 12.

Step 12. Repeat Step 8 until there is no change in the order seed.

Step 13. End.

Appendix 3

Algorithm: BIRCH

Input: Dataset

Output: Clusters

Algorithm steps:

Step 1. Set an initial threshold value and insert data points to the CF tree w.r.t the Insertion algorithm.

Step 2. Increase the threshold value if the size of the tree exceeds the memory limit assigned to it.

Step 3. Reconstruct the partially built tree according to the newly set threshold values and memory limit.

Step 4. Repeat Step 1 to Step 3 until all the data objects are scanned and form a complete tree.

Step 5. Smaller CF trees are built by varying the threshold values and eliminating the Outliers.

Step 6. Considering the leaf entities of the CF tree, the clustering quality is improved by applying the global clustering algorithm.

Step 7. Redistribution of data objects and labelling each point in the completely built CF tree.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mamatha Bai, B.G., Nalini, B.M., Majumdar, J. (2019). Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care. In: Shetty, N., Patnaik, L., Nagaraj, H., Hamsavath, P., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Advances in Intelligent Systems and Computing, vol 882. Springer, Singapore. https://doi.org/10.1007/978-981-13-5953-8_37

Download citation

DOI: https://doi.org/10.1007/978-981-13-5953-8_37
Published: 03 May 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5952-1
Online ISBN: 978-981-13-5953-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation