Skip to main content

Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care

  • Conference paper
  • First Online:
Emerging Research in Computing, Information, Communication and Applications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 882))

Abstract

In digitized world, data is growing exponentially and Big Data Analytics is an emerging trend and a dominant research field. Data mining techniques play an energetic role in the application of Big Data in healthcare sector. Data mining algorithms give an exposure to analyse, detect and predict the presence of disease and help doctors in decision-making by early detection and right management. The main objective of data mining techniques in healthcare systems is to design an automated tool which diagnoses the medical data and intimates the patients and doctors about the intensity of the disease and the type of treatment to be best practiced based on the symptoms, patient record and treatment history. This paper emphasises on diabetes medical data where classification and clustering algorithms are implemented and the efficiency of the same is examined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: Promise and potential. Health Information Science and Systems.

    Google Scholar 

  2. Diabetes Mellitus. https://en.wikipedia.org/wiki/Diabetes_mellitus.

  3. Agicha, K., et al. Survey on predictive analysis of diabetes in young and old patients. International Journal of Advanced Research in Computer Science and Software Engineering.

    Google Scholar 

  4. Iyer, A., Jeyalatha, S., & Sumbaly, R. (2015, January). Diagnosis of diabetes using classification mining techniques. International Journal of Data Mining & Knowledge Management Process (IJDKP), 5(1).

    Google Scholar 

  5. Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. OPTICS: Ordering Points To Identify the Clustering Structure. Institute for Computer Science, University of Munic.

    Google Scholar 

  6. Alzaalan, M. E., & Aldahdooh, R. T. (2012, February). EOPTICS “Enhancement ordering points to identify the clustering structure”. International Journal of Computer Applications (0975–8887), 40(17).

    Google Scholar 

  7. Senthil kumaran, M., & Rangarajan, R. (2011). Ordering points to identify the clustering structure (OPTICS) with ant colony optimization for wireless sensor networks. European Journal of Scientific Research, 59(4), 571–582 (ISSN 1450-216X).

    Google Scholar 

  8. Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1, 141–182.

    Article  Google Scholar 

  9. Zhang, T., Ramakrishnan, R., & Livny, M. BIRCH: An efficient data clustering method for very large databases.

    Google Scholar 

  10. Du, H. Z., & Li, Y. B. (2010). An improved BIRCH clustering algorithm and application in thermal power. In 2010 International Conference on Web Information Systems and Mining.

    Google Scholar 

  11. Feng, X., & Pan, Q. The algorithm of deviation measure for cluster models based on the FOCUS framework and BIRCH. In Second International Symposium on Intelligent Information Technology Application.

    Google Scholar 

  12. UCI Machine Learning Repository Pima Indians Diabetes Database https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes.

  13. Naïve Bayes. https://en.wikipedia.org/wiki/Naive_Bayes_classifier.

  14. Optics Algorithm. https://en.wikipedia.org/wiki/OPTICS_algorithm.

  15. Birch Algorithm. https://people.eecs.berkeley.edu/~fox/summaries/database/birch.html.

  16. Silhouette Method. https://en.wikipedia.org/wiki/Silhouette_(clustering).

Download references

Acknowledgements

The authors express their sincere gratitude to Prof. N. R. Shetty, Advisor and Dr. H. C. Nagaraj, Principal, Nitte Meenakshi Institute of Technology for giving constant encouragement and support to carry out research at NMIT.

The authors extend their thanks to Vision Group on Science and Technology (VGST), Government of Karnataka to acknowledge our research and providing financial support to set up the infrastructure required to carry out the research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. G. Mamatha Bai .

Editor information

Editors and Affiliations

Appendices

Appendix 1

Algorithm: Gaussian Naïve Bayes

Input: Dataset

Output: Classification into different categories

Algorithm Steps:

Step 1. Segment the data by the class.

Step 2. Calculate the probability of each of the class.

$$ Class\,Probabilit = \frac{Class\,Count}{Total\,Count} $$

Step 3. Find the average and variance of individual attribute x belonging to a class c.

  1. a.

    Let \( _{{}} \mu_{x} \) be the average of the attribute values in x allied with class c.

  2. b.

    Let \( \sigma_{x}^{2} \) represent the variance of the attribute values in x related with class c.

Step 4. The probability distribution is computed by

$$ {\text{p}}\left( {{\text{x}} = {\text{v|c}}} \right) = \frac{1}{{\sqrt {2\pi \sigma_{x}^{2} } }}e^{{\frac{{ - \left( {v - \mu_{x} } \right)^{2} }}{{2\sigma_{x}^{2} }}}} $$

Step 5. Calculate the probability of the attribute x

$$ P(x_{1} ,x_{2 \ldots \ldots \ldots } x_{n} |c) = \mathop \prod \limits_{i} P (x_{i} |c) $$

Naïve Bayes Classifier = argmax P(c) \( \mathop \prod \limits_{i} P (x_{i} |c) \)

Step 6. End.

Appendix 2

Algorithm: OPTICS

Input: Dataset

Output: Clusters

Algorithm Steps:

Step 1. Initially ε and MinPts need to be specified.

Step 2. All the data points in the dataset are marked as unprocessed.

Step 3. Neighbours are found for each point p which is unprocessed.

Step 4. Now mark the point as processed.

Step 5. Initialize the core distance for the data point p.

Step 6. Create an Order file and append point p to the file.

Step 7. If core distance initialization is unsuccessful, return back to Step 3 otherwise go to Step 8.

Step 8. Calculate the reachability distance for each of the neighbours and update the order seed with the reference of new values.

Step 9. Find the neighbours for each data point in the order seed and update the point as processed.

Step 10. Fix the core distance of the point and append to the order file.

Step 11. If undefined core distance exists, go back to Step 9, else continue with Step 12.

Step 12. Repeat Step 8 until there is no change in the order seed.

Step 13. End.

Appendix 3

Algorithm: BIRCH

Input: Dataset

Output: Clusters

Algorithm steps:

Step 1. Set an initial threshold value and insert data points to the CF tree w.r.t the Insertion algorithm.

Step 2. Increase the threshold value if the size of the tree exceeds the memory limit assigned to it.

Step 3. Reconstruct the partially built tree according to the newly set threshold values and memory limit.

Step 4. Repeat Step 1 to Step 3 until all the data objects are scanned and form a complete tree.

Step 5. Smaller CF trees are built by varying the threshold values and eliminating the Outliers.

Step 6. Considering the leaf entities of the CF tree, the clustering quality is improved by applying the global clustering algorithm.

Step 7. Redistribution of data objects and labelling each point in the completely built CF tree.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mamatha Bai, B.G., Nalini, B.M., Majumdar, J. (2019). Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care. In: Shetty, N., Patnaik, L., Nagaraj, H., Hamsavath, P., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Advances in Intelligent Systems and Computing, vol 882. Springer, Singapore. https://doi.org/10.1007/978-981-13-5953-8_37

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-5953-8_37

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-5952-1

  • Online ISBN: 978-981-13-5953-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics