Skip to main content

Data Mining in Healthcare and Biomedicine: A Survey of the Literature


As a new concept that emerged in the middle of 1990’s, data mining can help researchers gain both novel and deep insights and can facilitate unprecedented understanding of large biomedical datasets. Data mining can uncover new biomedical and healthcare knowledge for clinical and administrative decision making as well as generate scientific hypotheses from large experimental data, clinical databases, and/or biomedical literature. This review first introduces data mining in general (e.g., the background, definition, and process of data mining), discusses the major differences between statistics and data mining and then speaks to the uniqueness of data mining in the biomedical and healthcare fields. A brief summarization of various data mining algorithms used for classification, clustering, and association as well as their respective advantages and drawbacks is also presented. Suggested guidelines on how to use data mining algorithms in each area of classification, clustering, and association are offered along with three examples of how data mining has been used in the healthcare industry. Given the successful application of data mining by health related organizations that has helped to predict health insurance fraud and under-diagnosed patients, and identify and classify at-risk people in terms of health with the goal of reducing healthcare cost, we introduce how data mining technologies (in each area of classification, clustering, and association) have been used for a multitude of purposes, including research in the biomedical and healthcare fields. A discussion of the technologies available to enable the prediction of healthcare costs (including length of hospital stay), disease diagnosis and prognosis, and the discovery of hidden biomedical and healthcare patterns from related databases is offered along with a discussion of the use of data mining to discover such relationships as those between health conditions and a disease, relationships among diseases, and relationships among drugs. The article concludes with a discussion of the problems that hamper the clinical use of data mining by health professionals.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    MeSH is National Library of Medicine (NLM)’s controlled vocabulary used for indexing MEDLINE articles.

  2. 2.

    For example, if it takes for a hierarchical algorithm 60 s to cluster 1000 objects (records), to cluster 3000 objects it takes 1620 s (=(3000/1000)3*60) (if there is enough system memory).

  3. 3.

    Some classification algorithms can mine only either numeric data or categorical data.

  4. 4.

    Clustering accuracies can be measured only if class (i.e., a dependent variable) is available.

  5. 5.

  6. 6.


  1. 1.

    The Technology Review Ten, MIT Technology Review (January/February 2001).

  2. 2.

    Larose, D. T., Discovering knowledge in data: an introduction to data mining. Wiley, 2004.

  3. 3.

    Hand, D., Mannila, H., Smyth, P., Principles of data mining. MIT, 2001.

  4. 4.

    Yoo, I., Song, M., Biomedical ontologies and text mining for biomedicine and healthcare: a survey. Journal of Computing Science and Engineering 2(2):109–36, 2008. (

    Google Scholar 

  5. 5.

    Richards, G., Rayward-Smith, V. J., Sönksen, P. H., Carey, S., and Weng, C., Data mining for indicators of early mortality in a database of clinical records. Artif. Intell. Med. 22:215–231, 2001.

    Article  Google Scholar 

  6. 6.

    Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., The KDD process of extracting useful knowledge from volumes of data. Commun. ACM 39(11):27–34, 1996.

    Article  Google Scholar 

  7. 7.

    Berger, A., and Berger, C., Data mining as a tool for research and knowledge development in nursing. Comput. Inform. Nurs. 22(3):123–131, 2004.

    Article  Google Scholar 

  8. 8.

    Shearer, C., The CRISP-DM model: the new blueprint for data mining. J Data Warehous 5(4):13–22, 2000.

    Google Scholar 

  9. 9.

    Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., From data mining to knowledge discovery in databases. Commun. ACM 39(11):24–26, 1996.

    Article  Google Scholar 

  10. 10.

    Han, J., Kamber, M., Data mining: concepts and techniques. 2nd ed. The Morgan Kaufmann Series, 2006.

  11. 11.

    Silver, M., Sakara, T., Su, H. C., Herman, C., Dolins, S. B., and O’shea, M. J., Case study: how to apply data mining techniques in a healthcare data warehouse. J. Healthc. Inf. Manage. 15(2):155–164, 2001.

    Google Scholar 

  12. 12.

    Harper, P. R., A review and comparison of classification algorithms for medical decision making. Health Policy 71:315–331, 2005.

    Article  Google Scholar 

  13. 13.

    Sierra, B., and Larranaga, P., Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artif. Intell. Med. 14:215–230, 1998.

    Article  Google Scholar 

  14. 14.

    Eastwood, E. A., Magaziner, J., Wang, J., Silberzweig, S. B., Hannan, E. L., Strauss, E., et al., Patients with hip fracture: subgroups and their outcomes. J. Am. Geriatr. Soc. 50:1240–1249, 2002.

    Article  Google Scholar 

  15. 15.

    Stel, V. S., Pluijm, S. M., Deeg, D. J., Smit, J. H., Bouter, L. M., and Lips, P., A classification tree for predicting recurrent falling in community-dwelling older persons. J. Am. Geriatr. Soc. 51:1356–1364, 2003.

    Article  Google Scholar 

  16. 16.

    Yu, J. S., Ongarello, S., Fiedler, R., Chen, X. W., Toffolo, G., Cobelli, C., and Trajanoski, Z., Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21:2200–2209, 2005.

    Article  Google Scholar 

  17. 17.

    Adam, B. L., Qu, Y., Davis, J. W., Ward, M. D., Clements, M. A., Cazares, L. H., et al., Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res. 62:3609–3614, 2002.

    Google Scholar 

  18. 18.

    Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., et al., Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572–577, 2002.

    Article  Google Scholar 

  19. 19.

    Bellazzi, R., and Zupan, B., Predictive data mining in clinical medicine: current issues and guidelines. Int. J. Med. Inform. 77:81–97, 2008.

    Article  Google Scholar 

  20. 20.

    Hand, D., Data mining: statistic or more? Am. Stat. 52(2):112–118, 1998.

    MathSciNet  Google Scholar 

  21. 21.

    Seifert, J. W., Data mining: An overview. CRS Report for Congress, The Library of Congress, Dec 2004.

  22. 22.

    Hand, D., Statistics and data mining: intersecting disciplines. ACM SIGKDD 1(1):16–19, 1999.

    Article  Google Scholar 

  23. 23.

    Ichise, R., and Numao Learning, M., First-order rules to handle medical data. NII Journal 2:9–14, 2001.

    Google Scholar 

  24. 24.

    Jolins, J., Ancukiewicz, M., DeLong, E., Pryor, D., Muhlbaier, L., and Mark, D., Discordance of databases designed for claims payment versus clinical information systems: implications for outcomes research. Ann. Intern. Med. 119:844–850, 1993.

    Google Scholar 

  25. 25.

    Dans, P., Looking for answers in all the wrong places. Ann. Intern. Med. 119:855–857, 1993.

    Google Scholar 

  26. 26.

    Prather, J. C., Lobach, D. F., Goodwin, L. F., Hales, J. W., Hage, M. L., and Hammond, W. E., Medical data mining knowledge discovery in a clinical data warehouse. AMIA 1091–8280:101–105, 1997.

    Google Scholar 

  27. 27.

    Berman, J. J., Confidentiality issues for medical data miners. Artif. Intell. Med. 26:25–36, 2002.

    Article  Google Scholar 

  28. 28.

    Cios, K., and Moore, G. W., Uniqueness of medical data mining. Artif. Intell. Med. 26(1–2):1–24, 2002.

    Article  Google Scholar 

  29. 29.

    Brachman, R. J., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G., and Simoudis, E., Mining business databases. Commun. ACM 39(11):42–48, 1996.

    Article  Google Scholar 

  30. 30.

    Velickov, S., Solomatine, D., Predictive data mining: practical examples. 2nd Joint Workshop on Applied AI in Civil Engineering, Cottbus, Germany, March 2000.

  31. 31.

    Dunham, M., Data mining—Introductory and advanced topics. Pearson Education, 2003.

  32. 32.

    Kononenko, I., Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23:89–109, 2001.

    Article  Google Scholar 

  33. 33.

    Delen, D., Walker, G., and Kadam, A., Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34:113–127, 2005.

    Article  Google Scholar 

  34. 34.

    Anderson, J. A., and Davis, J., An introduction to neural networks. MIT, Cambride, 1995.

    MATH  Google Scholar 

  35. 35.

    Obenshain, M. K., Application of data mining techniques to healthcare data. Infect. Control Hosp. Epidemiol. 25(8):690–695, 2004.

    Article  Google Scholar 

  36. 36.

    Übeyli, E. D., Comparison of different classification algorithms in clinical decision making. Expert syst 24(1):17–31, 2007.

    Article  Google Scholar 

  37. 37.

    Kaur, H., and Wasan, S. K., Empirical study on applications of data mining techniques in healthcare. J. Comput. Sci. 2(2):194–200, 2006.

    Article  Google Scholar 

  38. 38.

    Romeo, M., Burden, F., Quinn, M., Wood, B., and McNaughton, D., Infrared microspectroscopy and artificial neural networks in the diagnosis of cervical cancer. Cell. Mol. Biol. (Noisy-le-Grand, France) 44(1):179, 1998.

    Google Scholar 

  39. 39.

    Ball, G., Mian, S., Holding, F., Allibone, R., Lowe, J., Ali, S., et al., An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics 18(3):395–404, 2002.

    Article  Google Scholar 

  40. 40.

    Aleynikov, S., and Micheli-Tzanakou, E., Classification of retinal damage by a neural network based system. J. Med. Syst. 22(3):129–136, 1998.

    Article  Google Scholar 

  41. 41.

    Potter, R., Comparison of classification algorithms applied to breast cancer diagnosis and prognosis, advances in data mining, 7th Industrial Conference, ICDM 2007, Leipzig, Germany, July 2007, pp.40–49.

  42. 42.

    Kononenko, I., Bratko, I., and Kukar, M., Application of machine learning to medical diagnosis. Machine Learning and Data Mining: Methods and Applications 389:408, 1997.

    Google Scholar 

  43. 43.

    Sharma, A., and Roy, R. J., Design of a recognition system to predict movement during anesthesia. IEEE Trans. Biomed. Eng. 44(6):505–511, 1997.

    Article  Google Scholar 

  44. 44.

    Einstein, A. J., Wu, H. S., Sanchez, M., and Gil, J., Fractal characterization of chromatin appearance for diagnosis in breast cytology. J. Pathol. 185(4):366–381, 1998.

    Article  Google Scholar 

  45. 45.

    Brickley, M., Shepherd, J. P., and Armstrong, R. A., Neural networks: a new technique for development of decision support systems in dentistry. J. Dent. 26(4):305–309, 1998.

    Article  Google Scholar 

  46. 46.

    Schwarzer, G., Vach, W., and Schumacher, M., On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. Stat. Med. 19:541–561, 2000.

    Article  Google Scholar 

  47. 47.

    Craven, M. W., Shavlik, J. W., Learning symbolic rules using artificial neural networks. Proc. 10th International Conference on Machine Learning. Amherst, MA, 1993.

  48. 48.

    Quinlan, J. R., Discovering rules by induction from large collections of examples. In: Michie, D., (Ed.), Expert Systems in the Micro Electronic Age. Edinburgh University Press, 1979.

  49. 49.

    Quinlan, J. R., Learning efficient classification procedures and their application to chess endgames. In: Michalski, R. S., Carbonell, J. G., and Mitchell, T. M. (Eds.), Machine learning: an artificial intelligence approach. Tioga Publishing Company, Palo Alto, 1983.

    Google Scholar 

  50. 50.

    Quinlan, J. R., C4.5: programs for machine learning. Morgan Kaufmann, Amsterdam, 1993.

    Google Scholar 

  51. 51.

    Boser, B. E., Guyon, I. M., and Vapnik, V. N., A training algorithm for optimal margin classifiers, Fifth Annual Workshop on Computational Learning Theory. ACM, Pittsburgh, pp. 144–152, 1992.

    Google Scholar 

  52. 52.

    Vapnik, V. N., The nature of statistical learning theory. Springer, NY, 1995.

    MATH  Google Scholar 

  53. 53.

    Vapnik, V. N., and Lerner, A., Pattern recognition using generalized portrait method. Autom. Remote Control 24:774–780, 1963.

    Google Scholar 

  54. 54.

    Vapnik, V. N., and Chervonenkis, Y., On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16:264–280, 1971.

    MATH  Article  Google Scholar 

  55. 55.

    Meyer, D., Leischa, F., and Hornikb, K., The support vector machine under test. Neurocomputing 55(1–2):169–186, 2003.

    Article  Google Scholar 

  56. 56.

    Liu, B., Hsu, W., Ma, Y., Integrating classification and association rule mining, KDD’98. New York, NY, Aug. 1998.

  57. 57.

    Cho, S. B., and Won, H. H., Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl. Intell. 26:243–250, 2007.

    MATH  Article  Google Scholar 

  58. 58.

    Whitehead, M., and Yaeger, L., Sentiment mining using ensemble classification models. In: Sobh, T. (Ed.), Innovations and advances in computer sciences and engineering. Springer, Netherlands, pp. 509–514, 2010.

    Chapter  Google Scholar 

  59. 59.

    Moon, H., Ahn, H., Kodell, R. L., Baek, S., Lin, C. J., and Chen, J. J., Ensemble methods for classification of patients for personalized medicine with high-dimensional data. Artif. Intell. Med. 41(3):197–207, 2007.

    Article  Google Scholar 

  60. 60.

    Schapire, R. E., The strength of weak learnability. Mach. Learn. 5(2):197–227, 1990.

    Google Scholar 

  61. 61.

    Breiman, L., Bagging predictors. Mach. Learn. 24(2):123–140, 1996.

    MathSciNet  MATH  Google Scholar 

  62. 62.

    Ho, T. K., The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8):832–844, 1998.

    Article  Google Scholar 

  63. 63.

    Ahn, H., Moon, H., Fazzari, M. J., Lim, N., Chen, J. J., and Kodell, R. L., Classification by ensembles from random partitions of high-dimensional data. Comput. Stat. Data Anal. 51:6166–6179, 2007.

    MathSciNet  MATH  Article  Google Scholar 

  64. 64.

    Zhou, Z. H., et al., Lung cancer cell identification based on artificial neural network ensembles. Artif. Intell. Med. 24(1):25–36, 2002.

    MATH  Article  Google Scholar 

  65. 65.

    Santos-Garcia, G., Varela, G., Novoa, N., and Jiménez, M. F., Prediction of postoperative morbidity after lung resection using an artificial neural network ensemble. Artif. Intell. Med. 30(1):61–69, 2004.

    Article  Google Scholar 

  66. 66.

    Freund, Y., and Schapire, R., A desicion-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55:119–139, 1997.

    MathSciNet  MATH  Article  Google Scholar 

  67. 67.

    Morra, J. H., Tu, Z., Apostolova, L. G., Green, A. E., Toga, A. W., and Thompson, P. M., Comparison of Adaboost and support vector machines for detecting Alzheimer’s disease through automated hippocampal segmentation. IEEE Trans. Med. Imag. 29(1):30–43, 2010.

    Article  Google Scholar 

  68. 68.

    Situ, N., Yuan, X., Zouridakis, G., Boosting instance prototypes to detect local dermoscopic features, 32nd Annual International Conference of the IEEE EMBS (Buenos Aires, Argentina, 2010, Aug 31–Sep 4), pp. 5561–5564.

  69. 69.

    Douglas, P. K., Harris, S., Yuille, A., Cohen, M. S., Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief. Neuroimage, 2010. doi:10.1016/j.neuroimage.2010.11.002.

  70. 70.

    Lopes, R., Ayache, A., Makni, N., Puech, P., Villers, A., Mordon, S., et al., Prostate cancer characterization on MR images using fractal features. Med. Phys. 38:83–95, 2011.

    Article  Google Scholar 

  71. 71.

    Kaufman, L., Rousseeuw, P. J., Finding groups in data: an introduction to cluster analysis. Wiley, 1990.

  72. 72.

    Yoo, I., and Hu, X., A comprehensive comparison study of document clustering for a biomedical digital library MDELINE. ACM/IEEE Joint Conference on Digital Libraries 11–15:220–229, 2006. Chapel Hill, NC, June 11–15, 2006.

    Google Scholar 

  73. 73.

    Yoo, I., Hu, X., and Song, I.-Y., Biomedical ontology improves biomedical literature clustering performance: a comparison study. Int. J. Bioinform. Res. Appl. 3(3):414–428, 2007.

    Article  Google Scholar 

  74. 74.

    Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., (Ed.), Knowledge Discovery in Databases. AAAI/MIT Press, 1991, pp. 229–248.

  75. 75.

    Agrawal, R., Imielinski, T., and Swami, A., Mining association rules between sets of items in large databases, Proceedings of the ACM SIGMOD International Conference on the Management of Data. ACM, Washington DC, pp. 207–216, 1993.

    Google Scholar 

  76. 76.

    Agrawal, R., and Srikant, R., Fast algorithms for mining association rules, Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Morgan Kaufmann, Santiago, pp. 487–499, 1994.

    Google Scholar 

  77. 77.

    Park, J. S., Chen, M. S., Yu, P. S., An effective hash-based algorithm for mining association rules, Proceedings 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD’95), San Jose, CA (May 1995), pp. 175–186.

  78. 78.

    Toivonen, H., Sampling large databases for association rules, Proceedings 1996 International Conference on Very Large Databases (VLDB’96), Bombay, India (Sept. 1996), pp.134–145.

  79. 79.

    Steinbach, M., Karypis, G., Kumar, V., A comparison of document clustering techniques, Technical Report #00-034. Department of Computer Science and Engineering, University of Minnesota, 2000.

  80. 80.

    SAS. First Things First—Highmark makes healthcare-fraud prevention top priority with SAS. 2006a.

  81. 81.

    SAS. Highmark maximizes Medicare revenues with SAS. 2006b

  82. 82.

    SAS. Healthways Heads Off Increased Costs with SAS. 2009.

  83. 83.

    Golub, T. R., et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537, 1999.

    Article  Google Scholar 

  84. 84.

    Hu, H., Li, J., Plank, A., Wang, H., Daggard, G., A comparative study of classification methods for microarray data analysis. CRPIT Volume 61, Proceedings Fifth Australasian Data Mining Conference. 2006. p. 33–37.

  85. 85.

    Ries, L. A. G., Harkins, D., Krapcho, M., et al., SEER Cancer Statistics Review, 1975–2003. National Cancer Institute, Bethesda, 2006.

    Google Scholar 

  86. 86.

    Van’t Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse, H. L., et al., Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536, 2002.

    Article  Google Scholar 

  87. 87.

    Weka Version 3.5.5, University of Waikato, Waikato, New Zealand, 1999–2007,

  88. 88.

    Cox, D. R., Analysis of survival data. Chapman & Hall, London, 1984.

    Google Scholar 

  89. 89.

    Shah, S., Kusiak, A., and Dixon, B., Data Mining in predicting survival of kidney dialysis patients, Proceedings of Photonics West—Bios 2003. In: Bass, L. S., et al. (Eds.), Lasers in surgery: advanced characterization, therapeutics, and systems XIII, 4949. SPIE, Belingham, 2003.

    Google Scholar 

  90. 90.

    Beller, G., The rising cost of health care in the United States: is it making the United States globally noncompetitive? J. Nucl. Cardiol. 15(4):481–482, 2008.

    Article  Google Scholar 

  91. 91.

    Bertsimas, D., Bjarnadóttir, M. V., Kane, M. A., Kryder, J. C., Pandey, R., Vempala, S., and Wang, G., Algorithmic prediction of health-care costs. Oper. Res. 56(6):1382–1392, 2008.

    MATH  Article  Google Scholar 

  92. 92.

    Kerr, G., Ruskin, H. J., Crane, M., and Doolan, P., Techniques for clustering gene expression data. Comput. Biol. Med. 38(3):283–293, 2008.

    Article  Google Scholar 

  93. 93.

    Do, J. H., and Choi, D. K., Clustering approaches to identifying gene expression patterns from DNA microarray data. Mol. Cells 25(2):279–288, 2008.

    Google Scholar 

  94. 94.

    Chae, Y. M., Ho, S. H., Cho, K. W., Lee, D. H., and Ji, S. H., Data mining approach to policy analysis in a health insurance domain. Int. J. Med. Inform. 62:103–111, 2001.

    Article  Google Scholar 

  95. 95.

    Adler, L. D., and Nierenberg, A. A., Review of medication adherence in children and adults with ADHD. Postgrad. Med. 122(1):184–191, 2010.

    Article  Google Scholar 

  96. 96.

    Tsai, M. H., and Huang, Y. S., Attention-deficit/hyperactivity disorder and sleep disorders in children. Med. Clin. North Am. 94(3):615–632, 2010.

    Article  Google Scholar 

  97. 97.

    Kessler, R. C., Adler, L. A., Barkley, R., et al., The prevalence and correlates of adult ADHD in the United States: results from the National Comorbidity Survey Replication. Am. J. Psychiatry 163(4):716–723, 2006.

    Article  Google Scholar 

  98. 98.

    Gau, S., Chong, M., Chen, T., and Cheng, A., A 3-year panel study of mental disorders among adolescents in Taiwan. Am. J. Psychiatry 162(7):1344–1350, 2005.

    Article  Google Scholar 

  99. 99.

    Tai, Y. M., and Chiu, H. W., Comorbidity study of ADHD: applying association rule mining (ARM) to National Health Insurance Database of Taiwan. Int. J. Med. Inform. 78:75–83, 2009.

    Article  Google Scholar 

  100. 100.

    Chen, T. J., Chou, L. F., and Hwang, S. J., Application of a data-mining technique to analyze coprescription patterns for antacids in Taiwan. Clin. Ther. 25(9):2453–2463, 2003.

    Article  Google Scholar 

  101. 101.

    Breault, J. L., Data mining diabetic databases: are rough sets a useful addition? Proceedings of the 33rd Symposium on the Interface. Computing Science and Statistics, Fairfax, 2001.

    Google Scholar 

  102. 102.

    Goodwin, L., and Iannacchione, M. A., Data mining methods for improving birth outcomes prediction. Outcomes Manage. 6(2):80–85, 2002.

    Google Scholar 

  103. 103.

    Breault, J. L., Goodall, C. R., and Fos, P. J., Data mining a diabetic data warehouse. Artif. Intell. Med. 26:37–54, 2002.

    Article  Google Scholar 

  104. 104.

    Andrews, P. J., Sleeman, D. H., Statham, P. F. X., Mcquatt, A., Corruble, V., Jones, P. A., et al., Predicting recovery in patients suffering from traumatic brain injury by using admission variables and physiological data: a comparison between decision tree analysis and logistic regression. J. Neurosurg. 97:326–336, 2002.

    Article  Google Scholar 

  105. 105.

    Goodwin, L., VanDyne, M., Lin, S., and Talbert, S., Data mining issues and opportunities for building nursing knowledge. J. Biomed. Inform. 36:379–388, 2003.

    Article  Google Scholar 

  106. 106.

    Nevins, J. R., Huang, E. S., Dressman, H., Pittman, J., Huang, A. T., and West, M., Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction, Human Molecular Genetics 12. Review Issue 2:R153–R157, 2003.

    Google Scholar 

  107. 107.

    Sigurdardottir, A. K., Jonsdottir, H., and Benediktsson, R., Outcomes of educational interventions in type 2 diabetes: WEKA data-mining analysis. Patient Educ. Couns. 67:21–31, 2007.

    Article  Google Scholar 

  108. 108.

    Huang, L., Hsu, S., Lin, E., A comparison of classification methods for predicting Chronic Fatigue Syndrome based on genetic data. Journal of Translational Medicine. 7–81, 2009.

  109. 109.

    Toussi, M., Lamy, J., Le Toumelin, P., Venot, A., Using data mining techniques to explore physicians’ therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Med. Informat. Decis. Making 9–28, 2009.

  110. 110.

    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I. H., The WEKA data mining software: an update. SIGKDD Explorations 11(1), 2009.

Download references

Author information



Corresponding author

Correspondence to Illhoi Yoo.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Yoo, I., Alafaireet, P., Marinov, M. et al. Data Mining in Healthcare and Biomedicine: A Survey of the Literature. J Med Syst 36, 2431–2448 (2012).

Download citation


  • Data mining
  • Review
  • Healthcare
  • Biomedicine