Journal of Medical Systems

, Volume 35, Issue 2, pp 277–281 | Cite as

Using Data Mining Techniques in Monitoring Diabetes Care. The Simpler the Better?

  • Dario Gregori
  • Michele Petrinco
  • Simona Bo
  • Rosalba Rosato
  • Eva Pagano
  • Paola Berchialla
  • Franco Merletti
Original Paper

Abstract

We aim at evaluating how data-mining statistical techniques can be applied on medical records and administrative data of diabetes and how they differ in terms of capabilities of predicting outcomes (e.g. death). Data on 3,892 outpatient patients with a diagnosis of type 2 diabetes from the San Giovanni Battista Hospital in Torino. Six statistical classifiers were applied: Logistic regression (LR), Generalized Additive Model (GAM), Projection pursuit Regression (PPR), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Artificial Neural Networks (ANN). All models selected the same subset of covariates. ANN is the model performing worse, whereas simpler models, like LR, GAM and LDA seem to perform better. GAM is associated with a very small misclassification rate. The agreement in predicting individual outcomes among models is 0.23 (SE 0.06, Kappa). Monitoring on the basis of patients’ characteristics is highly dependent from the statistical properties of the chosen statistical model.

Keywords

Data mining Diabetes care Mortality Administrative data Clinical predictions 

References

  1. 1.
    Podgorelec, V., Kokol, P., Stiglic, M. M., Hericko, M., and Rozman, I., Knowledge discovery with classification rules in a cardiovascular dataset. Comput. Methods Programs Biomed. 80(Suppl 1):S39–S49, 2005.CrossRefGoogle Scholar
  2. 2.
    Zhang, Q. P., Sun, D. Y., Lu, M., Qin, P., and Shang, T., The application of biomed-informatics in cardiovascular research—Data and knowledge. Sheng Li Ke Xue Jin Zhan. 36(2):119–124, 2005.Google Scholar
  3. 3.
    Bo, S., Ciccone, G., Grassi, G., et al., Patients with type 2 diabetes had higher rates of hospitalization than the general population. J. Clin. Epidemiol. 57(11):1196–1201, 2004.CrossRefGoogle Scholar
  4. 4.
    R Development Core Team. R: A language and environment for statistical computing 2005.Google Scholar
  5. 5.
    Fisher, R. A., The use of multiple measurements in taxonomic problems. Annals of Eugenics. 8:376–386, 1936.Google Scholar
  6. 6.
    Tatsuoka, M. M., Discriminant analysis. Institute for Personality and Ability Testing, Champaign, 1970.Google Scholar
  7. 7.
    Nelder, J. A., and Wedderburn, R. W. M., Generalized linear models. J. R. Stat. Soc., Ser. A. 135:370–384, 1972.CrossRefGoogle Scholar
  8. 8.
    Hastie, T. J., and Tibshirani, R. J., Generalized additive models. Chapman and Hall, New York, 1990.MATHGoogle Scholar
  9. 9.
    Friedman, J. H., and Stuetzle, W., Projection pursuit regression. J. Am. Stat. Assoc. 76:817–823, 1981.MathSciNetCrossRefGoogle Scholar
  10. 10.
    Ripley, B. D., Pattern recognition and neural networks. Cambridge University Press, Cambridge, 1996.MATHGoogle Scholar
  11. 11.
    Efron, B., Estimating the error rate of a prediction rule: Some improvements on crossvalidation. J. Am. Stat. Assoc. 78:316–331, 1983.MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Siegel, S. and Castellan, J. N. Nonparametric statistics for the behavioral sciences. 2nd ed. McGraw-Hill, 1988.Google Scholar
  13. 13.
    Bartfay, E., Mackillop, W. J., and Prater, J. L., Comparing the predictive value of neural network models to logistic regression models on the risk of death for small-cell lung cancer patients. Eur. J. Cancer Care. 15(2):115–124, 2006.CrossRefGoogle Scholar
  14. 14.
    Braitman, L. E., and Davidoff, F., Predicting clinical states in individual patients. Ann. Intern. Med. 125(5):406–412, 1996.Google Scholar
  15. 15.
    Reilly, B. M., and Evans, A. T., Translating clinical research into clinical practice: Impact of using prediction rules to make decisions. Ann. Intern. Med. 144(3):201–209, 2006.Google Scholar
  16. 16.
    Scott, L. J., Warram, J. H., Hanna, L. S., Laffel, L. M., Ryan, L., and Krolewski, A. S., A nonlinear effect of hyperglycemia and current cigarette smoking are major determinants of the onset of microalbuminuria in type 1 diabetes. Diabetes. 50(12):2482–2489, 2001.CrossRefGoogle Scholar
  17. 17.
    Andersen, A. H., Gash, D. M., and Avison, M. J., Principal component analysis of the dynamic response measured by fMRI: A generalized linear systems framework. Magn. Reson. Imaging. 17(6):795–815, 1999.CrossRefGoogle Scholar
  18. 18.
    Du, Y., and Liang, Y., Data mining for seeking accurate quantitative relationship between molecular structure and GC retention indices of alkanes by projection pursuit. Comput. Biol. Chem. 27(3):339–353, 2003.CrossRefGoogle Scholar
  19. 19.
    Du, Y., Liang, Y., and Yun, D., Data mining for seeking an accurate quantitative relationship between molecular structure and GC retention indices of alkenes by projection pursuit. J. Chem. Inf. Comput. Sci. 42(6):1283–1292, 2002.Google Scholar
  20. 20.
    Gribonval, R., From projection pursuit and CART to adaptive discriminant analysis? IEEE Trans. Neural Netw. 16(3):522–532, 2005.CrossRefGoogle Scholar
  21. 21.
    Ren, S., and Kim, H., Comparative assessment of multiresponse regression methods for predicting the mechanisms of toxic action of phenols. J. Chem. Inf. Comput. Sci. 43(6):2106–2110, 2003.Google Scholar
  22. 22.
    Vlassis, N., Motomura, Y., and Krose, B., Supervised dimension reduction of intrinsically low-dimensional data. Neural Comput. 14(1):191–215, 2002.MATHCrossRefGoogle Scholar
  23. 23.
    Ennis, M., Hinton, G., Naylor, D., Revow, M., and Tibshirani, R., A comparison of statistical learning methods on the GUSTO database. Stat. Med. 17:2501–2508, 1998.CrossRefGoogle Scholar
  24. 24.
    Almeida, J. S., Predictive non-linear modeling of complex data by artificial neural networks. Curr. Opin. Biotechnol. 13(1):72–76, 2002.CrossRefGoogle Scholar
  25. 25.
    Tafeit, E., and Reibnegger, G., Artificial neural networks in laboratory medicine and medical outcome prediction. Clin. Chem. Lab. Med. 37(9):845–853, 1999.CrossRefGoogle Scholar
  26. 26.
    Tu, J. V., Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J. Clin. Epidemiol. 49:1225–1231, 1996.CrossRefGoogle Scholar
  27. 27.
    Schwarzer, G., Vach, W., and Schumacher, M., On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. Stat. Med. 19(4):541–561, 2000.CrossRefGoogle Scholar
  28. 28.
    Ripley, B. D. Statistical aspects of neural networks. In: Barndorff-Nielsen, O. E., JJLe, ed. Networks and chaos—statistical and probabilistic aspects. London: Chapman and Hall, 1993.Google Scholar
  29. 29.
    Vach, W., Rossner, R., and Schumacher, M., Neural networks and logistic regression: Part II. Comput. Stat. Data Anal. 21:683–701, 1996.MATHCrossRefGoogle Scholar
  30. 30.
    Dybowski, R., Weller, P., Chang, R., and Gant, V., Prediction of outcome in critically ill patients using artificial neural network synthesised by genetic algorithm. Lancet. 347(9009):1146–1150, 1996.CrossRefGoogle Scholar
  31. 31.
    Justice, A. C., Covinsky, K. E., and Berlin, J. A., Assessing the generalizability of prognostic information. Ann. Intern. Med. 130(6):515–524, 1999.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Dario Gregori
    • 1
  • Michele Petrinco
    • 2
  • Simona Bo
    • 3
  • Rosalba Rosato
    • 2
  • Eva Pagano
    • 2
  • Paola Berchialla
    • 4
  • Franco Merletti
    • 2
  1. 1.Laboratories of Epidemiological Methods and Biostatistics, Department of Environmental Medicine and Public HealthUniversity of PadovaPadovaItaly
  2. 2.Unit of Cancer EpidemiologyUniversity of Torino, and CPO PiemonteTurinItaly
  3. 3.Department of Internal MedicineUniversity of TorinoTurinItaly
  4. 4.Department of Public Health and MicrobiologyUniversity of TorinoTurinItaly

Personalised recommendations