Advertisement

Mining Clinical Data

  • Argyris Kalogeratos
  • V. Chasanis
  • G. Rakocevic
  • A. Likas
  • Z. Babovic
  • M. Novakovic
Chapter

Abstract

The prerequisite of any machine learning or data mining application is to have a clear target variable that the system will try to learn. In a supervised setting, we also need to know the value of this target variable for a set of training examples (i.e., patient records). In the case study presented in this chapter, the value of the considered target variable that can be used for training is the ground truth characterizations of the coronary artery disease severity or, as a different scenario, the progression of the patients. We either set as target variable the disease severity, or disease progression, and then we consider a two-class problem in which we aim to discriminate a group of patients that are characterized as “severely diseased” or “severely progressed,” from a second group containing “mildly diseased” or “mildly progressed” patients, respectively. This latter mild/severe characterization is the actual value of the target variable for each patient.

Keywords

Support Vector Machine Random Forest Left Anterior Descend Information Gain Support Vector Machine Classifier 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    M.W. Browne, “Cross-validation methods”, Journal of Mathematical Psychologyvol. 44, Issue 1, pp. 108–132, March 2000.Google Scholar
  2. 2.
    I.H. Witten, Eibe Frank, “Data Mining: Practical Machine Learning Tools and Techniques”, Morgan Kaufmann, June 2005.Google Scholar
  3. 3.
    R. Herbrich, Learning Kernel Classifiers, MIT Press, Cambridge, MA, 2002.Google Scholar
  4. 4.
    I. Guyon and A. Elisseeff, “Variable and feature selection”, Journal of Machine Learning Research, vol. 3, March 2003.Google Scholar
  5. 5.
    A. Gimelli, G. Rossi, P. Landi, P. Marzullo, G. Iervasi, A. L’Abbate, and Daniele Rovai, “Stress/Rest Myocardial Perfusion Abnormalities by Gated SPECT: Still the Best Predictor of Cardiac Events in Stable Ischemic Heart Disease”, Journal of Nuclear Medicine, vol. 50, Issue 4, April 2009.Google Scholar
  6. 6.
    I. Guyon, J. Weston, S. Barnhill, V. Vapnik, “Gene Selection for Cancer Classification using Support Vector Machines”, Machine Learning, vol. 46, Issue 1–3, pp. 389–422, 2002.CrossRefzbMATHGoogle Scholar
  7. 7.
    University of California – Irvine (UCI) Machine Learning Repository: http://archive.ics.uci.edu/ml.
  8. 8.
    R. Das, I. Turkoglu and A. Sengur, “Effective Diagnosis of Heart Disease through Neural Network Ensembles”, Expert Systems with Applications, vol. 36, pp. 7675–7680, 2009.CrossRefGoogle Scholar
  9. 9.
    M.G. Tsipouras, T.P. Exarchos, D.I. Fotiadis, A.P. Kotsia, K.V. Vakalis, K.K. Naka, L.K. Michalis, “Automated Diagnosis of Coronary Artery Disease Based on Data Mining and Fuzzy Modeling”, IEEE Transactions on Biomedical Engineering, vol. 12, Issue 4, pp. 447–458, 2008.CrossRefGoogle Scholar
  10. 10.
    C. Ordonez, “Comparing Association Rules and Decision Trees for Disease Prediction”, Proceedings of the ACM HIKM’06, Arlington, 2006.Google Scholar
  11. 11.
    C. Ordonez, N. Ezquerra and C. Santana, “Constraining and Summarizing Association Rules in Medical Data”, Knowledge and Information Systems, vol. 9, Issue 3, pp. 259–283, 2006.CrossRefGoogle Scholar
  12. 12.
    P. Chanda, L. Sucheston, A. Zhang, D. Brazeau, J.L. Freudenheim, C. Ambrosone and M. Ramanathan, “AMBIENCE: A Novel Approach and Efficient Algorithm for Identifying Informative Genetic and Environmental Associations with Complex Phenotypes”, Genetics, vol. 180, pp. 1191–1210, October 2008.Google Scholar
  13. 13.
    J.H. Moore, J.C. Gilbert, C.T. Tsai, F.T. Chiang, T. Holden, N. Barney and B.C. White, “A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility”, Journal of Theoretical Biology, vol. 241, pp. 252–261, 2006.MathSciNetCrossRefGoogle Scholar
  14. 14.
  15. 15.
    J. R. Quinlan and J. R. C4.5, “Programs for machine learning”, Morgan Kaufmann Publishers, 1993.Google Scholar
  16. 16.
    C. Cortes and V. Vapnik, “Support-vector network”, Machine Learning, vol. 20, Issue 3, pp. 273–297, 1995.zbMATHGoogle Scholar
  17. 17.
    N. Cristiannini and J. Shawe-Taylor, “An Introduction to Support Vector Machines and Other Kernel-Based Learning Models”, Cambridge University Press, 2000.Google Scholar
  18. 18.
    L. Breiman, “Random Forests”, Machine Learning, vol. 45, Issue 1, pp. 5–32, 2001.CrossRefzbMATHGoogle Scholar
  19. 19.
    P. Tan, M. Steinbach and V. Kumar, “Introduction to Data Mining”, Addison-Wesley, 2005.Google Scholar
  20. 20.
    T. Hastie, R. Tibshirani and J. Friedman, “The Elements of Statistical Learning”, Springer-Verlag, 2008.Google Scholar
  21. 21.
    H. Liu and R. Setiono, “Chi2: Feature selection and discretization of numeric attributes”, Proceedings of the IEEE 7th International Conference on Tools with Artificial Intelligence, pp. 338–391, 1995.Google Scholar
  22. 22.
    H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, Issue 8, pp. 1226–1238, 2005.CrossRefGoogle Scholar
  23. 23.
    A. Reiner, C. Carlson, B. Thyagarajan, M. Rieder, J. Polak, D. Siscovick, D. Nickerson, D. Jacobs Jr, and M. Gross. “Soluble P-Selectin, SELP Polymorphisms, and Atherosclerotic Risk in European-American and African-African Young Adults”, Arteriosclerosis, Thrombosis and Vascular Biology, August 2008.Google Scholar
  24. 24.
    A. Timinskas, Z. Kucinskiene, and V. Kucinskas. “Atherosclerosis: alterations in cell communication”, in ACTA MEDICA LITUANICA, vol. 14, Issue 1. P. 24–29, 2007Google Scholar
  25. 25.
    S. Szymczak, B.W. Igl, and A. Ziegler. “Detecting SNP-expression associations: A comparison of mutual information and median test with standard statistical approaches”, Statistics in Medicine, vol. 28, pp. 3581–3596, 2009.MathSciNetCrossRefGoogle Scholar
  26. 26.
    J. Stangard, S. Kardia, S. Hmon, R. Schmidt, A. Tybjaerg-Hansen, V. Salomaa, E. Boerwinkle, and C. Sing. “Contribution of regulatory and structural variations in APOE to predicting dyslipidemia”, The Journal of Lipid Research, vol. 47, pp. 318–328, 2006.CrossRefGoogle Scholar
  27. 27.
    N. Yosef, J. Gramm, Q. Wang, W. Noble, R. Karp, and R. Sharan.“Prediction Of Phenotype Information From Genotype Data”, Communications In Information And Systems, vol. 10, Issue 2, pp. 99–114, 2010.CrossRefzbMATHGoogle Scholar
  28. 28.
    F. Pan, L. McMilan, F. Pardo-Manuel De Villena, D. Threadgill, and W. Wang.“TreeQA: Quantitative Genome Wide Association Mapping Using Local Perfect Phylogeny Trees”, Pac Symposium of Biocomputing, pp. 415–426, 2009.Google Scholar
  29. 29.
    D. Tzikas and A. Likas, “An Incremental Bayesian Approach for Training Multilayer Perceptrons”, Proceedings of the International Conference on Artificial Neural Networks (ICANN’10), Thessaloniki, Greece, Springer, 2010.Google Scholar
  30. 30.
    X. Wu, D.l Barbar, L. Zhang, and Y. Ye, “Gene Interaction Analysis Using k-way Interaction Loglinear Model: A Case Study on Yeast Data”, ICML Workshop, Machine Learning in Bioinformatics, 2003.Google Scholar
  31. 31.
    A. Jakulin, I. Bratko, “Testing the Significance of Attribute Interactions”,Proceedings of the 21st International Conference on Machine Learning (ICML-2004), Eds. R. Greiner and D. Schuurmans, pp. 409–416, Banff, Canada, 2004.Google Scholar
  32. 32.
    The ARTreat Project, site: http://www.artreat.org

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Argyris Kalogeratos
    • 1
  • V. Chasanis
    • 1
  • G. Rakocevic
    • 2
  • A. Likas
    • 1
  • Z. Babovic
    • 3
  • M. Novakovic
    • 3
  1. 1.Department of Computer ScienceUniversity of IoanninaIoanninaGreece
  2. 2.Mathematical InstituteSerbian Academy of Sciences and ArtsBelgradeSerbia
  3. 3.Innovation Center of the School of Electrical EngineeringUniversity of BelgradeBelgradeSerbia

Personalised recommendations