Abstract
This paper presents a case study of the process of insightful analysis of clinical data collected in regular hospital practice. The approach is applied to a database describing patients suffering from brain ischaemia, either permanent as brain stroke with positive computer tomography (CT) or reversible ischaemia with normal brain CT test. The goal of the analysis is the extraction of useful knowledge that can help in diagnosis, prevention and better understanding of the vascular brain disease. This paper demonstrates the applicability of subgroup discovery for insightful data analysis and describes the expert’s process of converting the induced rules into useful medical knowledge. Detection of coexisting risk factors, selection of relevant discriminative points for numerical descriptors, as well as the detection and description of characteristic patient subpopulations are important results of the analysis. Graphical representation is extensively used to illustrate the detected dependencies in the available clinical data.
Similar content being viewed by others
References
Pazzani MJ (2000) Knowledge discovery from data? IEEE Intell Syst 15(2):10–13
Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37–54
Gamberger D, Lavrač N, Krstačić G (2003) Active subgroup mining: a case study in a coronary heart disease risk group detection. Artif Intell Med 28:27–57
Amarenco P et al. (1994) Atherosclerotic disease of the aortic arch and the risk of ischemic stroke. New Engl J Med 331:1474–1479
Barnett HJM et al. (eds) (1998) Stroke. Pathophysiology, diagnosis, and management, 3rd edn. Elsevier Science, Churchill
Victor M, Ropper AH (2001) Cerebrovascular disease. In: Adams & Victor’s principles of neurology. McGraw–Hill, New York, pp 821–924
Gamberger D, Lavrač N (2002) Expert-guided subgroup discovery: methodology and application. J Artif Intell Res 17:501–527
Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8:87–102
Gamberger D, Lavrač N (2004) Avoiding data overfitting in scientific discovery: experiments in functional genomics. In: Proceedings of the 16th European conference on artificial intelligence (ECAI 2004), pp 470–474
Fürnkranz J (2005) From local to global patterns: evaluation issues in rule learning algorithms. In: Morik K, Boulicaut J-F, Siebes A (eds) Local pattern detection. Springer, Berlin, pp 20–38
Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Fayyad UM, Piatetski-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. MIT, Cambridge, pp 249–271
Lavrač N, Kavšek B, Flach P, Todorovski L (2004) Subgroup discovery with CN2-SD. J Mach Learn Res 5:153–188
Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European conference on principles of data mining and knowledge discovery, pp 78–87
Wrobel S (2001) Inductive logic programming for knowledge discovery in databases. In: Džeroski S, Lavrač N (eds) Relational data mining. Springer, Berlin, pp 74–101
Klösgen W, May M (2002) Census data mining—an application. In: Proceedings of the 6th European conference on principles and practice of knowledge discovery in databases, pp 65–79
Lavrač N, Železný F, Flach P (2003) RSD: relational subgroup discovery through first-order feature construction. In: Proceedings of the 12th international conference on inductive logic programming, pp 149–165
Suzuki E (2004) Discovering interesting exception rules with rule pair. In: Proceedings of the ECML/PKDD workshop on advances in inductive rule learning, pp 163–178
Roddick JF, Fule P, Graco WJ (2003) Exploratory medical knowledge discovery: experiences and issues. ACM SIGKDD Explor Newslett 5(1):94–99
Pazzani MJ, Mani S, Shankle R (2001) Acceptance by medical experts of rules generated by machine learning. Methods Inf Med 40(5):380–385
Lucas PJF, van der Gaag LC, Abu-Hanna A (2004) Editorial: Bayesian models in biomedicine and health-care. Artif Intell Med 30(3):201–214
Quinlan JR (1993) C4.5: programs for machine learning. Kaufmann, San Mateo
Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283
Okada T (2001) Medical knowledge discovery on the meningoencephalitis diagnosis studied by the cascade model. In: Proceedings of the new frontiers in artificial intelligence, Joint JSAI workshop, pp 533–540
Gamberger D, Lavrač N, Železný F, Tolar J (2004) Induction of comprehensible models for gene expression datasets by subgroup discovery methodology. J Biomed Inf 37(4):269–284
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by Croatian Ministry of Science, Education and Sport project “Machine Learning Algorithms and Applications”, Slovenian Ministry of Higher Education, Science and Technology project “Knowledge Technologies”, and EU FP6 project “Heartfaid: A knowledge based platform of services for supporting medical–clinical management of the heart failure within the elderly population”.
Rights and permissions
About this article
Cite this article
Gamberger, D., Lavrač, N., Krstačić, A. et al. Clinical data analysis based on iterative subgroup discovery: experiments in brain ischaemia data analysis. Appl Intell 27, 205–217 (2007). https://doi.org/10.1007/s10489-007-0068-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-007-0068-9