Subgroup Discovery in Data Sets with Multi–dimensional Responses: A Method and a Case Study in Traumatology

Umek, Lan; Zupan, Blaž; Toplak, Marko; Morin, Annie; Chauchat, Jean-Hugues; Makovec, Gregor; Smrke, Dragica

doi:10.1007/978-3-642-02976-9_39

Subgroup Discovery in Data Sets with Multi–dimensional Responses: A Method and a Case Study in Traumatology

Lan Umek²²,
Blaž Zupan^22,23,
Marko Toplak²²,
Annie Morin²⁴,
Jean-Hugues Chauchat²⁵,
Gregor Makovec²⁶ &
…
Dragica Smrke²⁶

Conference paper

2143 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5651))

Abstract

Biomedical experimental data sets may often include many features both at input (description of cases, treatments, or experimental parameters) and output (outcome description). State-of-the-art data mining techniques can deal with such data, but would consider only one output feature at the time, disregarding any dependencies among them. In the paper, we propose the technique that can treat many output features simultaneously, aiming at finding subgroups of cases that are similar both in input and output space. The method is based on k-medoids clustering and analysis of contingency tables, and reports on case subgroups with significant dependency in input and output space. We have used this technique in explorative analysis of clinical data on femoral neck fractures. The subgroups discovered in our study were considered meaningful by the participating domain expert, and sparked a number of ideas for hypothesis to be further experimentally tested.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hand, D.J.: Handbook of data mining and knowledge discovery. Oxford University Press, Inc., New York (2002)
Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The kdd process for extracting useful knowledge from volumes of data. Commun. ACM 39(11), 27–34 (1996)
Article Google Scholar
Lavrač, N., Flach, P., Kavšek, B., Todorovski, L.: Adapting classification rule induction to subgroup discovery. In: Proceedings of IEEE International Conference on Data Mining, pp. 266–273 (2002)
Google Scholar
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)
Google Scholar
Kavšek, B., Lavrač, N., Jovanoski, V.: APRIORI-SD: Adapting association rule learning to subgroup discovery. In: R. Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 230–241. Springer, Heidelberg (2003)
Chapter Google Scholar
Kavšek, B., Lavrač, N.: APRIORI-SD: Adapting association rule learning to subgroup discovery. Applied Artificial Intelligence 20(7), 543–583 (2006)
Article Google Scholar
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Ženko, B., Struyf, J.: Learning predictive clustering rules. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 234–250. Springer, Heidelberg (2006)
Chapter Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, Chichester (1990)
Book Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic (2000)
Google Scholar
Rousseeuw, P.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Article Google Scholar
Irigoien, I., Arenas, C.: INCA: new statistic for estimating the number of clusters and identifying atypical units. Statistics in Medicine 27(15), 2948–2973 (2008)
Article CAS PubMed Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57(1), 289–300 (1995)
Google Scholar
Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information Sciences, University of Ljubljana, Slovenia
Lan Umek, Blaž Zupan & Marko Toplak
Dept. of Human and Mol. Genetics, Baylor College of Medicine, Houston, USA
Blaž Zupan
IRISA, Universite de Rennes 1, Rennes cedex, 35042, France
Annie Morin
Universite de Lyon, ERIC-Lyon 2, 69676, Bron Cedex, France
Jean-Hugues Chauchat
Dept. of Traumatology, University Clinical Centre, Ljubljana, Slovenia
Gregor Makovec & Dragica Smrke

Authors

Lan Umek
View author publications
You can also search for this author in PubMed Google Scholar
Blaž Zupan
View author publications
You can also search for this author in PubMed Google Scholar
Marko Toplak
View author publications
You can also search for this author in PubMed Google Scholar
Annie Morin
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Hugues Chauchat
View author publications
You can also search for this author in PubMed Google Scholar
Gregor Makovec
View author publications
You can also search for this author in PubMed Google Scholar
Dragica Smrke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Verona, Department of Computer Science, Ca’ Vignal 2, strada le Grazie 15, 37134, Verona, Italy
Carlo Combi
Department of Information Systems Engineering, Ben Gurion University of the Negev, P.O. Box 653, 84105, Beer-Sheva, Israel
Yuval Shahar
Department of Medical Informatics, University of Amsterdam, Academic Medical Center, Meibergdreef 15, 1105, Amsterdam, AZ, The Netherlands
Ameen Abu-Hanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Umek, L. et al. (2009). Subgroup Discovery in Data Sets with Multi–dimensional Responses: A Method and a Case Study in Traumatology. In: Combi, C., Shahar, Y., Abu-Hanna, A. (eds) Artificial Intelligence in Medicine. AIME 2009. Lecture Notes in Computer Science(), vol 5651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02976-9_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-02976-9_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02975-2
Online ISBN: 978-3-642-02976-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics