Subgroup Discovery in Data Sets with Multi–dimensional Responses: A Method and a Case Study in Traumatology

  • Lan Umek
  • Blaž Zupan
  • Marko Toplak
  • Annie Morin
  • Jean-Hugues Chauchat
  • Gregor Makovec
  • Dragica Smrke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5651)


Biomedical experimental data sets may often include many features both at input (description of cases, treatments, or experimental parameters) and output (outcome description). State-of-the-art data mining techniques can deal with such data, but would consider only one output feature at the time, disregarding any dependencies among them. In the paper, we propose the technique that can treat many output features simultaneously, aiming at finding subgroups of cases that are similar both in input and output space. The method is based on k-medoids clustering and analysis of contingency tables, and reports on case subgroups with significant dependency in input and output space. We have used this technique in explorative analysis of clinical data on femoral neck fractures. The subgroups discovered in our study were considered meaningful by the participating domain expert, and sparked a number of ideas for hypothesis to be further experimentally tested.


subgroup discovery multi–label prediction k-medoids clustering χ2 statistics femoral neck fracture 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hand, D.J.: Handbook of data mining and knowledge discovery. Oxford University Press, Inc., New York (2002)Google Scholar
  2. 2.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The kdd process for extracting useful knowledge from volumes of data. Commun. ACM 39(11), 27–34 (1996)CrossRefGoogle Scholar
  3. 3.
    Lavrač, N., Flach, P., Kavšek, B., Todorovski, L.: Adapting classification rule induction to subgroup discovery. In: Proceedings of IEEE International Conference on Data Mining, pp. 266–273 (2002)Google Scholar
  4. 4.
    Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)Google Scholar
  5. 5.
    Kavšek, B., Lavrač, N., Jovanoski, V.: APRIORI-SD: Adapting association rule learning to subgroup discovery. In: R. Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 230–241. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Kavšek, B., Lavrač, N.: APRIORI-SD: Adapting association rule learning to subgroup discovery. Applied Artificial Intelligence 20(7), 543–583 (2006)CrossRefGoogle Scholar
  7. 7.
    Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63. Morgan Kaufmann, San Francisco (1998)Google Scholar
  8. 8.
    Ženko, B., Struyf, J.: Learning predictive clustering rules. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 234–250. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, Chichester (1990)CrossRefGoogle Scholar
  10. 10.
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic (2000)Google Scholar
  11. 11.
    Rousseeuw, P.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)CrossRefGoogle Scholar
  12. 12.
    Irigoien, I., Arenas, C.: INCA: new statistic for estimating the number of clusters and identifying atypical units. Statistics in Medicine 27(15), 2948–2973 (2008)CrossRefPubMedGoogle Scholar
  13. 13.
    Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57(1), 289–300 (1995)Google Scholar
  14. 14.
    Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Lan Umek
    • 1
  • Blaž Zupan
    • 1
    • 2
  • Marko Toplak
    • 1
  • Annie Morin
    • 3
  • Jean-Hugues Chauchat
    • 4
  • Gregor Makovec
    • 5
  • Dragica Smrke
    • 5
  1. 1.Faculty of Computer and Information SciencesUniversity of LjubljanaSlovenia
  2. 2.Dept. of Human and Mol. GeneticsBaylor College of MedicineHoustonUSA
  3. 3.IRISA, Universite de Rennes 1Rennes cedexFrance
  4. 4.Universite de Lyon, ERIC-Lyon 2Bron CedexFrance
  5. 5.Dept. of TraumatologyUniversity Clinical CentreLjubljanaSlovenia

Personalised recommendations