Advertisement

Subgroup Discovery Using Bump Hunting on Multi-relational Histograms

  • Radomír Černoch
  • Filip Železný
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7207)

Abstract

We propose an approach to subgroup discovery in relational databases containing numerical attributes. The approach is based on detecting bumps in histograms constructed from substitution sets resulting from matching a first-order query against the input relational database. The approach is evaluated on seven data sets, discovering interpretable subgroups. The subgroups’ rate of survival from the training split to the testing split varies among the experimental data sets, but at least on three of them it is very high.

Keywords

Aggregation Function Probability Mass Function Inductive Logic Programming Beam Search Machine Learn Research 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Atzmueller, M., Lemmerich, F.: Fast Subgroup Discovery for Continuous Target Concepts. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 35–44. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  2. 2.
    Berka, P., Sochorová, M.: Guide to the financial data set (1999), http://lisp.vse.cz/pkdd99/berka.html
  3. 3.
    Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90, 577–588 (1994)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Statistics and Computing 9, 123–143 (1999)CrossRefGoogle Scholar
  5. 5.
    Grosskreutz, H., Rüping, S.: On subgroup discovery in numerical domains. Data Mining and Knowledge Discovery 19, 210–226 (2009), doi:10.1007/s10618-009-0136-3MathSciNetCrossRefGoogle Scholar
  6. 6.
    Kavšek, B., Lavrač, N.: APRIORI-SD: adapting association rule learning to subgroup discovery. Applied Artificial Intelligence 20(7), 543–583 (2006), http://www.tandfonline.com/doi/abs/10.1080/08839510600779688 CrossRefGoogle Scholar
  7. 7.
    Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. American Association for Artificial Intelligence, Menlo Park (1996), http://dl.acm.org/citation.cfm?id=257938.257965 Google Scholar
  8. 8.
    Kralj-Novak, P., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research 10, 377–403 (2009)Google Scholar
  9. 9.
    Krogel, M.-A., Wrobel, S.: Transformation-Based Learning Using Multirelational Aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 142–155. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  10. 10.
    Kuželka, O., Szabóová, A., Holec, M., Železný, F.: Gaussian Logic for Predictive Classification. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS, vol. 6912, pp. 277–292. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: Fast learning of relational kernels. Machine Learning 78(3), 305–342 (2010)CrossRefGoogle Scholar
  12. 12.
    Landwehr, N., Kersting, K., De Raedt, L.: nFOIL: integrating naive bayes and FOIL. In: Proceedings of the 20th National Conference on Artificial Intelligence, vol. 2, pp. 795–800. AAAI Press (2005), http://dl.acm.org/citation.cfm?id=1619410.1619460
  13. 13.
    Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kFOIL: learning simple relational kernels. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 389–394. AAAI Press (2006), http://dl.acm.org/citation.cfm?id=1597538.1597601
  14. 14.
    Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)Google Scholar
  15. 15.
    Lowthian, P., Thompson, M.: Bump-hunting for the proficiency tester – searching for multimodality. The Analyst 127(10), 1359–1364 (2002)CrossRefGoogle Scholar
  16. 16.
    De Raedt, L.: Logical and relational learning. Springer (October 2008)Google Scholar
  17. 17.
    Silverman, B.W.: Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society 43(1), 97–99 (1981)MathSciNetGoogle Scholar
  18. 18.
    Srinivasan, A., Muggleton, S.H., Sternberg, M.J.E., King, R.D.: Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence 85, 277–299 (1996)CrossRefGoogle Scholar
  19. 19.
    Železný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery with RSD. Machine Learning 62(1-2), 33–63 (2006)CrossRefGoogle Scholar
  20. 20.
    Wrobel, S.: An Algorithm for Multi-Relational Discovery of Subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  21. 21.
    Yukizane, T., Ohi, S.Y., Miyano, E., Hirose, H.: The bump hunting method using the genetic algorithm with the extreme-value statistics. IEICE - Trans. Inf. Syst. E89-D, 2332–2339 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Radomír Černoch
    • 1
  • Filip Železný
    • 1
  1. 1.Faculty of Electrical Engineering, Department of Cybernetics, Intelligent Data Analysis Research LabCzech Technical UniversityCzech

Personalised recommendations