An algorithm for multi-relational discovery of subgroups

  • Stefan Wrobel
Parallel Session 2a
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1263)

Abstract

We consider the problem of finding statistically unusual subgroups in a multi-relation database, and extend previous work on single-relation subgroup discovery. We give a precise definition of the multi-relation subgroup discovery task, propose a specific form of declarative bias based on foreign links as a means of specifying the hypothesis space, and show how propositional evaluation functions can be adapted to the multi-relation setting. We then describe an algorithm for this problem setting that uses optimistic estimate and minimal support pruning, an optimal refinement operator and sampling to ensure efficiency and can easily be parallelized.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthu-rusamy, eds., Advances in Knowledge Discovery and Data Mining, ch. 12, pp. 307–328. AAAI/MIT Press, Cambridge, USA, 1996.Google Scholar
  2. 2.
    N. Alon and J. H. Spencer. The Probabilistic Method. Wiley, N.Y., 1992.Google Scholar
  3. 3.
    L. De Raedt and S. Džeroski. First order jk-clausal theories are pac-learnable. Artificial Intelligence, 70:375–392, 1994.CrossRefGoogle Scholar
  4. 4.
    L. De Raedt and L. De Haspe. Clausal discovery. Machine Learning, 1997.Google Scholar
  5. 5.
    J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, ed., Inductive Logic Programming, ch. 16, pp. 335–359. Academic Press, London, 1992.Google Scholar
  6. 6.
    J. Kivinen and H. Mannila. The power of sampling in knowledge discovery. In Proc. 1994 ACM SIGACT-SIGMOD-SIGACT Symp. on Principles of Database Theory (PODS'94), pp. 77–85, Minneapolis, 1994.Google Scholar
  7. 7.
    W. Klösgen. Explora: A multipattern and multistrategy discovery assistant. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., Advances in Knowledge Discovery and Data Mining, ch. 10, pp. 249–271. AAAI/MIT Press, Cambridge, USA, 1996.Google Scholar
  8. 8.
    H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In R. Trappl, ed., Cybernetics and Systems '96, pp. 973–978, 1996.Google Scholar
  9. 9.
    S. Muggleton. Inverse entailment and Progol. In K. Furukawa, D. Michie, and S. Muggleton, eds., Machine Intelligence 14, pp. 133–188. Oxford Univ. Press, Oxford, 1995.Google Scholar
  10. 10.
    T. Oates, M. Schmill, and P. Cohen. Parallel and distributed search for structure in multivariate time series. In M. van Someren and G. Widmer, eds, Machine Learning: ECML-97, Berlin, New York, 1997. Springer Verlag.Google Scholar
  11. 11.
    F. Olken. Random Sampling From Databases. PhD thesis, UC Berkeley, 1993.Google Scholar
  12. 12.
    J.R. Quinlan. Learning logical definitions from relations. Machine Learning, 5(3):239–266, 1990.Google Scholar
  13. 13.
    S. Wrobel and S. Dzeroski. The ILP description learning problem: Towards a general model-level definition of data mining in ILP. In K. Morik and J. Herrmann, editors, Proc. Fachgruppentreffen Maschinelles Lernen (FGML-95), 44221 Dortmund, 1995. Univ. Dortmund.Google Scholar

Copyright information

© Springer-Verlag 1997

Authors and Affiliations

  • Stefan Wrobel
    • 1
  1. 1.GMD, FIT. KISchloß BirlinghovenSankt AugustinGermany

Personalised recommendations