An algorithm for multi-relational discovery of subgroups
We consider the problem of finding statistically unusual subgroups in a multi-relation database, and extend previous work on single-relation subgroup discovery. We give a precise definition of the multi-relation subgroup discovery task, propose a specific form of declarative bias based on foreign links as a means of specifying the hypothesis space, and show how propositional evaluation functions can be adapted to the multi-relation setting. We then describe an algorithm for this problem setting that uses optimistic estimate and minimal support pruning, an optimal refinement operator and sampling to ensure efficiency and can easily be parallelized.
Unable to display preview. Download preview PDF.
- 1.R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthu-rusamy, eds., Advances in Knowledge Discovery and Data Mining, ch. 12, pp. 307–328. AAAI/MIT Press, Cambridge, USA, 1996.Google Scholar
- 2.N. Alon and J. H. Spencer. The Probabilistic Method. Wiley, N.Y., 1992.Google Scholar
- 4.L. De Raedt and L. De Haspe. Clausal discovery. Machine Learning, 1997.Google Scholar
- 5.J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, ed., Inductive Logic Programming, ch. 16, pp. 335–359. Academic Press, London, 1992.Google Scholar
- 6.J. Kivinen and H. Mannila. The power of sampling in knowledge discovery. In Proc. 1994 ACM SIGACT-SIGMOD-SIGACT Symp. on Principles of Database Theory (PODS'94), pp. 77–85, Minneapolis, 1994.Google Scholar
- 7.W. Klösgen. Explora: A multipattern and multistrategy discovery assistant. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., Advances in Knowledge Discovery and Data Mining, ch. 10, pp. 249–271. AAAI/MIT Press, Cambridge, USA, 1996.Google Scholar
- 8.H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In R. Trappl, ed., Cybernetics and Systems '96, pp. 973–978, 1996.Google Scholar
- 9.S. Muggleton. Inverse entailment and Progol. In K. Furukawa, D. Michie, and S. Muggleton, eds., Machine Intelligence 14, pp. 133–188. Oxford Univ. Press, Oxford, 1995.Google Scholar
- 10.T. Oates, M. Schmill, and P. Cohen. Parallel and distributed search for structure in multivariate time series. In M. van Someren and G. Widmer, eds, Machine Learning: ECML-97, Berlin, New York, 1997. Springer Verlag.Google Scholar
- 11.F. Olken. Random Sampling From Databases. PhD thesis, UC Berkeley, 1993.Google Scholar
- 12.J.R. Quinlan. Learning logical definitions from relations. Machine Learning, 5(3):239–266, 1990.Google Scholar
- 13.S. Wrobel and S. Dzeroski. The ILP description learning problem: Towards a general model-level definition of data mining in ILP. In K. Morik and J. Herrmann, editors, Proc. Fachgruppentreffen Maschinelles Lernen (FGML-95), 44221 Dortmund, 1995. Univ. Dortmund.Google Scholar