Abstract
We consider the problem of finding statistically unusual subgroups in a multi-relation database, and extend previous work on single-relation subgroup discovery. We give a precise definition of the multi-relation subgroup discovery task, propose a specific form of declarative bias based on foreign links as a means of specifying the hypothesis space, and show how propositional evaluation functions can be adapted to the multi-relation setting. We then describe an algorithm for this problem setting that uses optimistic estimate and minimal support pruning, an optimal refinement operator and sampling to ensure efficiency and can easily be parallelized.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthu-rusamy, eds., Advances in Knowledge Discovery and Data Mining, ch. 12, pp. 307–328. AAAI/MIT Press, Cambridge, USA, 1996.
N. Alon and J. H. Spencer. The Probabilistic Method. Wiley, N.Y., 1992.
L. De Raedt and S. Džeroski. First order jk-clausal theories are pac-learnable. Artificial Intelligence, 70:375–392, 1994.
L. De Raedt and L. De Haspe. Clausal discovery. Machine Learning, 1997.
J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, ed., Inductive Logic Programming, ch. 16, pp. 335–359. Academic Press, London, 1992.
J. Kivinen and H. Mannila. The power of sampling in knowledge discovery. In Proc. 1994 ACM SIGACT-SIGMOD-SIGACT Symp. on Principles of Database Theory (PODS'94), pp. 77–85, Minneapolis, 1994.
W. Klösgen. Explora: A multipattern and multistrategy discovery assistant. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., Advances in Knowledge Discovery and Data Mining, ch. 10, pp. 249–271. AAAI/MIT Press, Cambridge, USA, 1996.
H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In R. Trappl, ed., Cybernetics and Systems '96, pp. 973–978, 1996.
S. Muggleton. Inverse entailment and Progol. In K. Furukawa, D. Michie, and S. Muggleton, eds., Machine Intelligence 14, pp. 133–188. Oxford Univ. Press, Oxford, 1995.
T. Oates, M. Schmill, and P. Cohen. Parallel and distributed search for structure in multivariate time series. In M. van Someren and G. Widmer, eds, Machine Learning: ECML-97, Berlin, New York, 1997. Springer Verlag.
F. Olken. Random Sampling From Databases. PhD thesis, UC Berkeley, 1993.
J.R. Quinlan. Learning logical definitions from relations. Machine Learning, 5(3):239–266, 1990.
S. Wrobel and S. Dzeroski. The ILP description learning problem: Towards a general model-level definition of data mining in ILP. In K. Morik and J. Herrmann, editors, Proc. Fachgruppentreffen Maschinelles Lernen (FGML-95), 44221 Dortmund, 1995. Univ. Dortmund.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1997. Lecture Notes in Computer Science, vol 1263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63223-9_108
Download citation
DOI: https://doi.org/10.1007/3-540-63223-9_108
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63223-8
Online ISBN: 978-3-540-69236-2
eBook Packages: Springer Book Archive