Abstract
This paper addresses the problem of learning a statistical distribution of data in a relational database. Data we want to focus on are represented with trees which are a quite natural way to represent structured information. These trees are used afterwards to infer a stochastic tree automaton, using a well-known grammatical inference algorithm. We propose two extensions of this algorithm: use of sorts and generalization of the infered automaton according to a local criterion. We show on some experiments that our approach scales with large databases and both improves the predictive power of the learned model and the convergence of the learning algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Abe and H. Mamitsuka. Predicting protein secondary structure using stochastic tree grammars. Machine Learning, 29:275–301, 1997.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M.s Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487–499. Morgan Kaufmann, 12–15 1994.
M. Bernard and C. de la Higuera. Apprentissage de programmes logiques par Inférence Grammaticale. Revue d’Intelligence Artificielle, 14(3–4):375–396, 2000. Hermes Sciences.
C.L. Blake and C.J. Merz. University of California Irvine repository of machine learning databases. http://www.ics.uci.edu/~mlearn/, 1998.
J. Calera-Rubio and R. C. Carrasco. Computing the relative entropy between regular tree languages. Information Processing Letters, 68(6):283–289, 1998.
R. C. Carrasco, J. Oncina, and J. Calera. Stochastic Inference of Regular Tree Languages. Machine Learning, 44(1/2):185–197, 2001.
R. Chaudhuri and A. N. V. Rao. Approximating grammar probabilities: Solution of a conjecture. Journal of the Association for Computing Machinery, 33(4):702–705, 1986.
H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. Tree Automata Techniques and Applications. Available on: http://www.grappa.univ-lille3.fr/tata, 1997.
G. F. Cooper and E. Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347, 1992.
V. Crestana-Jensen and N. Soparkar. Frequent itemset counting across multiple tables. In 4th Pacific-Asian conference on Knowledge Discovery and Data Mining (PAKDD 2000), pages 49–61, April 2000.
L. De Raedt. Data mining in multi-relational databases. In 4th European Conference on Principles and Practice of Knowledge, 2000. Invited talk.
L. Dehaspe and H. Toivonen. Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery, 3(1):7–36, 1999.
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In 16th International Joint Conference on Artificial Intelligence (IJCAI), pages 1300–1307. Morgan Kaufmann, 1999.
P. Garcia and J. Oncina. Inference of recognizable tree sets. Research Report DSIC-II/47/93, Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 1993.
F. Gécseg and M. Steinby. Tree Automata. Akadémiai Kiadó, Budapest, 1984.
L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of relational structure. In 18th International Conference on Machine Learning, pages 170–177, Williamston, MA, June 2001. Morgan Kaufmann.
E. M. Gold. Language identification in the limit. Information and Control, 10(n5):447–474, 1967.
W. Hoeffding. Probabilities inequalities for sums or bounded random variables. Journal of the American Association, 58(301):13–30, 1963.
T. Knuutila and M. Steinby. Inference of tree languages from a finite sample: an algebraic approach. Theoretical Computer Science, 129:337–367, 1994.
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In U. M. Fayyad and R. Uthurusamy, editors, AAAI Workshop on Knowledge Discovery in Databases (KDD-94), pages 181–192, Seattle, Washington, 1994. AAAI Press.
S. Muggleton and L. De Raedt. Inductive Logic Programming: Theory and Methods. Journal of Logic Programming, 19–20:629–679, 1994.
J. R. Rico-Juan, J. Calera, and R. C. Carrasco. Probabilistic k-Testable Tree-Languages. In A. L. Oliveira, editor, 5th International Colloquium on Grammatical Inference (ICGI 2000), Lisbon (Portugal), volume 1891 of Lecture Notes in Computer Science, pages 221–228, Berlin, September 2000. Springer.
A. Stolcke and S. Omohundro. Inducing probabilistic grammars by bayesian model merging. In 2nd International Colloquium on Grammatical Inference (ICGI’94), volume 862 of Lecture Notes in Artificial Intelligence, pages 106–118, Alicante, Spain, 1994. Springer Verlag.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Habrard, A., Bernard, M., Jacquenet, F. (2002). Generalized Stochastic Tree Automata for Multi-relational Data Mining. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2002. Lecture Notes in Computer Science(), vol 2484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45790-9_10
Download citation
DOI: https://doi.org/10.1007/3-540-45790-9_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44239-4
Online ISBN: 978-3-540-45790-9
eBook Packages: Springer Book Archive