Generalized Stochastic Tree Automata for Multi-relational Data Mining

Habrard, Amaury; Bernard, Marc; Jacquenet, François

doi:10.1007/3-540-45790-9_10

Amaury Habrard⁶,
Marc Bernard⁶ &
François Jacquenet⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2484))

Included in the following conference series:

International Colloquium on Grammatical Inference

328 Accesses
4 Citations

Abstract

This paper addresses the problem of learning a statistical distribution of data in a relational database. Data we want to focus on are represented with trees which are a quite natural way to represent structured information. These trees are used afterwards to infer a stochastic tree automaton, using a well-known grammatical inference algorithm. We propose two extensions of this algorithm: use of sorts and generalization of the infered automaton according to a local criterion. We show on some experiments that our approach scales with large databases and both improves the predictive power of the learned model and the convergence of the learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. Abe and H. Mamitsuka. Predicting protein secondary structure using stochastic tree grammars. Machine Learning, 29:275–301, 1997.
Article MATH Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M.s Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487–499. Morgan Kaufmann, 12–15 1994.
Google Scholar
M. Bernard and C. de la Higuera. Apprentissage de programmes logiques par Inférence Grammaticale. Revue d’Intelligence Artificielle, 14(3–4):375–396, 2000. Hermes Sciences.
Google Scholar
C.L. Blake and C.J. Merz. University of California Irvine repository of machine learning databases. http://www.ics.uci.edu/~mlearn/, 1998.
J. Calera-Rubio and R. C. Carrasco. Computing the relative entropy between regular tree languages. Information Processing Letters, 68(6):283–289, 1998.
Article MathSciNet Google Scholar
R. C. Carrasco, J. Oncina, and J. Calera. Stochastic Inference of Regular Tree Languages. Machine Learning, 44(1/2):185–197, 2001.
Article MATH Google Scholar
R. Chaudhuri and A. N. V. Rao. Approximating grammar probabilities: Solution of a conjecture. Journal of the Association for Computing Machinery, 33(4):702–705, 1986.
MathSciNet Google Scholar
H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. Tree Automata Techniques and Applications. Available on: http://www.grappa.univ-lille3.fr/tata, 1997.
G. F. Cooper and E. Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347, 1992.
MATH Google Scholar
V. Crestana-Jensen and N. Soparkar. Frequent itemset counting across multiple tables. In 4th Pacific-Asian conference on Knowledge Discovery and Data Mining (PAKDD 2000), pages 49–61, April 2000.
Google Scholar
L. De Raedt. Data mining in multi-relational databases. In 4th European Conference on Principles and Practice of Knowledge, 2000. Invited talk.
Google Scholar
L. Dehaspe and H. Toivonen. Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery, 3(1):7–36, 1999.
Article Google Scholar
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In 16th International Joint Conference on Artificial Intelligence (IJCAI), pages 1300–1307. Morgan Kaufmann, 1999.
Google Scholar
P. Garcia and J. Oncina. Inference of recognizable tree sets. Research Report DSIC-II/47/93, Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 1993.
Google Scholar
F. Gécseg and M. Steinby. Tree Automata. Akadémiai Kiadó, Budapest, 1984.
MATH Google Scholar
L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of relational structure. In 18th International Conference on Machine Learning, pages 170–177, Williamston, MA, June 2001. Morgan Kaufmann.
Google Scholar
E. M. Gold. Language identification in the limit. Information and Control, 10(n5):447–474, 1967.
Article MATH Google Scholar
W. Hoeffding. Probabilities inequalities for sums or bounded random variables. Journal of the American Association, 58(301):13–30, 1963.
Article MATH MathSciNet Google Scholar
T. Knuutila and M. Steinby. Inference of tree languages from a finite sample: an algebraic approach. Theoretical Computer Science, 129:337–367, 1994.
Article MATH MathSciNet Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In U. M. Fayyad and R. Uthurusamy, editors, AAAI Workshop on Knowledge Discovery in Databases (KDD-94), pages 181–192, Seattle, Washington, 1994. AAAI Press.
Google Scholar
S. Muggleton and L. De Raedt. Inductive Logic Programming: Theory and Methods. Journal of Logic Programming, 19–20:629–679, 1994.
Article MathSciNet Google Scholar
J. R. Rico-Juan, J. Calera, and R. C. Carrasco. Probabilistic k-Testable Tree-Languages. In A. L. Oliveira, editor, 5th International Colloquium on Grammatical Inference (ICGI 2000), Lisbon (Portugal), volume 1891 of Lecture Notes in Computer Science, pages 221–228, Berlin, September 2000. Springer.
Google Scholar
A. Stolcke and S. Omohundro. Inducing probabilistic grammars by bayesian model merging. In 2nd International Colloquium on Grammatical Inference (ICGI’94), volume 862 of Lecture Notes in Artificial Intelligence, pages 106–118, Alicante, Spain, 1994. Springer Verlag.
Google Scholar

Download references

Author information

Authors and Affiliations

EURISE - Université de Saint-Etienne, 23, rue du Dr Paul Michelon, 42023, Saint-Etienne cedex 2, France
Amaury Habrard, Marc Bernard & François Jacquenet

Authors

Amaury Habrard
View author publications
You can also search for this author in PubMed Google Scholar
Marc Bernard
View author publications
You can also search for this author in PubMed Google Scholar
François Jacquenet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Perot Systems Nederland B.V., Hoefseweg 1, 3821 AE, Amersfoort, The Netherlands
Pieter Adriaans (Senior Research Advisor, Professor of Learning and Adaptive Systems) (Senior Research Advisor, Professor of Learning and Adaptive Systems)
ILLC/Computation and Complexity Theory, Universiteit van Amsterdam, Plantage Muidergracht 24, 1018 TV, Amsterdam, The Netherlands
Pieter Adriaans (Senior Research Advisor, Professor of Learning and Adaptive Systems) (Senior Research Advisor, Professor of Learning and Adaptive Systems)
School of Electrical Engineering and Computer Science, University of Newcastle, University Drive, Callaghan, NSW, 2308, Australia
Henning Fernau
Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Sand 13, 72076, Tübingen, Germany
Henning Fernau
FNWI/ILLC, Cognitive Systems and Information Processing Group, Universiteit van Amsterdam, Room B-5.39, Nieuwe Achtergracht 166, 1018 WV, Amsterdam, The Netherlands
Menno van Zaanen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Habrard, A., Bernard, M., Jacquenet, F. (2002). Generalized Stochastic Tree Automata for Multi-relational Data Mining. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2002. Lecture Notes in Computer Science(), vol 2484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45790-9_10

Download citation

DOI: https://doi.org/10.1007/3-540-45790-9_10
Published: 05 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44239-4
Online ISBN: 978-3-540-45790-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics