Skip to main content

Generalized Stochastic Tree Automata for Multi-relational Data Mining

  • Conference paper
  • First Online:
Grammatical Inference: Algorithms and Applications (ICGI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2484))

Included in the following conference series:

Abstract

This paper addresses the problem of learning a statistical distribution of data in a relational database. Data we want to focus on are represented with trees which are a quite natural way to represent structured information. These trees are used afterwards to infer a stochastic tree automaton, using a well-known grammatical inference algorithm. We propose two extensions of this algorithm: use of sorts and generalization of the infered automaton according to a local criterion. We show on some experiments that our approach scales with large databases and both improves the predictive power of the learned model and the convergence of the learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Abe and H. Mamitsuka. Predicting protein secondary structure using stochastic tree grammars. Machine Learning, 29:275–301, 1997.

    Article  MATH  Google Scholar 

  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M.s Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487–499. Morgan Kaufmann, 12–15 1994.

    Google Scholar 

  3. M. Bernard and C. de la Higuera. Apprentissage de programmes logiques par Inférence Grammaticale. Revue d’Intelligence Artificielle, 14(3–4):375–396, 2000. Hermes Sciences.

    Google Scholar 

  4. C.L. Blake and C.J. Merz. University of California Irvine repository of machine learning databases. http://www.ics.uci.edu/~mlearn/, 1998.

  5. J. Calera-Rubio and R. C. Carrasco. Computing the relative entropy between regular tree languages. Information Processing Letters, 68(6):283–289, 1998.

    Article  MathSciNet  Google Scholar 

  6. R. C. Carrasco, J. Oncina, and J. Calera. Stochastic Inference of Regular Tree Languages. Machine Learning, 44(1/2):185–197, 2001.

    Article  MATH  Google Scholar 

  7. R. Chaudhuri and A. N. V. Rao. Approximating grammar probabilities: Solution of a conjecture. Journal of the Association for Computing Machinery, 33(4):702–705, 1986.

    MathSciNet  Google Scholar 

  8. H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. Tree Automata Techniques and Applications. Available on: http://www.grappa.univ-lille3.fr/tata, 1997.

  9. G. F. Cooper and E. Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347, 1992.

    MATH  Google Scholar 

  10. V. Crestana-Jensen and N. Soparkar. Frequent itemset counting across multiple tables. In 4th Pacific-Asian conference on Knowledge Discovery and Data Mining (PAKDD 2000), pages 49–61, April 2000.

    Google Scholar 

  11. L. De Raedt. Data mining in multi-relational databases. In 4th European Conference on Principles and Practice of Knowledge, 2000. Invited talk.

    Google Scholar 

  12. L. Dehaspe and H. Toivonen. Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery, 3(1):7–36, 1999.

    Article  Google Scholar 

  13. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In 16th International Joint Conference on Artificial Intelligence (IJCAI), pages 1300–1307. Morgan Kaufmann, 1999.

    Google Scholar 

  14. P. Garcia and J. Oncina. Inference of recognizable tree sets. Research Report DSIC-II/47/93, Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 1993.

    Google Scholar 

  15. F. Gécseg and M. Steinby. Tree Automata. Akadémiai Kiadó, Budapest, 1984.

    MATH  Google Scholar 

  16. L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of relational structure. In 18th International Conference on Machine Learning, pages 170–177, Williamston, MA, June 2001. Morgan Kaufmann.

    Google Scholar 

  17. E. M. Gold. Language identification in the limit. Information and Control, 10(n5):447–474, 1967.

    Article  MATH  Google Scholar 

  18. W. Hoeffding. Probabilities inequalities for sums or bounded random variables. Journal of the American Association, 58(301):13–30, 1963.

    Article  MATH  MathSciNet  Google Scholar 

  19. T. Knuutila and M. Steinby. Inference of tree languages from a finite sample: an algebraic approach. Theoretical Computer Science, 129:337–367, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  20. H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In U. M. Fayyad and R. Uthurusamy, editors, AAAI Workshop on Knowledge Discovery in Databases (KDD-94), pages 181–192, Seattle, Washington, 1994. AAAI Press.

    Google Scholar 

  21. S. Muggleton and L. De Raedt. Inductive Logic Programming: Theory and Methods. Journal of Logic Programming, 19–20:629–679, 1994.

    Article  MathSciNet  Google Scholar 

  22. J. R. Rico-Juan, J. Calera, and R. C. Carrasco. Probabilistic k-Testable Tree-Languages. In A. L. Oliveira, editor, 5th International Colloquium on Grammatical Inference (ICGI 2000), Lisbon (Portugal), volume 1891 of Lecture Notes in Computer Science, pages 221–228, Berlin, September 2000. Springer.

    Google Scholar 

  23. A. Stolcke and S. Omohundro. Inducing probabilistic grammars by bayesian model merging. In 2nd International Colloquium on Grammatical Inference (ICGI’94), volume 862 of Lecture Notes in Artificial Intelligence, pages 106–118, Alicante, Spain, 1994. Springer Verlag.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Habrard, A., Bernard, M., Jacquenet, F. (2002). Generalized Stochastic Tree Automata for Multi-relational Data Mining. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2002. Lecture Notes in Computer Science(), vol 2484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45790-9_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-45790-9_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44239-4

  • Online ISBN: 978-3-540-45790-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics