Advertisement

Maximum Entropy modeling with Clausal Constraints

  • Luc Dehaspe
Part II Papers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1297)

Abstract

We present the learning system Maccent which addresses the novel task of stochastic MAximum ENTropy modeling with Clausal Constraints. Maximum Entropy method is a Bayesian method based on the principle that the target stochastic model should be as uniform as possible, subject to known constraints. Maccent incorporates clausal constraints that are based on the evaluation of Prolog clauses in examples represented as Prolog programs. We build on an existing maximum-likelihood approach to maximum entropy modeling, which we upgrade along two dimensions: (1) Maccent can handle larger search spaces, due to a partial ordering defined on the space of clausal constraints, and (2) uses a richer first-order logic format. In comparison with other inductive logic programming systems, MACCENT seems to be the first that explicitly constructs a conditional probability distribution p(C|I) based on an empirical distribution \(\tilde p\)(C|I) (where p(C|I) (\(\tilde p\)(C|I)) equals the induced (observed) probability of an instance I belonging to a class C). First experiments indicate MACCENT may be useful for prediction, and for classification in cases where the induced model should be combined with other stochastic information sources.

Keywords

stochastic modeling inductive logic programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A.L. Berger, V.J. Della Pietra, and S.A. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996.Google Scholar
  2. 2.
    H. Blockeel and L. De Raedt.Experiments with top-down induction of logical decision trees. Technical Report CW 247, Dept. of Computer Science, K.U.Leuven, January 1997. Also in Periodic Progress Report ESPRIT Project ILP2, January 1997. http://www.cs.kuleuiren.ac.be/publicaties/rapporten/CWi997.html.Google Scholar
  3. 3.
    I. Bratko. Prolog Programming for Artificial Intelligence. Addison-Wesley, 1986.Google Scholar
  4. 4.
    D. Brown. A note on approximations to discrete probability distributions. Information and Control, 2:386–392, 1959.Google Scholar
  5. 5.
    J. Cussens. Baysian inductive logic programming with explicit probabilistic bias. Technical Report PRG-TR-24-96, Oxford University Computing Laboratory, 1996.Google Scholar
  6. 6.
    J. N. Darroch and D. Ratcliff. Generalized Iterative Scaling for Log-linear Models. Annals of Mathematical Statistics, 43:1470–1480, 1972.Google Scholar
  7. 7.
    L. De Raedt. Induction in logic. In R.S. Michalski and Wnek J., editors, Proceedings of the 3rd International Workshop on Multistrategy Learning, pages 29–38, 1996.Google Scholar
  8. 8.
    L. De Raedt and H. Blockeel.Using logical decision trees for clustering.In Proceedings of the 7th International Workshop on Inductive Logic Programming. Springer-Verlag, 1997.Google Scholar
  9. 9.
    L. De Raedt and S. Džeroski. First order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392, 1994.Google Scholar
  10. 10.
    L. De Raedt and W. Van Laer. Inductive constraint logic. In Proceedings of the 5th Workshop on Algorithmic Learning Theory, volume 997 of Lecture Notes in Artificial Intelligence. Springer-Verlag, 1995.Google Scholar
  11. 11.
    L. Dehaspe and L. De Raedt. DLAB: A declarative language bias formalism. In Proceedings of the International Symposium on Methodologies for Intelligent Systems (ISMIS96), volume 1079 of Lecture Notes in Artificial Intelligence, pages 613–622. Springer-Verlag, 1996.Google Scholar
  12. 12.
    S.A. Della Pietra, V.D. Della Pietra, and J. Lafferty. Inducing features of random fields. Technical Report CMU-CS-95-144, Carnegie-Mellon University, Pittsburgh, PA, 1995.Google Scholar
  13. 13.
    B. Dolšiak and S. Muggleton. The application of Inductive Logic Programming to finite element mesh design. In S. Muggleton, editor, Inductive logic programming, pages 453–472. Academic Press, 1992.Google Scholar
  14. 14.
    S.F. Gull and G.J. Daniell. Image Reconstruction from Incomplete and Noisy Data. Nature, 272:686, 1978.Google Scholar
  15. 15.
    E.T. Jaynes. Notes on present status and future prospects. In W.T Grandy and L.H. Schick, editors, Maximum Entropy and Bayesian Methods, pages 1–13. Kluwer Academic Publishers, 1990.Google Scholar
  16. 16.
    A. Karalič and I. Bratko. First order regression. Machine Learning, 1997. To appear.Google Scholar
  17. 17.
    S. Kramer. Structural regression trees. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), 1996.Google Scholar
  18. 18.
    M.P. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313–330, 1993.Google Scholar
  19. 19.
    S. Muggleton. Stochastic logic programs. Journal of Logic Programming, 1997. submitted.Google Scholar
  20. 20.
    G. Plotkin. A note on inductive generalization. In Machine Intelligence, volume 5, pages 153–163. Edinburgh University Press, 1970.Google Scholar
  21. 21.
    U. Pompe and I. Kononenko.Naive bayesian classifier within ILP-R.In L. De Raedt, editor, Proceedings of the 5th International Workshop on Inductive Logic Programming, pages 417–436, 1995.Google Scholar
  22. 22.
    U. Pompe and I. Kononenko. Probabilistic first-order classification. In Proceedings of the 7th International Workshop on Inductive Logic Programming. Springer-Verlag, 1997.Google Scholar
  23. 23.
    A. Ratnaparkhi. A maximum entropy part-of-speech tagger. In Proceedings of the Empirical Methods in Natural Language Processing Conference. University of Pennsylvania, 1996.Google Scholar
  24. 24.
    E.S. Ristad. Maximum entropy toolkit, release 1.5 beta. Technical report, Princeton Univ., January 1997. ftp://ftp.cs.princeton.edu/pub/packages/memt.Google Scholar
  25. 25.
    A. Srinivasan, S.H. Muggleton, M.J.E. Sternberg, and R.D. King. Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence, 85, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Luc Dehaspe
    • 1
  1. 1.Department of Computer ScienceKatholieke Universiteit LeuvenHeverleeBelgium

Personalised recommendations