Advertisement

Preprocessing for Optimization of Probabilistic-Logic Models for Sequence Analysis

  • Henning Christiansen
  • Ole Torp Lassen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5649)

Abstract

A class of probabilistic-logic models is considered, which increases the expressibility from HMM’s and SCFG’s regular and context-free languages to, in principle, Turing complete languages. In general, such models are computationally far too complex for direct use, so optimization by pruning and approximation are needed. The first steps are taken towards a methodology for optimizing such models by approximations using auxiliary models for preprocessing or splitting them into submodels. Evaluation of such approximating models is challenging as authoritative test data may be sparse. On the other hand, the original complex models may be used for generating artificial evaluation data by efficient sampling, which can be used in the evaluation, although it does not constitute a foolproof test procedure. These models and evaluation processes are illustrated in the PRISM system developed by other authors, and we discuss their applicability and limitations.

Keywords

Hide Markov Model Logic Program Parse Tree Canonical Model Logical Part 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Biba, M., Ferilli, S., Mauro, N.D., Basile, T.M.A.: A hybrid symbolic-statistical approach to modeling metabolic networks. In: Apolloni, B., Howlett, R.J., Jain, L.C. (eds.) KES 2007, Part I. LNCS, vol. 4692, pp. 132–139. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268, 78–94 (1997)CrossRefGoogle Scholar
  3. 3.
    Chen, J., Muggleton, S., Santos, J.: Abductive stochastic logic programs for metabolic network inhibition learning. In: Frasconi, P., Kersting, K., Tsuda, K. (eds.) MLG (2007)Google Scholar
  4. 4.
    Christiansen, H., Dahmcke, C.M.: A machine learning approach to test data generation: A case study in evaluation of gene finders. In: Perner, P. (ed.) MLDM 2007. LNCS, vol. 4571, pp. 742–755. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Christiansen, H., Gallagher, J.: Mode-based slicing and its applications (submitted, 2009) Google Scholar
  6. 6.
    De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S. (eds.): Probabilistic Inductive Logic Programming. LNCS, vol. 4911. Springer, Heidelberg (2008)zbMATHGoogle Scholar
  7. 7.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)CrossRefzbMATHGoogle Scholar
  8. 8.
    Jaeger, M.: Relational bayesian networks. In: Geiger, D., Shenoy, P.P. (eds.) UAI, pp. 266–273. Morgan Kaufmann, San Francisco (1997)Google Scholar
  9. 9.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice-Hall, Inc., Upper Saddle River (2006)Google Scholar
  10. 10.
    Koller, D., McAllester, D.A., Pfeffer, A.: Effective bayesian inference for stochastic programs. In: AAAI/IAAI, pp. 740–747 (1997)Google Scholar
  11. 11.
    Krogh, A.: Using database matches with for HMMGene for automated gene detection in Drosophila. Genome Research 10(4), 523–528 (2000)CrossRefGoogle Scholar
  12. 12.
    Lukashin, A., Borodovsky, M.: Genemark.hmm: new solutions for gene finding. Nucleic Acids Research 26(4), 1107–1115 (1998)CrossRefGoogle Scholar
  13. 13.
    Muggleton, S.: Learning from positive data. In: Muggleton, S. (ed.) ILP 1996. LNCS, vol. 1314, pp. 358–376. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  14. 14.
    LoSt on the Web, http://lost.ruc.dk
  15. 15.
    Sato, T.: A statistical learning method for logic programs with distribution semantics. In: ICLP, pp. 715–729 (1995)Google Scholar
  16. 16.
    Sato, T., Kameya, Y.: Parameter learning of logic programs for symbolic-statistical modeling. J. Artif. Intell. Res (JAIR) 15, 391–454 (2001)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Sato, T., Kameya, Y.: Statistical abduction with tabulation. In: Kakas, A.C., Sadri, F. (eds.) Computational Logic: Logic Programming and Beyond. LNCS, vol. 2408, pp. 567–587. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Henning Christiansen
    • 1
  • Ole Torp Lassen
    • 1
  1. 1.Research group PLIS: Programming, Logic and Intelligent Systems Department of Communication, Business and Information TechnologiesRoskilde UniversityRoskildeDenmark

Personalised recommendations