Advertisement

Extracting best consensus motifs from positive and negative examples

  • Erika Tateishi
  • Osamu Maruyama
  • Satoru Miyano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1046)

Abstract

We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet Σ is a family Ω of subsets of Σ*. A motif π of type Ω is a string π=π1πn of motif components, each of which stands for an element in Ω. The BCM problem for Ω is, given a yes-no sample S=(α(1)(1),..., (α(m)(m))} of pairs of strings in Σ* with α(i) ≠β(i) for 1 ≤ i ≤ m, to find a motif π of type Ω that maximizes the number of good pairs in S, where (α(i), β(i)) is good for π if π accepts α(i) and rejects β(i) We prove that the BCM problem is NP-complete even for a very simple type Ω1=2 −{θ}, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type Ω1∪ {Σ+}∪{Σ[i,j]¦1≤i≤ j}, where Σ[i,j] is the set of strings over Σ of length between i and j Furthermore, for the BCM problem for Ω1 we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.

Keywords

algorithms and computational complexity genome informatics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Angluin, D., Finding patterns common to a set of strings, J. Comput. System Sci. 21 (1980) 46–62.CrossRefGoogle Scholar
  2. 2.
    Arikawa, S., Miyano, S., Shinohara, A., Kuhara, S., Mukouchi, Y., and Shinohara, T., A machine discovery from amino acid sequences by decision trees over regular patterns, New Generation Computing 11 (1993) 361–375.Google Scholar
  3. 3.
    Bairoch, A., PROSITE: a dictionary of sites and patterns in proteins, Nucleic Acids Res. 19 (1991) 2241–2245.PubMedGoogle Scholar
  4. 4.
    Garey, M.R., Johnson, D.S. and Stockmeyer, L., Some simplified NP-complete problems, Theoret. Comput. Sci. 1 (1976) 237–267.CrossRefGoogle Scholar
  5. 5.
    Gribskov, M. and Devereux, J., Sequence Analysis Primer, Stockholm Press, 1991.Google Scholar
  6. 6.
    Helgesen, C. and Sibbald, P.R., PALM — A pattern language for molecular biology, Proc. First International Conference on Intelligent Systems for Morecular Biology, 1993, 172–180.Google Scholar
  7. 7.
    Jiang, T. and Li, M., On the complexity of learning strings and sequences, Proc. 4th Workshop on Computational Learning Theory, 1991, 367–371.Google Scholar
  8. 8.
    Miyano, S., Shinohara, A. and Shinohara, T., Which classes of elementary formal systems are polynomial-time learnable?, Proc. Second Workshop on Algorithmic Learning Theory, 1991, 139–150.Google Scholar
  9. 9.
    Papadimitriou, C.H., Computational Complexity, Addison-Wesley, 1994.Google Scholar
  10. 10.
    Quinlan, J.R., Induction on decision trees, Machine Learning 1 (1986) 81–106.Google Scholar
  11. 11.
    Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., and Arikawa, S., Knowledge acquisition from amino acid sequences by machine learning system BONSAI, Transactions of Information Processing Society of Japan 35 (1994) 2009–2018.Google Scholar
  12. 12.
    Shinohara, T., Polynomial time inference of extended regular pattern languages, Lecture Notes in Computer Science 147 (1983) 115–127.Google Scholar
  13. 13.
    Shoudai, T., Lappe, M., Miyano, S., Shinohara, A., Okazaki, T., Arikawa, S., Uchida, T., Shimozono, S., Shinohara, T., and Kuhara, S., BONSAI Garden: parallel knowledge discovery system for amino acid sequences, Proc. Third International Conference on Intelligent Systems for Molecular Biology (AAAI Press), 1995, 359–366.Google Scholar
  14. 14.
    Tateishi, E. and Miyano, S., A greedy strategy for finding motifs from positive and negative examples, to appear in Proc. First Pacific Symposium on Biocomputing, 1996.Google Scholar
  15. 15.
    Yannakakis, M., On the approximation of maximum satisfiability, J. Algorithms 17 (1994) 475–502.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  1. 1.Department of Information SystemsKyushu University 39KasugaJapan
  2. 2.Research Institute of Fundamental Information ScienceKyushu University 33FukuokaJapan

Personalised recommendations