Advertisement

A Frequent Pattern Mining Method for Finding Planted (l, d)-motifs of Unknown Length

  • Caiyan Jia
  • Ruqian Lu
  • Lusheng Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6401)

Abstract

Identification and characterization of gene regulatory binding motifs is one of the fundamental tasks toward systematically understanding the molecular mechanisms of transcriptional regulation. Recently, the problem has been abstracted as the challenge planted (l, d)-motif problem. Previous studies have developed numerous methods to solve the problem. But most of methods need to specify the length l of a motif in advance. In this study, we present an exact and efficient algorithm, called Apriori-Motif, without given l. The algorithm uses breadth first search and prunes the search space quickly by the downward closure property used in Apriori, a classical algorithm of frequent pattern mining. Empirical study shows that Apriori-Motif is better than some existing methods.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tompa, M.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 137–144 (2005)CrossRefGoogle Scholar
  2. 2.
    Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Research 33(15), 4899–4913 (2005)CrossRefGoogle Scholar
  3. 3.
    Das, M.K., Dai, H.K.: A survey of DNA motif finding algorithms. BMC Bioinformatics 8(suppl. 7) (2007)Google Scholar
  4. 4.
    Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, California, USA, pp. 269–278 (2000)Google Scholar
  5. 5.
    Sagot, M.F.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  6. 6.
    Eskin, E., Pevzner, P.: Finding composite regulatory patterns in DNA sequences. Bioinformatics, 354–363 (2002)Google Scholar
  7. 7.
    Buhler, J., Tompa, M.: Finding motifs using random projections. In: Proceeding of the Fifth Annual Internal Conference Computational Molecular Biology, Canada. ACM Press, New York (2001)Google Scholar
  8. 8.
    Keich, U., Pevzner, P.A.: Subtle motif: defining the limits of finding algorithms. Bioinformatics 18(10), 1382–1390 (2002)CrossRefGoogle Scholar
  9. 9.
    Price, A., Ramabhadran, S., Pevzner, P.A.: Finding subtle motifs by branching from sample string. Bioinformatics 2, 1–7 (2003)Google Scholar
  10. 10.
    Evans, P.A., Smith, A.D.: Toward optimal motif enumeration. In: Proceedings of Algorithms and Data Structures, 8th International Workshop, pp. 47–58 (2003)Google Scholar
  11. 11.
    Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17, 207–214 (2001)Google Scholar
  12. 12.
    Davila, J., Balla, S., Rajasekaran, S.: Fast and practical algorithms for planted (l, d) motif search. IEEE/ACM Trans. on Computational Biology and Bioinformatics 4, 544–552 (2007)CrossRefGoogle Scholar
  13. 13.
    Chin, Y.L., Leung, C.M.: Voting algorithms for discovering long motifs. In: Proceedings of the Third Asia-Pacific Bioinformatics Conference, Singapore, pp. 261–271 (2005)Google Scholar
  14. 14.
    Leung, C.M., Chin, Y.L.: An efficient algorithm for the extended (l, d)-motif problem with unknown number of binding sites. In: Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering, pp. 11–18 (2005)Google Scholar
  15. 15.
    Pisanti, N., Carvalho, A.M., et al.: RISOTTO: fast extraction of motifs with mismatches. In: Proceeding of the Seventh Latin Am. Theoretical Informatics Symp., pp. 757–768 (2006)Google Scholar
  16. 16.
    Lawrence, C.E., Altschul, S.F., et al.: Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)CrossRefGoogle Scholar
  17. 17.
    Lu, R.Q., Jia, C.Y., et al.: An exact data mining method for finding center strings and all their instances. IEEE Trans. on Knowledge and Data Engineering 19(4), 509–522 (2007)CrossRefGoogle Scholar
  18. 18.
    Styczynski, M.P., Jensen, K.L.: An extension and novel solution to the (l, d)-motif challenge problem. Genome Informatics 15, 63–71 (2004)Google Scholar
  19. 19.
    Jensen, K.L., Styczynski, M.P.: et al: A generic motif discovery algorithm for sequential data. Bioinformatics 22, 21–28 (2006)CrossRefGoogle Scholar
  20. 20.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, pp. 487–499 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Caiyan Jia
    • 1
  • Ruqian Lu
    • 2
    • 3
  • Lusheng Chen
    • 2
  1. 1.Department of Computer ScienceBeijing Jiaotong UniversityBeijingChina
  2. 2.Shanghai Key Lab of Intelligent Information Processing & Department of Computer Science and EngineeringFudan UniversityShanghaiChina
  3. 3.Institute of MathematicsChinese Academy of SciencesBeijingChina

Personalised recommendations