A Parallel, Distributed-Memory Framework for Comparative Motif Discovery

  • Dieter De Witte
  • Michiel Van Bel
  • Pieter Audenaert
  • Piet Demeester
  • Bart Dhoedt
  • Klaas Vandepoele
  • Jan FostierEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8385)


The increasing number of sequenced organisms has opened new possibilities for the computational discovery of cis-regulatory elements (‘motifs’) based on phylogenetic footprinting. Word-based, exhaustive approaches are among the best performing algorithms, however, they pose significant computational challenges as the number of candidate motifs to evaluate is very high. In this contribution, we describe a parallel, distributed-memory framework for de novo comparative motif discovery. Within this framework, two approaches for phylogenetic footprinting are implemented: an alignment-based and an alignment-free method. The framework is able to statistically evaluate the conservation of motifs in a search space containing over 160 million candidate motifs using a distributed-memory cluster with 200 CPU cores in a few hours. Software available from


Motif discovery Phylogenetic footprinting Parallel computing Distributed-memory 



This work was carried out using the Stevin Supercomputer Infrastructure at Ghent University, funded by Ghent University, the Hercules Foundation and the Flemish Government - department EWI. This research fits in the Multidisciplinary Research Partnership of Ghent University: Nucleotides to Networks (N2N).


  1. 1.
    Das, M.K., Dai, H.-K.: A survey of DNA motif finding algorithms. BMC Bioinform. 8(Suppl 7), S21 (2007)CrossRefGoogle Scholar
  2. 2.
    Blanchette, M., Tompa, M.: Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12(5), 739–748 (2002)CrossRefGoogle Scholar
  3. 3.
    Elemento, O., Tavazoie, S.: Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biol. 6(2), R18 (2005)CrossRefGoogle Scholar
  4. 4.
    Wu, J., Sieglaff, D.H., Gervin, J., Xie, X.S.: Discovering regulatory motifs in the Plasmodium genome using comparative genomics. Bioinformatics 24(17), 1843–1849 (2008)CrossRefzbMATHGoogle Scholar
  5. 5.
    Sieglaff, D.H., Dunn, W.A., Xie, X.S., Megy, K., Marinotti, O., James, A.A.: Comparative genomics allows the discovery of cis-regulatory elements in mosquitoes. Proc. Natl. Acad. Sci. 106(9), 3053–3058 (2009)CrossRefGoogle Scholar
  6. 6.
    Kumar, L., Breakspear, A., Kistler, C., Ma, L.J., Xie, X.: Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes. BMC Genomics 11, 208 (2010)CrossRefGoogle Scholar
  7. 7.
    Ettwiller, L., Paten, B., Souren, M., Loosli, F., Wittbrodt, J., Birney, E.: The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates. Genome Biol. 6(12), R104 (2005)CrossRefGoogle Scholar
  8. 8.
    Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., Kellis, M.: Systematic discovery of regulatory motifs in human promoters and 3’ UTRs by comparison of several mammals. Nature 434(7031), 338–345 (2005)CrossRefGoogle Scholar
  9. 9.
    Bailey, T.L., Bodén, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., Noble, W.S.: MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, 202–208 (2009)CrossRefGoogle Scholar
  10. 10.
    Stark, A., Lin, M.F., Kheradpour, P., Pedersen, J.S., Parts, L., Carlson, J.W., Crosby, M.A., Rasmussen, M.D., Roy, S., Deoras, A.N., et al.: Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450(7167), 219–232 (2007)CrossRefGoogle Scholar
  11. 11.
    Gusfield, D.: Algorithms on Strings, Trees, And Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefzbMATHGoogle Scholar
  12. 12.
    Giegerich, R., Kurtz, S., Stoye, J.: Efficient implementation of lazy suffix trees. Softw. Pract. Exp. 33(11), 1035–1049 (2003)CrossRefGoogle Scholar
  13. 13.
    Marsan, L., Sagot, M.F.: Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. J. Comput. Biol. 7(3/4), 345–360 (2000)CrossRefGoogle Scholar
  14. 14.
    Marschall, T., Rahmann, S.: Efficient exact motif discovery. Bioinformatics 25(12), 356–364 (2009)CrossRefGoogle Scholar
  15. 15.
    Van Bel, M., Proost, S., Wischnitzki, E., Movahedi, S., Scheerlinck, C., Van de Peer, Y., Vandepoele, K.: Dissecting Plant Genomes with the PLAZA comparative genomics platform. Plant Physiol. 158(2), 590–600 (2012)CrossRefGoogle Scholar
  16. 16.
    Proost, S., Van Bel, M., Sterk, L., Billiau, K., Van Parys, T., Van de Peer, Y., Vandepoele, K.: PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell 21, 3718–3731 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Dieter De Witte
    • 1
  • Michiel Van Bel
    • 2
    • 3
  • Pieter Audenaert
    • 1
  • Piet Demeester
    • 1
  • Bart Dhoedt
    • 1
  • Klaas Vandepoele
    • 2
    • 3
  • Jan Fostier
    • 1
    Email author
  1. 1.Department of Information Technology (INTEC)Ghent University - iMindsGhentBelgium
  2. 2.Department of Plant Systems BiologyVIBGhentBelgium
  3. 3.Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium

Personalised recommendations