A Parallel, Distributed-Memory Framework for Comparative Motif Discovery
The increasing number of sequenced organisms has opened new possibilities for the computational discovery of cis-regulatory elements (‘motifs’) based on phylogenetic footprinting. Word-based, exhaustive approaches are among the best performing algorithms, however, they pose significant computational challenges as the number of candidate motifs to evaluate is very high. In this contribution, we describe a parallel, distributed-memory framework for de novo comparative motif discovery. Within this framework, two approaches for phylogenetic footprinting are implemented: an alignment-based and an alignment-free method. The framework is able to statistically evaluate the conservation of motifs in a search space containing over 160 million candidate motifs using a distributed-memory cluster with 200 CPU cores in a few hours. Software available from http://bioinformatics.intec.ugent.be/blsspeller/
KeywordsMotif discovery Phylogenetic footprinting Parallel computing Distributed-memory
This work was carried out using the Stevin Supercomputer Infrastructure at Ghent University, funded by Ghent University, the Hercules Foundation and the Flemish Government - department EWI. This research fits in the Multidisciplinary Research Partnership of Ghent University: Nucleotides to Networks (N2N).