Abstract
Searching genomic sequences for motifs representing functionally important sites is a significant and well–established subfield of bioinformatics. In that context, Position Weight Matrices are a popular way of representing variable motifs, as they have been widely used for describing the binding sites of transcriptional proteins. However, the standard implementation of PWM matching, while not inefficient on shorter sequences, is too expensive for whole–genome searches. In this paper we present an algorithm we have developed for efficient matching of PWMs in long target sequences. After the initial pre–processing of the matrix it performs in time linear to the size of the genomic segment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A., Corasick, M.: Efficient string matching: an aid to bibliographic search. Comm. Assoc. Comput. Mach. 18, 333–340 (1975)
Apostolico, A., Bock, M., Lonardi, S., Xu, X.: Efficient detection of unusual words. J. Comput. Biol. 7, 71–94 (2000)
Bryne, J., Valen, E., Tang, M., Marstrand, T., Winther, O., da Piedade, I., Krogh, A., Lenhard, B., Sandelin, A.: JASPAR, the open access database of transcription factor–binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 36, D102–D106 (2008)
Gershenzon, N.I., Stormo, G.D., Ioshikhes, I.P.: Computational technique for improvement of the position–weight matrices for the DNA/protein binding sites. Nucleic Acids Res. 33, 2290–2301 (2005)
Hannenhalli, S., Wang, L.S.: Enhanced position weight matrices using mixture models. Bioinformatics 21, i204–i212 (2005)
Hughes, J., Estep, P., Tavazoie, S., Church, G.: Computational identification of cis–regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000)
Kel, A.E., Gössling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O.V., Wingender, E.: Match: A tool for searching transcription factor binding sites in dna sequences. Nucleic Acids Res. 31(13), 3576–3579 (2003), http://dx.doi.org/10.1093/nar/gkg585
Khambata-Ford, S., Liu, Y., Gleason, C., Dickson, M., Altman, R., Batzoglou, S., Myers, R.: Identification of promoter regions in the human genome by using a retroviral plasmid library–based functional reporter gene assay. Genome Res. 13, 1765–1774 (2003)
Knuth, D., Morris, J., Pratt, V.: Fast pattern matching in strings. SIAM J. Computing 6, 323–350 (1977)
Liefooghe, A., Touzet, H., Varré, J.S.: Large Scale Matching for Position Weight Matrices. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 401–412. Springer, Heidelberg (2006)
Nelson, C., Hersh, B., Carroll, S.B.: The regulatory content of intergenic DNA shapes genome architecture. Genome Biol. 5, R25 (2004)
Pizzi, C., Rastas, P., Ukkonen, E.: Finding signicant matches of position weight matrices in linear time. IEEE/ACM Transactions on Computational Biology and Bioinformatics E–publication ahead of print (2009)
Qin, Z., McCue, L., Thompson, W., Mayerhofer, L., Lawrence, C., Liu, J.: Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nature Biotechnology 21, 435–439 (2003)
Singh, A., Stojanovic, N.: An efficient algorithm for the identification of repetitive variable motifs in the regulatory sequences of co-expressed genes. In: Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds.) ISCIS 2006. LNCS, vol. 4263, pp. 182–191. Springer, Heidelberg (2006)
Singh, A., Stojanovic, N.: Genome–wide search for putative transcriptional modules in eukaryotic sequences. In: Proceedings of BIOCOMP 2009, pp. 848–854 (2009)
Stojanovic, N.: A study on the distribution of phylogenetically conserved blocks within clusters of mammalian homeobox genes. Genetics and Molecular Biology 32, 666–673 (2009)
Stojanovic, N.: Linear-time matching of position weight matrices. In: Proceedings of the First International Conference on Bioinformatics, BIOINFORMATICS 2010, pp. 66–73 (2010)
Stormo, G.: Consensus patterns in DNA. Methods Enzym. 183, 211–221 (1990)
The ENCODE Project Consortium: The ENCODE pilot project: Identification and analysis of functional elements in 1% of the human genome. Nature 447, 799–816 (2007)
van Helden, J.: Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics 20, 399–406 (2004)
Wingender, E.: The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Briefings in Bioinformatics 9, 326–332 (2008)
Young, J.E., Vogt, T., Gross, K.W., Khani, S.C.: A short, highly active photoreceptor–specific enhancer/promoter region upstream of the human rhodopsin kinase gene. Investigative Ophtamology and Visual Science 44, 4076–4085 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stojanovic, N. (2011). Efficient Searching for Motifs in DNA Sequences Using Position Weight Matrices. In: Fred, A., Filipe, J., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2010. Communications in Computer and Information Science, vol 127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18472-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-18472-7_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18471-0
Online ISBN: 978-3-642-18472-7
eBook Packages: Computer ScienceComputer Science (R0)