Efficient Searching for Motifs in DNA Sequences Using Position Weight Matrices

Stojanovic, Nikola

doi:10.1007/978-3-642-18472-7_31

Nikola Stojanovic⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 127))

Included in the following conference series:

International Joint Conference on Biomedical Engineering Systems and Technologies

1042 Accesses

Abstract

Searching genomic sequences for motifs representing functionally important sites is a significant and well–established subfield of bioinformatics. In that context, Position Weight Matrices are a popular way of representing variable motifs, as they have been widely used for describing the binding sites of transcriptional proteins. However, the standard implementation of PWM matching, while not inefficient on shorter sequences, is too expensive for whole–genome searches. In this paper we present an algorithm we have developed for efficient matching of PWMs in long target sequences. After the initial pre–processing of the matrix it performs in time linear to the size of the genomic segment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho, A., Corasick, M.: Efficient string matching: an aid to bibliographic search. Comm. Assoc. Comput. Mach. 18, 333–340 (1975)
MathSciNet MATH Google Scholar
Apostolico, A., Bock, M., Lonardi, S., Xu, X.: Efficient detection of unusual words. J. Comput. Biol. 7, 71–94 (2000)
Article Google Scholar
Bryne, J., Valen, E., Tang, M., Marstrand, T., Winther, O., da Piedade, I., Krogh, A., Lenhard, B., Sandelin, A.: JASPAR, the open access database of transcription factor–binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 36, D102–D106 (2008)
Article Google Scholar
Gershenzon, N.I., Stormo, G.D., Ioshikhes, I.P.: Computational technique for improvement of the position–weight matrices for the DNA/protein binding sites. Nucleic Acids Res. 33, 2290–2301 (2005)
Article Google Scholar
Hannenhalli, S., Wang, L.S.: Enhanced position weight matrices using mixture models. Bioinformatics 21, i204–i212 (2005)
Article Google Scholar
Hughes, J., Estep, P., Tavazoie, S., Church, G.: Computational identification of cis–regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000)
Article Google Scholar
Kel, A.E., Gössling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O.V., Wingender, E.: Match: A tool for searching transcription factor binding sites in dna sequences. Nucleic Acids Res. 31(13), 3576–3579 (2003), http://dx.doi.org/10.1093/nar/gkg585
Article Google Scholar
Khambata-Ford, S., Liu, Y., Gleason, C., Dickson, M., Altman, R., Batzoglou, S., Myers, R.: Identification of promoter regions in the human genome by using a retroviral plasmid library–based functional reporter gene assay. Genome Res. 13, 1765–1774 (2003)
Article Google Scholar
Knuth, D., Morris, J., Pratt, V.: Fast pattern matching in strings. SIAM J. Computing 6, 323–350 (1977)
Article MathSciNet MATH Google Scholar
Liefooghe, A., Touzet, H., Varré, J.S.: Large Scale Matching for Position Weight Matrices. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 401–412. Springer, Heidelberg (2006)
Chapter Google Scholar
Nelson, C., Hersh, B., Carroll, S.B.: The regulatory content of intergenic DNA shapes genome architecture. Genome Biol. 5, R25 (2004)
Article Google Scholar
Pizzi, C., Rastas, P., Ukkonen, E.: Finding signicant matches of position weight matrices in linear time. IEEE/ACM Transactions on Computational Biology and Bioinformatics E–publication ahead of print (2009)
Google Scholar
Qin, Z., McCue, L., Thompson, W., Mayerhofer, L., Lawrence, C., Liu, J.: Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nature Biotechnology 21, 435–439 (2003)
Article Google Scholar
Singh, A., Stojanovic, N.: An efficient algorithm for the identification of repetitive variable motifs in the regulatory sequences of co-expressed genes. In: Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds.) ISCIS 2006. LNCS, vol. 4263, pp. 182–191. Springer, Heidelberg (2006)
Chapter Google Scholar
Singh, A., Stojanovic, N.: Genome–wide search for putative transcriptional modules in eukaryotic sequences. In: Proceedings of BIOCOMP 2009, pp. 848–854 (2009)
Google Scholar
Stojanovic, N.: A study on the distribution of phylogenetically conserved blocks within clusters of mammalian homeobox genes. Genetics and Molecular Biology 32, 666–673 (2009)
Article Google Scholar
Stojanovic, N.: Linear-time matching of position weight matrices. In: Proceedings of the First International Conference on Bioinformatics, BIOINFORMATICS 2010, pp. 66–73 (2010)
Google Scholar
Stormo, G.: Consensus patterns in DNA. Methods Enzym. 183, 211–221 (1990)
Article Google Scholar
The ENCODE Project Consortium: The ENCODE pilot project: Identification and analysis of functional elements in 1% of the human genome. Nature 447, 799–816 (2007)
Google Scholar
van Helden, J.: Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics 20, 399–406 (2004)
Article Google Scholar
Wingender, E.: The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Briefings in Bioinformatics 9, 326–332 (2008)
Article Google Scholar
Young, J.E., Vogt, T., Gross, K.W., Khani, S.C.: A short, highly active photoreceptor–specific enhancer/promoter region upstream of the human rhodopsin kinase gene. Investigative Ophtamology and Visual Science 44, 4076–4085 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, 76019, U.S.A.
Nikola Stojanovic

Authors

Nikola Stojanovic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Departament of Systems and Informatics, Polytechnic Institute of Setúbal – INSTICC, Rua do Vale de Chaves - Estefanilha, 2910-761, Setúbal, Portugal
Joaquim Filipe
Institute of Telecommunications, Av. Rovisco Pais, 1, 1049-001, Lisboa, Portugal
Hugo Gamboa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stojanovic, N. (2011). Efficient Searching for Motifs in DNA Sequences Using Position Weight Matrices. In: Fred, A., Filipe, J., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2010. Communications in Computer and Information Science, vol 127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18472-7_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-18472-7_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18471-0
Online ISBN: 978-3-642-18472-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics