Skip to main content

Efficient Searching for Motifs in DNA Sequences Using Position Weight Matrices

  • Conference paper
Book cover Biomedical Engineering Systems and Technologies (BIOSTEC 2010)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 127))

  • 1042 Accesses

Abstract

Searching genomic sequences for motifs representing functionally important sites is a significant and well–established subfield of bioinformatics. In that context, Position Weight Matrices are a popular way of representing variable motifs, as they have been widely used for describing the binding sites of transcriptional proteins. However, the standard implementation of PWM matching, while not inefficient on shorter sequences, is too expensive for whole–genome searches. In this paper we present an algorithm we have developed for efficient matching of PWMs in long target sequences. After the initial pre–processing of the matrix it performs in time linear to the size of the genomic segment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A., Corasick, M.: Efficient string matching: an aid to bibliographic search. Comm. Assoc. Comput. Mach. 18, 333–340 (1975)

    MathSciNet  MATH  Google Scholar 

  2. Apostolico, A., Bock, M., Lonardi, S., Xu, X.: Efficient detection of unusual words. J. Comput. Biol. 7, 71–94 (2000)

    Article  Google Scholar 

  3. Bryne, J., Valen, E., Tang, M., Marstrand, T., Winther, O., da Piedade, I., Krogh, A., Lenhard, B., Sandelin, A.: JASPAR, the open access database of transcription factor–binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 36, D102–D106 (2008)

    Article  Google Scholar 

  4. Gershenzon, N.I., Stormo, G.D., Ioshikhes, I.P.: Computational technique for improvement of the position–weight matrices for the DNA/protein binding sites. Nucleic Acids Res. 33, 2290–2301 (2005)

    Article  Google Scholar 

  5. Hannenhalli, S., Wang, L.S.: Enhanced position weight matrices using mixture models. Bioinformatics 21, i204–i212 (2005)

    Article  Google Scholar 

  6. Hughes, J., Estep, P., Tavazoie, S., Church, G.: Computational identification of cis–regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000)

    Article  Google Scholar 

  7. Kel, A.E., Gössling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O.V., Wingender, E.: Match: A tool for searching transcription factor binding sites in dna sequences. Nucleic Acids Res. 31(13), 3576–3579 (2003), http://dx.doi.org/10.1093/nar/gkg585

    Article  Google Scholar 

  8. Khambata-Ford, S., Liu, Y., Gleason, C., Dickson, M., Altman, R., Batzoglou, S., Myers, R.: Identification of promoter regions in the human genome by using a retroviral plasmid library–based functional reporter gene assay. Genome Res. 13, 1765–1774 (2003)

    Article  Google Scholar 

  9. Knuth, D., Morris, J., Pratt, V.: Fast pattern matching in strings. SIAM J. Computing 6, 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  10. Liefooghe, A., Touzet, H., Varré, J.S.: Large Scale Matching for Position Weight Matrices. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 401–412. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Nelson, C., Hersh, B., Carroll, S.B.: The regulatory content of intergenic DNA shapes genome architecture. Genome Biol. 5, R25 (2004)

    Article  Google Scholar 

  12. Pizzi, C., Rastas, P., Ukkonen, E.: Finding signicant matches of position weight matrices in linear time. IEEE/ACM Transactions on Computational Biology and Bioinformatics E–publication ahead of print (2009)

    Google Scholar 

  13. Qin, Z., McCue, L., Thompson, W., Mayerhofer, L., Lawrence, C., Liu, J.: Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nature Biotechnology 21, 435–439 (2003)

    Article  Google Scholar 

  14. Singh, A., Stojanovic, N.: An efficient algorithm for the identification of repetitive variable motifs in the regulatory sequences of co-expressed genes. In: Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds.) ISCIS 2006. LNCS, vol. 4263, pp. 182–191. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Singh, A., Stojanovic, N.: Genome–wide search for putative transcriptional modules in eukaryotic sequences. In: Proceedings of BIOCOMP 2009, pp. 848–854 (2009)

    Google Scholar 

  16. Stojanovic, N.: A study on the distribution of phylogenetically conserved blocks within clusters of mammalian homeobox genes. Genetics and Molecular Biology 32, 666–673 (2009)

    Article  Google Scholar 

  17. Stojanovic, N.: Linear-time matching of position weight matrices. In: Proceedings of the First International Conference on Bioinformatics, BIOINFORMATICS 2010, pp. 66–73 (2010)

    Google Scholar 

  18. Stormo, G.: Consensus patterns in DNA. Methods Enzym. 183, 211–221 (1990)

    Article  Google Scholar 

  19. The ENCODE Project Consortium: The ENCODE pilot project: Identification and analysis of functional elements in 1% of the human genome. Nature 447, 799–816 (2007)

    Google Scholar 

  20. van Helden, J.: Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics 20, 399–406 (2004)

    Article  Google Scholar 

  21. Wingender, E.: The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Briefings in Bioinformatics 9, 326–332 (2008)

    Article  Google Scholar 

  22. Young, J.E., Vogt, T., Gross, K.W., Khani, S.C.: A short, highly active photoreceptor–specific enhancer/promoter region upstream of the human rhodopsin kinase gene. Investigative Ophtamology and Visual Science 44, 4076–4085 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stojanovic, N. (2011). Efficient Searching for Motifs in DNA Sequences Using Position Weight Matrices. In: Fred, A., Filipe, J., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2010. Communications in Computer and Information Science, vol 127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18472-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-18472-7_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-18471-0

  • Online ISBN: 978-3-642-18472-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics