Skip to main content

Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics

  • Conference paper
  • First Online:
Book cover Algorithms in Bioinformatics (WABI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2452))

Included in the following conference series:

Abstract

We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A = 〈a 1, a 2, ..., a n〉 of real numbers, a segment S is a consecutive subsequence 〈a i, a i+1, ..., a j〉. The width of S is j - i + 1, while the density is (∑ikj a k)/(j - i+1). The maximum-density segment problem takes A and two integers L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. If U = n (or equivalently, U = 2L - 1), we can solve the problem in O(n) time, improving upon the O(n log L)-time algorithm by Lin, Jiang and Chao for a general sequence A. Furthermore, if U and L are arbitrary, we solve the problem in O(n + n log(U - L + 1)) time. There has been no nontrivial result for this case previously. Both results also hold for a weighted variant of the maximum-density segment problem.

Supported in part by NSF grant EIA-0112934.

Supported in part by NSC grant NSC-90-2218-E-001-005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. N. Alexandrov and V. V. Solovyev. Statistical significance of ungapped sequence alignments. In Proceedings of Pacific Symposium on Biocomputing, volume 3, pages 461–470, 1998.

    Google Scholar 

  2. G. Barhardi. Isochores and the evolutionary genomics of vertebrates. Gene, 241:3–17, 2000.

    Article  Google Scholar 

  3. J. L. Bentley. Programming Pearls. Addison-Wesley, Reading, MA, 1986.

    Google Scholar 

  4. G. Bernardi and G. Bernardi. Compositional constraints and genome evolution. Journal of Molecular Evolution, 24:1–11, 1986.

    Article  Google Scholar 

  5. B. Charlesworth. Genetic recombination: patterns in the genome. Current Biology, 4:182–184, 1994.

    Article  Google Scholar 

  6. L. Duret, D. Mouchiroud, and C. Gautier. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. Journal of Molecular Evolution, 40:308–371, 1995.

    Article  Google Scholar 

  7. A. Eyre-Walker. Evidence that both G+C rich and G+C poor isochores are replicated early and late in the cell cycle. Nucleic Acids Research, 20:1497–1501, 1992.

    Article  Google Scholar 

  8. A. Eyre-Walker. Recombination and mammalian genome evolution. Proceedings of the Royal Society of London Series B, Biological Science, 252:237–243, 1993.

    Google Scholar 

  9. C. A. Fields and C. A. Soderlund. gm: a practical tool for automating DNA sequence analysis. Computer Applications in the Biosciences, 6:263–270, 1990.

    Google Scholar 

  10. J. Filipski. Correlation between molecular clock ticking, codon usage fidelity of DNA repair, chromosome banding and chromatin compactness in germline cells. FEBS Letters, 217:184–186, 1987.

    Article  Google Scholar 

  11. M. P. Francino and H. Ochman. Isochores result from mutation not selection. Nature, 400:30–31, 1999.

    Article  Google Scholar 

  12. S. M. Fullerton, A. B. Carvalho, and A. G. Clark. Local rates of recombination are positively correlated with GC content in the human genome. Molecular Biology and Evolution, 18(6):1139–1142, 2001.

    Google Scholar 

  13. P. Guldberg, K. Gronbak, A. Aggerholm, A. Platz, P. thor Straten, V. Ahrenkiel, P. Hokland, and J. Zeuthen. Detection of mutations in GC-rich DNA by bisulphite denaturing gradient gel electrophoresis. Nucleic Acids Research, 26(6):1548–1549, 1998.

    Article  Google Scholar 

  14. R. C. Hardison, D. Drane, C. Vandenbergh, J.-F. F. Cheng, J. Mansverger, J. Taddie, S. Schwartz, X. Huang, and W. Miller. Sequence and comparative analysis of the rabbit alpha-like globin gene cluster reveals a rapid mode of evolution in a G+C rich region of mammalian genomes. Journal of Molecular Biology, 222:233–249, 1991.

    Article  Google Scholar 

  15. W. Henke, K. Herdel, K. Jung, D. Schnorr, and S. A. Loening. Betaine improves the PCR amplification of GC-rich DNA sequences. Nucleic Acids Research, 25(19):3957–3958, 1997.

    Article  Google Scholar 

  16. G. P. Holmquist. Chromosome bands, their chromatin flavors, and their functional features. American Journal of Human Genetics, 51:17–37, 1992.

    Google Scholar 

  17. X. Huang. An algorithm for identifying regions of a DNA sequence that satisfy a content requirement. Computer Applications in the Biosciences, 10(3):219–225, 1994.

    Google Scholar 

  18. K. Ikehara, F. Amada, S. Yoshida, Y. Mikata, and A. Tanaka. A possible origin of newly-born bacterial genes: significance of GC-rich nonstop frame on antisense strand. Nucleic Acids Research, 24(21):4249–4255, 1996.

    Article  Google Scholar 

  19. R. B. Inman. A denaturation map of the 1 phage DNA molecule determined by electron microscopy. Journal of Molecular Biology, 18:464–476, 1966.

    Article  Google Scholar 

  20. R. Jin, M.-E. Fernandez-Beros, and R. P. Novick. Why is the initiation nick site of an AT-rich rolling circle plasmid at the tip of a GC-rich cruciform? The EMBO Journal, 16(14):4456–4466, 1997.

    Article  Google Scholar 

  21. Y. L. Lin, T. Jiang, and K. M. Chao. Algorithms for locating the length-constrained heaviest segments, with applications to biomolecular sequence analysis. Journal of Computer and System Sciences, 2002. To appear.

    Google Scholar 

  22. G. Macaya, J.-P. Thiery, and G. Bernardi. An approach to the organization of eukaryotic enomes at a macromolecular level. Journal of Molecular Biology, 108:237–254, 1976.

    Article  Google Scholar 

  23. C. S. Madsen, C. P. Regan, and G. K. Owens. Interaction of CArG elements and GC-rich repressor element in transcriptional regulation of the smooth muscle myosin heavy chain gene in vascular smooth muscle cells. Journal of Biological Chemistry, 272(47):29842–29851, 1997.

    Article  Google Scholar 

  24. S.-i. Murata, P. Herman, and J. R. Lakowicz. Texture analysis of fluorescence lifetime images of AT-and GC-rich regions in nuclei. Journal of Hystochemistry and Cytochemistry, 49:1443–1452, 2001.

    Google Scholar 

  25. A. Nekrutenko and W.-H. Li. Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Research, 10:1986–1995, 2000.

    Article  Google Scholar 

  26. P. Rice, I. Longden, and A. Bleasby. EMBOSS: The European molecular biology open software suite. Trends in Genetics, 16(6):276–277, June 2000.

    Article  Google Scholar 

  27. L. Scotto and R. K. Assoian. A GC-rich domain with bifunctional effects on mRNA and protein levels: implications for control of transforming growth factor beta 1 expression. Molecular and Cellular Biology, 13(6):3588–3597, 1993.

    Google Scholar 

  28. P. H. Sellers. Pattern recognition in genetic sequences by mismatch density. Bulletin of Mathematical Biology, 46(4):501–514, 1984.

    MATH  MathSciNet  Google Scholar 

  29. P. M. Sharp, M. Averof, A. T. Lloyd, G. Matassi, and J. F. Peden. DNA sequence evolution: the sounds of silence. Philosophical Transactions of the Royal Society of London Series B, Biological Sciences, 349:241–247, 1995.

    Article  Google Scholar 

  30. P. Soriano, M. Meunier-Rotival, and G. Bernardi. The distribution of interspersed repeats is nonuniform and conserved in the mouse and human genomes. Proceedings of the National Academy of Sciences of the United States of America, 80:1816–1820, 1983.

    Google Scholar 

  31. N. Stojanovic, L. Florea, C. Riemer, D. Gumucio, J. Slightom, M. Goodman, W. Miller, and R. Hardison. Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. Nucleic Acids Research, 27:3899–3910, 1999.

    Article  Google Scholar 

  32. N. Sueoka. Directional mutation pressure and neutral molecular evolution. Proceedings of the National Academy of Sciences of the United States of America, 80:1816–1820, 1988.

    Google Scholar 

  33. Z. Wang, E. Lazarov, M. O’Donnel, and M. F. Goodman. Resolving a fidelity paradox: Why Escherichia coli DNA polymerase II makes more base substitution errors in at-compared to GC-rich DNA. Journal of Biological Chemistry, 2002. To appear.

    Google Scholar 

  34. K. H. Wolfe, P. M. Sharp, and W.-H. Li. Mutation rates differ among regions of the mammalian genome. Nature, 337:283–285, 1989.

    Article  Google Scholar 

  35. Y. Wu, R. P. Stulp, P. Elfferich, J. Osinga, C. H. Buys, and R. M. Hofstra. Improved mutation detection in GC-rich DNA fragments by combined DGGE and CDGE. Nucleic Acids Research, 27(15):e9, 1999.

    Article  Google Scholar 

  36. S. Zoubak, O. Clay, and G. Bernardi. The gene distribution of the human genome. Gene, 174:95–102, 1996.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Goldwasser, M.H., Kao, MY., Lu, HI. (2002). Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-45784-4_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44211-0

  • Online ISBN: 978-3-540-45784-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics