Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics

Goldwasser, Michael H.; Kao, Ming-Yang; Lu, Hsueh-I

doi:10.1007/3-540-45784-4_12

Michael H. Goldwasser⁶,
Ming-Yang Kao⁷ &
Hsueh-I Lu⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2452))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1096 Accesses
13 Citations

Abstract

We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A = 〈a ₁, a ₂, ..., a _n〉 of real numbers, a segment S is a consecutive subsequence 〈a _i, a _i+1, ..., a _j〉. The width of S is j - i + 1, while the density is (∑_i≤k≤j a _k)/(j - i+1). The maximum-density segment problem takes A and two integers L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. If U = n (or equivalently, U = 2L - 1), we can solve the problem in O(n) time, improving upon the O(n log L)-time algorithm by Lin, Jiang and Chao for a general sequence A. Furthermore, if U and L are arbitrary, we solve the problem in O(n + n log(U - L + 1)) time. There has been no nontrivial result for this case previously. Both results also hold for a weighted variant of the maximum-density segment problem.

Supported in part by NSF grant EIA-0112934.

Supported in part by NSC grant NSC-90-2218-E-001-005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. N. Alexandrov and V. V. Solovyev. Statistical significance of ungapped sequence alignments. In Proceedings of Pacific Symposium on Biocomputing, volume 3, pages 461–470, 1998.
Google Scholar
G. Barhardi. Isochores and the evolutionary genomics of vertebrates. Gene, 241:3–17, 2000.
Article Google Scholar
J. L. Bentley. Programming Pearls. Addison-Wesley, Reading, MA, 1986.
Google Scholar
G. Bernardi and G. Bernardi. Compositional constraints and genome evolution. Journal of Molecular Evolution, 24:1–11, 1986.
Article Google Scholar
B. Charlesworth. Genetic recombination: patterns in the genome. Current Biology, 4:182–184, 1994.
Article Google Scholar
L. Duret, D. Mouchiroud, and C. Gautier. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. Journal of Molecular Evolution, 40:308–371, 1995.
Article Google Scholar
A. Eyre-Walker. Evidence that both G+C rich and G+C poor isochores are replicated early and late in the cell cycle. Nucleic Acids Research, 20:1497–1501, 1992.
Article Google Scholar
A. Eyre-Walker. Recombination and mammalian genome evolution. Proceedings of the Royal Society of London Series B, Biological Science, 252:237–243, 1993.
Google Scholar
C. A. Fields and C. A. Soderlund. gm: a practical tool for automating DNA sequence analysis. Computer Applications in the Biosciences, 6:263–270, 1990.
Google Scholar
J. Filipski. Correlation between molecular clock ticking, codon usage fidelity of DNA repair, chromosome banding and chromatin compactness in germline cells. FEBS Letters, 217:184–186, 1987.
Article Google Scholar
M. P. Francino and H. Ochman. Isochores result from mutation not selection. Nature, 400:30–31, 1999.
Article Google Scholar
S. M. Fullerton, A. B. Carvalho, and A. G. Clark. Local rates of recombination are positively correlated with GC content in the human genome. Molecular Biology and Evolution, 18(6):1139–1142, 2001.
Google Scholar
P. Guldberg, K. Gronbak, A. Aggerholm, A. Platz, P. thor Straten, V. Ahrenkiel, P. Hokland, and J. Zeuthen. Detection of mutations in GC-rich DNA by bisulphite denaturing gradient gel electrophoresis. Nucleic Acids Research, 26(6):1548–1549, 1998.
Article Google Scholar
R. C. Hardison, D. Drane, C. Vandenbergh, J.-F. F. Cheng, J. Mansverger, J. Taddie, S. Schwartz, X. Huang, and W. Miller. Sequence and comparative analysis of the rabbit alpha-like globin gene cluster reveals a rapid mode of evolution in a G+C rich region of mammalian genomes. Journal of Molecular Biology, 222:233–249, 1991.
Article Google Scholar
W. Henke, K. Herdel, K. Jung, D. Schnorr, and S. A. Loening. Betaine improves the PCR amplification of GC-rich DNA sequences. Nucleic Acids Research, 25(19):3957–3958, 1997.
Article Google Scholar
G. P. Holmquist. Chromosome bands, their chromatin flavors, and their functional features. American Journal of Human Genetics, 51:17–37, 1992.
Google Scholar
X. Huang. An algorithm for identifying regions of a DNA sequence that satisfy a content requirement. Computer Applications in the Biosciences, 10(3):219–225, 1994.
Google Scholar
K. Ikehara, F. Amada, S. Yoshida, Y. Mikata, and A. Tanaka. A possible origin of newly-born bacterial genes: significance of GC-rich nonstop frame on antisense strand. Nucleic Acids Research, 24(21):4249–4255, 1996.
Article Google Scholar
R. B. Inman. A denaturation map of the 1 phage DNA molecule determined by electron microscopy. Journal of Molecular Biology, 18:464–476, 1966.
Article Google Scholar
R. Jin, M.-E. Fernandez-Beros, and R. P. Novick. Why is the initiation nick site of an AT-rich rolling circle plasmid at the tip of a GC-rich cruciform? The EMBO Journal, 16(14):4456–4466, 1997.
Article Google Scholar
Y. L. Lin, T. Jiang, and K. M. Chao. Algorithms for locating the length-constrained heaviest segments, with applications to biomolecular sequence analysis. Journal of Computer and System Sciences, 2002. To appear.
Google Scholar
G. Macaya, J.-P. Thiery, and G. Bernardi. An approach to the organization of eukaryotic enomes at a macromolecular level. Journal of Molecular Biology, 108:237–254, 1976.
Article Google Scholar
C. S. Madsen, C. P. Regan, and G. K. Owens. Interaction of CArG elements and GC-rich repressor element in transcriptional regulation of the smooth muscle myosin heavy chain gene in vascular smooth muscle cells. Journal of Biological Chemistry, 272(47):29842–29851, 1997.
Article Google Scholar
S.-i. Murata, P. Herman, and J. R. Lakowicz. Texture analysis of fluorescence lifetime images of AT-and GC-rich regions in nuclei. Journal of Hystochemistry and Cytochemistry, 49:1443–1452, 2001.
Google Scholar
A. Nekrutenko and W.-H. Li. Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Research, 10:1986–1995, 2000.
Article Google Scholar
P. Rice, I. Longden, and A. Bleasby. EMBOSS: The European molecular biology open software suite. Trends in Genetics, 16(6):276–277, June 2000.
Article Google Scholar
L. Scotto and R. K. Assoian. A GC-rich domain with bifunctional effects on mRNA and protein levels: implications for control of transforming growth factor beta 1 expression. Molecular and Cellular Biology, 13(6):3588–3597, 1993.
Google Scholar
P. H. Sellers. Pattern recognition in genetic sequences by mismatch density. Bulletin of Mathematical Biology, 46(4):501–514, 1984.
MATH MathSciNet Google Scholar
P. M. Sharp, M. Averof, A. T. Lloyd, G. Matassi, and J. F. Peden. DNA sequence evolution: the sounds of silence. Philosophical Transactions of the Royal Society of London Series B, Biological Sciences, 349:241–247, 1995.
Article Google Scholar
P. Soriano, M. Meunier-Rotival, and G. Bernardi. The distribution of interspersed repeats is nonuniform and conserved in the mouse and human genomes. Proceedings of the National Academy of Sciences of the United States of America, 80:1816–1820, 1983.
Google Scholar
N. Stojanovic, L. Florea, C. Riemer, D. Gumucio, J. Slightom, M. Goodman, W. Miller, and R. Hardison. Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. Nucleic Acids Research, 27:3899–3910, 1999.
Article Google Scholar
N. Sueoka. Directional mutation pressure and neutral molecular evolution. Proceedings of the National Academy of Sciences of the United States of America, 80:1816–1820, 1988.
Google Scholar
Z. Wang, E. Lazarov, M. O’Donnel, and M. F. Goodman. Resolving a fidelity paradox: Why Escherichia coli DNA polymerase II makes more base substitution errors in at-compared to GC-rich DNA. Journal of Biological Chemistry, 2002. To appear.
Google Scholar
K. H. Wolfe, P. M. Sharp, and W.-H. Li. Mutation rates differ among regions of the mammalian genome. Nature, 337:283–285, 1989.
Article Google Scholar
Y. Wu, R. P. Stulp, P. Elfferich, J. Osinga, C. H. Buys, and R. M. Hofstra. Improved mutation detection in GC-rich DNA fragments by combined DGGE and CDGE. Nucleic Acids Research, 27(15):e9, 1999.
Article Google Scholar
S. Zoubak, O. Clay, and G. Bernardi. The gene distribution of the human genome. Gene, 174:95–102, 1996.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Loyola University Chicago, 6525 N. Sheridan Rd., 60626, Chicago, IL
Michael H. Goldwasser
Department of Computer Science, Northwestern University, 60201, Evanston, IL
Ming-Yang Kao
Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, 115, Taipei, Taiwan
Hsueh-I Lu

Authors

Michael H. Goldwasser
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Yang Kao
View author publications
You can also search for this author in PubMed Google Scholar
Hsueh-I Lu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IMIM-UPF-CRG, Dr. Aiguader 80, 08003, Barcelona, Spain
Roderic Guigó
Department of Computer Science, University of California, 95616, Davis, CA, USA
Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goldwasser, M.H., Kao, MY., Lu, HI. (2002). Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_12

Download citation

DOI: https://doi.org/10.1007/3-540-45784-4_12
Published: 10 October 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44211-0
Online ISBN: 978-3-540-45784-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics