An Efficient Algorithm for Finding Long Conserved Regions Between Genes

Ma, Tak-Man; Lyuu, Yuh-Dauh; Ti, Yen-Wu

doi:10.1007/11875741_5

Tak-Man Ma²²,
Yuh-Dauh Lyuu²³ &
Yen-Wu Ti²³

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4216))

Included in the following conference series:

International Symposium on Computational Life Science

Abstract

We study the problem of approximate non-tandem repeat (conserved regions) extraction among strings (genes). Basically, given a string S and thresholds L and D over a finite alphabet, extracting approximate repeats is to find pairs (β, β′) of substrings of S under some constraints such that β and β′ have edit-distance at most D and their respective lengths are at least L. Previous works mainly focus on the case that D is small, so they are not appropriate for extracting approximate repeats with relatively large D. In contrast, this paper focuses on extracting long approximate repeats with large D and it is more efficient than previous works. We also show that our algorithm is optimal in time when D is a constant.

In this paper, given an input string S and thresholds L and D, we would like to extract all (D, L)-supermaximal approximate repeats (β, β′) of S. One useful application of extracting all (D, L)-supermaximal approximate repeats (β, β′) is to find all longest possible substrings β of S such that there exist some other substring β′ of S where β and β′ have edit-distance at most D and their respective lengths are at least L. This algorithm can be easily applied to the case where there are multiple input strings S ₁,S ₂,...,S _n if we first concatenate the input strings into one long subject string S with a special symbol \(``\sharp"\) for separation: \(S_1\sharp S_2\sharp\ldots\sharp S_n\). The running time complexity of our algorithm is O(DN ²) where N=|S ₁|+|S ₂|+⋯+|S _n|.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adebiyi, E.F., Jiang, T., Kaufmann, M.: An Efficient Algorithm for Finding Short Approximate Non-tandem Repeats. Bioinformatics 17(90001), S5–S12 (2001)
Google Scholar
Benson, G.: A Space Efficient Algorithm for Finding the Best Nonoverlapping Alignment Score. In: Crochemore, M., Gusfield, D. (eds.) CPM 1994. LNCS, vol. 807, pp. 1–14. Springer, Heidelberg (1994)
Google Scholar
Fitch, W., Smith, T., Breslow, J.: Detecting Internally Repeated Sequences and Inferring the History of Duplication. In: Segrest, J.P., Albers, J.J. (eds.) Plasma Proteins. Part A: Preparation, Structure, and Molecular Biology. Methods in Enzymology, vol. 128, pp. 773–788. Academic Press, San Diego (1986)
Chapter Google Scholar
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
Book MATH Google Scholar
Kannan, S.K., Myers, E.W.: An Algorithm for Locating Nonoverlapping Regions of Maximum Alignment Score. SIAM J. Computing 25(3), 648–662 (1996)
Article MATH MathSciNet Google Scholar
Kurtz, S., Ohlebusch, E., et al.: Computation and Visualization of Degenerate Repeats in Complete Genomes. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), pp. 228–238 (2000)
Google Scholar
Kurtz, S.: Reducing the Space Requirement of Suffix Trees. Software Practice and Experience 29(13), 1149–1171 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer and Information Science, University of Pennsylvania, Philadelphia, USA
Tak-Man Ma
Dept. of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Yuh-Dauh Lyuu & Yen-Wu Ti

Authors

Tak-Man Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yuh-Dauh Lyuu
View author publications
You can also search for this author in PubMed Google Scholar
Yen-Wu Ti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Berkeley Initiative in Soft Computing (BISC), University of California at Berkeley, USA
Michael R. Berthold
Department of Chemistry, Unilever Centre for Molecular Informatics, Cambridge University, CB2 1EW, Cambridge, UK
Robert C. Glen
ALTANA Chair for Bioinformatics and Information Mining, University of Konstanz, Germany
Ingrid Fischer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, TM., Lyuu, YD., Ti, YW. (2006). An Efficient Algorithm for Finding Long Conserved Regions Between Genes. In: R. Berthold, M., Glen, R.C., Fischer, I. (eds) Computational Life Sciences II. CompLife 2006. Lecture Notes in Computer Science(), vol 4216. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875741_5

Download citation

DOI: https://doi.org/10.1007/11875741_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45767-1
Online ISBN: 978-3-540-45768-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics