Finding Optimal Pairs of Patterns

Bannai, Hideo; Hyyrö, Heikki; Shinohara, Ayumi; Takeda, Masayuki; Nakai, Kenta; Miyano, Satoru

doi:10.1007/978-3-540-30219-3_38

Hideo Bannai²¹,
Heikki Hyyrö²²,
Ayumi Shinohara^22,23,
Masayuki Takeda²³,
Kenta Nakai²¹ &
…
Satoru Miyano²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3240))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

581 Accesses
8 Citations

Abstract

We consider the problem of finding the optimal pair of string patterns for discriminating between two sets of strings, i.e. finding the pair of patterns that is best with respect to some appropriate scoring function that gives higher scores to pattern pairs which occur more in the strings of one set, but less in the other. We present an O(N ²) time algorithm for finding the optimal pair of substring patterns, where N is the total length of the strings. The algorithm looks for all possible Boolean combination of the patterns, e.g. patterns of the form \(p \land\lnot q\), which indicates that the pattern pair is considered to match a given string s, if p occurs in s, AND q does NOT occur in s. The same algorithm can be applied to a variant of the problem where we are given a single set of sequences along with a numeric attribute assigned to each sequence, and the problem is to find the optimal pattern pair whose occurrence in the sequences is correlated with this numeric attribute. An efficient implementation based on suffix arrays is presented, and the algorithm is applied to several nucleotide sequence datasets of moderate size, combined with microarray gene expression data, aiming to find regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing certain genomic functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. J. Comput. Biol. 5, 279–305 (1998)
Article Google Scholar
Marsan, L., Sagot, M.F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7, 345–360 (2000)
Article Google Scholar
Arimura, H., Wataki, A., Fujino, R., Arikawa, S.: A fast algorithm for discovering optimal string patterns in large text databases. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 247–261. Springer, Heidelberg (1998)
Chapter Google Scholar
Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 363, S354–S363 (2002)
Google Scholar
Liu, X., Brutlag, D., Liu, J.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput., pp. 127–138 (2001)
Google Scholar
Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., Arikawa, S.: Knowledge acquisition from amino acid sequences by machine learning system BONSAI. Transactions of Information Processing Society of Japan 35, 2009–2018 (1994)
Google Scholar
Shinohara, A., Takeda, M., Arikawa, S., Hirao, M., Hoshino, H., Inenaga, S.: Finding best patterns practically. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS (LNAI), vol. 2281, pp. 307–317. Springer, Heidelberg (2002)
Chapter Google Scholar
Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering most classificatory patterns for very expressive pattern classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)
Chapter Google Scholar
Shinozaki, D., Akutsu, T., Maruyama, O.: Finding optimal degenerate patterns in DNA sequences. Bioinformatics 19, ii 206–ii214 (2003)
Article Google Scholar
Bussemaker, H.J., Li, H., Siggia, E.D.: Regulatory element detection using correlation with expression. Nature Genetics 27, 167–171 (2001)
Article Google Scholar
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: A string pattern regression algorithm and its application to pattern discovery in long introns. Genome Informatics 13, 3–11 (2002)
Google Scholar
Conlon, E.M., Liu, X.S., Lieb, J.D., Liu, J.S.: Integrating regulatory motif discovery and genome-wide expression analysis. In: Proc. Natl. Acad. Sci., vol. 100, pp. 3339–3344 (2003)
Google Scholar
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: Efficiently finding regulatory elements using correlation with gene expression. Journal of Bioinformatics and Computational Biology (2004) (in press)
Google Scholar
Zilberstein, C.B.Z., Eskin, E., Yakhini, Z.: Using expression data to discover RNA and DNA regulatory sequence motifs. In: The First Annual RECOMB Satellite Workshop on Regulatory Genomics (2004)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Wang, Y., Liu, C., Storey, J., Tibshirani, R., Herschlag, D., Brown, P.: Precision and functional specificity in mRNA decay. Proc. Natl. Acad. Sci. 99, 5860–5865 (2002)
Article Google Scholar
Yang, E., van Nimwegen, E., Zavolan, M., Rajewsky, N., Schroeder, M., Magnasco, M., J.D., Jr.: Decay rates of Human mRNAs: correlation with functional characteristics and sequence attributes. Genome Res. 13, 1863–1872 (2003)
Google Scholar
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22, 935–948 (1993)
Article MATH MathSciNet Google Scholar
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003)
Chapter Google Scholar
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)
Chapter Google Scholar
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)
Chapter Google Scholar
Kasai, T., Arimura, H., Arikawa, S.: Efficient substring traversal with suffix arrays. Technical Report 185, Department of Informatics, Kyushu University (2001)
Google Scholar
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The enhanced suffix array and its applications to genome analysis. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 449–463. Springer, Heidelberg (2002)
Chapter Google Scholar
Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)
Chapter Google Scholar
Alstrup, S., Gavoille, C., Kaplan, H., Rauhe, T.: Nearest common ancestors: a survey and a new distributed algorithm. In: 14th annual ACM symposium on Parallel algorithms and architectures, pp. 258–264 (2002)
Google Scholar
Hui, L.: Color set size problem with applications to string matching. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 230–243. Springer, Heidelberg (1992)
Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longestcommon- prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Chapter Google Scholar
Wilusz, C.J., Wormington, M., Peltz, S.W.: The cap-to-tail guide to mRNA turnover. Nat. Rev. Mol. Cell Biol. 2, 237–246 (2001)
Article Google Scholar
Graber, J.: Variations in yeast 3’-processing cis-elements correlate with transcript stability. Trends Genet 19, 473–476 (2003), http://harlequin.jax.org/yeast/turnover/
Article Google Scholar
Wickens, M., Bernstein, D.S., Kimble, J., Parker, R.: A PUF family portrait: 3’UTR regulation as a way of life. Trends Genet. 18, 150–157 (2002)
Article Google Scholar
Ruiz-Echevarria, M.J., Munshi, R., Tomback, J., Kinzy, T.G., Peltz, S.W.: Characterization of a general stabilizer element that blocks deadenylation-dependent mRNA decay. J. Biol. Chem. 276, 30995–31003 (2001)
Article Google Scholar
Kasprzyk, A., Keefe, D., Smedley, D., London, D., Spooner, W., Melsopp, C., Hammond, M., Rocca-Serra, P., Cox, T., Birney, E.: EnsMart: A generic system for fast and flexible access to biological data. Genome Research 14, 160–169 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan
Hideo Bannai, Kenta Nakai & Satoru Miyano
PRESTO, Japan Science and Technology Agency (JST),
Heikki Hyyrö & Ayumi Shinohara
Department of Informatics, Kyushu University 33, Fukuoka, 812-8581, Japan
Ayumi Shinohara & Masayuki Takeda

Authors

Hideo Bannai
View author publications
You can also search for this author in PubMed Google Scholar
Heikki Hyyrö
View author publications
You can also search for this author in PubMed Google Scholar
Ayumi Shinohara
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Takeda
View author publications
You can also search for this author in PubMed Google Scholar
Kenta Nakai
View author publications
You can also search for this author in PubMed Google Scholar
Satoru Miyano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Computational Biology Unit, HIB, University of Bergen, 5020, Bergen, Norway
Inge Jonassen
Department of Biology,, Penn Center for Bioinformatics, Penn Genomics Institute, 415 S. University Ave., PA 19104, Philadelphia, USA
Junhyong Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S. (2004). Finding Optimal Pairs of Patterns. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-540-30219-3_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23018-2
Online ISBN: 978-3-540-30219-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics