Composite Pattern Discovery for PCR Application

Angelov, Stanislav; Inenaga, Shunsuke

doi:10.1007/11575832_19

Stanislav Angelov¹⁸ &
Shunsuke Inenaga¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3772))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

1500 Accesses

Abstract

We consider the problem of finding pairs of short patterns such that, in a given input sequence of length n, the distance between each pair’s patterns is at least α. The problem was introduced in [1]and is motivated by the optimization of multiplexed nested PCR.

We study algorithms for the following two cases; the special case when the two patterns in the pair are required to have the same length, and the more general case when the patterns can have different lengths. For the first case we present an O(αn log log n) time and O(n) space algorithm, and for the general case we give an O(αn log n) time and O(n) space algorithm. The algorithms work for any alphabet size and use asymptotically less space than the algorithms presented in [1]. For alphabets of constant size we also give an \(O(n\sqrt{n} {\rm log}^{2} n)\) time algorithm for the general case. We demonstrate that the algorithms perform well in practice and present our findings for the human genome.

In addition, we study an extended version of the problem where patterns in the pair occur at certain positions at a distance at most α, but do not occur α-close anywhere else, in the input sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Inenaga, S., Kivioja, T., Mäkinen, V.: Finding missing patterns. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 463–474. Springer, Heidelberg (2004)
Chapter Google Scholar
Apostolico, A.: Pattern discovery and the algorithmics of surprise. In: Artificial Intelligence and Heuristic Methods for Bioinformatics, pp. 111–127 (2003)
Google Scholar
Shinohara, A., Takeda, M., Arikawa, S., Hirao, M., Hoshino, H., Inenaga, S.: Finding best patterns practically. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS (LNAI), vol. 2281, pp. 307–317. Springer, Heidelberg (2002)
Chapter Google Scholar
Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., Arikawa, S.: Knowledge acquisition from amino acid sequences by machine learning system BONSAI. Transactions of Information Processing Society of Japan 35, 2009–2018 (1994)
Google Scholar
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: Efficiently finding regulatory elements using correlation with gene expression. Journal of Bioinformatics and Computational Biology 2, 273–288 (2004)
Article Google Scholar
Baeza-Yates, R.A.: Searching subsequences (note). Theoretical Computer Science 78, 363–376 (1991)
Article MATH MathSciNet Google Scholar
Hirao, M., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best subsequence patterns. In: Morishita, S., Arikawa, S. (eds.) DS 2000. LNCS (LNAI), vol. 1967, pp. 141–154. Springer, Heidelberg (2000)
Chapter Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episode in sequences. In: Proc. 1st International Conference on Knowledge Discovery and Data Mining, pp. 210–215. AAAI Press, Menlo Park (1995)
Google Scholar
Hirao, M., Inenaga, S., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best episode patterns. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 435–440. Springer, Heidelberg (2001)
Chapter Google Scholar
Inenaga, S., Bannai, H., Shinohara, A., Takeda, M., Arikawa, S.: Discovering best variable-length-don’t-care patterns. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 86–97. Springer, Heidelberg (2002)
Chapter Google Scholar
Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering most classificatory patterns for very expressive pattern classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)
Chapter Google Scholar
Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18, S354–S363 (2002)
Google Scholar
Marsan, L., Sagot, M.F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7, 345–360 (2000)
Article Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Carvalho, A.M., Freitas, A.T., Oliveira, A.L., Sagot, M.F.: A highly scalable algorithm for the extraction of cis-regulatory regions. In: Proc. 3rd Asia Pacific Bioinformatics Conference (APBC 2005), pp. 273–282. Imperial College Press, London (2005)
Chapter Google Scholar
Arimura, H., Arikawa, S., Shimozono, S.: Efficient discovery of optimal word-association patterns in large text databases. New Generation Computing 18, 49–60 (2000)
Article Google Scholar
Arimura, H., Asaka, H., Sakamoto, H., Arikawa, S.: Efficient discovery of proximity patterns with suffix arrays (extended abstract). In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 152–156. Springer, Heidelberg (2001)
Chapter Google Scholar
Liu, X., Brutlag, D., Liu, J.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput., pp. 127–138 (2001)
Google Scholar
Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: An O(N ²) algorithm for discovering optimal Boolean pattern pairs. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 159–170 (2004)
Article Google Scholar
Inenaga, S., Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of cooperative and competing patterns with bounded distance. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 32–46. Springer, Heidelberg (2004)
Chapter Google Scholar
Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P.: Molecular Biology of the Cell, 4th edn. Garland Science, New York (2002)
Google Scholar
Karp, R., Rabin, M.: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31, 249–260 (1987)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA, 19104, USA
Stanislav Angelov
Department of Informatics, Kyushu University, Fukuoka, 812-8581, Japan
Shunsuke Inenaga

Authors

Stanislav Angelov
View author publications
You can also search for this author in PubMed Google Scholar
Shunsuke Inenaga
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Toronto,
Mariano Consens
Dept. of Computer Science, University of Chile,
Gonzalo Navarro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Angelov, S., Inenaga, S. (2005). Composite Pattern Discovery for PCR Application. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_19

Download citation

DOI: https://doi.org/10.1007/11575832_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29740-6
Online ISBN: 978-3-540-32241-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics