Stochastic k-Tree Grammar and Its Application in Biomolecular Structure Modeling

Ding, Liang; Samad, Abdul; Xue, Xingran; Huang, Xiuzhen; Malmberg, Russell L.; Cai, Liming

doi:10.1007/978-3-319-04921-2_25

Liang Ding¹⁹,
Abdul Samad¹⁹,
Xingran Xue¹⁹,
Xiuzhen Huang²²,
Russell L. Malmberg^20,21 &
…
Liming Cai^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8370))

Included in the following conference series:

International Conference on Language and Automata Theory and Applications

1085 Accesses
2 Citations

Abstract

Stochastic context-free grammar (SCFG) has been successful in modeling biomolecular structures, typically RNA secondary structure, for statistical analysis and structure prediction. Context-free grammar rules specify parallel and nested co-occurren-ces of terminals, and thus are ideal for modeling nucleotide canonical base pairs that constitute the RNA secondary structure. Stochastic grammars have been sought, which may adequately model biomolecular tertiary structures that are beyond context-free. Some of the existing linguistic grammars, developed mostly for natural language processing, appear insufficient to account for crossing relationships incurred by distant interactions of bio-residues, while others are overly powerful and cause excessive computational complexity. This paper introduces a novel stochastic grammar, called stochastic k-tree grammar (SkTG), for the analysis of context-sensitive languages. With the new grammar rules, co-occurrences of distant terminals are characterized and recursively organized into k-tree graphs. The new grammar offers a viable approach to modeling context-sensitive interactions between bioresidues because such relationships are often constrained by k-trees, for small values of k, as demonstrated by earlier investigations. In this paper it is shown, for the first time, that probabilistic analysis of k-trees over strings are computable in polynomial time n ^O(k). Hence, SkTG permits not only modeling of biomolecular tertiary structures but also efficient analysis and prediction of such structures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Achawanantakun, R., Takyar, S., Sun, Y.: Grammar string: A novel ncRNA secondary structure representation. lifesciences society org, pp. 2–13 (2010)
Google Scholar
Rozenknop, A.: Gibbsian context-free grammar for parsing. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 49–56. Springer, Heidelberg (2002)
Chapter Google Scholar
Arnborg, S., Proskurowski, A.: Linear time algorithms for np-hard problems restricted to partial k-trees. Discrete Applied Mathematics 23(1), 11–24 (1989)
Article MATH MathSciNet Google Scholar
Chiang, D., Joshi, A.K., Searls, D.B.: Grammatical representations of macromolecular structure. Journal of Computational Biology 13(5), 1077–1100 (2006)
Article MathSciNet Google Scholar
Dill, K.A., Lucas, A., Hockenmaier, J., Huang, L., Chiang, D., Josh, A.K.: Computational linguistics: A new tool for exploring biopolymer structures and statistical mechanics. Polymer 48, 4289–4300 (2007)
Article Google Scholar
Ding, L., Samad, A., Li, G., Robinson, R., Xue, X., Malmberg, R., Cai, L.: Finding maximum spanning k-trees on backbone graphs in polynomial time (2013) (manuscript)
Google Scholar
Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer (1999)
Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)
Google Scholar
Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley (2007)
Google Scholar
Huang, Z., Mohebbi, M., Malmberg, R., Cai, L.: RNAv: Non-coding RNA secondary structure variation search via graph homomorphism. In: Proceedings of Computational Systems Bioinformatics Conference (CSB 2010), vol. 9, pp. 56–69 (2010)
Google Scholar
Huang, Z., Wu, Y., Robertson, J., Feng, L., Malmberg, R., Cai, L.: Fast and accurate search for non-coding RNA pseudoknot structures in genomes. Bioinforamtics 24(20), 2281–2287 (2008)
Article Google Scholar
Thiim, J.F.I.M., Mardia, M., Ferkinghoff-Borg, K., Hamelryck, J.,, T.: A probabilistic model of RNA conformational space. PLoS Comput. Biol. 5(6) (2009)
Google Scholar
Joshi, A.: How much context-sensitivity is necessary for characterizing structural descriptions. In: Dowty, D., Karttunen, L., Zwicky, A. (eds.) Natural Language Processing: Theoretical, Computational, and Psychological Perspectives, pp. 206–250. Cambridge University Press, NY (1985)
Google Scholar
Joshi, A., Vijay-Shanker, K., Weir, D.: The convergence of mildly context-sensitive grammar formalisms. Issues in Natural Language Processing, pp. 31–81. MIT Press, Cambridge (1991)
Google Scholar
Jurafsky, D., Wooters, C., Segal, J., Stolcke, A., Fosler, E., Tajchaman, G., Morgan, N.: Using a stochastic context-free grammar as a language model for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 189–192 (1995)
Google Scholar
Klein, D., Manning, C.: Accurate unlexicalized parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
Google Scholar
Knudsen, B., Hein, J.: Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 31, 3423–3428 (2003)
Article Google Scholar
Lari, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4, 35–56 (1990)
Article Google Scholar
Martin, D., Sigal, R., Weyuker, E.J.: Computability, complexity, and languages: Fundamentals of theoretical computer science, 2nd edn. Morgan Kaufmann (1994)
Google Scholar
Murzin, A.G., Brenner, S., Hubbard, T., Chothia, C.: Scop: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247(4), 536–540 (1995)
Google Scholar
Nawrocki, E.P., Kolbe, D.L., Eddy, S.R.: Infernal 1.0: Inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009)
Article Google Scholar
Noller, H.F.: Structure of ribosomal RNA. Annual Review of Biochemistry 53, 119–162 (1984)
Article Google Scholar
Patil, H.P.: On the structure of k-trees. Journal of Combinatorics, Information and System Sciences 11(2-4), 57–64 (1986)
MATH MathSciNet Google Scholar
Rivas, E., Lang, R., Eddy, S.R.: A range of complex probabilistic models for RNA secondary structure prediction that include the nearest neighbor model and more. RNA 18, 193–212 (2012)
Article Google Scholar
Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjolander, K., Underwood, R.C., Haussler, D.: Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 22, 5112–5120 (1994)
Article Google Scholar
Salomaa, A.: Jewels of Formal Language Theory. Computer Science Press (1981)
Google Scholar
Sánchez, I.A., Benedi, J.M., Linares, D.: Performance of a scfg-based language model with training data sets of increasing size. In: Proceedings of Conference on Pattern Recognition and Image Analysis, pp. 586–594 (2005)
Google Scholar
Searls, D.B.: The computational linguistics of biological sequences. Artificial Intelligence and Molecular Biology, pp. 47–120 (1993)
Google Scholar
Searls, D.B.: Molecules, languages and automata. In: Sempere, J.M., García, P. (eds.) ICGI 2010. LNCS, vol. 6339, pp. 5–10. Springer, Heidelberg (2010)
Chapter Google Scholar
Sergio Caracciolo, S., Masbaum, G., Sokal, A., Sportiello, A.: A randomized polynomial-time algorithm for the spanning hypertree problem on 3-uniform hypergraphs. CoRR abs/0812.3593 (2008)
Google Scholar
Song, Y., Liu, C., Huang, X., Malmberg, R., Xu, Y., Cai, L.: Efficient parameterized algorithms for biopolymer structure-sequence alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(4), 423–431 (2006)
Article Google Scholar
Srebro, N.: Maximum likelihood bounded tree-width Markov networks. Artificial Intelligence 143(2003), 123–138 (2003)
Article MATH MathSciNet Google Scholar
Uemura, Y., Hasegawa, A., Kobayashi, S., Yokomori, T.: Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science 210, 277–303 (1999)
Article MATH MathSciNet Google Scholar
Vijay-Shanker, K., Weir, D.: The equivalence of four extensions of context-free grammars. Mathematical Systems Theory 27(6), 511–546 (1994)
Article MATH MathSciNet Google Scholar
Waters, C.J., MacDonald, B.A.: Efficient word-graph parsing and search with a stochastic context-free grammar. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 311–318 (1997)
Google Scholar
Xu, J., Berger, B.: Fast and accurate algorithms for protein side-chain packing. Journal of the ACM 53(4), 533–557 (2006)
Article MathSciNet Google Scholar
Xu, Y., Liu, Z., Cai, L., Xu, D.: Protein structure prediction by protein threading. In: Computational Methods for Protein Structure Prediction and Modeling, pp. 389–430. Springer I&II (2006)
Google Scholar
Progress, Y.Z.: challenges in protein structure prediction. Current Opinions in Structural Biology 18(3), 342–348 (2008)
Article Google Scholar
Weinberg, Z., Ruzzo, L.: Faster genome annotation of non-coding RNA families without loss of accuracy. In: Proceedings of Conference on Research in Computational Molecular Biology (RECOMB 2004), pp. 243–251 (2004)
Google Scholar
Zimand, M.: The complexity of the optimal spanning hypertree problem. Technical Report, University of Rochester. Computer Science Department (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Georgia, GA, 30602, USA
Liang Ding, Abdul Samad, Xingran Xue & Liming Cai
Institute of Bioinformatics, University of Georgia, GA, 30602, USA
Russell L. Malmberg & Liming Cai
Department of Plant Biology, University of Georgia, GA, 30602, USA
Russell L. Malmberg
Dept. of Computer Science, Arkansas State University, Jonesboro, AR, 72467, USA
Xiuzhen Huang

Authors

Liang Ding
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Samad
View author publications
You can also search for this author in PubMed Google Scholar
Xingran Xue
View author publications
You can also search for this author in PubMed Google Scholar
Xiuzhen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Russell L. Malmberg
View author publications
You can also search for this author in PubMed Google Scholar
Liming Cai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistics, Rovira i Virgili University, Avinguda Catalunya, 35, 43002, Tarragona, Spain
Adrian-Horia Dediu & Carlos Martín-Vide &
School of Computer Science, Department of Software Engineering and Artificial Intelligence, Complutense University of Madrid, Professor José Garcia Santesmases, 9, 28040, Madrid, Spain
José-Luis Sierra-Rodríguez
Fakultät für Informatik, Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Bianca Truthe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, L., Samad, A., Xue, X., Huang, X., Malmberg, R.L., Cai, L. (2014). Stochastic k-Tree Grammar and Its Application in Biomolecular Structure Modeling. In: Dediu, AH., Martín-Vide, C., Sierra-Rodríguez, JL., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2014. Lecture Notes in Computer Science, vol 8370. Springer, Cham. https://doi.org/10.1007/978-3-319-04921-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-04921-2_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04920-5
Online ISBN: 978-3-319-04921-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics