Abstract
The computational identification of conserved motifs in RNA molecules is a major—yet largely unsolved—problem. Structural conservation serves as strong evidence for important RNA functionality. Thus, comparative structure analysis is the gold standard for the discovery and interpretation of functional RNAs.In this paper we focus on one of the functional RNA motif types, sequence-structure motifs in RNA molecules, which marks the molecule as targets to be recognized by other molecules.We present a new approach for the detection of RNA structure (including pseudoknots), which is conserved among a set of unaligned RNA sequences. Our method extends previous approaches for this problem, which were based on first identifying conserved stems and then assembling them into complex structural motifs. The novelty of our approach is in simultaneously preforming both the identification and the assembly of these stems. We believe this novel unified approach offers a more informative model for deciphering the evolution of functional RNAs, where the sets of stems comprising a conserved motif co-evolve as a correlated functional unit.Since the task of mining RNA sequence-structure motifs can be addressed by solving the maximum weighted clique problem in an n-partite graph, we translate the maximum weighted clique problem into a state graph. Then, we gather and define domain knowledge and low-level heuristics for this domain. Finally, we learn hyper-heuristics for this domain, which can be used with heuristic search algorithms (e.g., A*, IDA*) for the mining task.The hyper-heuristics are evolved using HH-Evolver, a tool for domain-specific, hyper-heuristic evolution. Our approach is designed to overcome the computational limitations of current algorithms, and to remove the necessity of previous assumptions that were used for sparsifying the graph.This is still work in progress and as yet we have no results to report. However, given the interest in the methodology and its previous success in other domains we are hopeful that these shall be forthcoming soon.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akutsu T (2000) Dp algorithms for rna secondary structure prediction with pseudoknots. Discrete Appl Math 104(1–3):45–62
Aler R, Borrajo D, Isasi P (1998) Genetic programming of control knowledge for planning. In: Proceedings of AIPS-98
Aler R, Borrajo D, Isasi P (2001) Learning to solve planning problems efficiently by means of genetic programming. Evol Comput 9(4):387–420
Aler R, Borrajo D, Isasi P (2002) Using genetic programming to learn and improve knowledge. Artif Intell 141(1–2):29–56
Arfaee SJ, Zilles S, Holte RC (2010) Bootstrap learning of heuristic functions. In: Proceedings of the 3rd international symposium on combinatorial search (SoCS2010), pp 52–59
Backofen R, Tsur D, Zakov S, Ziv-Ukelson M (2011) Sparse folding: time and space efficient algorithms. J Discrete Algorithms 9(1):12–31
Bonet B, Geffner H (2005) mGPT: A probabilistic planner based on heuristic search. J Artif Intell Res 24:933–944
Borrajo D, Veloso MM (1997) Lazy incremental learning of control knowledge for efficiently obtaining quality plans. Artif Intell Rev 11(1–5):371–405
Brierley I, Gilbert RC, Pennell S (2008) Pseudoknots and the regulation of protein synthesis. Biochem Soc Trans 36(4):684–689
Burke EK, Kendall G, Soubeiga E (2003) A tabu-search hyperheuristic for timetabling and rostering. J Heuristics 9(6):451–470. http://dx.doi.org/10.1023/B:HEUR.0000012446.94732.b6
Burke EK, Hyde M, Kendall G, Ochoa G, Ozcan E, Woodward JR (2010) A classification of hyper-heuristic approaches. In: Gendreau M, Potvin J (eds) Handbook of meta-heuristics, 2nd edn. Springer, Berlin, pp 449–468
Cowling PI, Kendall G, Soubeiga E (2000) A hyperheuristic approach to scheduling a sales summit. In: Burke EK, Erben W (eds) PATAT. Lecture notes in computer science, vol 2079. Springer, Berlin, pp 176–190. doi:10.1007/3-540-44629-X_11
Do CB, Woods DA, Batzoglou S (2006) Contrafold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14):e90–e98
Elyasaf A, Sipper M (2013) Hh-evolver: a system for domain-specific, hyper-heuristic evolution. In: Proceedings of the 15th annual conference companion on genetic and evolutionary computation GECCO ’13 companion. ACM, New York, pp 1285–1292. doi:10.1145/2464576.2482707. http://doi.acm.org/10.1145/2464576.2482707
Elyasaf A, Hauptman A, Sipper M (2012) Evolutionary design of FreeCell solvers. IEEE Trans Comput Intell AI Games 4(4):270–281. doi:10.1109/TCIAIG.2012.2210423. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6249736
Fawcett C, Karpas E, Helmert M, Roger G, Hoos H (2011) Fd-autotune: domain-specific configuration using fast-downward. In: Proceedings of ICAPS-PAL 2011
Garrido P, Rojas MCR (2010) DVRP: a hard dynamic combinatorial optimisation problem tackled by an evolutionary hyper-heuristic. J Heuristics 16(6):795–834. http://dx.doi.org/10.1007/s10732-010-9126-2
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A (2005) RFAM: annotating non-coding RNAS in complete genomes. Nucleic Acids Res 33(suppl 1):D121–D124
Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for heuristic determination of minimum path cost. IEEE Trans Syst Sci Cybern 4(2):100–107
Hauptman A, Elyasaf A, Sipper M, Karmon A (2009) GP-Rush: using genetic programming to evolve solvers for the Rush Hour puzzle. In: GECCO’09: Proceedings of 11th annual conference on genetic and evolutionary computation conference. ACM, New York, pp 955–962. doi:10.1145/1569901.1570032. http://dl.acm.org/citation.cfm?id=1570032
Havgaard J, Lyngso R, Stormo G, Gorodkin J (2005) Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 21(9):1815–1824
Hochsmann M, Toller T, Giegerich R, Kurtz S (2003) Local similarity in RNA secondary structures. In: Proceedings of the IEEE computer society conference on bioinformatics, Citeseer, p 159
Hofacker I, Fontana W, Stadler P, Bonhoeffer L, Tacker M, Schuster P (1994) Fast folding and comparison of RNA secondary structures. Monatshefte fur Chemie/Chemical Monthly 125(2):167–188
Hofacker I, Fekete M, Stadler P (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol 319:1059–1066
Hofacker I, Bernhart S, Stadler P (2004) Alignment of RNA base pairing probability matrices. Bioinformatics 20(14):2222–2227
Hoffmann J, Nebel B (2001) The FF planning system: fast plan generation through heuristic search. J Artif Int Res 14(1):253–302. http://dl.acm.org/citation.cfm?id=1622394.1622404
Ji Y, Xu X, Stormo GD (2004) A graph theoretical approach for predicting common rna secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics 20(10):1591–1602
Korf RE (1985) Depth-first iterative-deepening: an optimal admissible tree search. Artif Intell 27(1):97–109
Korf RE (1997) Finding optimal solutions to Rubik’s cube using pattern databases. In: Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on innovative applications of artificial intelligence, AAAI’97/IAAI’97, AAAI Press, pp 700–705
Koza JR (1994) Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge, MA
Levine J, Humphreys D (2003) Learning action strategies for planning domains using genetic programming. In: Raidl GR, Meyer JA, Middendorf M, Cagnoni S, Cardalda JJR, Corne D, Gottlieb J, Guillot A, Hart E, Johnson CG, Marchiori E (eds) EvoWorkshops. Lecture notes in computer science, vol 2611. Springer, New York, pp 684–695
Levine J, Westerberg H, Galea M, Humphreys D (2009) Evolutionary-based learning of generalised policies for AI planning domains. In: Rothlauf F (ed) Proceedings of the 11th annual conference on genetic and evolutionary computation (GECCO 2009). ACM, New York, pp 1195–1202
Mandal M, Breaker RR (2004) Gene regulation by riboswitches. Cell 6:451–463
Mathews DH, Turner DH (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 317(2):191–203
Milo N, Zakov S, Katzenelson E, Bachmat E, Dinitz Y, Ziv-Ukelson M (2013) Unrooted unordered homeomorphic subtree alignment of rna trees. Algorithms Mol Biol 8(1):13
Milo N, Yogev S, Ziv-Ukelson M (2014) Stemsearch: Rna search tool based on stem identification and indexing. Methods
Mitchell TM (1999) Machine learning and data mining. Commun ACM 42(11):30–36
Oltean M (2005) Evolving evolutionary algorithms using linear genetic programming. Evol Comput 13(3):387–410. http://dx.doi.org/10.1162/1063656054794815
Pederson J, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander E, Kent J, Miller W, Haussler D (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLOS Comput Biol 2:e33
Samadi M, Felner A, Schaeffer J (2008) Learning from multiple heuristics. In: Fox D, Gomes CP (eds) Proceedings of the twenty-third AAAI conference on artificial intelligence (AAAI 2008), AAAI Press, pp 357–362
Sczyrba A, Kruger J, Mersch H, Kurtz S, Giegerich R (2003) RNA-related tools on the bielefeld bioinformatics server. Nucleic Acids Res 31(13):3767
Siebert S, Backofen R (2005) MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics 21(16):3352–3359
Staple DW, Butcher SE (2005) Pseudoknots: RNA structures with diverse functions. PLoS Biol 3(6):e213
Thompson J, Higgins D, Gibson T (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673
Torarinsson E, Havgaard JH, Gorodkin J (2007) Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23(8):926–932
Wang Z, Zhang K (2001) Alignment between two RNA structures. Lecture notes in computer science. Springer, Berlin, pp 690–702
Washietl S, Hofacker I (2004) Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J Mol Biol 342:19–30
Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R (2007) Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering. PLOS Comput Biol 3(4):e65
Yoon SW, Fern A, Givan R (2008) Learning control knowledge for forward search planning. J Mach Learn Res 9:683–718. http://doi.acm.org/10.1145/1390681.1390705
Acknowledgements
This research was supported by the Israel Science Foundation (grant no. 123/11 and grant no. 179/14).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Elyasaf, A., Vaks, P., Milo, N., Sipper, M., Ziv-Ukelson, M. (2016). Learning Heuristics for Mining RNA Sequence-Structure Motifs. In: Riolo, R., Worzel, W., Kotanchek, M., Kordon, A. (eds) Genetic Programming Theory and Practice XIII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-319-34223-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-34223-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34221-4
Online ISBN: 978-3-319-34223-8
eBook Packages: Computer ScienceComputer Science (R0)