Relational Sequence Alignments and Logos
The need to measure sequence similarity arises in many applicitation domains and often coincides with sequence alignment: the more similar two sequences are, the better they can be aligned. Aligning sequences not only shows how similar sequences are, it also shows where there are differences and correspondences between the sequences.
Traditionally, the alignment has been considered for sequences of flat symbols only. Many real world sequences such as natural language sentences and protein secondary structures, however, exhibit rich internal structures. This is akin to the problem of dealing with structured examples studied in the field of inductive logic programming (ILP). In this paper, we introduce Real, which is a powerful, yet simple approach to align sequence of structured symbols using well-established ILP distance measures within traditional alignment methods. Although straight-forward, experiments on protein data and Medline abstracts show that this approach works well in practice, that the resulting alignments can indeed provide more information than flat ones, and that they are meaningful to experts when represented graphically.
KeywordsRelational Sequence Alignment Algorithm Global Alignment Inductive Logic Programming Ground Atom
Unable to display preview. Download preview PDF.
- 1.Barzilay, R., Lee, L.: Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment. In: Proc. of HLT-NAACL-03, pp. 16–23 (2003)Google Scholar
- 2.Brill, E.: Some advances in rule-based part of speech tagging. In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94) (1994)Google Scholar
- 4.Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. In: Dayhoff, M.O (ed.) Atlas of Protein Sequence and Structure, vol. 5, ch. 22, pp. 345–352. Nat. Biomedical Research Foundation (1978)Google Scholar
- 7.Gorodkin, J., Heyer, L.J., Brunak, S., Stormo, G.D.: Displaying the information contents of structural RNA alignments: the structure logos. CABIOS 13(6), 583–586 (1997)Google Scholar
- 10.Jacobs, N.: Relational Sequence Learning and User Modelling. PhD thesis, Computer Science Department, Katholieke Universiteit Leuven, Belgium (2004)Google Scholar
- 11.Jiang, T., Wang, L., Zhang, K.: Alignment of trees: an alternative to tree edit. Theoretical Computer Science 143(1) (1995)Google Scholar
- 13.Kersting, K., Gärtner, T.: Fisher Kernels for Logical Sequences. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 205–216. Springer, Heidelberg (2004)Google Scholar
- 14.Ketterlin, A.: Clustering Sequences of Complex Objects. In: Proc. of the 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD-97), pp. 215–218 (1997)Google Scholar
- 15.Lee, S.D., De Raedt, L.: Constraint Based Mining of First Order Sequences in SeqLog. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 154–173. Springer, Heidelberg (2004)Google Scholar
- 16.Lloyd, J.W.: Foundations of Logic Programming, 2nd edn. Springer, Heidelberg (1989)Google Scholar
- 17.McCallum, A., Bellare, K., Pereira, F.: A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance. In: Bacchus, F., Jaakkola, T. (eds.) Proceedings of the Twenty-Firstst Conference on Uncertainty in Artificial Intelligence (UAI-05), Edinburgh, Scotland, July 26–29, 2005 (2005)Google Scholar
- 20.Nienhuys-Cheng, S.-H.: Distance between Herbrand interpretations: A measure for approximations to a target concept. In: Proc. of the 8. International Conference on Inductive Logic Programming (ILP-97), pp. 250–260 (1997)Google Scholar
- 21.Parker, C., Fern, A., Tadepalli, P.: Gradient Boosting for Sequence Alignment. In: Gil, Y., Mooney, R.J. (eds.) Proceedings of National Conference on Artificial Intelligence (AAAI-06), Boston, Massachusetts, USA, July 16-20, 2006, AAAI Press, Stanford (2006)Google Scholar
- 22.Ramon, J.: Clustering and instance based learning in first order logic. PhD thesis, Department of Computer Science, K.U. Leuven, Leuven, Belgium (October 2002)Google Scholar
- 23.Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Evol. Biol. 4(4), 406–425 (1987)Google Scholar
- 24.Sato, K., Sakakribara, Y.: RNA secondary structural alignment with conditional random field. Bioinformatics 25(Suppl. 2), ii237–ii242 (2005)Google Scholar
- 26.Tobudic, A., Widmer, G.: Relational IBL in Classical Music. Machine Learning 2006 (to be published)Google Scholar
- 27.Weskamp, N.: Graph Alignments: A New Concept to Detect Conserved Regions in Protein Active Sites. In: Giegerich, R., Stoye, J. (eds.) Proceedings German Conference on Bioinformatics, pp. 131–140 (2004)Google Scholar