Abstract
A method to represent arbitrary sequences (strings) is discussed. We emphasize the application of the method to the analysis of the similarity of sets of proteins expressed as sequences of amino acids. We define a pattern of arbitrary structure called a metasymbol. An implementation of a detailed representation is discussed. We show that a protein may be expressed as a collection of metasymbols in a way such that the underlying structural similarities are easier to identify.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gibbs, A.J., McIntyre, G.A.: The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16, 1–11 (1970)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for the similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov (last access 30-04-05)
Mount, D.W.: Bioinformatics. Sequence and genome analysis. Cold Spring Harbor Laboratory Press, New York (2001)
Lipman, D.J., Altschul, S.F., Kececioglu, J.D.: A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. 86, 4412–4415 (1989)
Higgins, D.G., Thompson, J.D., Gibson, T.J.: Using CLUSTAL for multiple sequence alignments. Methods Enzimol. 266, 237–244 (1996)
Corpert, F.: Multiple sequence alignment with hierarchical clustering. Nucleic. Acids. Res. 16, 10881–10890 (1988)
Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALING: finding local similarities by multiple sequence alignment. Bioinformatics 14, 290–294 (1998)
Notredame, C., Higgins, D.G.: SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 1515–1524 (1996)
Gribskov, M., Luethy, R., Eisenberg, D.: Profile analysis. Methods Enzimol 183, 146–159 (1990)
Burkhardt, S., Kärkkäinen, J.: Better Filtering with Gapped q-Grams. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 73–85. Springer, Heidelberg (2001)
Parida, L.: Algorithmic Techniques in Computational Genomics, Doctoral Dissertation, Courant Institute of Mathematical Sciences, University of New York (1998)
Kuri, A.: Pattern based lossless data compression. WSEAS Transactions on communications 3(1), 22–29 (2004)
Kuri, A., Herrera, O.: Efficient lossless data compression for nonergodic sources using advanced search operators and genetic algorithms. In: Nazuno, J., Gelbukh, A., Yañez, C., Camacho, O. (eds.) Advances in Artificial Intelligence, Computing Science and Computer Engineering, vol. 10, pp. 243–251 (2004) ISBN: 970-36-0194-4, ISSN: 1665-9899
Kuri, A., Galaviz, J.: Pattern-based data compression. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 1–10. Springer, Heidelberg (2004)
Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer, New York (1997)
Nevill-Manning, C.G., Witten, I.H.: Protein is incompressible. In: Storer, J.A., Cohn, M. (eds.) Proc. Data Compression Conference, pp. 257–266. IEEE Press, Los Alamitos (1999)
Kuri-Morales, A., Herrera, O., Galaviz, J., Ortiz, M.: Practical Estimation of Kolmogorov Complexity using Highly Efficient Compression Algorithms, cursos.itam.mx/akuri/-2005/tempart (last access: 04/30/05)
Kuri, A.: Lossless Data Compression through Pattern Recognition, cursos.itam.mx/akuri/2005/tempart (last access: 04/30/05)
Definition of Bioinformatics in the Web, http://www.google.com.mx/search?hl-=es&lr=&oi=defmore&q=define:Bioinformatics (last access: 01/02/05)
Kuri, A., Galaviz, J.: Data Compression using a Dictionary of Patterns, cursos.itam.mx/akuri/2005/tempart (last access: 05/02/05)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuri-Morales, A.F., Ortiz-Posadas, M.R. (2005). A New Approach to Sequence Representation of Proteins in Bioinformatics. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_90
Download citation
DOI: https://doi.org/10.1007/11579427_90
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29896-0
Online ISBN: 978-3-540-31653-4
eBook Packages: Computer ScienceComputer Science (R0)