Skip to main content

Simplifying Amino Acid Alphabets Using a Genetic Algorithm and Sequence Alignment

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 4447)

Abstract

In some areas of bioinformatics (like protein folding or sequence alignment) the full alphabet of amino acid symbols is not necessary. Often, better results are received with simplified alphabets. In general, simplified alphabets are as universal as possible. In this paper we show that this concept may not be optimal. We present a genetic algorithm for alphabet simplifying and we use it in a method based on global sequence alignment. We demonstrate that our algorithm is much faster and produces better results than the previously presented genetic algorithm. We also compare alphabets constructed on the base of universal substitution matrices like BLOSUM with our alphabets built through sequence alignment and propose a new coefficient describing the value of alphabets in the sequence alignment context. Finally we show that our simplified alphabets give better results in a sequence classification (using k-NN classifier), than most previously presented simplified alphabets and better than full 20-letter alphabet.

Keywords

  • amino acid alphabet
  • sequence alignment
  • substitution matrices
  • protein classification.

The research has been partially supported by grant No 3 T11C 002 29 received from Polish Ministry of Education and Science.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andorf, C.M., Dobbs, D.L., Honavar, V.G.: Discovering protein function classification rules from reduced alphabet representation of protein sequences. In: Proceedings of the Conference on Computational Biology and Genome Informatics, Durham, North Carolina (2002)

    Google Scholar 

  2. Cannata, N., Toppo, S., Romualdi, C., Valle, G.: Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices. Bioinformatics 18(8), 1102–1108 (2002)

    CrossRef  Google Scholar 

  3. Fan, K., Wang, W.: What is the Minimum Number of Letters Required to Fold a Protein? J. Mol. Biol. 328, 921–926 (2003)

    CrossRef  Google Scholar 

  4. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)

    CrossRef  Google Scholar 

  5. Jaakkolay, T., Diekhansz, M., Hausslerz, D.A.: discriminative framework for detecting remote protein homologies (1999), http://www.cse.ucsc.edu/research/compbio/research.html

  6. Li, T., Wang, J., Fan, K., Wang, W.: How simple can the proteins be: from the prediction of the classes of protein structures. Modern Physics Letters B 17(5), 1–8 (2003)

    CrossRef  Google Scholar 

  7. Liu, X., Liu, D., Qi, J., Zheng, W.: Simplified amino acid alphabets based on deviation of conditional probability from random background. Physical Review E 66, 021906 (2002)

    Google Scholar 

  8. Miyazawa, S., Jernigan, R.L.: Residue-Residue Potentials with a Favorable Contact Pair Term and an Unfavorable High Packing Density Term for Simulation and Threading. J. Mol. Biol. 256, 623–644 (1996)

    CrossRef  Google Scholar 

  9. Murphy, L.R., Wallqvist, A., Levy, R.M.: Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Engineering 13(3), 149–152 (2000)

    CrossRef  Google Scholar 

  10. Palensky, M., Hesham, A.: A Genetic Algorithm for Simplifying The Amino Acid Alphabet. In: Computer Society Bioinformatics Conference, CSB2003 (2003)

    Google Scholar 

  11. Romero, P., Obradovic, Z., Dunker, A.K.: Folding minimal sequences: the lower bound for sequence complexity of globular proteins. FEBS Letters 462, 363–367 (1999)

    CrossRef  Google Scholar 

  12. Sakakibara, Y.: Learning context-free grammars using tabular representations. Pattern Recognition 38, 1372–1383 (2005)

    CrossRef  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Elena Marchiori Jason H. Moore Jagath C. Rajapakse

Rights and permissions

Reprints and Permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Lenckowski, J., Walczak, K. (2007). Simplifying Amino Acid Alphabets Using a Genetic Algorithm and Sequence Alignment. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds) Evolutionary Computation,Machine Learning and Data Mining in Bioinformatics. EvoBIO 2007. Lecture Notes in Computer Science, vol 4447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71783-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71783-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71782-9

  • Online ISBN: 978-3-540-71783-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics