Two Proteins for the Price of One: The Design of Maximally Compressed Coding Sequences
The emerging field of synthetic biology moves beyond conventional genetic manipulation to construct novel life forms which do not originate in nature. We explore the problem of designing the provably shortest genomic sequence to encode a given set of genes by exploiting alternate reading frames. We present an algorithm for designing the shortest DNA sequence simultaneously encoding two given amino acid sequences. We show that the coding sequence of naturally occurring pairs of overlapping genes approach maximum compression. We also investigate the impact of alternate coding matrices on overlapping sequence design. Finally, we discuss an interesting application for overlapping gene design, namely the interleaving of an antibiotic resistance gene into a target gene inserted into a virus or plasmid for amplification.
KeywordsGene Pair Antibiotic Resistance Gene Substitution Matrice Human Disease Gene Alternate Reading Frame
Unable to display preview. Download preview PDF.
- 6.Skiena, S., Wimmer, E.: Gene design for vaccines and theraputic phages. NSF ITR Award 0325123 (2003)Google Scholar
- 9.Fukuda, Y., Washio, T., Tomita, M.: Evolution of overlapping genes: Comparative genomics of mycoplasma genitalium and mycoplasma pneumoniae. In: The Ninth Workshop on Genome Informatics (1998)Google Scholar
- 10.Cann, A.J.: Principles of Molecular Virology. Academic Press, London (1993)Google Scholar
- 12.Krakauer, D.C.: Evolutionary principles of genomic compression. Comments on Theor. Biol. (2002)Google Scholar
- 13.Oppenheim, D., Yahofsky, C.: Translational coupling during expression of the tryptophan operon of e. coli. Genetics 95, 785–795 (1980)Google Scholar
- 21.Gilis, D., Massar, S., Cerf, N.J., Rooman, M.: Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol. 2(11) (2001)Google Scholar