Two proteins for the price of one: the design of maximally compressed coding sequences
- 69 Downloads
The emerging field of synthetic biology moves beyond conventional genetic manipulation to construct novel life forms which do not originate in nature. We explore the problem of designing the provably shortest genomic sequence to encode a given set of genes by exploiting alternate reading frames. We present an algorithm for designing the shortest DNA sequence simultaneously encoding two given amino acid sequences. We show that the coding sequence of naturally occurring pairs of overlapping genes approach maximum compression. We also investigate the impact of alternate coding matrices on overlapping sequence design. Finally, we discuss an interesting application for overlapping gene design, namely the interleaving of an antibiotic resistance gene into a target gene inserted into a virus or plasmid for amplification.
KeywordsGene compression Gene design Protein design Overlapping genes Sequence design algorithms Synthetic biology Dynamic programming
This research was partially supported by NSF grants EIA-0325123 and DBI-0444815. We thank Eckard Wimmer for his interest and support. We also thank Chen Zhao, Huei-Chi Chen, and Rahul Sinha for discussions and contributions to this research.
- Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P (2002) Molecular biology of the cell. Garland Science, New YorkGoogle Scholar
- Cann AJ (1993) Principles of molecular virology. Academic Press, LondonGoogle Scholar
- Fukuda Y, Washio T, Tomita M (1998) Evolution of overlapping genes: Comparative genomics of mycoplasma genitalium and mycoplasma pneumoniae. Genome Inform 9:254–255Google Scholar
- Gilis D, Massar D, Cerf NJ, Rooman M (2001) Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol 2(11):1–12Google Scholar
- Keese P, Gibbs A (1992) Origins of genes: “big bang” or continuous creation? Proc Natl Acad Sci 89:9489–9493Google Scholar
- Kodumal S, Pael K, Reid R, Menzella H, Welch M, Santi D (2004) Total synthesis of long DNA sequences: synthesis of a contiguous 32-kb polyketide synthase gene cluster. Proc Natl Acad Sci 44:15573–15578Google Scholar
- Krakauer DC (2000) Stability and evolution of overlapping genes. Evolution 54(3):731–739Google Scholar
- Oppenheim D, Yahofsky C (1980) Translational coupling during expression of the tryptophan operon of E. coli. Genetics 95:785–795Google Scholar
- Skiena S (2001) Designing better phages. Bioinformatics 17:253–261Google Scholar
- Skiena S, Wimmer E (2003) Gene design for vaccines and theraputic phages. NSF ITR Award 0325123Google Scholar
- Smith H, Hutchison C, Pfannkoch C, Venter JC (2003) Generating a synthetic genome by whole genome assembly: phix174 bacteriophage from synthetic oligonucleotides. Proc Natl Acad Sci 100:15440–15445Google Scholar