Viruses with single stranded DNA genomes infect hosts that belong to all three domains of life and are considered to be economically, medically and environmentally important pathogens. Recent studies have shown that these single stranded DNA viruses exist in great numbers in highly diverse habitats, ranging from extreme geothermal springs to the gut of humans and other animals. International Committee on Taxonomy of Viruses currently classified single stranded DNA viruses into 10 different taxa. However, several viruses that can be classified into additional groups have been isolated and many of their genomes were sequenced. All single stranded DNA viruses are pathogenic on eukaryotes, possess non-enveloped, icosahedral
capsids, along with Microviridae
family members, which infects bacteria. Single stranded DNA viruses pathogenic on other prokaryotes have filamentous (Inovirus)
, rod-shaped (Plectrovirus)
, coil-shaped (Spiraviridae)
, or pleomorphic (proposed family“Pleolipoviridae”)
Single stranded DNA viruses are the group comprising of smallest viruses and their genomes are as small as 1–2 kb, encoding two proteins; one for capsid formation and the other for genome replication. Such irreducible simplicity of single stranded DNA viruses epitomizes their essence of being a virus and makes them an attractive model for investigating virus origins and evolution. Numerous metagenomic studies have revealed a high range of genetic diversity existing in single stranded DNA viruses in the environment, suggesting a highly dynamic interaction between these viruses and their respective hosts. Also, single stranded DNA viruses with the smallest genomes and simplest proteomes
were found to be widespread in cellular chromosomes, providing new important insight into the evolution of these viral.
1.4.1 Genomes of Bacteriophages
Bacteriophages are the smallest viruses with simple genomes. Since their discovery in 1915 and 1917 by Fredrick Twort and Felix d’Herelle respectively, bacteriophages have been studied in many laboratories and are being used in a variety of practical applications. The Density of phage viruses present in the oceans is 106–107 particles per ml. It was estimated that the total population of the bacteriophages is 1031 particles and the ratio of environmental virus and bacteria are 5–10:1, after the validation of 1030 bacterial cells in the biosphere. Altogether, the prokaryotic population is highly dynamic, with an estimated number of ~1023 global infections per second. It has been hypothesized that oceanic bacteriophages infect bacterial cells at the rate of 1029 phage infections per day, which releases over 1011 kg of carbon from the biological pool per day. Over the past three decades, research on bacteriophages has revealed their abundance in nature, genome diversity, impact on the evolution of microbial diversity, their utilization in control of infectious diseases and their influence in regulating the microbial balance in the ecosystem has been explored, leading to a resurgence of interest in the phage research. Research on phages has played a pivotal role in the most significant discoveries, that were made in biological sciences right from the identification of DNA as the genetic material, in the elucidation of the genetic code, leading to the development of the molecular biology. Research on phages has continuously broken new grounds in our understanding of the basic molecular mechanisms of gene expression and their structure. In recent times, phage genomics has revealed novel biochemical mechanisms for replication, maintenance, and expression of the genetic material and is providing new insights into the origins of infectious diseases, utilization of phage gene products and even whole phage as an agent for the gene therapy.
In addition to the killing of bacterial cells, temperate phage genomes also carry toxins and other critical virulence factor genes that are important for many bacterial pathogens to infect human beings. Phages also contribute to the diversity of the bacterial community by serving as vectors for the transduction of different genetic alleles, such as antibiotic resistance genes, between bacterial cells. Phages also have great medical and nanotechnological potential. Strategies for using tailed phages for detecting bacteria, curing bacterial diseases through phage therapy or decontaminating surfaces have been implemented for almost 100 years in Russia and Georgia. These phages are currently being used to treat agricultural diseases as well as in the prevention of food contamination in western countries. Phage virions
are being developed as nanocontainers for specific chemical cargoes that can be delivered to specific targets.
Small size and the simplicity of isolation have made bacteriophages as the primary choice for the complete genome sequencing
. Phage φX174
is the first organism with the complete genome sequence of 5386 bases of single stranded DNA and λ phage genome is the first organism with double stranded DNA of 48,502 bp, followed by phage T7 genome of 39,936 bp. dsDNA tailed mycobacteriophage L5 is the first among non-E. coli phage genomes to be fully sequenced. Further, the sequencing of the bacteriophage genomes are propelled exponentially with two main objectives;
To understand the relationship between the phage genomes the evolutionary mechanisms that shaped these bacteriophage populations.
For increased utilization of bacteriophages in the development of tools, utilities, and techniques related to genetics and biotechnology.
Phage genomes display a considerable amount of variation in their size, varying from Leuconostoc phage
L5 (2435 bp) to Pseudomonas phage
201 (316,674 b). Tailed phages with double stranded DNA genomes vary in their size from >10 kbp to <15 kbp, consistent with their overall virion
structure and gene assembly, which encompass up to 15 kbp of the genome space. Siphoviruses
of the genome size 1.5–6 kbp are characterized by a long flexible non-contractile tail with a tape measure protein gene, whose length corresponds to the phage tail length Many phages with the morphologies similar to Siphoviruses have genomes longer than 20 kbp. Contrastingly, Myoviruses with contractile tails are the phages with larger genomes of >125 kbp and the Bacillus phage SPBc2, is the largest Siphoviral genome of the length 134,416 bp. The main reason for the absence of large Siphoviruses is still unknown.
1.4.2 Phage Genome Sequence Diversity
Bacteriophages are estimated to be the most widely distributed biological entity of the biosphere. They are found in all habitats of the world, where bacteria proliferate. Most of the viral population is dominated by bacteriophages, with double stranded DNA tailed phages, or Caudovirales, accounting for 95% of all the phages, possibly making up the majority of phages on the planet. However, phages belonging to other groups also occur abundantly in the biosphere, such as phages with different virions
, genomes, and lifestyles. Two key approaches were made for studying the viral diversity are metagenomics of total concentrated phage samples collected from the environment and a genome-by-genome strategy of individually isolated phages. These two approaches are compatible, having distinct outcomes. Metagenomics generates a large amount of sequence data, which provides a good insight into their diversity. Sequencing and analysis of individually isolated phages generate small data sets, which are structured into whole genomes. As phage genomes are architecturally mosaic, the availability of complete genomes contextualizes the complexities of their relationships. The nucleotide sequences of phage genomes with non-overlapping hosts rarely share sequence similarity, as noticed in the published genomes of four Streptomyces phages and available collection of 50 mycobacteriophage genomes. Phages infecting a common bacterial host are in genetic contact with each other, and they share common nucleotide sequences. Genomes of over 30 phages with common host have been isolated and sequenced from Pseudomonas, Staphylococcus, and Mycobacterium containing related sequences, with a few exceptions. Most of these phages share a very low or no sequence similarity, as illustrated by the nucleotide sequence comparisons of mycobacteriophages and Pseudomonas phages
1.4.3 Genome Mosaicism of Phages
Phages were evolved not only by the accumulation of mutations but also through the recombination events, during which they exchanged genetic material with other phages. These events have been suggested to explain the mosaic structure of the phages, arisen by comparison of two or more phage genomes. During the comparison of the genomes, nearly identical sequences alternate with merely similar sequences or completely divergent sequences. Such type of exchanges in bacteriophages was obtained by heteroduplex mapping in the early 1990s. Since then, numerous mosaics have been identified by sequence comparison, and the mosaic structure of bacteriophages is now a well-documented phenomenon. This mosaicism is also found to be ubiquitous among bacteria, where the genes are acquired through horizontal genetic exchange mostly through transduction, transformation, and conjugation. But, the extent of mosaicism is highly remarkable in phage genomes as evidenced by the increasing number of genomes available for comparative genomics
The mechanism of genome mosaicism in bacteriophages can be understood at two levels; 1. by comparing nucleotide sequence through DNA heteroduplex mapping, 2. by comparing their DNA sequences. There are two models which explain the recombination mechanisms that are responsible for these patterns. Model 1 describes the role of short conserved boundary sequences that are located at gene junctions in targeting various exchange events that are catalyzed by homologous recombinations
, by using the recombinases
synthesized by either host-or phages. Model 2 attempts to explain that the homologous recombination events are not specifically targeted and occur randomly with the preference of a few short sequences so that most of the events results in non-functional genomic trash. Comparison of the predicted amino acid sequences encoding phage gene products is an alternative manifestation of mosaicism. This is an informative approach, since many phages including those that infect common hosts may not share any nucleotide sequence information. In that case, protein sequence data reveals genes that share much older ancestry.
1.4.4 Genomes of Enterobacteria Phage M13 and λ Phages
M13 Enterobacteria phage infects E. coli. The genome of M13 phage consists of 6.4 kb single-stranded, (+) sense, circular DNA, which encodes for 10 genes. Unlike most icosahedral
, the capsid of M13 phage is filamentous, which can be expanded by the addition of further protein subunits. Hence, the genome size can also be increased by the addition of extra sequences in the nonessential intergenic region without becoming incapable of being packaged into the capsid (Fig. 1.3).
In λ phage, the packaging constraints are much more rigid with DNA of ∼46–54 kbp of the normal genome size can be packaged into the virus capsid and the substrate packaged into the phage heads during assembly consists of long concatemers of phage DNA that are produced during the later stages of vegetative replication. The DNA is apparently reeled into the phage head and after the incorporation of the complete genome, DNA is cleaved at a specific sequence by a phage-coded endonuclease, leaving a 12-bp 5¢ overhang on the end of each of the cleaved strands, known as the cos site. Hydrogen bond formation between these ‘sticky ends’ can result in the formation of a circular molecule (Fig. 1.4).
In a newly infected cell, the gaps on either side of the cos site are closed by DNA ligase, and resulting circular DNA undergoes vegetative replication and integration into the bacterial chromosome
1.4.5 The Genome of T4 Phage
Bacteriophages T2 and T4
are the model organisms playing an instrumental role in the development of modern genetics and molecular biology since the 1940s. They were involved in the development of many salient concepts related to biological sciences, including the recognition of nucleic acids as genetic material, identification of a gene through structural, mutational, recombinational, and functional analyses, in the demonstration triplet genetic code, in the identification of mRNA and establishing the importance of recombination in the replication of DNA, in the light-dependent and light-independent DNA repair mechanisms, restriction and modification of DNA, self-splicing introns
in prokaryotes, etc. The main advantage of using T4 phage as a model system is its capability of totally inhibiting its host’s gene expression, permitting the investigators to identify the differences between host specific and phage specific macromolecular syntheses. Analysis of the T4 capsid assembly and functioning of its nucleotide-synthesizing complex, replisome, and recombination complexes has led to important insights into macromolecular interactions, substrate channeling, and co-operation between phage and host proteins within such complexes.
The genome of T4 phage is considered as the best avenue for understanding and evaluating the complete genome of a well organized biological system. On the basis of all available information, T4 phage genome comprises of ~300 probable genes, packed into a 168,903 bp genome. This genome comprises 289 expressing genes, 8 tRNA genes, and a minimum of 2 genes that encodes small, stable RNAs with unknown function. Genes 16, 17, and 49 contains multiple coding regions that encode more than one protein. T4 phage genome is four times higher than that of Herpesviruses
and yeast, two times higher than E. coli. A very small number of genes contains non-coding regions of ~9 kb, accounting for 5.3% of the genome. Regulatory regions in this phage genome are compact, occasionally with overlapping coding regions. Another significant feature of this genome is the overlap of one gene’s termination codon with the start codon of the next one. T4 phage has several groups of nested genes. It was found that only 62 genes in this organism are absolutely essential under standard laboratory conditions (rich medium, aeration, 30 to 37 °C). Mutants generated by altering a few other genes produced very small plaques under similar standard laboratory conditions. Many of the 62 essential genes are larger than an average T4 gene, occupying half of the genome. Essential genes encode proteins of the replisome and nucleotide precursor complex, transcriptional regulatory factors, and proteins involved in the structure and assembly of the phage particle. The genome of T4 phage
illustrates another rare molecular feature of certain linear viral genomes, terminal redundancy. Replication this phage genome produces long concatemers of DNA, which are cleaved by a specific endonuclease, gets incorporated into the particle with the length exceeding its complete genome due to the repetition of some genes at each end of the genome. Resulting T4 phage genome containing reiterated information is packed into the phage head.
Three T4 phage genes that encode for thymidylate synthase (td), subunit of the aerobic ribonucleotide reductase (nrdB) and the anaerobic ribonucleotide reductase (nrdD) are found to contain introns
that are later spliced out of these transcripts. A possibility of an unusual relationship between the nucleic acid sequence and protein sequence occurring through translational bypassing is demonstrated in gene 60 of the T4 phage genome. A 50 bp mRNA segment in the coding region of this gene is not translated by the regular mechanism. This mRNA segment is the only known and unique high-efficiency translational bypass site in the entire T4 phage genome.
DNA in the genome of this phage contains only 34.5% GC, compared with its host genome of E. coli consisting of 50% GC. In the genome, 18 of the known or predicted genes containing less than 60% AT and 4 predicted genes have less than 58%. Capsid proteins, which are the most widely conserved among the T4-related phages have the lowest AT contents. Gene 23, which encodes for the major head protein, has the lowest AT content of 55%. A substantial decrease in the pairing of G against C in the coding strands of translated regions has been identified. 4 genes having more than 20% C in the coding strand, while more than 130 genes have more than 20% G and 37 genes have more than 22% G. A and T are equally divided between the coding strands. However, some AT bias has been identified in the T4
phage genome, which is stronger in the third position of codons, as expected in genomes with a high amount of AT-rich regions.