Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Background

Until fairly recently, it has been customary, in the absence of clinically significant infection, to view the human organism as an isolated entity. In fact, the healthy human body always contains a large number of foreign cells and viruses (Virgin et al., 2009; Dethlefsen et al., 2007; Relman, 2002). There are more viral particles in the human body than microbial cells, which are ten times more numerous than eukaryotic (human) cells. Similarly, only about 1.5% of the human genome encodes recognizable “human proteins,” whereas approximately 45% our genome is retrotransposons, DNA transposons, and viral sequences. Most of the human-associated microbes and viruses, often found on “external” surfaces lining the lumens of organs such as the gut and oral/nasal cavities, participate in complex commensal or mutualistic relationships with their human host (Dethlefsen et al., 2007; Relman, 2002). Therefore, it is not advantageous to attempt to eradicate every virus and microbial cell from the body in response to infection. A new medical paradigm is emerging: an illness may be defined by a disruption of the normal “healthy” microbiome and/or virome, and that restoration of this state, not elimination of all nonhuman organisms, should be the goal of medical treatment (Harrison, 2007). Current interest in the human microbiome reflects the increasing acceptance of the view that the microbiota per se should not be seen merely as invasive disease vectors but are in fact an intrinsic part of the human supra-organism (Dethlefsen et al., 2007).

The classical method of viral isolation is by culturing. Koch’s postulates (Rivers, 1937) dictate the conditions under which a virus cultured in vitro should be regarded as the cause of an infectious disease; human viruses are usually cultured only in this context. In addition, culturing will be successful only for the small fraction of viruses for which appropriate culture conditions can be determined. To break from the limited view that all viruses are intrinsically harmful requires new methodologies that enable us to characterize entire uncultured viral communities. A culture-independent metagenomics approach to viral community analysis will yield a broader view of the human virome, just as metagenomic sequencing has revealed a wider range of bacteria in the human microbiome than culture-based methods (Harris et al., 2007; Rogers et al., 2004).

Metagenomics

A viral metagenome or virome is the total genetic (DNA and RNA) sequence derived from a viral community. Mathematically, the structure of a community may be represented by a graph whose functional form (lognormal, power function, etc.) reflects the relative abundance distribution of its members. The evenness of the distribution (fractional contribution of each genotype), along with the richness (the total number of genotypes), are often combined to denote the diversity of a community, as in the Shannon–Wiener index (H ),

$$H' = - \sum\limits_{i = 1}^S {r_i } \ln r_i $$
((4.1))

where S is the sample richness, and r i is the relative abundance of genotype i. Viral communities tend to be unevenly distributed, with a small number of species or genotypes dominating in abundance (Fig. 4.1).

Fig. 4.1
figure 1

Example of a human viral community rank-abundance curve using sequences from a human oropharyngeal metagenome. Here the relative frequencies of BLAST n hits to a viral sequence database follow a relationship that can be approximated by a power-law equation of the type \(y = a\;x^{ - b}\) (Willner et al., 2010)

Metagenomics has been greatly facilitated by recent advances in sequencing technology. Pyrosequencing (Roche/454 Life Sciences), as well as other technologies (e.g., Solexa, SOLiD), enable routine DNA sequencing on the scale of 108 bp. All of these new high-throughput methods replace traditional cloning in bacteria with mechanical separation of DNA molecules by some means (e.g., emPCR with DNA immobilized on beads for 454). This requires the creation of a minimally biased DNA library that better reflects the viral community in the sample. At present, the original DNA sample must often be amplified before sequencing, increasing the opportunity for artificial over- or underrepresentation of particular sequences. Despite this limitation, these methods appear to avoid most of the problems associated with conventional cloning, which is subject to strong sequence bias against some “unclonable” sequences. This phenomenon appears to be particularly pronounced in attempts to clone viral sequences. Microarrays, as well, remain semi-quantitative detection methods because it is impossible to simultaneously optimize the hybridization of thousands of individual sequences (Table 4.1).

Table 4.1 Examples of viruses detected in human samples by metagenomic methods

Approximate Number and Distribution of Viruses in the Human Body

How Many Viruses Are There in a Human?

We can approach this question from two directions: estimation of the number of phages expected based on the size of the human microbiome and the typical viral (phage):host ratio, or by direct counts of viruses in samples from healthy individuals. The human body is composed of about 1013 cells (Savage, 1977). There are about 10 times this number of microbial cells associated with the healthy human body (Savage, 1977). The observed ratio of 7–10 viral-like particles per microbial cell in environmental (Rohwer, 2003) and human samples (Furlan, 2009) means that we could expect to find about 1015 phages in the body. It is possible to compare this prediction with results from recent studies. The data in Table 4.2 are from direct counts of viruses using epifluorescence microscopy. These data indicate the presence of approximately 3 × 1012 viruses in the body.

Table 4.2 The diversity and estimated viral load of nonviremic humans

Abundance of Viruses at Specific Body Sites

Wherever microbes (bacteria and archaea) are present, their viruses will be found. Thus in the human body, the regions of high microbial levels, in particular the gut, also have the highest abundance of viruses. Other organ systems with mucus membranes, such as the nasal and oral cavities and vagina, harbor a smaller but significant viral community.

What Types of Viruses Inhabit the Human Host?

Compared with environmental viral communities, the diversity of the human virome is low. We estimate that there are 1,500 viral genotypes in a typical healthy, human virome. By contrast, 1 kg of marine sediment will contain at least ten thousand, and perhaps a million, viral genotypes. The human-associated viruses are unevenly distributed, with the bulk of the virome composed of a handful of dominant species (Table 4.3). In the limited data available to date, it appears that a disease state is correlated with an increase in the diversity of the virome (Willner et al., 2009a). Most of the viruses are phages. There are also certain eukaryotic viruses, such as herpesviruses, anelloviruses, and papillomaviruses, that are ubiquitous in the human virome and tend to cause few problems considering their abundance (Virgin et al., 2009). See also Fig. 4.3.

Fig. 4.2
figure 2

Virus-like particles (VLPs) from asymptomatic individuals. (a) respiratory tract; (b) gut. Viruses were purified and concentrated by CsCl density gradient centrifugation as described in Breitbart et al. (2003). The VLPs were visualized by capturing on a 0.02-μm Anodisc filter, SYBR Gold staining, and viewing by epifluorescence microscopy

Fig. 4.3
figure 3

Residence time vs. symbiotic modality of selected viruses. EBV epstein–barr virus, HIV human immunodeficiency virus, HSV herpes simplex virus, TTV Torque Teno virus, PMMV pepper mild mottle virus

Table 4.3 Comparison of diversity indices of typical environmental and human metagenomes. Values calculated using PHACCS (Angly et al., 2005) utilize all data, not merely identified sequences

Phage Community

Commensal microbes are ubiquitous in the healthy human body (Dethlefsen et al., 2007; Wilson, 2005), occupying niches on skin (Grice et al., 2008, 2009), distal gut (Gill et al., 2006; Turnbaugh et al., 2009), vagina (Hyman et al., 2005). As a result, viruses that infect microbes (phages) are numerous (Letarov and Kulikov, 2009) and have been found in the gut (Reyes et al., 2009), nasopharynx (Allander et al., 2005), oropharynx (Willner et al., 2010), oral cavity (Hitch et al., 2004), blood (Breitbart and Rohwer, 2005), and lung secretions (Willner et al., 2009a).

Phages comprise by far the majority of the human virome (Willner et al., 2009a, 2010) and can be expected to exert an influence on the human microbial community (Gill et al., 2006; Hendrix, 2005) that parallels the interactions observed in a variety of environmental samples (Letarov and Kulikov, 2009; Weinbauer, 2006; Rodriguez-Mueller et al., 2010; Breitbart et al., 2005). By killing specific host organisms, phages regulate the absolute and relative abundance of microbial species (Breitbart et al., 2005). Genetic variation in the hosts is therefore favored as a means of escaping phage predation (Kunin et al., 2008). In addition, phages are major vehicles of DNA transfer to and from host cells (horizontal gene transfer) through both lytic and lysogenic pathways (Little, 2005), potentially conferring new phenotypes that can increase the pathogenicity (Breitbart et al., 2005) or the fitness (Sharon et al., 2009; Wagner and Waldor, 2002) of the host. Analysis of the phage metagenome can thus provide information not only about potential host taxonomy, but also reveal potential metabolic pathways available to the microbial community (Willner et al., 2009a; Sharon et al., 2009). Box 4.1 shows the “core” phage metagenome found in the human lower respiratory tract: 19 phage types that were all present in five normal control subjects and five cystic fibrosis patients (Willner et al., 2009a) (Fig. 4.2).

Eukaryotic Viruses

Viruses capable of infecting the human host (“eukaryotic viruses”), while obviously present in diseased individuals, can also be found in healthy subjects (Virgin et al., 2009; Willner et al., 2009a, 2010). In asymptomatic subjects, the abundance of these viruses is far lower than that of phages in the healthy human body (Willner et al., 2009a, 2010). Depending on the area of the body under examination, the presence of eukaryotic viruses will be due to either transient environmental exposure of accessible regions (e.g., the lungs) or chronic infections that do not give rise to recognizable clinical symptoms. The lack of symptoms might reflect a low-level viral infection that is successfully suppressed by the immune system at an early stage, or perhaps a commensal virus that causes no apparent harm (Virgin et al., 2009; Stapleton et al., 2004; Okamoto, 2009; Antonsson et al., 2000). An example of the latter is Torque Teno Virus (TTV), which was originally thought to be associated with a form of hepatitis, but now seems likely to be a ubiquitous but benign commensal virus (Okamoto, 2009). Instances of true viral–human mutualism in this context are not yet well understood, but it has been suggested that co-infection with GB Virus Type C (originally termed Hepatitis G virus) reduces mortality in HIV-infected individuals (Stapleton et al., 2004). Box 4.2 shows the “core” eukaryotic viral metagenome found in the human lower respiratory tract: 20 viruses that were all present in five normal control subjects and five cystic fibrosis patients (Willner et al., 2009a).

Residence Time and Pathogenicity

We can characterize viruses by their persistence (residence time in the body) and the degree of mutualism they exhibit (Fig. 4.3). The viruses that comprise the core human virome are relatively persistent (never cleared from the body). This distinguished them from pathogenic viruses causing acute and short-lived infections. There are, however, a number of pathogenic viruses such as herpesviruses that may persist in the body in an intracellular form, only to cause sporadic shedding of viral particles. Still other viruses are transient but common members of the human virome. Plant viruses such as PMMV are taken in with food and pass directly through the digestive tract (Zhang et al., 2005).

Viral Metagenomics Methods

Investigation of the human virome has recently been accelerated by technological and methodological developments. The methods fall into three categories: viral nucleic acid isolation, DNA sequencing, and data analysis. For a review of methods in viral metagenomics see Delwart (2007).

Recovery from Microarrays

The SARS coronavirus was discovered by hybridizing nucleic acids to an array (Virochip) that contained sequences representing all fully sequenced viruses, physically removing the annealed DNA from the array, and PCR amplifying this DNA using primers complementary to linkers that had been added (Kistler et al., 2007; Wang et al., 2002; Chiu et al., 2008). The prime example of this approach is the cloning and sequencing of the SARS coronavirus (Ksiazek et al., 2003). Limitations of the method are that it will only succeed with viruses that share significant homology with previously known viruses and that simultaneous optimization of multiple hybridizations on an array may be impossible.

Random RT-PCR

There are several variations of randomly primed reverse-transcription PCR (RT-PCR) for amplification of RNA viral sequences. Viral RNA is converted to cDNA using primers containing random octamers for both first- and second-strand synthesis, followed by PCR amplification. These methods have been successful in identifying many RNA viruses from human samples. Examples can be found in Victoria et al. (2009), Nakamura et al. (2009), and Jones et al. (2005). The method may be limited by PCR amplification bias, but it is highly sensitive.

Virus Purification and Phi29 Amplification

DNA viral metagenomes, including many phages, have been sequenced by purification of viral particles by CsCl density gradient centrifugation, DNase treatment, DNA isolation, and random amplification with Phi 29 DNA polymerase. Examples are respiratory tract metagenomes (mostly phages) from CF and non-CF subjects (Willner et al., 2009a) and an oropharyngeal metagenome from pooled samples from 19 healthy individuals (Willner et al., 2010). Limitations are potential amplification bias (Phi29 polymerase favors small circular and large linear genomes). This method has proved more successful for DNA than for RNA viruses.

Sequencing Methods

Due to the “untargeted” nature of metagenomics, and the often unavoidable contamination of viral nucleic acids with large amounts of human DNA, high-throughput sequencing has been essential. To date, the Roche/454 Life Sciences GS-FLX platform has been at the forefront of this technology, particularly because long sequence reads are necessary for shotgun sequencing. Sequencing technology is currently experiencing an unprecedented expansion, however, and it would not be surprising to see a series of further significant changes in sequencing methodology in the near future.

Bioinformatics

Data analysis is often the most challenging aspect of metagenomics research because the results are not pre-filtered by culturing or another selection process. The desired information must be extracted from a very large data set. Bioinformatics methods can be divided overall into two categories: similarity-based and similarity-independent approaches.

Similarity-Dependent Analyses

The original and more conventional means of sequence data analysis is to find segments of similarity to known sequences by searching databases. The most common tools are the various versions of BLAST (McGinnis and Madden, 2004), which will find local similarities based on the nucleic acid sequence or the deduced amino acid sequence. Microarray hybridization patterns have also been used to characterize novel viral nucleic acids (Urisman et al., 2005). These approaches are limited when the sample contains novel viruses that share little similarity with known viruses. Viruses in particular are subject to great variations in sequence composition. A large percentage of the sequences in a typical viral metagenome will not resemble any known sequences with any significance.

Metabolic Pathways

A metagenome can be characterized not only by taxonomy, but also by the cumulative metabolic potential encoded by the metagenome (Meyer et al., 2008). In the case of viral sequence data derived from lung sputum from CF patients and healthy subjects, the disease state of individuals correlated more strongly with the metabolic potential of viral metagenomes than with the taxonomic analysis (Willner et al., 2009a). In many cases the phage community appears to carry genes that complement the functions of the microbial community. In particular, phages often seem to use genes for proteins that will increase the short-term energy output of the host cells, either to increase viability (lysogeny) or to boost the production of viral particles (lytic). Some bacteria, such as cholera, are dependent on phage infection to achieve their virulence.

Similarity-Independent Analyses

More recently, similarity-independent methods have been developed that do not require database searches. For example, PHACCS (Angly et al., 2005) uses contig spectra derived from the sequence data to infer the diversity of genotypes present in the original sample. Other methods enable the comparison of one metagenome to another on the basis of relative abundance of shared sequences. These methods will not identify the unknown viruses, but they can help to characterize the sample by defining the overall complexity of the community. Other methods involve analysis based on the percent G/C content of genomes or the relative frequency of various dinucleotide combinations (Karlin et al., 1997; Burge et al., 1992; Karlin, 1998; Willner et al., 2009b), which in some cases is diagnostic of particular taxa.

Uncharacterized Viral Diversity

When viruses are purified from any human or environmental sample, the extracted DNA inevitably yields a large number of sequences (usually 70–99%) that show no significant similarity to any known sequences (Fig. 4.4) (Willner et al., 2009a, 2010; Jones et al., 2005).

Fig. 4.4
figure 4

Unidentifiable sequences dominate the typical human viral metagenome. A similar phenomenon is observed in viral metagenomes from environmental samples

Provided that adequate precautions have been taken to avoid contamination with nonviral nucleic acids, this suggests that a very large fraction of the existing viral diversity remains uncharacterized. One of the strengths of the “untargeted” approach to viral metagenomics is that these sequences are obtained, but understanding the origin and significance of the “unknown” viral sequences is a substantial bioinformatic challenge that has yet to be solved. If a sequence has no similarity to the DNA of known organisms as defined by BLAST (McGinnis and Madden, 2004) or similar search algorithms, other methods must be developed for this purpose. For example, genome organization patterns such as large-scale arrangements of open reading frames or regulatory elements (promoters, enhancers, and origins of replication) may be signatures that would identify sequences as being of viral origin. This approach would likely require long sequences or even complete genomes to be successful.

Implications for Medical Care

An accurate assessment of the normal human virome provides a reference point from which to detect any novel viruses. This will serve as a background against which an emerging pathogen or bioterrorism agent would appear in the human population through suitable screening programs. The health of the human subject should be judged by variation from the true “community” that it is, not by the assumption that no nonhuman entities should be present. This is analogous to restoration of a disturbed ecosystem. Knowledge of the normal viral community and assessment of any perturbations found in patients may enable physicians to diagnose disturbances of the microbiome.