Studying the gut virome in the metagenomic era: challenges and perspectives
The human gut harbors a complex ecosystem of microorganisms, including bacteria and viruses. With the rise of next-generation sequencing technologies, we have seen a quantum leap in the study of human-gut-inhabiting bacteria, yet the viruses that infect these bacteria, known as bacteriophages, remain underexplored. In this review, we focus on what is known about the role of bacteriophages in human health and the technical challenges involved in studying the gut virome, of which they are a major component. Lastly, we discuss what can be learned from studies of bacteriophages in other ecosystems.
Introduction to the virome
With an estimated population of 1031, viruses are the most numerous biological entities on Earth, inhabiting diverse environments ranging from the oceans to hydrothermal vents to the human body . The human body is inhabited by both prokaryotic (mostly bacterial) and eukaryotic (mostly human) viruses. Researchers have historically focused on eukaryotic viruses because of their well-known impact on human health, including the influenza virus that causes seasonal flu epidemics and the viruses that cause devastating health consequences like HIV and Ebola. However, increasing evidence suggests that prokaryotic viruses can also impact human health by affecting the structure and function of the bacterial communities that symbiotically interact with humans [2, 3]. The viruses that infect bacteria, called bacteriophages, can play a key role in shaping community structure and function in ecosystems with high bacterial abundance [4, 5] such as the human gut.
Bacteriophages: dynamic players in ecosystems
Bacteriophages are the most abundant group of viruses and are obligatory parasites propagating in bacterial hosts. The potential host range is phage-specific and can vary from only one bacterial strain to multiple bacterial species. During infection, a bacteriophage attaches to the bacterium surface and inserts its own genetic material into the cell. The bacteriophage then follows one of two main life cycles: a lytic cycle or a lysogenic cycle.
Lytic cycles are lethal to host cells and culminate in the production of new phages. Well-known examples of viruses with lytic cycles are the T7 and Mu phages that mainly infect Escherichia coli. These phages initially hijack the bacterial cell machinery to produce virions. Thereafter, the bacterial cell is lysed, releasing 100–200 virions into the surrounding environment where they can infect new bacterial cells. They can thus play an important role in regulating the abundance of their host bacteria.
In contrast, a lysogenic cycle refers to phage replication that does not directly result in virion production. A temperate phage is a phage that has the ability to display lysogenic cycles. Under certain conditions, such as DNA damage and low nutrient conditions, these phages can spontaneously extract themselves from the host genome and enter the lytic cycle . This excision, called induction, may occur with the capture of specific parts of the bacterial genome. The ability of phages to transfer genes from one bacterium to another by means of lysogenic conversion or transduction (as reviewed in ) can lead to increased diversification of viral species and of their associated bacterial host species. These phenomena may cause the spread of toxins, virulence genes, and possibly antibiotic resistance genes through a bacterial population . A well-known example of temperate phage is the phage CTXφ of Vibrio cholera that alters the virulence of its bacterial host by incorporating the genes that code for the toxin that induces diarrhea . Phages may thus serve as important reservoirs and transmitters of genetic diversity. The classification of phages based on their life cycle is a topic of much debate  and variations of life cycles like pseudolysogeny and carrier-states have been proposed [11, 12].
In the human gut ecosystem, temperate bacteriophages dominate over lytic bacteriophages [13, 14, 15]. It is believed that the majority of bacterial cells have at least one phage inserted into their genome, the so-called prophage. Some prophages may be incorporated in bacterial genomes for millions of generations, losing their ability to excise from host genomes because of genetic erosion (degradation and deletion processes) . These prophages, which are called cryptic or defective, have been shown to be important for the fitness of the bacterial host  and thus represent an essential part of a bacterial genome.
Major hallmarks of the human gut virome
The human gut virome develops rapidly after birth
During early development, the virome, like the bacteriome, is extremely dynamic [18, 19, 20]. In 2008 Breitbart et al., using direct epifluorescent microscopy, concluded that meconium (earliest infant stool) contained no phages . Just 1 week later the infant stool contained 108 viral-like particles (VLPs) per gram of feces . Similar to the bacteriome, the infant virome was found to be less diverse than that of adults . The exact mechanism of the origin of phages in the infant gut has yet to be identified, although one hypothesis could be that the phages arise as a result of the induction of prophages from gut bacteria. Numerous other factors are also thought to shape the infant gut virome, including environmental exposures, diet, host genetics, and mode of delivery [15, 19, 20]. McCann et al. compared the virome of infants born via vaginal delivery to that of infants born via cesarean delivery and found that the alpha- and beta-diversity of the infant virome differed significantly between birth modes . The authors were able to identify 32 contigs that were differentially abundant by birth mode, including several contigs bearing high levels of nucleotide homology to Bifidobacteria temperate phages. This was thought to reflect differential colonization by Bifidobacterium with birth mode. Furthermore, an increased abundance of the vertebrate ssDNA virus Anelloviridae was found in infants born via vaginal delivery, suggesting its vertical transmission from mother to baby . The abundance of this virus had previously been shown to decrease after the age of 15 months , but it nonetheless remains highly prevalent in humans worldwide . Diet may also play a role in colonization of infant gut, as Pannaraj et al. showed that a significant proportion of bacteriophages were transferred from mothers to infants through breast milk . Despite these interesting results, only a few studies to date have investigated the infant virome longitudinally. In 2015, Lim et al. conducted a longitudinal study of the virome and bacteriome in four twin pairs, from birth to 2 years, and found that the expansion of the bacteriome with age was accompanied by a contraction and shift in the bacteriophage composition .
The human gut virome consists mostly of bacteriophages
The human gut virome is temporally stable in each individual but shows large inter-individual diversity
A study by Minot et al. showed that approximately 80% of the phages in a healthy adult male were maintained over a period of 2.5 years (the entire duration of their study) . This was recently also demonstrated by Shkoporov et al., who found that assemblies of the same or very closely related viral strains persist for as long as 26 months . This compositional stability was further reflected in stable levels of alpha-diversity and total viral counts, suggesting that viral populations are not subject to periodic fluctuations . In a longitudinal study where six individuals were exposed to a short-term fat- and fiber-controlled dietary intervention, the gut virome was shown to be relatively stable in each individual . The same study also showed that interpersonal variation in the gut virome was the largest source of variance, even among individuals following the same diet .
The large inter-individual variations in the virome are consistent with those seen in the bacteriome and appear largely due to environmental rather than genetic factors. It was recently shown in a cohort of monozygotic twins that co-twins did not share more virotypes than unrelated individuals and that bacteriome diversity predicts viral diversity .
Interaction of the human gut virome with the bacteriome in relation to health
In recent years, numerous associations have been established between the human intestinal bacteriome and a number of diseases, syndromes, and traits . Support for these associations varies from anecdotal reports from individuals to results from large cohort studies. For example, in their large cohort study, Falony et al. found the core bacterial microbiome (i.e., the genera shared by 95% of samples) to be composed of 17 genera with a median core abundance of 72.20% . Other studies have shown that a large percentage of the gut bacteriome is represented by members of the Firmicutes and Bacteroidetes, and that their relative levels change in individuals with conditions such as obesity, inflammatory bowel disease (IBD), and diabetes [44, 45, 46]. This suggests the existence of a “healthy” bacteriome that is disrupted in disease.
In recent years there have also been attempts to characterize a “healthy gut phageome”. In 2016, Manrique et al. used ultra-deep sequencing to study the presence of completely assembled genomes of phages in 64 healthy people around the world . The authors proposed that the phageome could be split into three parts: i) the core, which is composed of at least 23 bacteriophages, one of them crAssphage, found in > 50% of all individuals; (ii) the common, which is shared among 20–50% of individuals; and (iii) the low overlap/unique, which is found in a small number of individuals. The latter fraction represented the majority of found bacteriophages in the whole dataset . This study, amongst others, suggests that a core virome should not be determined as strictly as the core bacteriome has thus far been defined. Therefore, crAssphage, the abundance of which was not associated with any health-related variables, is likely to be a core element of the normal human virome .
An attractive model to study bacteria–phage interactions is through the use of gnotobiotic mice, which are colonized with a limited collection of bacteria that are well characterized yet still complex . Recently, Hsu et al. colonized gnotobiotic mice with a defined set of human gut commensal bacteria and subjected them to predation by cognate lytic phages . This revealed that phage predation not only directly impacted susceptible bacteria, but also led to cascading effects on other bacterial species via interbacterial interactions . Fecal metabolomics in these mice revealed that phage predation in the mouse gut microbiota can potentially impact the mammalian host by changing the levels of key metabolites involved in important functions such as gastric mobility and ileal contraction .
Bacteriophages and disease
Selection of studies on gut virome changes in humans in various disease states
Healthy twins (n = 8 pairs) versus twins discordant for severe malnutrition (n = 12 pairs)
Bacteriophage as well as members of the Anelloviridae and Circoviridae families of eukaryotic viruses discriminate discordant from concordant healthy pairs
Reyes et al. 2015 
Clostridium difficile infection (CDI)
CDI patients (n = 24) versus healthy controls (n = 20)
Treatment response in FMT associated with a high colonization level of donor-derived Caudovirales taxa in the recipient. Caudovirales bacteriophages may play a role in the efficacy of FMT in CDI
Zuo et al. 2018 
Inflammatory bowel disease (IBD)
Crohn’s disease (n = 16) and ulcerative colitis (n = 36) and household controls (n = 21)
Enteric virome richness was increased in Crohn’s disease and ulcerative colitis, and both forms of IBD were associated with a significant expansion of Caudovirales bacteriophages
Norman et al. 2015 
Colorectal cancer (CRC)
CRC cases (n = 74) and controls without CRC (n = 92) in Hong Kong. Validated in three independent European cohorts
Dysbiosis of the gut virome was associated with early- and late-stage CRC. A combination of four taxonomic markers was associated with reduced survival of patients with CRC
Nakatsu et al. 2018 
Acquired immune deficiency syndrome (AIDS)
HIV-negative (n = 40), treatment naïve (n = 40), and treated HIV patients (n = 40) in Uganda
Alterations in the enteric virome and bacterial microbiome were associated with low peripheral CD4 T cell counts rather than HIV infection alone
Monaco et al. 2016 
Type 1 diabetes (T1D)
11 infants from Finland and Estonia recruited at birth based on their HLA risk genotype and followed for 36 months
Significant enrichment of Circoviridae-related sequences in samples from controls in comparison with cases. Higher diversity and richness of bacteriophages in controls compared with cases
Zhao et al. 2017 
Type 2 diabetes (T2D)
T2D patients (n = 71) and non-diabetic Chinese adults (n = 74), validated in independent cohort
Observed a significant increase in the number of gut phages in the T2D group and identified seven phage operational taxonomic units specific to T2D. Significant alterations of the gut phageome not explained by co-variation with the altered bacterial hosts
Ma et al. 2018 
Healthy controls (n = 41), pre-hypertension (n = 56), and hypertension patients (n = 99) in China
Noted that certain viruses can be selected as biomarkers to distinguish healthy people, pre-hypertension people, and hypertension patients. Viruses had superior resolution and better discrimination power than bacteria for identifying hypertension samples
Han et al. 2018 
Parkinson’s disease (PD)
PD patients (n = 31) and control individuals (n = 28)
Identified shifts of the phage/bacteria ratio in lactic acid bacteria known to produce dopamine and regulate intestinal permeability, both major factors implicated in PD pathogenesis
Tetz et al. 2018 
It is important to keep in mind that, although many diseases show associations with various bacteriophages, it is extremely hard to establish causality. Furthermore, in these association studies it is difficult to establish whether alterations in the microbiome and virome are a cause or a consequence of the disease. Koch’s postulates are a set of criteria designed to establish a causative relationship between a microbe and a disease. In 2012, Mokili et al. proposed a metagenomic version of Koch’s postulates . In order to fulfill these metagenomic Koch’s postulates, the following conditions must be met: i) the metagenomic traits in diseased subjects must be significantly different from those in healthy subjects; ii) the inoculation of samples from a diseased animal into a healthy control must lead to the induction of the disease state; and iii) the inoculation of the suspected purified traits into a healthy animal will induce disease if the traits form the etiology of the disease . Many studies investigating the role of specific bacteriophages in human disease have been able to fulfill the first criterion and have found significant differences in viral contigs or specific phages between diseased and healthy individuals (Table 1). However, only a few of these studies are supported by animal experiments, and most of these experiments are in the form of fecal microbiota transplantation (FMT) rather than delivery of specific inoculated phages [62, 63]. Furthermore, the question of causality becomes even more complex when, as is often the case, multiple phages are likely to be involved in the etiology of a disease (Table 1).
It is known that both the gut virome and gut microbiome can be pathologically altered in patients with recurrent Clostridium difficile infection , and FMT has rapidly become accepted as a viable and effective treatment . Ott et al. described the greater efficacy of bacteria-free fecal filtrate transfer compared to FMT in reduction of symptoms in patients with C. difficile infection . The filtrate recovered from normal stool contains a complex of bacteriophages, as shown by analysis of VLPs from the filtrate, which suggests that phages may mediate the beneficial effects of FMT , although this could also be the effect of various metabolites.
Interestingly, phages can also directly influence human immunity. Recent research has shown phages to modulate both human innate and adaptive immunity (reviewed in ). One way in which phages can directly influence host immunity was described by Barr et al. as the Bacteriophage Adherence to Mucus model (BAM) . In BAM, phages adhering to mucus reduce bacterial colonization of these surfaces, thereby protecting them from infection and disease .
Since their discovery in the early twentieth century, lytic bacteriophages have been seen to have promising potential as antimicrobial agents, although this potential was broadly surpassed by the rapid development of antibiotics as our main antibacterial agents. Currently, the applications of lytic bacteriophages go far beyond their antimicrobial activity as they are now engineered as vehicles for drug delivery and vaccines [68, 69] and broadly used in molecular biology and microbiology [70, 71].
In recent years there have been some attempts to systematically study the effect of phages in trial settings. Yen et al. showed that prophylactic administration of a Vibrio cholerae-specific phage cocktail protects against cholera by reducing both colonization and cholera-like diarrhea in infant murine and rabbit models . In contrast, Sarker et al. showed that oral coliphages, though safe for use in children suffering from acute bacterial diarrhea, failed to achieve intestinal amplification and improve diarrhea outcome . This was possibly due to insufficient phage coverage and too low E. coli pathogen titers, meaning that higher oral phage doses were probably required to achieve the desired effect . These studies demonstrate how bacteriophage therapy is still in its infancy despite its long use in the field of medical sciences [74, 75, 76] and emphasize the need for more systematic fundamental in vitro studies, translational animal studies, and large, properly controlled, randomized controlled trials.
Studying the human gut virome
Challenges of studying human gut virome and possible solutions
Nucleic acid extraction
• Existence of active and silent fractions of viromes
• Total nucleic acid isolation protocols (TNAI):
+ Allow characterization of microbiome along with virome potential = holistic picture of all components of the microbiome
– Lead to inflation of false-positive hits from bacteria in the subsequent data analysis
• Viral-like particle (VLP) isolation protocols:
+ Ensure true positives on viruses due to physical removal of bacteria by filtration
– Give a low-concentration output  that may complicate the genomic library preparation step
• Combination of TNAI and VLP isolation protocol approaches 
Genomic library preparation
• Limited amount of viral genetic material available
• Use of more sensitive genomic library preparation kits
• Restricted use of MDA
• Studying RNA viruses requires additional effort due to the relative instability of RNA genetic material:
- Use of reverse transcriptase to convert RNA to cDNA
- Restricted usage of RNase in protocols handling both DNA and RNA viruses 
- May require separate isolation protocol (arising from the previous point) and, therefore, increase of the starting material
• Metatranscriptomics approaches
• Use of reverse transcription step
• Studying ssDNA viruses requires additional effort:
- The majority of current genomic library preparation procedures cannot handle ssDNA genomes due to the use of dsDNA adapters
- ssDNA viruses have been shown to have higher mutation rates than dsDNA viruses , thus increasing the microdiversity of the metagenome, which limits reference-based approach
• Use of ssDNA adaptors in adaptor-ligation reaction at the genomic library preparation step 
• Selection of an appropriate cut-off for coverage is complicated
• Removal of bacterial sequences is complicated by the viral signals from prophages (both cryptic and inducible) carried by bacterial genomes
• Use of tools for identification of prophages in bacterial genomes [87, 88, 89], though some are limited to known prophages. The combination of multiple methods has been shown to enrich the set of detected prophages  and therefore prevent their concurrent removal with bacterial sequences.
• Existing databases do not fully represent viral diversity 
• Use of de novo assembly approaches
• Rapid evolution and diversity of viral genomes limits reference-based approaches
• Use of a protein-based search
• Use of a profile hidden Markov model based on protein domains allows the identification of remote homologs 
• De novo assembly approach is sensitive to biases introduced during genomic library preparation and sequencing:
- Shifts in GC content during genomic library preparation  affect the completeness of genomes and cause assembly fragmentation
• Adjustment of the assembly pipeline according to applied genomic library preparation procedure : use of modes suitable for an uneven distribution of read coverage such as single-cell SPAdes [98, 99] preceded by read de-duplication  or Velvet-SC 
Sample collection and storage
The first challenge in gut-microbiome-related studies is the limited number of samples an individual can provide, particularly in the framework of biobanks and large-scale studies. Moreover, in low biomass samples such as viral communities from certain environmental ecosystems and human-related specimens, researchers need to be extremely careful of environmental contamination from kits and reagents .
Post-sampling, bacteria and bacteriophages remain in contact with each other and will continue having ecological interactions, which means that prolonged incubation of samples at room temperature can affect the ratio of microbes to the point that they are no longer representative of in situ conditions . Overcoming this issue requires extracting viral genetic material immediately after collection (if possible) or rapidly freezing samples at − 80 °C.
Nucleic acid extraction
Similar to gut microbiome studies, gut virome studies begin by isolating the genetic material from intestinal specimens (Fig. 3). Given the perceived predominance of DNA viruses in human stool [14, 15], current virome studies mainly use DNA extraction from fecal samples [78, 79, 80]. However, the current conception of gut virome composition might underestimate the abundance of RNA viruses. For example, RNase I is commonly used in VLP isolation protocols to remove free capsid-unprotected RNA of non-viral origin [78, 79]. However, RNase I has recently also been shown to affect the RNA-fraction of the virome . To get a true estimate of the RNA viruses in the sample, one needs to restrict the use of RNase I, although this might come at a cost of increased contamination (Table 2).
The main hurdle in studying the virome, however, is the parasitic nature of bacteriophages. Their ability to be incorporated into the host bacterial genome causes the nominal division of the virome into active (lytic phages) and silent (prophages) fractions (Table 2). Depending on the targeted fraction of the virome, DNA extraction protocols may differ substantially. For instance, the active virome is primarily studied through the extraction of DNA from VLPs obtained by filtration, various chemical precipitations [14, 15, 29, 47], and/or (ultra)centrifugation [106, 107]. In contrast to studying the active virome, the concurrent targeting of both the silent and active virome (so-called “virome potential”) requires total nucleic acid isolation (TNAI) from all the bacteria and viruses in the sample [56, 57, 58]. While both approaches have their pros and cons (Table 2), a combination of both is desirable, albeit expensive, because this will give the complete picture of the microbiome communities.
In addition to the exclusion of RNA viruses during the isolation of genetic material in some common extraction protocols, ssDNA viruses might also be overlooked. Sequencing of ssDNA virus genomes is difficult because of the limited number of genomic library preparation kits that allow in situ representation of ssDNA viruses without amplification bias (Table 2) . Thus, the current conception that the gut virome is predominantly composed of dsDNA viruses might be biased by the relative ease of processing dsDNA.
Genomic library preparation
At the step of preparation of genomic libraries, low viral biomass poses a new challenge since many existing genomic library preparation kits require inputs of up to micrograms of DNA, amounts that are rarely available for virome samples. Taking into account the perceived predominance of bacteriophages in human stool (see “Major hallmarks of the human gut virome” section), the typical input amount of DNA after the extraction step can be estimated as follows: the number of bacteriophages in 1 g of human feces is 109 [108, 109, 110] and the average genome size of a bacteriophage is 40 kbp  (Fig. 2), so the total amount of bacteriophage DNA in 1 g of human feces is 40 ∙ 109 kbp with the weight of 43.6 ng. Thus, depending on the elution volume (usually 50–200 μl), any VLP isolation protocol for stool will result in a minuscule concentration of bacteriophage DNA: [0.22–0.87] ng/μl. This is also the range observed in the benchmarking of VLP extraction protocols, although with variations that can reach an order of magnitude in some cases [78, 79, 80]. Therefore, the application of more sensitive kits that enable the handling of nano- and picograms of DNA input  or whole-(meta)genome amplification (WGA) is needed (Table 2). Although WGA has been shown to be a powerful tool for studying the human gut virome [19, 20], some WGA techniques, even non-PCR-based methods such as multiple displacement amplification (MDA), unevenly amplify linear genome fragments and might introduce biases into the representation of ssDNA circular viruses [82, 85]. Therefore, in the presence of MDA, the downstream analysis of viral community composition might be limited to presence-absence statistics because relative abundances might be biased towards specific viruses. Another type of WGA, adaptase-linker amplification (A-LA), is preferable for studying differentially abundant viruses since it keeps them quantifiable and allows unbiased representation . Moreover, A-LA allows the study of both ssDNA and dsDNA viruses compared to other quantitative WGA methods such as alternative linker amplification (LA) and tagmentation (TAG), which are mostly focused on dsDNA viruses [77, 85].
At the sequencing step, the selection of a coverage cut-off poses an additional challenge (Table 2). In general, as a very complex and diverse community, the virome requires ultra-deep sequencing , even though such sequencing might also complicate downstream analysis . Generally, the increase of coverage leads to an increase in the number of duplicated reads with sequencing errors. These duplicated reads might align to each other and create spurious contigs that prevent assembly of longer contigs [112, 113].
After overcoming the barriers faced in isolation and sequencing of virome communities, new challenges need to be overcome in the data analysis. Initially, it is necessary to discard human-host and bacterial-host reads that may introduce biases into the virome community profiling. While there are now many tools that remove nearly all human-related reads, filtering of bacterial reads may be challenging due to the presence of prophages within bacterial genomes. As inducible and cryptic prophages are important players in the gut ecosystem [16, 17], it is necessary to filter bacterial reads carefully since they may contain prophage genome sequences that should be taken into consideration during the virome analysis. There are now several tools that can identify prophage sequences in MGS data (Table 2).
Sequencing reads passing quality control are thereafter subjected to virome profiling. Currently, there are two general strategies for virome profiling based on MGS data: (i) reference-based read mapping and (ii) de novo assembly-based profiling (Fig. 3). Both strategies face challenges in the characterization of viral community (Table 2). The reference-based read mapping approach, which is the one broadly used in microbiome studies, is limited by a scarcity of annotated viral genomes . However, the enormous viral diversity and viral genetic microdiversity will also complicate de novo assembly of metagenomes [115, 116] (Table 2).
Rapid evolution, an innate feature of viruses that allow them to inhabit almost every ecological niche, leads to substantial intraspecies divergence . Although the human gut virome has been shown to be stable over time, partly due to the temperate character of the majority of human gut viruses, some members of the human gut virome can evolve quickly. For example, it has been shown for lytic ssDNA bacteriophages from Microviridae inhabiting the human gut that a 2.5-year period is sufficient time for a new viral species to evolve . This may limit the use of reference-based approaches in studying the virome, although some studies have successfully used this method for virome annotation in combination with the de novo assembly-based method [55, 118] (Table 2).
The de novo assembly of metagenomes that was successfully used for the discovery of CrAssphage  does not rely on the reference databases. Therefore, de novo assembly-based approaches give a more comprehensive estimation of the complexity of viral communities and viral dark matter (uncharacterized metagenomic sequences originating from viruses) (Fig. 3) . However, metagenome assembly outcome is highly dependent on the read coverage  since the default assembly workflow assumes an even coverage distribution for each genome . Some biases introduced during sample processing might affect the coverage distribution and therefore hamper de novo assembly in terms of completeness of genomes and assembly fragmentation. The sources of such bias include low DNA input for genomic library preparation [94, 95], use of A-LA [94, 96], and shifted GC content associated with MDA . In addition, it has been shown that the choice of sequencing technology has a minimal effect on the de novo assembly outcome , while the choice of assembly software crucially affects results  (Table 2).
Regardless of the method chosen for virome annotation, more challenges come at the step of taxonomy assignment to viral sequences. Currently, only 5560 viral species have been described and deposited with the International Committee on Taxonomy of Viruses (ICTV) . Despite the rapid growth of the ICTV database after it allowed the deposition of de novo assembled viral sequences that were not cultured or imaged  and the application of gene-sharing networks to viral sequences for taxonomy assignment , levels above genus are still unavailable for many known viruses. Nonetheless, there are reasons to be optimistic. The ICTV committee recently decided to expand the taxonomical classification of viruses to levels above rank and order , and the first-ever viral phylum  has already been reported. More higher-order ranks can be expected given the rise of pace and uniformity of novel viral genomes deposited .
Lessons from other ecosystems
Fortunately, the majority of the technical challenges described in Table 2 have already been addressed in studies of viral communities in other human organs (such as skin [125, 126] and lungs ) and in environmental ecosystems (such as seawater [128, 129] and soil ). Some of the solutions from environmental studies are now being applied to similar challenges in the human gut (Table 2). However, we still need a systematic approach to studying the gut virome as a complex community. Environmental studies have a long history of taking the entire complex community into account: from the sequencing of the first viral metagenome of an ocean sample in 2002  to the 2019 global ocean survey that revealed almost 200,000 viral populations . This is in striking contrast to human-oriented studies, which have often been limited to the identification of specific pathogens in order to combat them. Given this historical context, additional analytical approaches and hypotheses developed in cutting-edge viral ecogenomic studies of environmental samples might also be applicable to the human gut virome.
Many environmental studies have benefited from the use of multi-omics approaches [81, 116, 133]. For example, Emerson et al. showed the potential of bacteriophages to influence complex carbon degradation in the context of climate change . This has been possible partially due to the advantages of metatranscriptomics and the concurrent reconstruction of bacterial and viral genomes from soil metagenomics . Additionally, combining metaproteomic and metagenomic approaches has identified highly abundant viral capsid proteins from the ocean, and these proteins may represent the most abundant biological entity on Earth .
Next to these multi-omic approaches, viral metagenomic assembly can be complemented by single-virus genomics (SVG), which includes individual sequencing of the genome of the viruses once each viral particle has been isolated and amplified. Therefore, unlike de novo assembly of metagenomes, de novo assembly of SVG genomes can address viral genetic microdiversity and thereby enable the reconstruction of more complete viral genomes . SVG has identified highly abundant marine viral species that have, so far, not been found via metagenomic assembly . These newly identified viral species possess proteins homologous to the aforementioned abundant capsid proteins, confirming their widespread presence in oceans . Furthermore, another challenge of de novo assembly—the presence of low coverage regions—might be overcome through the use of long-read sequencing (> 800 kbp), which was recently shown to recover some complete viral genomes from aquatic samples .
In addition to the advances in data generation from viral communities, approaches to overcoming the problem of dominance of unknown sequences in viral metagenomes have been suggested in several environmental studies. Brum et al. used full-length similarity clustering of the proteins predicted from viral genomic sequences to reveal the set of core viral genes shared by samples originating from seven oceans, the diversity patterns of marine viral populations, and the ecological drivers structuring these populations . Taking into account the huge inter-individual variation of the human gut virome (see “Major hallmarks of the human gut virome” section), it might be useful to use a similar approach to identify the core viral genes in the human gut.
To understand the mechanisms behind the phage–host interaction in the context of the gut ecosystem, it might also be useful to use viral-encoded auxiliary metabolic genes (AMGs). The analysis of AMGs and their abundance in marine samples facilitated the identification of the role of bacteriophages in nitrogen and sulfur cycling by affecting the host metabolism . Furthermore, the study of viral communities in the polar region of the Southern Ocean highlighted the value of AMG analysis in understanding how lytic and temperate phages survive during seasonal changes in their bacterial host abundance, which follows the availability of nutrient resources . Another approach applied by Zeigler Allen et al. in the study of the marine microbiome community suggests using bacteriophage sequence signatures, together with measures of the virus/bacteria ratio and bacterial diversity, to evaluate the influence of viruses on the bacterial community instead of direct comparison of co-abundance profiles . This method redefined the viral infection potential and confirmed the role of bacteriophages in shaping the entire marine community structure.
Similarly, in soil ecosystems, where bacteria dominate over archaea and eukaryotes as they do in marine ecosystems, it has been shown that phages play an important role in defining ecosystem composition and function [81, 130, 139]. Moreover, in ecosystems such as anaerobic digesters, more than 40% of the total variation of the prokaryotic community composition is explained by the presence of certain phages, and this is much higher than the explanatory potential of abiotic factors (14.5%) . Studies in plants have also demonstrated that phages are a major factor influencing bacterial composition . However, the applicability of these findings to the human gut, which is also a bacteria-dominated ecosystem, has yet to be explored.
It is important to bear in mind that ecological concepts from one ecosystem might have limited applicability to another. Even if two ecosystems have similar viral community structures, the underlying ecological relationships may differ. For example, a predominance of temperate viruses was reported in a polar aquatic region . This predominance of temperate phages corresponds to that in the gut ecosystem. However, for the polar marine ecosystem, it was shown that temperate phages switch from lysogeny to lytic infection mode with the rise of bacterial abundance . This is opposite to the Piggyback-the-Winner model observed in the human gut, where temperate phages dominate over lytic phages when the bacterial host is abundant [142, 143]. This difference in ecological concepts between the gut and distinct marine ecosystem reflects the exposure to different factors of the environment. The polar aquatic region has a periodic nature owing to the change of seasons, while the gut ecosystem can be considered relatively stable (see “Major hallmarks of the human gut virome” section). Therefore, while human gut viromics might benefit from considering some cutting-edge approaches developed in environmental studies, caution should be exercised in extrapolating ecological concepts found in distinct ecosystems to situations pertaining to the human gut.
Given the fascinating and challenging nature of viruses, emerging evidence for the role of gut bacteriophages in health and disease and on-going paradigm shifts in our understanding of the role of certain viruses in other ecosystems, the further development of viromics is much warranted. Once we have overcome the current challenges of gut virome research, for example, through optimization of virome isolation protocols and expansion of the current databases of (un)cultivated viruses, future directions for development in the study of the human gut virome will be: (i) to establish a core gut virome and/or core set of viral genes through the use of large longitudinal cohort studies; (ii) to study the long-term evolution of bacteriome–virome interactions under the influence of external factors; and (iii) to establish the causality of the correlations with host-related phenotypes through the use of model systems, multi-omics approaches, and novel bioinformatic techniques, possibly including those inherited from environmental studies.
We thank Kate McIntyre for editing this review and Stella Ilchenko for help with graphical design of figures.
SG and TS researched the topics and wrote the manuscript. AK, JF, CW, and AZ gave scientific advice and wrote parts of the manuscript. All authors critically assessed the manuscript and read and approved the final version.
SG and TS hold scholarships from the Graduate School of Medical Sciences, University of Groningen and the Junior Scientific Masterclass, University of Groningen, respectively. AZ holds the Netherlands Organization for Scientific Research (NWO) Vidi grant (NWO-VIDI 016.178.056) and a European Research Council (ERC) starting grant (ERC Starting Grant 715772). JF holds an NWO-Vidi (NWO-VIDI 864.13.013). This work is also supported by a CardioVasculair Onderzoek Nederland (CVON 2018–27) grant to AZ and JF. CW is supported by an ERC advanced grant (FP/2007–2013/ ERC grant 2012–322698), an NWO Spinoza prize (NWO SPI 92–266), the NWO Gravitation Netherlands Organ-on-Chip Initiative (024.003.001), the Stiftelsen Kristian Gerhard Jebsen foundation (Norway), and the RuG investment agenda grant Personalized Health.
The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors declare that they have no competing interests.
- 1.Cobián Güemes AG, Youle M, Cantú VA, Felts B, Nulton J, Rohwer F. Viruses as winners in the game of life. Annu Rev Virol. 2016;3:197–214. https://doi.org/10.1146/annurev-virology-100114-054952.CrossRefPubMedGoogle Scholar
- 12.Ackermann HW, DuBow MS. Viruses of prokaryotes vol. 1. General properties of bacteriophages. Boca Raton: CRC Press; 1987.Google Scholar
- 24.ICTV. Introduction to the ICTV Online Report, Virus Properties. https://talk.ictvonline.org/ictv-reports/ictv_online_report/introduction/w/introduction-to-the-ictv-online-report/418/virus-properties. Accessed 15 Jul 2019.
- 25.Gregory AC, Zablocki O, Howell A, Bolduc B, Sullivan MB. The human gut virome database. bioRxiv. 2019:655910. https://doi.org/10.1101/655910.
- 29.Shkoporov AN, Khokhlova EV, Fitzgerald CB, Stockdale SR, Draper LA, Ross RP, et al. ΦCrAss001 represents the most abundant bacteriophage family in the human gut and infects Bacteroides intestinalis. Nat Commun. 2018;9:4781. https://doi.org/10.1038/s41467-018-07225-7.CrossRefPubMedPubMedCentralGoogle Scholar
- 30.Castro-Mejía JL, Muhammed MK, Kot W, Neve H, Franz CMAP, Hansen LH, et al. Optimizing protocols for extraction of bacteriophages prior to metagenomic analyses of phage communities in the human gut. Microbiome. 2015;3:64. https://doi.org/10.1186/s40168-015-0131-4.CrossRefPubMedPubMedCentralGoogle Scholar
- 31.EC 50, Washington, DC J 2018; E ratification F 2019 (MSL #34). ICTV Taxonomy Release. 2018. https://talk.ictvonline.org/taxonomy/p/taxonomy_releases. Accessed 11 Jul 2019.
- 40.Shkoporov AN, Clooney AG, Sutton TDS, Ryan FJ, Daly KM, Nolan JA, et al. The human gut virome is highly diverse, stable and individual-specific. bioRxiv. 2019:657528. https://doi.org/10.1101/657528.
- 45.Frank DN, St. Amand AL, Feldman RA, Boedeker EC, Harpaz N, Pace NR. Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc Natl Acad Sci U S A. 2007;104:13780–5. https://doi.org/10.1073/pnas.0706625104.CrossRefPubMedPubMedCentralGoogle Scholar
- 54.Monaco CL, Gootenberg DB, Zhao G, Handley SA, Ghebremichael MS, Lim ES, et al. Altered virome and bacterial microbiome in human immunodeficiency virus-associated acquired immunodeficiency syndrome. Cell Host Microbe. 2016;19:311–22. https://doi.org/10.1016/j.chom.2016.02.011.CrossRefPubMedPubMedCentralGoogle Scholar
- 59.Cornuault JK, Petit M-A, Mariadassou M, Benevides L, Moncaut E, Langella P, et al. Phages infecting Faecalibacterium prausnitzii belong to novel viral genera that help to decipher intestinal viromes. Microbiome. 2018;6:65. https://doi.org/10.1186/s40168-018-0452-1.CrossRefPubMedPubMedCentralGoogle Scholar
- 63.Kau AL, Planer JD, Liu J, Rao S, Yatsunenko T, Trehan I, et al. Functional characterization of IgA-targeted bacterial taxa from undernourished Malawian children that produce diet-dependent enteropathy. Sci Transl Med. 2015;7:276ra24. https://doi.org/10.1126/scitranslmed.aaa4877.CrossRefPubMedPubMedCentralGoogle Scholar
- 80.Conceição-Neto N, Zeller M, Lefrère H, De Bruyn P, Beller L, Deboutte W, et al. Modular approach to customise sample preparation procedures for viral metagenomics: a reproducible protocol for virome analysis. Sci Rep. 2015;5:16532. https://doi.org/10.1038/srep16532.CrossRefPubMedPubMedCentralGoogle Scholar
- 83.Parras-Moltó M, Rodríguez-Galet A, Suárez-Rodríguez P, López-Bueno A. Evaluation of bias induced by viral enrichment and random amplification protocols in metagenomic surveys of saliva DNA viruses. Microbiome. 2018;6:119. https://doi.org/10.1186/s40168-018-0507-3.CrossRefPubMedPubMedCentralGoogle Scholar
- 93.Alves JMP, de Oliveira AL, Sandberg TOM, Moreno-Gallego JL, de Toledo MAF, de Moura EMM, et al. GenSeed-HMM: a tool for progressive assembly using profile HMMs as seeds and its application in alpavirinae viral discovery from metagenomic data. Front Microbiol. 2016;7:269. https://doi.org/10.3389/fmicb.2016.00269.CrossRefPubMedPubMedCentralGoogle Scholar
- 103.Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F, et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell. 2019;176:649–62.e20. https://doi.org/10.1016/j.cell.2019.01.001.CrossRefPubMedPubMedCentralGoogle Scholar
- 122.Siddell SG, Walker PJ, Lefkowitz EJ, Mushegian AR, Adams MJ, Dutilh BE, et al. Additional changes to taxonomy ratified in a special vote by the international committee on taxonomy of viruses (October 2018). Arch Virol. 2019;164:943–6. https://doi.org/10.1007/s00705-018-04136-2.CrossRefPubMedGoogle Scholar
- 123.Wolf Y, Krupovic M, Zhang YZ, Maes P, Dolja V, Koonin EV, et al. Proposal 2017.016 M.A.v2. Megataxonomy of negative-sense RNA viruses. 2018. https://talk.ictvonline.org/ICTV/proposals/2017.006M.R.Negarnaviricota.zip. Accessed 11 Jul 2019 (Correspondence: firstname.lastname@example.org).Google Scholar
- 126.Hannigan GD, Meisel JS, Tyldsley AS, Zheng Q, Hodkinson BP, SanMiguel AJ, et al. The human skin double-stranded DNA virome: topographical and temporal diversity, genetic enrichment, and dynamic associations with the host microbiome. MBio. 2015;6:e01578–15. https://doi.org/10.1128/mBio.01578-15.CrossRefPubMedPubMedCentralGoogle Scholar
- 134.Warwick-Dugdale J, Solonenko N, Moore K, Chittick L, Gregory AC, Allen MJ, et al. Long-read viral metagenomics captures abundant and microdiverse viral populations and their niche-defining genomic islands. PeerJ. 2019;7:e6800. https://doi.org/10.7717/peerj.6800.CrossRefPubMedPubMedCentralGoogle Scholar
- 139.Graham EB, Paez-Espino D, Brislawn C, Hofmockel KS, Wu R, Kyrpides NC, et al. Untapped viral diversity in global soil metagenomes. bioRxiv. 2019:583997. https://doi.org/10.1101/583997.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.