Background

Beneficial associations between eukaryotic hosts and bacterial partners are ubiquitous, but how these persist stably over evolutionary time remains a source of debate [1,2,3]. One of the factors that plays a central role in maintaining beneficial symbioses is partner fidelity, defined as the stability of the association between a host and its symbiont over multiple host generations [4] (see Table 1). In associations with strict fidelity, genetic variants of hosts and symbionts show phylogenetic concordance. Strict fidelity is favoured in associations in which the symbionts are transmitted vertically, that is directly from hosts to their offspring. However, fidelity in vertically transmitted symbioses can be disrupted by host switching, symbiont displacement, acquisition of novel symbionts from free-living microorganisms and symbiont loss. Fidelity is generally weaker in associations with horizontal symbiont transmission in which the symbionts are acquired from free-living microbial populations or co-occurring hosts [4, 5]. However, strong fidelity can also occur in symbioses with horizontal transmission if genotype-dependent partner choice ensures the faithfulness of the association [4].

Table 1 Definition of key terms used in this study: As many of the terms below are used inconsistently in the literature, we explain here how we interpret them.

Partner fidelity has been well studied in obligate associations with only one or a few symbionts, such as aphids [12,13,14], tsetse flies [15], Riftia tube worms [16], Vesicomyidae clams [17,18,19], Solemya clams [20] and the Hawaiian bobtail squid [21]. However, as the number and diversity of symbiont species in a host increase, analysing partner fidelity over multiple host generations and across hundreds to thousands of microbial species and strains that often evolve rapidly and are in continuous flux is highly challenging [22, 23]. In hosts with highly diverse microbiota, such as sponges [24,25,26], ascidians [26], corals [26, 27], some insects [28,29,30] and mammals [31,32,33], phylosymbiosis is therefore the approach most often used (see Table 1). In phylosymbiosis studies, analyses of microbial metagenomes, microbial marker genes, or parts of them, like the 16S rRNA gene, are used to examine if the composition of a given group of host-associated microbiomes reflects the phylogeny of these hosts [28, 29, 31, 34,35,36]. However, phylosymbiosis is different from partner fidelity. In phylosymbiosis studies, the similarity of microbial communities across hosts is analysed, while in fidelity studies, the coinheritance of symbiont and host genotypes is investigated (see Table 1). By characterizing patterns of host-symbiont coinheritance, partner fidelity can provide more detailed insights into the ecological and evolutionary processes that affect the acquisition and persistence of each single member of complex symbiont communities.

Previous studies of partner fidelity in associations with complex symbiont communities have focussed on terrestrial animals, such as insects [37,38,39,40,41] and humans [42,43,44,45], but the little that is known about partner fidelity in marine animals with a diverse and species-rich microbiome largely stems from studies on sponges and corals (e.g. [46, 47]). Importantly, almost nothing is known about partner fidelity at microevolutionary time scales in wild animals, as past studies on both terrestrial and aquatic animals with species-rich microbiomes have investigated fidelity on larger evolutionary time scales, namely, across populations, species and geographic regions. Microevolutionary studies on single host populations are, however, important because natural selection acts most directly on single populations. Microevolutionary studies can isolate ecological and evolutionary factors that affect the stability and persistence of symbiotic communities from other larger-scale factors such as geographic distance. Moreover, such short time scales matter because strict partner fidelity decreases over evolutionary time in most symbiotic associations [5, 48] and because opportunities for host switching, symbiont displacement and uptake of new symbiont genotypes from the environment increase with time. Furthermore, deleterious genome decay in vertically transmitted symbionts increases with time, making it advantageous for hosts to acquire new symbionts horizontally from the environment or other hosts [49, 50]. In summary, assumptions about partner fidelity based on macroevolutionary scales may not reflect ecologically relevant, microevolutionary interactions between hosts and their microbial communities.

Here we investigated microevolutionary partner fidelity in the marine annelid Olavius algarvensis (Fig. 1a). This gutless marine worm is an excellent model for investigating partner fidelity in hosts with multimember symbiont communities, as it lives in an intimate, beneficial association with six to seven symbiont species. Its microbiota is therefore complex and diverse, yet simple enough for investigating the fidelity of each single symbiont species. All gutless annelid species (Clitellata, Naididae, Phallodrilinae, genera Olavius and Inanidrilus; sensu Erséus et al. [51]) are regularly associated with at least three to seven symbiont species from different genera and phyla that co-occur within single host individuals [52,53,54,55,56,57,58,59]. All symbionts are harboured in an extracellular region immediately under the outer cuticle of the host (Fig. 1b). Over the estimated 50 million years these hosts have evolved from their gut-bearing ancestors [60], they have become so fully dependent on their symbionts for both nutrition and recycling of their waste compounds that they no longer have a mouth, digestive tract and excretory system [52,53,54, 61]. Symbiont transmission occurs vertically through smearing when the parents deposit their eggs in the sediment, based on transmission electron microscopy and fluorescence in situ hybridization (FISH) studies of three host species (Inanidrilus leukodermatus, Olavius planus and Olavius algarvensis) [59, 62,63,64]. However, given the morphological similarity of several members of the bacterial symbiont community, these studies could not resolve if all symbionts are inherited through egg smearing or if some are acquired horizontally from the environment. Furthermore, there is evidence for host switching and displacement in the dominant, sulphur-oxidizing symbiont Candidatus Thiosymbion over longer evolutionary time [65, 66].

Fig. 1
figure 1

The O. algarvensis population in two bays off the island of Elba was dominated by two mitochondrial haplotypes. a Light microscopy image of Olavius algarvensis. b Fluorescence in situ hybridization image of an O. algarvensis cross section, highlighting the symbionts just below the cuticle of the host (gammaproteobacterial symbionts in green and deltaproteobacterial symbionts in red, using general probes for these two phyla). Reproduced with permission from Kleiner et al. [67]. c and d Location of the two collection sites, Sant’ Andrea and Cavoli, two bays off the island of Elba in the Mediterranean. e Haplotype network of mitochondrial cytochrome c oxidase subunit 1 (COI) gene sequences of O. algarvensis individuals from the two collection sites. The two dominant COI haplotypes A and B co-occurred in both bays. The size of the pie charts corresponds to COI haplotype frequencies. Hatch marks correspond to the number of point mutations between COI haplotypes. Nodes depicted by small red points indicate unobserved intermediates predicted by the algorithm in the haplotype network software. The number of individuals identified as COI haplotype A or B in each bay is in parentheses in the box below the network.

In the best-studied gutless marine annelid, O. algarvensis, seven symbiont species have been identified — two sulphur-oxidizing Gammaproteobacteria: Ca. Thiosymbion and Gamma3; four sulphate-reducing Deltaproteobacteria: Delta1a, Delta1b, Delta3 and Delta4; and a spirochete [52,53,54,55, 68, 69]. Of these seven symbionts, not all consistently co-occur in all host individuals, but all hosts harbour the sulphur-oxidizing symbiont Ca. Thiosymbion, the sulphate-reducing Delta1a or Delta1b symbiont and the spirochete [55]. The gammaproteobacterial, and possibly the deltaproteobacterial symbionts, autotrophically fix CO2 and engage in syntrophic cycling of oxidized and reduced sulphur compounds [52, 53, 67]. Nutrient transfer to the host occurs via phagolysosomal digestion of the symbionts in the epidermal cells underneath the symbiont layer [70, 71]. Evidence from genomic, proteomic and stable isotope analyses indicates resource partitioning, with different symbiont species favouring different energy and carbon sources [53, 54, 61, 70]. This metabolic niche differentiation among these co-occurring symbionts, together with the variability in their abundances across host individuals, indicates different levels of selective pressure, which could be reflected in varying degrees of partner fidelity [72].

In this study, we examined partner fidelity in O. algarvensis collected off the island of Elba, Italy, by analysing single-nucleotide polymorphisms (SNPs) across 80 metagenomes. For the hosts, we analysed the phylogeny of their mitochondrial genomes as indicators of vertical symbiont transmission. We then compared mitochondrial phylogeny with that of each symbiont to identify levels of congruence, and correspondingly fidelity, within the microbial consortium. We used a low-coverage sequencing approach for non-model organisms that has not yet been applied to host-microbe associations [73, 74], allowing the analysis of the entire microbial consortium, including low-abundance symbionts.

Results

Two mitochondrial haplotypes dominated the host population

To identify mitochondrial genetic diversity in the O. algarvensis population, we sequenced the mitochondrial cytochrome c oxidase subunit I (COI) gene of 579 O. algarvensis individuals collected over 4 years from two bays approximately 16 km apart on the island of Elba, Italy (Sant’ Andrea and Cavoli; Fig. 1 c and d). A haplotype network, based on 579 COI sequences of 525 bp, revealed two dominant mitochondrial COI haplotypes, here termed A and B, which co-occurred at both locations and were distinct from each other by five nucleotides (Fig. 1e). We sequenced the metagenomes of 20 individuals from the two dominant A and B haplotypes from both locations (in total 80 metagenomes) and assembled complete circular genomes of host mitochondria (mtDNA) from an arbitrarily selected metagenome for A and B haplotypes. The mtDNAs of A and B haplotypes shared 99.3% average nucleotide identity (ANI) and encoded 13 protein-coding genes, 2 ribosomal RNAs and 21 transfer RNAs (A = 15,715 bp and B = 15,730 bp). The close phylogenetic relatedness between these two host mitochondrial haplotypes that co-occurred in both bays allowed us to examine the effects of both genetics and geographic location on partner fidelity in O. algarvensis.

Intraspecific symbiont diversity was low across and within host individuals

We assembled metagenome-assembled genomes (MAGs) for each of the seven symbionts in O. algarvensis and used them as references for all further analyses (Supplementary Table S1; Supplementary Figs. S1 and S2). To ensure that each of the seven symbionts belonged to the same species across all O. algarvensis individuals, we examined average nucleotide identity (ANI) of all MAGs that could be recovered from each symbiont species (Supplementary Fig. S3). MAGs from the same species shared more than 95% pairwise ANI, and their 16S rRNA genes were also more than 99% identical (Supplementary Figs. S3 and S4, the only exception was one Ca. Thiosymbion pair that had a slightly lower ANI of 94.7%). These sequence similarities are widely accepted as thresholds for identifying bacterial species (95% ANI for MAGs [75,76,77] and 98.5 or 99% for 16S rRNA genes [78, 79]).

Symbiont strain diversity within single host individuals was low based on SNP densities in the seven symbiont genomes (0.02–0.49 SNP/kbp; Supplementary Table S2b). These values are lower than or comparable to SNP densities reported for endosymbionts transmitted vertically with repeated events of horizontal transmission, such as the shallow water bivalve Solemya velum and the deep-sea scaly-foot snail Chrysomallon squamiferum (0.1–1 SNP/kbp and 0.004–3.5 SNP/kbp, respectively) [20, 80], and considerably lower than SNP densities in horizontally transmitted symbionts of the giant tubeworm Riftia pachyptila (2.9 SNP/kbp) [81] and deep-sea Bathymodiolus mussels (5–11 SNP/kbp) [82]. Since strain diversity was low within host individuals, we treated each symbiont within an O. algarvensis individual as a single genotype in the analyses described below.

Sulphur-oxidizing, sulphate-reducing and spirochete symbionts were found in all host individuals

We assessed the relative abundance of symbionts within each of the 80 O. algarvensis individuals by quantifying sequencing read abundances for single-copy genes specific to each symbiont species (Fig. 2; Supplementary text 1.1; 162 to 431 single-copy genes per symbiont species). The sulphur-oxidizing symbionts, Ca. Thiosymbion and Gamma3, as well as the spirochete, were present in all individuals (Fig. 2a). All host individuals also always had sulphate-reducing symbionts (Delta1a, Delta1b, Delta3 and Delta4), but these varied across host individuals, and no individuals hosted all of them. Delta3 was detected in only six host individuals, making statistical tests meaningless and therefore excluded from subsequent phylogenetic analyses.

Fig. 2
figure 2

The composition of the symbiont community in 80 O. algarvensis individuals. a The number of O. algarvensis host individuals from two mitochondrial lineages (COI haplotypes A and B) and two locations (Sant’ Andrea and Cavoli) in which the respective symbiont species was detected (n = 80 in total; 20 replicates per location and COI haplotype; Supplementary text 1.1). b Relative read abundances of symbionts in the 80 O. algarvensis individuals. Each column shows the reads from a single host individual. The sulphur-oxidizing symbionts Ca. Thiosymbion (Ca. Thiosym.) and Gamma3 were the most abundant across host individuals, while the abundances of the sulphate-reducing symbionts (Delta1a, Delta1b, Delta3, Delta4) and the spirochete symbiont (Spiro) were consistently lower. Relative abundances of each symbiont were estimated based on metagenomic sequencing reads that mapped to the single-copy genes of each symbiont (Supplementary text 1.1). Relative symbiont abundances based on 16S rRNA gene sequences in the metagenomes were similar (Supplementary text 1.2; Supplementary Fig. S5)

Based on relative read abundances for single-copy genes, Ca. Thiosymbion was the most abundant symbiont across host individuals (41.6 ± 11.6%; mean ± standard deviation; Fig. 2b), while the abundance of Gamma3 symbiont reads varied considerably in host individuals from 0.1 to 61.2% (28.3 ± 11.3%). The summed relative abundances of the sulphur-oxidizing gammaproteobacteria (69.9 ± 7.4%) and sulphate-reducing deltaproteobacteria (26.7 ± 7.2%) showed consistent ratios across the 80 host individuals, regardless of the location, host COI haplotype, or the combination of these two factors (Supplementary Table S3). Read abundances of the spirochete symbiont were consistently low in host individuals (3.4 ± 2.7%), with one exception of 17.3%.

Probabilistic SNP identification increased the number of hosts and SNP sites for phylogenetic reconstruction of low-abundance symbionts

Conventionally, SNPs in symbiont genomes are identified using a deterministic genotype-calling approach that requires a minimum read coverage (e.g. at least fivefold [83]; Supplementary text 1.4). In our dataset, this coverage requirement excluded symbionts that occurred at low relative abundances (Supplementary Table 4a). In addition, the deterministic genotype calling limits the number of available SNP sites for phylogenetic inference, in our case to a maximum symbiont SNP density of 0.102 SNP/kbp (ranging from 0.006 to 0.102 SNP/kbp; Supplementary Table 4b). To circumvent these limitations, we calculated genotype probabilities and inferred genetic distances from the probability matrices for phylogenetic reconstruction. This allowed us to reconstruct phylogenies of symbionts from more O. algarvensis individuals, up to twice as many individuals than using the deterministic approach (between 3 and 131% increase; Supplementary Table S4a). Moreover, the probabilistic approach increased the robustness of our phylogenetic analyses, as it led to a substantial increase in SNP sites (ranging from 0.035 to 0.752 SNP/kbp; Supplementary Table S4b; see Supplementary text 1.4). Our comparison of phylogenies based on deterministic and probabilistic SNP identification showed consistent phylogenetic clustering for the samples that could be analysed using both methods (Fig. 3; Supplementary Fig. S6; Supplementary text 1.4). Therefore, we based our analyses on the probabilistic approach that allowed us to (i) include significantly more samples and (ii) identify more SNP sites to infer more robust phylogenies (Supplementary Table S4).

Fig. 3
figure 3

Comparative phylogenetic analyses of O. algarvensis and its microbial consortium members revealed variable patterns of congruence across the six symbionts. Phylogenetic trees based on SNPs across genomes of a host mitochondria (166 SNPs, n = 80), Ca. Thiosymbion (2872 SNPs, n = 80), c Gamma3 symbiont (618 SNPs, n = 80), d Delta1a symbiont (375 SNPs, n =37), e Delta1b symbiont (624 SNPs, n = 46), f Delta4 symbiont (675 SNPs, n = 67) and g spirochete symbiont (88 SNPs, n = 41). Phylogenies were inferred from genetic distances calculated from posterior genotype probabilities. Scale bars indicate 0.1 substitution per SNP site. Bootstrap support values > 95% are shown in black circles. Support for branches internal to each coloured leaf was omitted for visibility

Congruence of symbiont and mitochondrial phylogenies varied from high to absent

To examine partner fidelity in the O. algarvensis symbiosis, we compared the phylogenies of the six most widespread symbionts with that of their hosts’ mitochondrial genomes. The degree of congruence between symbiont and host phylogenies reflects the degree of fidelity between partners, with high congruence indicating strong fidelity and vice versa [12, 13, 20]. For the host population, their mtDNA phylogeny revealed a clear divergence between two mitochondrial lineages (termed A- and B-hosts), corresponding to the two COI haplotypes A and B (Fig. 3a). In contrast, the two locations appeared to play a smaller role in shaping mitochondrial phylogeny, with only B-hosts from Sant’ Andrea forming a well-supported clade. However, the phylogenetic relationships within the mitochondrial lineages could not be fully resolved due to their limited genetic divergence (Supplementary Fig. S7).

For the six symbionts of O. algarvensis, we found marked differences in the congruence of their phylogenies with that of their hosts’ mitochondrial genomes (Fig. 3 b–g). Congruence was highest in Ca. Thiosymbion, which mirrored the mitochondrial phylogeny of their hosts, with symbionts from A- and B-hosts falling into two separate clades (Fig. 3b). The congruence between the two mitochondrial lineages and the two Ca. Thiosymbion clades was 100% at both locations. That is, O. algarvensis A-hosts always had Ca. Thiosymbion A, and O. algarvensis B-hosts always had Ca. Thiosymbion B, without any exceptions. The slight differences in clade topology in Fig. 3 a and b are due to the lack of sufficient genetic divergence between individual mitochondrial haplotypes and Ca. Thiosymbion genotypes (Supplementary Fig. S7b). The Gamma3 symbionts formed nine groups, most of which were statistically supported clades, with each group containing symbionts from either A- or B-hosts (Fig. 3c). The exception was a Gamma3 symbiont of a B-host from Cavoli that fell in a clade of A-host symbionts from Cavoli (magnified panel in Fig. 3c). Location also appeared to affect the phylogeny of Gamma3 symbionts, with symbionts from the same bay forming distinct groups.

For all other symbionts besides the Ca. Thiosymbion and Gamma3, there was no phylogenetic divergence between symbionts from A- and B-hosts (Fig. 3 d–g). Location affected the phylogeny of the Delta4 symbionts, with two well-supported clades separating symbionts from Cavoli and Sant’ Andrea (Fig. 3f). The Delta1b symbionts also clustered based on their location, but the clades were not statistically supported (Fig. 3e). For the Delta1a and spirochete, neither host mitochondrial lineage nor location affected their phylogenetic clustering (Fig. 3 d–e, g).

We further examined congruence between the phylogenies of O. algarvensis mitochondria and its symbionts using two additional approaches. First, we compared the pairwise genetic distances of hosts and symbionts using three categories: within A- or B-hosts from the same location (within), between A- and B-hosts (mito), and between the two locations Sant’ Andrea and Cavoli (location). We tested if genetic distances were explained by host mitochondrial lineage or by location, by analysing pairwise genetic distances in ‘within’ vs. ‘mito’ and ‘within’ vs. ‘location’ (Fig. 4, Table 2, Supplementary Table S5). For the host, both the mitochondrial lineage and the location had a significant effect on genetic distances, as observed in our phylogenetic SNP analyses. For the symbionts, there was a significant effect of the mitochondrial lineage on genetic distances in Ca. Thiosymbion and Gamma3 symbionts, while the effect of location was well supported for the Gamma3 and Delta4 symbionts, again confirming our phylogenetic analyses. An effect of location on genetic distances was also significant for the Delta1b symbionts, but only for B-host symbionts, as they were not detected in A-hosts from Sant’ Andrea.

Fig. 4
figure 4

Host mitochondrial lineage and geographic location had a significant effect on the genetic divergence of some but not all symbionts. Host mitochondrial lineages explained genetic divergence in Ca. Thiosymbion and Gamma3, while geographic location explained divergence in the Gamma3, Delta1b and Delta4 symbionts. Pairwise genetic distances in O. algarvensis mitochondrial genomes and symbionts were calculated from pairs of O. algarvensis individuals within the same combination of host lineage (A- or B-host) and location (“Within”), between individuals of A- and B-hosts from the same location (“Between A- and B-hosts”) and between individuals from the two locations, Sant’ Andrea and Cavoli, but from the same host lineage (“Between locations”). Pairwise genetic distances were compared among these three categories for a mitochondria, Ca. Thiosymbion, c Gamma3 symbiont, d Delta1a symbiont, e Delta1b symbiont, f Delta4 symbiont and g spirochete symbiont. Genetic distances were normalized per SNP site and log scaled. Thick horizontal lines and grey boxes respectively indicate the median and interquartile range (IQR) of observations. Vertical lines show the IQR ± 1.5 IQR range, and outliers out of this range are shown as circles. Numbers in brackets indicate numbers of pairwise comparisons per category tested. Asterisks respectively denote statistical significance (*p < 0.05, **p < 0.01, see Supplementary Table S5). Orange and blue brackets highlight a significant effect on genetic divergence by the mitochondrial lineage and location, respectively. “N.S.” indicates no significant differences among categories (p > 0.05)

Table 2 Levels of partner fidelity between O. algarvensis and its symbiotic consortia based on analyses in this study

We next examined whether the genetic distance of mtDNA between pairs of O. algarvensis individuals corresponded to that of their symbionts. To test for this pattern suggesting genetic co-divergence, correlations of the pairwise genetic distances were calculated between mtDNA and each of the six symbionts (Supplementary Fig. S8, Supplementary Table S6a). The genetic distances of the Ca. Thiosymbion symbionts had the strongest positive correlation with mtDNA genetic distances (Table 2, Supplementary Fig. S8a, Supplementary Table S6a). For the Gamma3, Delta1a, Delta1b and Delta4 symbionts, we only observed weak positive correlations (Supplementary Fig. S8 b–e). Genetic co-divergence between the spirochete symbionts and host mitochondria was not detectable (Supplementary Fig. S8f, Supplementary Table S6a). Correlation coefficients (Mantel’s R) calculated above for the six symbionts showed a positive correlation with the symbionts’ relative read abundances (Fig. 2b; Supplementary Table S6b). In other words, the higher the relative abundance of a symbiont species in host individuals, the greater the degree of genetic co-divergence with O. algarvensis.

Discussion

Our analyses revealed that fidelity between the gutless annelid O. algarvensis and its endosymbiotic microbial consortium varied from strict to absent. This variability in partner fidelity likely occurred over a short, microevolutionary period, as we analysed individuals within a population of O. algarvensis from two very closely related mitochondrial lineages (0.7% divergence) that co-occurred in two bays separated by only 16 km. Our study highlights the importance of examining fidelity over microevolutionary timescales, as it was central to revealing the broad range of fidelity across the members of the O. algarvensis symbiont community from strict over intermediate to absent. Over macroevolutionary timescales, strict fidelity is disrupted in Ca. Thiosymbion (see below), and it is unlikely that we would have detected the strong to moderate levels of fidelity in other members of the O. algarvensis symbiont community over longer evolutionary time.

Varying degrees of partner fidelity indicate a spectrum of mixed modes between vertical and horizontal transmission

Different degrees of partner fidelity across the microbial consortium of O. algarvensis reflect the faithfulness with which the symbionts are transmitted from one generation to the next. The symbionts of marine gutless annelids are transmitted vertically through egg smearing, during which the egg passes through symbiont-rich tissues termed genital pads [62,63,64]. The egg is then encased in a cocoon and deposited in the sediment. This process offers opportunities for horizontal transmission of bacteria from the surrounding sediment or co-occurring hosts. Indeed, previous studies have shown that over macroevolutionary time, host switching and displacement disrupt fidelity in Ca. Thiosymbion, based on comparative phylogenetic analyses of ribosomal genes in 23 gutless annelid species [65, 66]. At microevolutionary time scales, however, our analyses of O. algarvensis revealed that transmission is strictly vertical for Ca. Thiosymbion, given the strong phylogenetic congruence and genetic co-divergence between these symbionts and their hosts’ mitochondria, independent of their collection site (Supplementary text 1.6). We also observed strong fidelity in the Gamma3 symbiont, with only a single switching event from an A- to a B-host in Cavoli (Fig. 3c). Interestingly, such strong fidelity was not observed in the mono-specific association between the marine clam Solemya velum and its vertically transmitted sulphur-oxidizing symbionts. In these clams, repeated uptake of symbionts from the environment or contemporary hosts occurs within single host populations [20]. This indicates a lesser degree of fidelity between S. velum and its symbionts than between O. algarvensis and its sulphur-oxidizing symbionts, despite the fact that the clam symbionts are likely transmitted via the ovaries versus the presumed less-restrictive mode of egg smearing in O. algarvensis.

Partner fidelity was intermediate to absent for the deltaproteobacterial and spirochete symbionts of O. algarvensis. It is therefore likely that these symbionts are regularly acquired horizontally from a free-living population or other co-occurring host individuals. For the Gamma3, Delta1b and Delta4 symbionts, we also observed an effect of their geographic location on their genetic distances (Fig. 4, Table 2). Explanations for this effect include differences in the geographic distribution of genotypes for these symbionts in the environment, exchange of symbionts between contemporary hosts and isolation-by-distance effects on partner choice, but cannot be resolved without additional in-depth analyses of the free-living symbiont population (Supplementary text 1.6).

Combinations of vertical and horizontal transmission modes in co-occurring symbionts have been shown previously, for example in humans [42, 43, 45], sponges [46] and corals [47]. However, these observations were based on comparisons of microbiota in parents and their immediate offspring, and did not examine partner fidelity over multiple generations. As discussed above, time scales matter, because partner fidelity decreases over evolutionary time in most symbiotic associations [5, 48]. In other words, for any given symbiont that is transmitted vertically across one or a few generations, over longer evolutionary time, horizontal transmission events are likely to disrupt partner fidelity [49, 50]. To our knowledge, the wide range of vertical and horizontal transmission modes of the O. algarvensis co-occurring symbionts has not been previously shown at fine microevolutionary time scales.

How can we explain such different degrees of partner fidelity in the O. algarvensis symbiosis?

Strong partner fidelity is widespread in associations in which the symbionts are critical for the host’s survival and fitness [2, 84]. In O. algarvensis, the relative abundance of symbionts was positively correlated with fidelity, with the highest fidelity detected for their sulphur-oxidizing symbionts Ca. Thiosymbion, followed by Gamma3 (42% and 28% relative read abundance, respectively; Fig. 2b). These symbionts are the primary producers for their gutless hosts by autotrophically fixing CO2 into organic compounds [52, 54, 61], and all O. algarvensis individuals from both bays harboured these symbionts. Given that O. algarvensis gains all of its nutrition by digesting its endosymbionts via endocytosis [70, 71], those symbionts that provide most of its nutrition are likely most strongly selected for, as they most directly affect the fitness of their host.

In contrast, selective pressures on maintaining the symbiotic association are likely relaxed for symbionts that are less critical for their host’s fitness or are functionally redundant. The deltaproteobacterial, sulphate-reducing symbionts play an important role by producing reduced sulphur compounds as energy sources for the sulphur-oxidizing bacteria, particularly when these compounds are limiting in the sediment environment [52]. However, all four deltaproteobacterial symbionts reduce sulphate to sulphide [52,53,54, 68, 69], making this trait functionally redundant among these symbionts. Correspondingly, the presence and abundance of the four sulphate-reducing symbionts varied across host individuals, with different combinations of one to three of these symbionts in each host. This pick and mix pattern indicates that O. algarvensis fitness is not dependent on any particular deltaproteobacterial symbiont species but on the presence of functional roles that they share. The lower relative abundance of reads from deltaproteobacterial symbionts (summed average 27%) compared to those of the sulphur oxidizers (70%), as well as their limited biomass [54, 55], further indicates that the deltaproteobacterial symbionts are not as critical for their host’s nutrition as Ca. Thiosymbion and Gamma3.

The absence of fidelity in the spirochete symbionts was unexpected, given their regular presence in O. algarvensis individuals and in other gutless annelids from around the world [56, 85] (Supplementary Fig. S2). While the metabolism of this symbiont is not yet known, it was by far the least abundant symbiont in O. algarvensis in this study (3% relative read abundance) as well as previous ones [53,54,55, 67, 70], indicating a limited role of these symbionts in their hosts’ nutrition.

In addition to selective forces, less stringent fidelity in low-abundance symbionts could also be caused by stochastic processes, for example poorer chances of faithful vertical transmission during egg smearing. Assuming that the success with which a symbiont is transmitted to the egg depends on its abundance in the parent, symbionts present in low abundance might face greater chances of not being transmitted than high abundance symbionts. Horizontal reacquisition of these symbionts from the sediment or co-occurring hosts would then explain the disruption of genetic congruence between these symbionts and O. algarvensis.

Having your cake and eating it too: the advantage of flexibility in partner fidelity

The lack of stringent partner fidelity between O. algarvensis and its deltaproteobacterial and spirochete symbionts indicates that these hosts regularly acquire novel intraspecific genotypes from the environment or from co-occurring host individuals. These newly acquired intraspecific genotypes, together with the interspecific variability in the deltaproteobacterial community across host individuals, could expand the ecological niche of O. algarvensis and enable the population to adapt to fluctuating environments. For example, new symbiont genotypes could be better adapted to the environment, provide greater flexibility in the use of resources from the environment and enable resilience to seasonal and long-term temperature changes [5, 86,87,88]. On the other hand, weak partner fidelity can be costly for the host, including the failure to find suitable symbionts, acquisition of harmful bacteria, or association with ‘cheaters’ that do not provide mutualistic benefits [89]. However, these costs are likely to be minimal for the deltaproteobacterial symbionts, as their partner quality may depend largely on their ability to produce reduced sulphur compounds, a trait that is critically linked to the energy metabolism of these sulphate-reducing bacteria [52,53,54], making cheating in this trait unviable for the sulphate-reducing bacteria.

For O. algarvensis, fluctuating environmental conditions may be more critical selective forces than the costs of weak partner fidelity. O. algarvensis, like other marine annelids, does not have a pelagic life stage and can likely not disperse as widely as many other infaunal invertebrates [63]. These hosts therefore face the risk of local extinction if environmental conditions become unsuitable for their symbionts. Free-living bacteria rapidly adapt to new environmental conditions [90], and horizontal acquisition of symbionts from the environment can increase the host’s potential to adapt to environmental challenges rapidly [91]. The benefit of long-term survival via an evolutionary bet hedging [92, 93] may therefore outweigh the cost of recruiting less-favourable symbionts, as predicted in a recent theoretical study modelling the selective advantages of imperfect vertical transmission of symbionts in variable environments [72].

Probabilistic approaches to SNP identification expand ecological and evolutionary studies of microbial communities

Metagenomic analyses of intraspecific genetic variability in microbial communities typically rely on deep sequencing of each genotype to obtain sufficient SNPs across their genomes [77], incurring substantial sequencing costs. In this study, we reconstructed phylogenies of symbionts directly from genotype probabilities across their genomes, rather than calling genotypes. This probabilistic approach accurately identifies SNPs from low-coverage sequencing data by accounting for the uncertainties of genotyping [73]. This approach has several advantages, including (i) lower sequencing costs compared to high-coverage sequencing, meaning that for the same price, better population coverage can be achieved by sequencing more individuals, (ii) the ability to analyse the genetic diversity of low-abundance symbionts that are present in only some hosts and (iii) an increased robustness of phylogenetic analyses through the recovery of higher SNP numbers. Our study highlights how approaches using genotype probabilities can be applied to the population genetics of host-associated microbiota with variable abundances. Furthermore, our results indicate that probabilistic approaches can also be used to study evolutionary dynamics in free-living microbial communities, thus greatly expanding our toolbox for understanding non-cultivable microorganisms.

Conclusions

We showed that partner fidelity varies from strict to absent in the association between the gutless marine annelid O. algarvensis and its microbial consortium. This variability in fidelity was unexpected given that these hosts transmit their symbionts vertically via egg smearing. Our results highlight the importance of examining partner fidelity at microevolutionary scales, as over longer evolutionary time, strict vertical transmission is rare in most symbioses [48, 88]. Understanding the processes that drive fidelity within associations over short to long evolutionary time will help identify the benefits and costs in maintaining symbiotic associations. Such efforts should encompass increasing geographic and taxonomic scales, beginning with local host populations and expanding to regional intraspecific analyses to large-scale global analyses across host species (e.g., [39, 44, 94, 95]). Rapid advances in high-throughput sequencing combined with substantial reductions in sequencing costs using probabilistic SNP calling now make such studies feasible and will contribute to revealing the driving forces that shape the complex and fluid nature of multimember symbioses [87, 96,97,98].

Methods

Specimen collection and host mitochondrial lineage screening

A total of 579 O. algarvensis individuals from two locations on Elba, Italy (Sant’ Andrea; 42°48′31″N/10°08′33″E, and Cavoli, 42°44′05″N/10°11′12″E; Fig. 1 c and d; n = 346 and 233, respectively) were screened for their mitochondrial lineages based on their mitochondrial COI gene sequences (COI haplotypes). O. algarvensis individuals were collected between 2010 and 2016 from sandy sediments in the vicinity of Posidonia oceanica seagrass beds at water depths between 7 and 14 m, as previously described [54]. Live specimens were (i) flash-frozen in liquid nitrogen and stored at −80 °C or (ii) immersed in RNAlater (Thermo Fisher Scientific, Waltham, MA, USA) and stored at 4 °C. DNA was individually extracted from single worms using the DNeasy Blood & Tissue Kit (Qiagen, Hilden) according to the manufacturer’s instructions. A region of 670 bp of the COI gene was amplified with PCR using the primer set of COI-1490F (5′-GGT-CAA-CAA-ATC-ATA-AAG-ATA-TTG-G-3′) and COI-2189R (5′-TAA-ACT-TCA-GGG-TGA-CCA-AAA-AAT-CA-3′) as previously described [99]. PCR-amplicons were sequenced using the BigDye Sanger sequencing kit (Life Technologies, Darmstadt, Germany) with the COI-2189R primer on the Applied Biosystems Hitachi capillary sequencer (Applied Biosystems, Waltham, USA) according to the manufacturer’s instructions. COI sequences were quality filtered with a maximum error rate of 0.5% and aligned using MAFFT v7.45 [100] in Geneious software v11.0.3 (Biomatters, Auckland, New Zealand). A COI haplotype network was built on a 525-bp core alignment using the TCS statistical parsimony algorithm [101] implemented in PopART v1.7 [102]. Twenty O. algarvensis individuals from each ‘host group’ (i.e. the combination of sample location, Sant’ Andrea or Cavoli, and COI haplotype A or B; 4 groups in total) were randomly selected for metagenomic sequencing (n = 80 individuals total).

Metagenome sequencing

Sequencing libraries were constructed from the DNA extracted from single worms using a Tn5 transposase purification and tagmentation protocol [103]. The Tn5 transposase was provided by the Protein Science Facility at Karolinska Institutet SciLifeLab (Solna, Sweden). Quantity and quality of DNA samples were checked with the Quantus Fluorometer with the QuantiFluor dsDNA System (Promega Corporation, Madison, WI, USA), the Agilent TapeStation system with the DNA ScreenTape (Agilent Technologies, Santa Clara, CA, USA) and the FEMTO Pulse genomic DNA analysis kit (Advanced Analytical Technologies Inc., Heidelberg, Germany) prior to library construction. Insert template DNA was size-selected for 400–500 bp using the AMPure XP (Beckman Coulter, Indianapolis, IN, USA). Paired-end 150-bp sequences were generated using the Illumina HiSeq3000 System (Illumina, San Diego, CA, USA) with an average total yield of 2.5 Gbp per sample (2548 ± 715 Mbp (mean ± SD)). Construction and quality control of libraries and sequencing were performed at the Max Planck Genome Centre (Cologne, Germany).

Assembly of reference genomes of endosymbionts and O. algarvensis mitochondrion

Metagenome-assembled genomes (MAGs) of the Ca. Thiosymbion, Gamma3, Delta4, and spirochete symbionts were de-novo assembled from a deeply sequenced metagenome of an O. algarvensis individual available at the European Nucleotide Archive (ENA) project PRJEB28157 (specimen ID: OalgB6SA; approx. 7 Gbp; Supplementary Table S1). In addition, MAGs of Delta1a, Delta1b and Delta3 were obtained from the public database deposited under the ENA project accession number PRJEB28157 [68, 69]. For the de-novo assembly, raw metagenomic reads were first adapter-trimmed and quality-filtered with length ≥ 36 bp and Phred quality score ≥ 2 using bbduk of BBTools v36.86 (http://jgi.doe.gov/data-and-tools/bbtools) and corrected for sequencing errors using BayesHammer [104] implemented in SPAdes v3.9.1 [105]. Clean reads were assembled using MEGAHIT v1.0.6 [106], and symbiont genome bins were identified with MetaBAT v0.26.3 [107]. Bins were assigned to the Ca. Thiosymbion, Gamma3, Delta4, and spirochete symbionts based on 99–100% sequence matches with their reference 16S rRNA gene sequences (NCBI accession numbers AF328856, AJ620496, AJ620497, AJ620502 [52, 55], respectively). MAGs were further refined using Bandage v0.08.1 [108] by identifying and inspecting connected contigs on assembly graphs. Completeness and contamination of the MAGs were estimated with CheckM version 1.0.7 [109], and assembly statistics of symbiont genomes were calculated with QUAST v5.0.2 [110] (Supplementary Table S1). For the assessment of overall genetic divergence of symbionts in the 80 O. algarvensis individuals, MAGs of all symbionts were binned as above, wherever their read coverage allowed. Average nucleotide identity (ANI) of these MAGs was calculated using FastANI v1.33 [75]. To ensure robust ANI comparisons, MAGs that had 25% or less homologous genes than those of the reference genome (Supplementary Table S1) were excluded from the analysis. Heatmaps of pairwise ANIs within each of the symbionts were generated with pheatmap 1.0.12 in R [111].

Complete mitochondrial genomes (mtDNA) were assembled from two metagenomes of O. algarvensis from Sant’ Andrea representing the two COI haplotypes A and B (specimen IDs: ‘OalgSANT_A04’ and ‘OalgSANT_B04,’ respectively). Duplicated sequences due to PCR-amplification during the library preparation were first removed from raw sequences using FastUniq v1.1 [112] prior to adapter clipping and quality trimming with Phred quality score ≥ 2 using Trimmomatic v0.36 [113]. A preliminary mtDNA scaffold was first generated by iterative mapping of the clean reads to a reference COI sequence of O. algarvensis (NCBI accession number KP943854, as a bait sequence) with MITObim v1.9 [114]. Mitochondrial reads were then identified by mapping to the mtDNA scaffold using bbmap in BBTools, and mtDNA was assembled from the identified reads with SPAdes. A circular mtDNA was identified on assembly graphs in Bandage and annotated using MITOS2 webserver (http://mitos2.bioinf.uni-leipzig.de) to confirm the completeness [115].

Characterization of symbiont community composition

The taxonomic composition of the O. algarvensis metagenomes was first screened by identifying 16S rRNA gene sequences using phyloFlash v3.3-beta1 [116] with the SILVA SSU database release 132 [117] as reference. The 16S rRNA gene sequences of O. algarvensis symbionts were assembled with SPAdes implemented in phyloFlash and aligned using MAFFT in Geneious to identify SNP sites. Chimeric and incomplete (< 1100 bp) SSU sequences were identified in the alignment and excluded.

Relative abundances of O. algarvensis symbionts were estimated by mapping metagenome reads to a collection of symbiont-specific sequences of single-copy genes extracted from the genome bins. Orthologous single-copy gene sequences were first identified within each of the reference symbiont MAGs with CheckM. To ensure unambiguous taxon differentiation, duplicated genes detected in each symbiont MAG (i.e. those labelled as ‘contamination’ in CheckM) as well as sequences sharing > 90% nucleotide identity between multiple symbiont species (checked with CD-HIT v4.5.4 [118]; 8 cases identified between the Delta1a and Delta1b symbionts) were removed from the final reference sequences of single-copy genes. Metagenomic reads (quality filtered with the same processes as for the mtDNA assembly above) matching to the singe-copy gene sequences were quantified using Kallisto v0.44.0 [119] (Supplementary text 1.1). Symbiont composition estimates were plotted in R using ggplot2 package v3.2.1 [120].

To examine whether our phylogenetic analysis of each O. algarvensis symbiont reflects a single-dominant genotype per host individual, levels of symbiont genotype diversity within a host individual were assessed by calculating SNP densities (the number of SNP sites per kbp of reference genome) with previously established procedures [82]. This analysis was performed using a set of publicly available deeply sequenced metagenomes of O. algarvensis (listed in Supplementary Table S2a) to ensure that SNP densities were estimated using sufficient symbiont read coverages, and that these estimates could be compared to those in other studies performing similar assessments [20, 80,81,82].

Identification of single-nucleotide polymorphisms and phylogenetic reconstruction

For identification of SNPs in the genomes of symbionts and mitochondria, quality-controlled reads were first unambiguously split into different symbiont species using bbsplit of BBTools, using the reference symbiont MAGs described above. Mitochondrial reads were identified with the reference mtDNA sequence derived from the specimen ‘OalgSANT_A04’ (Sant’ Andrea, COI haplotype A) using bbmap with a minimum nucleotide identity of 95%; this step was performed to remove potential contaminations from sequences of nuclear mitochondrial pseudogenes divergent from the mtDNA [121] while ensuring successful mapping of mtDNA reads in the analysis for both A- and B-hosts given the high nucleotide identity of mtDNAs between these two lineages (> 99%; see the “Results”).

To reconstruct phylogenies of symbionts and mitochondria, SNPs in genomes of symbionts and host mitochondria were identified using two approaches: based on (i) posterior genotype probabilities without genotype calling and (ii) deterministic genotyping only at genetic positions that were deeply sequenced in all samples. For the SNP identification without genotyping (i), the symbiont and mitochondrial reads identified above were mapped onto individual reference genomes of symbionts and mitochondria using bbmap. Mapping files were filtered based on mapping quality using samtools v1.3.1 [122] and BamUtil v1.0.14 [123], deduplicated with MarkDuplicates of Picard Toolkit v2.9.2 (https://github.com/broadinstitute/picard) and realigned around indels with Genome Analysis Toolkit v3.7 [124]. Posterior genotype probabilities were calculated with ANGSD v0.929 [125]. ANGSD is widely used for studies of diploid organisms to infer genotypes from low-coverage sequencing data while taking sequencing errors into account [125,126,127]. To deal with haploid genotypes in ANGSD, all genotypes were assumed to be ‘homozygous’ by setting an inbreeding coefficient F of ‘1’, and a uniform prior were specified for posterior probability calculation (T.S. Korneliussen, 2018, pers. comm.). SNP sites were identified as reference nucleotide positions that were covered ≥ 1 × by all samples and showed statistically significant support (SNP p-value < 0.01). When no SNP site was found due to very low or lacking reads from a symbiont, these samples were excluded based on a cut-off of lateral coverages (i.e. % reference genetic sites covered by reads; Supplementary Table S4). For symbionts and mitochondria, their pairwise genetic distances in host individuals were obtained from the matrix of genotype probabilities using NGSdist v1.0.2 [74]. Phylogenetic trees with bootstrap support were computed from the resulting distant matrix using FastME v2.1.5.1 [128] and RAxML v8.2.11 [129].

For the SNP identification by deterministic genotyping (ii), the same symbiont and mitochondrial reads were analysed with the SNIPPY pipeline v3.2 (https://github.com/tseemann/snippy), with the same reference genomes of mtDNA and symbionts as above (Supplementary text 1.4).

Analyses of phylogenetic patterns of symbionts based on host mitochondrial lineages and locations

To examine which factors drive the patterns of genetic divergence in mitochondria and symbionts, their pairwise genetic distances were statistically compared in 3 categories of host pairs: (a) within the same combination of location plus A- or B-hosts, (b) between A- and B-hosts (A vs. B in Cavoli and A vs. B in Sant’ Andrea) and (c) between locations (Cavoli vs. Sant’ Andrea in A-hosts and Cavoli vs. Sant’ Andrea in B-hosts). Sample pairs in each category were randomly selected without replacement to ensure data independence. For the Delta1b symbionts, we separately compared (a) vs. (b) within Cavoli and (a) vs. (c) within B-hosts, because Delta1b did not occur in A-hosts in Sant’ Andrea. The statistical analyses were performed for each of the symbionts and mitochondria using Kruskal-Wallis rank sum test and Dunn post hoc tests with p-value adjustment by controlling the false-discovery rate, using R core package v3.4.2 [130] and FSA package v0.8.25 [131]. To examine genetic co-divergence patterns between a symbiont and host mtDNA, regressions of pairwise genetic distances estimated with NGSdist were examined using Mantel tests implemented in the R-package vegan v2.4.4 [132]. A correlation between Mantel’s Rs and relative abundances of symbionts based on reads mapped to single-copy genes was tested with Kendall’s rank correlation tau using the function test.cor in R’s core package.