Mosaic composition of ribA and wspB genes flanking the virB8-D4 operon in the Wolbachia supergroup B-strain, wStr

The obligate intracellular bacterium, Wolbachia pipientis (Rickettsiales), is a widespread, vertically transmitted endosymbiont of filarial nematodes and arthropods. In insects, Wolbachia modifies reproduction, and in mosquitoes, infection interferes with replication of arboviruses, bacteria and plasmodia. Development of Wolbachia as a tool to control pest insects will be facilitated by an understanding of molecular events that underlie genetic exchange between Wolbachia strains. Here, we used nucleotide sequence, transcriptional and proteomic analyses to evaluate expression levels and establish the mosaic nature of genes flanking the T4SS virB8-D4 operon from wStr, a supergroup B-strain from a planthopper (Hemiptera) that maintains a robust, persistent infection in an Aedes albopictus mosquito cell line. Based on protein abundance, ribA, which contains promoter elements at the 5′-end of the operon, is weakly expressed. The 3′-end of the operon encodes an intact wspB, which encodes an outer membrane protein and is co-transcribed with the vir genes. WspB and vir proteins are expressed at similar, above average abundance levels. In wStr, both ribA and wspB are mosaics of conserved sequence motifs from Wolbachia supergroup A- and B-strains, and wspB is nearly identical to its homolog from wCobU4-2, an A-strain from weevils (Coleoptera). We describe conserved repeated sequence elements that map within or near pseudogene lesions and transitions between A- and B-strain motifs. These studies contribute to ongoing efforts to explore interactions between Wolbachia and its host cell in an in vitro system. Electronic supplementary material The online version of this article (doi:10.1007/s00203-015-1154-8) contains supplementary material, which is available to authorized users.


Introduction
Wolbachia pipientis (Rickettsiales; Alphaproteobacteria) is an obligate intracellular bacterium that infects filarial nematodes and a wide range of arthropods including ≥60 % of insects and ≈35 % of isopod crustaceans, but does not infect vertebrates (Hilgenboecker et al. 2008). Wolbachia is considered to be a single species classified into clades by multilocus sequence typing and designated as supergroups A to N (Baldo et al. 2006b;Comandatore et al. 2013;Lo et al. 2007). The C-and D-strains that infect filarial worms have phylogenies concordant with those of nematode hosts, consistent with strict vertical transmission as obligate mutualists (Comandatore et al. 2013;Dedeine et al. 2003;Li and Carlow 2012;Strubing et al. 2010;Taylor et al. 2005;Wu et al. 2004). Although arthropod-associated A-and B-strains may provide subtle fitness benefits to hosts (Zug and Hammerstein 2014), they are best known as reproductive parasites, causing phenotypes that maintain or increase Wolbachia infection frequencies, including feminization, parthenogenesis, and cytoplasmic incompatibility (Saridaki and Bourtzis 2010;Werren et al. 2008). Interference with host immune mechanisms and replication of arboviruses, bacteria and malarial plasmodia (Kambris et al. 2009;Pan et al. 2012;Zug and Hammerstein 2014) has encouraged efforts to exploit Wolbachia for biocontrol of arthropod vectors of vertebrate pathogens and/or crop pests (Bourtzis 2008;Rio et al. 2004;Sinkins and Gould 2006;Zabalou et al. 2004). An understanding of molecular differences between A-and B-strains, and how they have been influenced by horizontal transmission and genetic exchange (Newton and Bordenstein 2011;Schuler et al. 2013;Werren et al. 2008;Zug and Hammerstein 2014) will facilitate manipulation of Wolbachia.
Wolbachia's interaction with host cells likely involves the type IV secretion system (T4SS), a macromolecular complex that transports DNA, nucleoproteins and "effector" proteins across the microbial cell envelope into the host cell, where they mediate intracellular interactions (Alvarez-Martinez and Christie 2009;Zechner et al. 2012). Homologs of all genes except virB5 of Agrobacterium tumefaciens T4SS have been identified in Wolbachia and other members of the Rickettsiales (Gillespie et al. 2009(Gillespie et al. , 2010, including Anaplasma, Ehrlichia, Neorickettsia, Orientia and Rickettsia. Among sequenced Wolbachia genomes, T4SS genes are organized in two operons: virB3-B6 containing virB3, virB4 and four virB6 paralogs and virB8- D4 containing virB8, virB9, virB10, virB11, virD4 and, in some genomes, the wspB paralog of the wspA major surface antigen (Pichon et al. 2009;Rances et al. 2008). In the supergroup B-strain wPip from Culex pipiens mosquitoes, wspB is disrupted by a transposon and is presumably inactive (Sanogo et al 2007). T4SS effector proteins that manipulate host cells have been identified from Anaplasma and Ehrlichia (Liu et al. 2012;Lockwood et al. 2011;Niu et al. 2010), and Wolbachia express both vir operons in ovaries of arthropod hosts, wherein T4SS effectors are suspected to play a role in cytoplasmic incompatibility and other reproductive distortions (Masui et al. 2000;Rances et al. 2008;Wu et al. 2004). Although WspA and WspB are likely components of the Wolbachia outer membrane, their functions remain unknown. In the case of wBm, WspB is excreted/secreted into filarial host cells (Bennuru et al. 2009) and co-localizes with the Bm1_46455 host protein in tissues that include embryonic nuclei (Melnikow et al. 2011). WspB is therefore itself a candidate T4SS effector that may play a role in reproductive manipulation of the host.
The Wolbachia strain wStr in supergroup B causes strong cytoplasmic incompatibility in the planthopper, Laodelphax striatellus (Noda et al. 2001a), and in addition maintains a robust, persistent infection in a clonal Aedes albopictus mosquito cell line, C/wStr1 (Fallon et al. 2013;Noda et al. 2002). Because in vitro studies with wStr provide advantages of scale and ease of manipulation for exploring mechanisms that may facilitate transformation and genetic manipulation of Wolbachia, we have undertaken proteomics-based studies that provide strong support for expression of T4SS machinery in cell culture. Here, we report the sequence of the virB8-D4 operon, including flanking genes ribA, upstream of virB8, and wspB downstream of virD4. We show that wspB is intact, describe protein structure predicted from the deduced WspB sequence, and verify co-transcription of wspB with upstream vir genes. Relative abundance levels of WspB and the VirB8-D4 proteins in wStr are well above average, while RibA is among the least abundant of MS-detected proteins. In wStr, ribA and wspB are mosaics of sequence motifs that are differentially conserved in supergroup A-(WOL-A) and B-(WOL-B) strains, and they contain conserved 8-bp repeat elements that may be associated with genetic exchange. Finally, we discuss implications for functional integration of the Wolbachia T4SS with WspB and with the riboflavin biosynthesis pathway enzymes GTP cyclohydrolase II (RibA) and dihydroxybutanone phosphate synthase (RibB).

Cultivation of cells
Aedes albopictus C7-10 and C/wStr1 cells were maintained in Eagle's minimal medium supplemented with 5 % fetal bovine serum at 28-30 °C in a 5 % CO 2 atmosphere (Fallon et al. 2013;Shih et al. 1998). Cells were harvested during exponential growth, under conditions favoring maximal recovery of Wolbachia .

Polymerase chain reaction, cloning and DNA sequencing
The polymerase chain reaction (PCR) was used to amplify wStr genes from DNA extracts prepared from Wolbachia enriched by fractionation of C/wStr1 cells on sucrose density gradients and recovered from the interface between 50 and 60 % sucrose . Template DNA was used to obtain 21 PCR products using a panel of 31 primers (Table S1), GoTaq™ DNA polymerase (Promega, Madison, WI), and a Techne TC-312 cycler (Staffordshire, UK). Cycle parameters were: 1 cycle at 94 °C for 2 min, 35 cycles at 94 °C for 35 s, 53 °C for 35 s, 72 °C for 1 min, followed by 1 cycle at 72 °C for 5 min. Extension time was increased to 2 min for products ≥1000 bp. PCR products were cloned in the pCR4-TOPO vector with the TOPO-TA Cloning Kit for Sequencing (Life Technologies, Grand Island, NY), and two or more clones each were sequenced at the University of Minnesota BioMedical Genomics Center.

Reverse transcriptase polymerase chain reaction
Total RNA was purified from A. albopictus C7-10 and C/wStr1 cells using the PureLink RNA Mini Kit (Life Technologies) and treated with DNase I (RNase-free; Life Technologies) followed by heat inactivation, as suggested by the manufacturer. RT-PCR was executed with primers virD4 F1764-1784 and wspB R152-172 (Table S1) using the RNA PCR Core Kit (Life Technologies) as suggested by the manufacturer with the exception that synthesized cDNA was treated with DNase-inactivated RNaseA before the final PCR reaction. The PCR reaction included 1 cycle at 95 °C for 4 min, 35 cycles at 95 °C for 35 s, 56 °C for 40 s, 72 °C for 40 s, followed by 1 cycle at 72 °C for 3 min. Reaction products were electrophoresed on 1 % agarose gels, cloned, and sequenced as above.

Sequence alignments and protein structure prediction
DNA and protein sequence alignments were executed with the Clustal Omega program (Sievers et al. 2011). Alignments were edited by visual inspection and modified in Microsoft Word. WspB protein structure predictions were obtained using tools available at www.predictprotein.org, including the PROFtmb program (Dell et al. 2010) for prediction of bacterial transmembrane beta barrels (Bigelow et al. 2004) and per-residue prediction of up-strand, down-strand, periplasmic loop and outer loop positions of residues. The PROFisis program (Ofran and Rost 2006) was used to predict WspB amino acid residues that are potentially involved in protein-protein interactions. Trees were produced using PAUP* version 4 (Swofford 2002). Amino acids were aligned with Clustal W, using pairwise alignment parameters of 25/0.5 and multiple alignment parameters of 10/0.2 for gap opening and gap extension, respectively. The protein weight matrix was set to Gonnet. The alignment was saved as a nexus file and loaded into PAUP*, and the trees were created using a heuristic search with the criterion set to parsimony. Bootstrap 50 % majority-rule consensus trees are based on 1000 replicates, with wBm (WOL-D) as the outgroup.

Mass spectrometry, peptide detection, protein identification and statistical analysis
Mass spectrometry data, generated using LC-MS/MS on LTQ and Orbitrap Velos mass spectrometers as four data sets, were described previously . The MS search database was modified to include deduced ORFs from wStr sequence data described herein. All tests of association were performed with SAS version 9.3 (Cary, NC; http://www.sas.com/en_us/home.html/).

Structure of the wStr virB4-D8 operon
The robust, persistent infection of A. albopictus mosquito cell line, C/wStr1 with B wStr (in the text below, strain designations are denoted by superscripts), isolated from the planthopper L. striatellus, provides an in vitro model to identify proteins that modulate the host-microbe (2) designates a refined search in which the database included peptides based on the present wStr nucleotide sequence data; (T) combined total peptides from both searches. c Percent protein sequence coverage represented by detected peptides. d Mean number of peptides from four independent MS data sets. e Studentized residual based on the modified univariable model of the refined search (Table S3, column R); SR value 0 indicates average abundance protein, 0-1 above average, 1-2 abundant and >2 highly abundant. Values below 0 indicate lower than average abundance. f A 94 % confidence peptide indicated in Fig. 1A did not meet the threshold for proteome inclusion in the original search. For VirB10, one originally detected peptide was absent from the refined search 1 3 interaction. A potential role for the T4SS is supported by strong representation of peptides from VirB8, VirB9, VirB10, VirB11, VirD4 (Table 1) and associated proteins in the B wStr proteome ). Despite its emergence as a useful strain that grows well in vitro, the B wStr genome is not yet available. In Wolbachia strains for which genome annotation is available, gene order within the virB8-D4 operon is conserved. Based on transcriptional analyses in the related genera, Anaplasma and Ehrlichia (Pichon et al. 2009), the promoter likely maps within the 3′-end of ribA extending into the intergenic spacer (Fig. 1a, black horizontal arrow at left) and is followed by five consecutive vir genes (Fig. 1b). In B wPip from Culex pipiens mosquitoes, wspB is disrupted by insertion of an IS256 element that encodes a transposase on the opposite strand ( Fig. 1a, at right; Sanogo et al. 2007). Because VirB8-D4 proteins were highly similar to homologs from B wPip ), we evaluated wspB in B wStr and its potential expression as a virB8-D4 operon member, as is the case in A wMel and A wRi from Drosophila spp. and A wAtab 3 from the wasp Asobara tabida (Rances et al. 2008;Wu et al. 2004). In the original proteomic analysis, three WspB peptides (Fig. 1a, tall black and gray arrows represent 95 and 94 % confidence peptides, respectively) mapped proximal and distal to the transposon insertion in B wPip, while the absence of peptides corresponding to the transposon suggested that wspB is intact in B wStr.

Nucleotide and deduced amino acid sequence comparisons
To examine the virB4-D4 operon in B wStr, we sequenced overlapping PCR products from 20 primer pairs (Table S1) spanning 9.1 kb beginning 43 bp downstream of the 5′-end of ribA in other Wolbachia strains and ending within topA encoded immediately downstream of the operon on the opposite strand (Fig. 1b, c). With the notable exception of the B wPip transposon, the nucleotide sequence aligned most wStr. c Filled lines above the 10-kb scale marker represent cloned PCR amplification products (see Table S1 for primers) that were sequenced and assembled into the B wStr ribB and ribA-topA consensus sequence.  Table S2 for GenBank Accessions).
Pairwise sequence comparisons of the virB8-D4 operon from B wStr to homologs from Wolbachia supergroup A, B, C, D and F strains (Table 2) confirm that virB10, with nucleotide identities ranging from 74-99 %, is the least conserved of the five vir genes, and we note that Klasson et al. (2009) attributed divergence of virB10 in A wMel and A wRi to genetic exchange with a WOL-B-strain. Collectively and as individuals, the vir genes from B wStr have the highest nucleotide identities (~99 %) with B wVitB and B wPip. Identities with five A-strains are lower (range 87-91 %), lower yet (range 80-89 %) with the F-strain, F wCle and fall to a range of 74-88 % with three nematode-associated strains, D wBm, C wOo and C wOv. At the 5′-end of the operon, ribA was distinct, with approximately equivalent nucleotide identity with homologs from A-and B-strains (range 91-94 %), while the partial sequence of topA downstream of the operon had a conservation pattern similar to that of the vir genes. In some comparisons, virB8, virB11, virD4 and topA amino acid identities exceed nucleotide identities. Although ribB is not physically adjacent to the virB8-D4 operon in annotated Wolbachia genomes, ribB from B wStr is most similar to homologs from B wNo (97 % nucleotide identity) and A wMel (90 %), but was exceptional because identities with three other insect-associated A-and B-strains (~80 %) were lower than with F-, C-and D-strains (range 85-87 %). Consistent with earlier proteomic data , in all comparisons that discriminate between A-and B-strains, B wStr resembled WOL-B, while variability in ribA and wspB flanking the virB8-D4 genes exceeded that of the vir genes themselves.

Expression and relative abundances of the B wStr virB4-D8 proteins
To refine an earlier original proteomic analysis , we incorporated the PCR-amplified B wStr sequences described here to the database for peptide identification [ Table 1, see column labeled Pep (2)]. Statistical analysis indicated that in a univariable model, protein molecular weight was weakly (r 2 = 0.2221) but significantly (p < 0.0001) associated with peptide count: log(pe ptides) = −0.40247 + 0.4953 × log(MW). Estimations of protein relative abundance levels (RAL) based on peptide counts were therefore normalized to protein length using studentized residuals (SR), a measure of deviance from expected values adjusted for estimated SD from the mean. All peptide data and SR values in the univariable and multivariable models of the original and refined searches are detailed in Table S3.
In the refined search, we identified eight new peptides from Vir proteins [ Table 1, compare columns labeled Pep(2) to Pep(1)], including three from the most divergent VirB10. In aggregate, the five Vir proteins had a mean (SD) SR of 0.73 (0.2) and are expressed at above average abundance. We identified five new peptides from RibB, but none from RibA (Table 1). RibB has an SR of 1.2 and is an abundant protein, while RibA has an SR of −2.3 and is among the least abundant of MS-detected proteins. Nine new peptides from the highly divergent WspB (see below) generated an SR of 1.08, slightly above the threshold (>1.0) for an abundant protein and roughly equivalent to SR values (range 1-1.17) of housekeeping proteins such as isocitrate dehydrogenase, ftsZ, ATPsynthase F0F1 α subunit, and ribosomal proteins S2, S9, L3, L7/L12 and L14 (Table  S3). In comparison, WspA with an SR of 2.17 (Table S3, entry 63) ranked as highly abundant, and the most abundant protein in the proteome was the GroEL chaperone (entry 586), with an SR of 3.66.

Reverse transcriptase PCR confirms co-transcription of wspB with vir genes
Similar SR values for WspB, relative to VirB8-D4, were consistent with evidence that wspB is co-transcribed with virB8-D4 in A wMel, A wRi and A wAtab 3 (Rances et al. 2008;Wu et al. 2004). We used RT-PCR with RNA template verified by PCR to be free of DNA contamination (Fig. 2b, lanes 2 and 3) to amplify a 528-bp product that was produced in reactions containing RNA from C/wStr1 cells (Fig. 2a, lane 4), but not in negative control reactions (lanes 1 and 2) or those with RNA from C7-10 cells (lane 3). Its sequence matched the expected B wStr genomic sequence (Fig. 1c, RT-PCR box at right), confirming that in B wStr, wspB is a member of the virB8-D4 operon.

In B wStr, ribA is a mosaic of conserved WOL-A and WOL-B sequence motifs
The ribA nucleotide sequence has been shown to contain regulatory elements for expression of the T4SS operon in Anaplasma and Ehrlichia (Ohashi et al. 2002;Pichon et al. 2009). In contrast to highest homologies of B wStr virB8-D4 genes to WOL-B-strains, ribA sequence identities showed little difference between WOL-A and -B homologs ( Table 2), but the two MS-detected peptides corresponded to A wMel and B wPip homologs, respectively (Fig. 1a). Alignment of amino acids from 10 RibA homologs ( Fig. 3; WOL-A and WOL-B-strains are identified at left in red and blue, respectively) suggested that B wStr RibA is a two-part mosaic, each containing a protein functional domain.
In the C-terminus, where the amino acid alignment shows an overall higher consensus (Fig. 3), B wStr grouped with the B-strains including B wPip, while B wVulC appears more closely related to A-strains.

Nucleotide alignment and phylogenetic comparisons show that ribA is a mosaic gene in B wStr and B wVulC
A nucleotide alignment (Fig. S1) confirmed that ribA from B wStr is a two-part mosaic of WOL-A and WOL-B sequence motifs that correspond to the N-and C-terminal halves of the protein. In the first 522 nucleotides of ribA, 45 (in red font) of 56 variable nucleotides in B wStr match the A-strain sequences (Fig. S1), but only six (in blue) match the majority of B-strains and two are unique to B wStr (in green). In the downstream 522 nucleotides of ribA, 51 (in blue) of 54 variable nucleotides in B wStr match B-strains, while a single nucleotide (684 in red) matches the A-strains and two (in green) are unique to B wStr. In B wVulC, ribA has a similar two-part mosaic structure but does not firmly transit from the WOL-A to the WOL-B sequence motif until position 775, consistent with the amino acid alignment. Among the A-strains, ribA from A wRi is again most similar to the B-strain sequences. Within nucleotides 387-453 encoding amino acids 129-150 just before the cyclohydrolase domain and the A/B-strain sequence motif transition in B wStr, 13 of 18 WOL-A/B variable nucleotides in A wRi are shared with B wTai, B wPip and B wVitB, but those of B wStr and B wVulC are conserved with the other A-strains (orange and black vs. red residues, respectively).

WspB in B wStr is strikingly similar to a A wCobU4-2 homolog
Having shown that wspB is intact in B wStr, we mapped 11 peptides onto amino acid sequences encoded by 12 homologs (Fig. 5), including sequences deduced from three open reading frames (ORFs) in the wspB pseudogene from B wPip (Sanogo et al. 2007) and two overlapping ORFs in a pseudogene from A wCobU4-2, one of several WOL-A  Fig. 3; the remainder of the protein was included in the C-terminal alignment Cyan designates peptides conserved in B-strains, and yellow, those conserved in B wStr and A wCobU4-2. Olive peptides were unique to B wStr. Residues conserved between B wStr and a majority of A-strains are in red font (a single proline at residue 193) and residues conserved with a majority of B-strains are in blue font. Unique residues are in green font, and residues conserved between two or three homologs are in orange font. Underlined residues below the alignment denote the breakpoints between contiguous peptides within sequence regions. The greater than and less than symbols below the alignment indicate a transposon insertion in the wspB pseudogene of B wPip, followed by two additional deduced ORFs-see Fig. S2. PROFtmb (prediction of transmembrane beta barrels) symbols for individual residues below the alignment are: Uup-strand, D-down-strand, I-periplasmic loop, O-outer loop. PROFisis (prediction of protein-protein interaction residues) symbol P designates interaction residues. Wolbachia strain host associations: A wAtab 3, A. tabida-wasp; A wCob, C. obstrictus-weevil; B wMet, Metaseiulus occidentalis-predatory mite. See Tables 2 and  S2 for other host associations and GenBank Accessions. The first 20 residues of the A wCob and B wMet sequences are not available variants associated with the weevil, Ceutorhynchus obstrictus. Of two B wStr peptides (Fig. 5) detected at 95 % confidence in the original search , the first (residues 105-115 in gray) was identical in all strains except B wNo, which has unique M/I and V/I substitutions (residues in green). The second peptide (residues 209-220) is identical in all but the two A wCob strains that share an M/R substitution (215 in orange), while A wCobU4-2 has a unique Y/C substitution (219 in green). Five additional B wStr peptides (highlighted in cyan) were identical with B wVitB and B wMet (residues in blue), but not with B wPip and B wNo, which have many residues that are unique (in green) or shared (in orange) only with A wCobU5-2 and A wAna. Thus, with the exception of A wCobU5-2, cyan peptides of B wStr match other WOL-B-strains.
Two peptides underscore a striking similarity between the B wStr and A wCobU4-2 homologs. The first (Fig. 5, residues 133-140 highlighted in yellow) contains an alanine residue (138 in bold orange) shared only with A wCobU4-2. The second (residues 169-186 highlighted in olive) has a unique F/L substitution (in green) and a V/I substitution (in orange) shared with A wCobU4-2 and A wAna. Overall, the B wStr and A wCobU4-2 sequences differ at only five residues (59, 172, 193, 215 and 219), of which four occur within hypervariable regions. Throughout the alignment, A wAtab 3, A wKue, A wMel and A wRi form a conserved group, but the divergent A wAna and A wCobU4-2 and U5-2 strains have multiple residues (in blue, as in 42-77 and 224-277) that are conserved with the B-strains, suggesting genetic exchange between supergroups.

WspB domain structure and hypervariable regions (HVRs)
WspB is a paralog of the better-known WspA major surface antigen, which is anchored in the cell envelope by a transmembrane β-barrel domain (Koebnik et al. 2000), while surface-exposed loop domains contain HVRs with high recombination frequencies within and between strains (Baldo et al. 2010). The PROFtmb program predicted 10 transmembrane down (D)-and up (U)-strands and six periplasmic space (I) strands in WspB from B wStr ( Fig. 5; residues indicated by D, U and I, respectively; Z score of 6.8 supports designation as transmembrane β-barrel protein). HVR1 and HVR2 each contain a predicted outer loop (residues 38-86 and 115-156 indicated by O) with high proportions of amino acids that are potentially charged at physiological pH; HVR3 contains two outer loops. Finally, a small predicted loop that is not within an HVR contains a proline (residue 193) that is conserved in B wStr and four WOL-A-strains. It is one of the 20 amino acids, most with hydrophilic or potentially charged side chains and within HVRs or adjacent to periplasmic space strands, predicted by the PROFisis program to be potentially involved in protein-protein interactions (P below alignment).

HVR1 amino acids
In HVR1 (Fig. 5, residues 41-77), eight residues are universally conserved among all homologs, while the majority of variable residues are differentially conserved in the B-strains (residues in blue) versus the A-strains. However, the sequences from the A wAna and A wCobU5-2 A-strains are mosaics in which eight of the first 20 residues (in blue) are conserved with all B-strains, while eight others are either conserved mutually or with B wNo or B wPip (in orange). Within the remaining 17 residues of HVR1, the A wAna and A wCobU5-2 sequences are better conserved with the other A-strains, while B wNo and B wPip have multiple unique residues (in green). The A wCobU4-2 and B wStr sequences differ only at residue 59.

HVR2 amino acids
Within HVR2 (Fig. 5, residues 121-150), A wCobU5-2 and A wAna sequences have alignment gaps at four residues, five or six unique residues respectively (in green), and eight residues that are either conserved mutually (in orange) or with B wNo. The B wPip pseudogene has only the first two residues of HVR2 due to a transposon insertion (indicated below alignment by greater than less than symbols). The A wCobU4-2 pseudogene contains a nucleotide sequence duplication (see below) that results in an overlap of the first and third ORFs beginning at the seventh residue of HVR2, but their spliced sequences, as shown, are identical to that of B wStr. The B wNo sequence has eight alignment gaps and nine unique residues.

HVR3 amino acids
In HVR3, five of 52 residues (Fig. 5, residues 224-277) are conserved among all strains. Throughout HVR3, sequences from the upper cluster of four A-strains are identical, including an alignment gap. However, the A wAna sequence has 22 unique residues (in green) and is partially conserved with B wNo (nine residues in orange). In striking contrast to differences in HVR1 and HVR2, the A wCobU4-2 and U5-2 homologs have identical HVR3 sequences that are conserved with the B-strains, particularly B wStr (residues in blue), differing only at residues 241 and 244.

Nucleotide sequence alignment confirms a mosaic wspB and identifies a conserved repeated sequence
Nucleotide sequence alignment of eleven wspB homologs confirmed that WOL-A/B genetic mosaicism is concentrated in the HVR regions and revealed three copies of a repeated sequence element within or near HVR2. Further analyses identified three copies of the repeated sequence element in ribA at the 5′-end of the virB8-D4 operon and four copies in vir genes. HVR1 HVR1 (Fig. S2, nucleotides 117-241) from B wStr begins with two nucleotides (117 and 120 in red) that are conserved in B wStr and all WOL-A-strains except A wCobU5-2 and A wCobU4-2. Downstream, the B wStr sequence includes 47 of 48 nucleotides (in blue) within a sequence motif characteristic of B wStr and the other B-strains. The A wCobU5-2 and A wAna sequences are initially similar to the WOL-B motif, but beginning at an alignment gap in the other A-strains they have 11 nucleotides (in orange, nucleotides 152-207) that are conserved with B wNo and B wPip at positions in which those strains diverge from the WOL-B consensus. Thus, HVR1 in B wStr begins with nucleotides from a conserved WOL-A sequence motif but transitions to the conserved WOL-B motif, while HVR1 from the A wCobU4-2 A-strain differs from that WOL-B motif at a single nucleotide (176). In contrast, the A wAna and A wCobU5-2 sequences are mosaics of the WOL-A and WOL-B consensus motifs and share nucleotides with the divergent B wNo and B wPip B-strains, which also closely resemble each other upstream of HVR1 (23 nucleotides in light blue and one in orange).

HVR2 contains conserved repeat elements
HVR2 (Fig. S2, nucleotides 361-450) contains a conserved WOL-B sequence motif that differs at 20 nucleotides (in blue), from the WOL-A motif, while the divergent sequences from B wNo, B wPip, A wAna and A wCobU5-2 share an alignment gap and are again similar (nucleotides in orange). A tandem repeated sequence at nucleotides 365-379, CAAGTAATCAAGTAAC, in the B-strains B wStr, B wVitB and B wMet occurs with slight variation (underlined residues) as CAAGTAGCCAAATAAC, in the A-strains A wAtab 3, A wKue, A wMel and A wRi. We designated the eight-bp sequence, CAARTARY, where R = A or G, and Y = C or T, as an HVR2-repeat. The pseudogene from A wCobU4-2 contained a third copy of CAAGTAAT that interrupted ORF1 and was removed from the alignment (indicated by upwards arrow below alignment) to shift to ORF3, which maintains identity to the deduced amino acid sequence from B wStr. Just downstream of HVR2 at nucleotides 457-463, a truncated copy of the HVR2repeat lacking the 3′-terminal pyrimidine is conserved in B wStr, B wVitB, B wMet and A wCobU4-2 and corresponds to the position (indicated by greater than less than symbols below alignment) of the transposon insertion in B wPip. Finally, we noted that the most divergent HVR2 sequences from A wAna, A wCobU5-2, B wNo and B wPip have T/C and A/G substitutions (in orange, light blue and green) that disrupt the HVR2-repeat consensus.

HVR3
Within HVR3 (Fig. S2, nucleotides 670-831), conserved sequence motifs occur in the upper cluster of four A-strains and in the B-strains (nucleotides in blue), with the exceptions of B wPip (HVR3 absent) and B wNo. Sequences from A wCobU4-2 and A wCobU5-2 are identical despite their major differences in HVR1 and HVR2 and differ from the B-strain consensus only at nucleotides 722 and 773 (in orange). The A wAna and B wNo sequences are the most divergent but share 43 variable nucleotides (in orange) and have 67 and 18 unique residues (in green), respectively.

HVR2-repeats also occur in ribA and ribB
Based on a DNA pattern search (http://bioinformatics.org/ sms/), three HVR2-repeats occur in ribA, two in virD4, and single copies in virB8 and virB9 (Table 3). In addition, a reverse complement of the CAARTARY sequence occurs at the same position in ribB from three WOL-A-strains and B wPip (see gray shading in Fig. S3). The B wPip homolog contains a second copy at residues 7-14 just downstream of the start codon (not shown) and is a WOL-A/B mosaic (see below). Although repeat frequencies in individual ribA (0.29) and wspB (0.34) genes are ~sixfold higher than in the whole genomes of A wMel and B wPip (0.05) from flies (Diptera), it will be important to re-evaluate these frequencies when a B wStr genome (Hemipteran host) becomes available.
Although RibA and RibB are involved in riboflavin biosynthesis, ribB is not contiguous with ribA and the virV8-D4 operon, and it has higher variability than ribA (Table 2). Among the WOL-B-strains, ribB in B wStr and B wNo is conserved with the A wAu and A wMel A-strains ( Fig. S3; note especially the bold blue residues downstream of nucleotide 181, as well as additional residues in orange). In contrast, the B wPip homolog is best-conserved (nucleotides in red) with WOL-A-strains, A wAna, A wHa and A wRi, including an alignment gap at residue 483 encompassing an identical 15-nucleotide "island" with the reverse complement CAARTARY repeat. Downstream of the gap, at residue 511, the B wPip sequence shifts to a predominantly WOL-B motif conserved in B wStr, B wNo, but also in A wMel (nucleotides in blue), while A wAna, A wRi and A wHa are mutually conserved (nucleotides in orange) versus all other strains. Within the 3′-end of the alignment (nucleotides 541-600), the B wPip sequence is conserved with B wStr, B wNo and D wBm (nucleotides in blue), while A wAu and A wMel are the most divergent (nucleotides in green).

Discussion
Although the status of Wolbachia as a species remains unclear (Baldo et al. 2006b;Lo et al. 2007), a notable distinction between WOL-C-/D-strains that associate with nematodes as mutualists and WOL-A-/B-strains that occur as reproductive parasites in insects relates to genome stability and phylogenetic congruence between Wolbachia and its host. In insect hosts, Wolbachia appears to engage in frequent horizontal gene transfer, resulting in a lack of phylogenetic congruence manifested by gene structures that represent mosaic recombinations from genomes now considered distinct strains. Coinfections with two or more Wolbachia strains and activities of bacteriophages that reside in genomes of WOL-A/B-strains likely contribute to this genetic plasticity (Bordenstein and Reznikoff 2005;Newton and Bordenstein 2011), which may reflect what some authors suggest is a worldwide Wolbachia pandemic (Zug et al. 2012). Examples of natural coinfections include A wAlbA and B wAlbB in A. albopictus mosquitoes (O'Neill et al. 1997), A wVitA and B wVitB in the parasitoid wasp, N. vitripennis (Perrot-Minnot et al. 1996;Raychoudhury et al. 2008) and A wHa and B wNo in the phytophagous D. simulans (James et al. 2002). A particularly interesting example in C. obstrictus weevils involves infection with a single A wCob strain, in which polymorphisms in wspA and wspB indicate that three distinct variants coexist in all host populations (Floate et al. 2011) and it will be of interest to explore other genetic similarities and differences among these variants following separation in vitro and/ or in uninfected hosts. Wolbachia coinfections have also been documented in insects such as fig wasps (Yang et al. 2012), tephritid flies (Morrow et al. 2014) and planthoppers (Zhang et al. 2013) whose interactions with parasitoids, parasites and predator arthropods may facilitate horizontal transmission (Cordaux et al. 2001;Werren et al. 2008;Zug et al. 2012). In nature, the B wStr strain occurs in two planthopper hosts (Noda et al. 2001a) and in the strepsipteran endoparasite Elenchus japonicus (Noda et al. 2001b;Zhang et al. 2013). In the present study, B wStr has been artificially introduced into a cultured cell line, which has not been achieved with B wPip or nematode-associated strains. Adaptation of B wStr to cell lines (Noda et al. 2002;Fallon et al 2013) will provide an in vitro system for examining mechanisms of genetic exchange if conditions for maintenance of doubly infected cells can be developed through coinfection or somatic cell fusion. We note that high rates of recombination and transposition in Wolbachia (Baldo et al. 2006a;Cordaux et al. 2008) are consistent with expression of an abundant RecA protein (SR 1.05; Table S3, entry 146) as well as 18 transposases and/or proteins with transposase domains in B wStr .

Genetic plasticity of wspB in the virB8-D4 operon
An intact wspB that maps to the 3′-end of the virB8-D4 operon in most WOL-A genomes (Wu et al. 2004) is absent from 17 of 21 WOL-B-strains, including B wVulC and nearly all other isopod-associated strains (Pichon et al. 2009), and is interrupted by a transposon in B wPip (Sanogo et al. 2007). Here, we verify that in B wStr, an intact wspB is co-transcribed with virD4 and is expressed in C/wStr1 cells as an abundant protein at levels similar to those of many housekeeping proteins. The wspB structure closely resembles that of its better-studied wspA paralog, encoding a major surface antigen that has four HVR regions with sequence motifs that have been shuffled by recombination within and between Wolbachia WOL-A-and -B-strains (Baldo et al. 2005(Baldo et al. , 2010. Likewise, most sequence variation in wspB alleles occurs in the three HVR regions, with distinctive patterns for each region. HVR1 underscores WOL-A/B mosaicism in A wAna and A wCobU5-2, Values indicate 5′-nucleotide positions of HRV2-repeats in the 9133-bp ribA to topA sequence from B wStr (see Fig. 1; Acc. KF43064.1). Negative values indicate reverse complement positions. Copy numbers in the complete A wMel (NC_002978.6) and B wPip (NC_010981.1) genomes are shown at right a Frequency is defined as number repeats/total nucleotides in each individual gene (or complete genome) indicated at the top of the panel, ×100 b See underlined nucleotides 457-463 in Fig. S2, which lack the 3′-terminal pyrimidine wCobU5-2 and between B wStr and A wCobU4-2 also occurs in HRV2, while B wNo stands out as distinctive. In B wPip, HVR2 is disrupted by a transposon insertion and we identified an eight-nucleotide HRV2-repeat (CAAR-TARY) that correlates with transitions between WOL-A-/B-strain motifs and the pseudogene lesions in B wPip and A wCobU4-2. Finally, we noted that high identity of A wCobU5-2, A wCobU4-2 and B wStr is unique to HVR3.
The remarkable similarity of the wspB homologs from B wStr and A wCobU4-2 (>98 % nucleotide identity Fig.  S2) is consistent with exchange of an apparently intact gene between members of distinct Wolbachia supergroups by a mechanism that requires further investigation. Intensive analysis of the wspA paralog demonstrates that intragenic recombination breakpoints are concentrated in conserved regions outside of the HVRs (Baldo et al. 2005(Baldo et al. , 2010. CAARTARY repeats are not present in wspA, and in wspB, they occur only within and directly adjacent to HVR2 at positions that correspond to pseudogene lesions in A wCobU4-2 and in B wPip (due to a transposition event in B wPip; Sanogo et al. 2007). Furthermore, Pichon et al. (2009) suggested that transposition events may explain absence of wspB in the virB8-D4 operons of many WOL-B-strains. In a practical sense, CAARTARY repeats at wspB pseudogene lesions and WOL-A/B sequence motif transitions (Figs. S1, S2, S3) suggest their involvement in genetic exchange. Because transformation of Wolbachia has not yet been achieved, engineering of CAARTARY repeats into vectors used successfully to introduce selectable markers into other members of the Rickettsiales (see Beare et al. 2011) merits investigation.

Potential functions of WspB
Although bacterial outer membrane proteins are important mediators of interactions with host cells and specific function(s) of both WspA and WspB remain to be identified, they may have unique functions as porin proteins in Wolbachia, which lack cell walls. The virB8-D4 operons of Wolbachia and its sister genera, Anaplasma and Ehrlichia, are similarly organized (Gillespie et al. 2010;Hotopp et al. 2006) with 3′-terminal genes encoding major surface proteins that, analogous to wspB, are co-transcribed with the vir genes (Ohashi et al. 2002). In A. marginale, a family of msp2 pseudogenes undergo "combinatorial gene conversion" at the expression site (Brayton et al. 2002) and MSP2 variants change during growth in different host cell types, which likely reflects a response to host immunity mechanisms (Chávez et al. 2012). Similarly, Baldo et al. (2010) proposed that changes in WspA HVR regions play a role in host adaptation and innate immunity interactions, consistent with variation in the higher-order structure of the protein in different hosts (Uday and Puttaraju 2012). HVR sequence changes in the wspB paralog may reflect a similar dynamic. Additional evidence indicates that MSP2 proteins are glycosylated (Sarkar et al. 2008), which is now an established process in post-translational modification in bacteria (Dell et al. 2010;Nothaft and Szymanski 2010), and we note that WspB contains potential glycosylation sites. Although an inactivated pseudogene or absence of wspB in virB8-D4 operons of some Wolbachia strains indicates that it is not absolutely required for survival, a secretome analysis of Brugia malayi showed that WspB from D wBm is excreted/secreted into filarial host cells (Bennuru et al. 2009). Furthermore, it co-localizes with the Bm1_46455 host protein in tissues that include embryonic nuclei (Melnikow et al. 2011). WspB is therefore itself a candidate T4SS effector that may play a role in reproductive manipulation of the host. Mosaicism in wspB and its high rate of evolution (Comandatore et al. 2013) may thus reflect genetic changes that optimize adaptation to particular host cells such as those in reproductive tissues and facilitate exploitation of new arthropod niches by Wolbachia.

Genetic plasticity of ribA in the virB8-D4 operon
Aside from wspB at the 3′-end of the T4SS virB8-D4 operon, ribA exhibits genetic plasticity at its 5′-end. In both B wStr and B wVulC, ribA is a two-part mosaic of N-terminal WOL-A and C-terminal WOL-B motifs. In contrast, the internal virB8-D4 genes have typical B-strain identities, and in some strain comparisons, amino acid identities slightly exceed nucleotide identities, which Pichon et al. (2009) attribute to strong selection against non-synonymous codon substitutions. Among the internal virB8-D4 genes, however, Klasson et al. (2009) suggest that in A wRi, an especially variable region in virB10 is likely derived from genetic exchange with a B-strain. We note here that ribA from A wRi closely resembles B-strain homologs within a variable region that immediately precedes the GTP cyclohydrolase domain, where its homolog in B wStr transitions from WOL-A to WOL-B sequence motifs (Fig. S1, positions 387-450).
In contrast to D wBm, in which ribA and virB8 are cotranscribed and bind common transcription factors (Li and Carlow 2012), relative abundance levels suggest that in B wStr, ribA is transcribed independently of the virB8-D4 operon. Some WOL-B-strains, such as B wVulC, lack wspB at the 3′-terminus of the virB8-D4 operon, while our data confirm that in B wStr, wspB is co-transcribed with the vir genes, consistent with similar relative abundances of WspB and the five Vir proteins. In aggregate, these observations suggest that WOL-D and WOL-A-/B-strains may differ in how RibA and WspB expression interfaces with T4SS-mediated transport of effectors in filarial worms and arthropod hosts (Felix et al. 2008;Masui et al. 2000;Rances et al. 2008;Wu et al. 2004), and it will be of interest to explore whether such differences relate to riboflavin provisioning. In filarial nematodes (Li and Carlow 2012;Strubing et al. 2010;Wu et al. 2009) and bedbugs (Hosokawa et al. 2010), evidence suggests that Wolbachia provisions host with riboflavin, the precursor of flavin cofactors that are essential for many cellular redox reactions. In contrast, riboflavin depletion reduces B wStr abundance in C/wStr1 cells, suggesting that B wStr utilizes host riboflavin and does not augment riboflavin levels in mosquito host cells .

Potential functions of RibA and RibB
In initial commitment steps in riboflavin biosynthesis, enzymatic activities encoded by the ribA and ribB functional domains use GTP and ribulose-5-phosphate as substrates to catalyze riboflavin biosynthesis, consuming 25 molecules of ATP per molecule of riboflavin (Bacher et al. 2000). We note that in Wolbachia genomes, ribA is the annotated homolog of ribBA in Escherichia coli (Brutinel et al. 2013) and encodes a dihydroxybutanone phosphate synthase domain with putative RibB function near the N-terminus, upstream of a GTP cyclohydrolase II domain with conserved dimerization and active site residues (RibA function). As in E. coli, Wolbachia genomes also encode ribB, but at a distinct chromosomal locus, suggesting that ribA and ribB are not coordinately expressed. In Sinorhizobium meliloti (Rhizobiales; Alphaproteobacteria), knockout mutations of ribBA decreased flavin secretion but did not cause riboflavin auxotrophy or block establishment of symbiosis, suggesting that RibBA may have an undefined role in molecular transport (Yurgel et al. 2014). As is the case with B wStr, RibB is at least threefold more abundant than RibA in the bacterium Acidithiobacillus ferrooxidans (Knegt et al. 2008). In yeast, RibB has thiol-dependent alternative redox states (McDonagh et al. 2011), partially localizes to the mitochondrial periplasm, and has an unexplained function in oxidative respiration that is independent of riboflavin biosynthesis (Jin et al. 2003). These observations raise the possibility that in Wolbachia, RibA and RibB may have functions other than riboflavin biosynthesis that integrate with pathways involved in cellular oxidative state, such as iron metabolism. Intracellular bacteria are challenged by host-imposed oxidative stress and iron starvation (reviewed by Benjamin et al. 2010) and riboflavin biosynthesis is associated with iron acquisition in bacteria such as Helicobacter pylori (Worst et al. 1998) and Campylobacter jejuni (Crossley et al. 2007). Wolbachia interferes with iron metabolism and sequestration in insects (Brownlie et al. 2009;Kremer et al. 2009) and influences iron-dependent host processes such as heme metabolism, oxidative stress, apoptosis and autophagy (Gill et al. 2014). We note that the periplasmic iron-binding component of a membrane transporter is an abundant protein in B wStr (Table S3, entry 778 and Baldridge et al. 2014).