Introduction

The most efficient biological nitrogen fixation occurs when bacterial microsymbionts (rhizobia) form an effective symbiotic association with legume host plants. Legumes can develop these interactions with many different species of rhizobia belonging mainly to the Alphaproteobacteria , including Azorhizobium , Allorhizobium , Bradyrhizobium , Ensifer , Mesorhizobium and Rhizobium [1, 2]. The genus Rhizobium contains at the time of writing 71 species, and within a species there may be distinct symbiovars [3].

Within the species Rhizobium leguminosarum , there are three distinct symbiovars [4, 5] including bv. phaseoli that forms nodules with Phaseolus vulgaris , bv. trifolii that forms nodules with clover ( Trifolium ) and bv. viciae that forms nodules on vetch, pea and lentil ( Vicia , Lathyrus , Pisum and Lens ). In R. leguminosarum the nod genes that define these distinct host specificities are mostly located on the symbiotic plasmid, which has generically been designated pSym. The genomes of R. leguminosarum strains are usually large and complex containing, in addition to pSym, a chromosomal replicon and extra-chromosomal low-copy-number replicons characterized by the presence of repABC replication systems [68]. Recent studies have revealed that substantial divergence can occur in this genome organization and in the metabolic versatility of R. leguminosarum isolates [5, 912]. Kumar et al. [5] demonstrated that the diversity of R. leguminosarum within a local population of nodule isolates was 10 times higher than that found for Ensifer medicae . It was noted that the abundance of a particular genotype within the population can vary significantly and adaptation to the edaphic environment is a sought after trait particularly for the development of inoculants [13, 14].

R. leguminosarum bv. viciae GB30 was isolated as the most abundant nodule inhabitant (>42 %) of Pisum sativum cv. Ramrod plants cultivated at a field site in Janow, Poland [10]. In contrast to other abundant isolates, GB30 formed nodules and fixed nitrogen with both P. sativum and Vicia villosa (cv. Wista). Preliminary investigation into the genome architecture using Eckhardt analysis has revealed that GB30 contained a multipartite genome consisting of six replicons with one chromosome and five plasmids [10]. The genome of this strain could therefore provide important insights into the mechanisms required by effective R. leguminosarum microsymbionts to adapt to a particular edaphic environment. Here, we present a set of general features for Rhizobium leguminosarum bv. viciae GB30 together with the description of the complete genome sequence and annotation.

Organism information

Classification and features

R. leguminosarum bv. viciae strain GB30 is a motile, Gram-negative rod in the order Rhizobiales of the class Alphaproteobacteria . The rod-shaped form varies in size with dimensions of 0.8-1 μm in width and 2.3-2.5 μm in length (Fig. 1 Left and Center). It is fast growing, forming colonies within 3–4 days when grown on half strength Lupin Agar (½LA) [15] at 28 °C. Colonies on ½LA are white-opaque, slightly domed and moderately mucoid with smooth margins (Fig. 1 Right).

Fig. 1
figure 1

Images of Rhizobium leguminosarum bv. viciae strain GB30 using scanning (Left) and transmission (Center) electron microscopy and the appearance of colony morphology on ½LA solid media (Right)

Figure 2 shows the phylogenetic relationship of Rhizobium leguminosarum bv. viciae GB30 in a 16S rRNA gene sequence based tree. This strain is phylogenetically most related to Rhizobium laguerreae FB206T and Rhizobium gallicum R602spT based on the 16S rRNA gene alignment with sequence identities of 100 %, as determined using the EzTaxon-e server [16]. Rhizobium laguerreae FB206T was isolated from effective Vicia faba root nodules in Tunisia [17], whereas Rhizobium gallicum R602spT was isolated from effective Phaseolus vulgaris root nodules in France [18]. Sequence similarity was also investigated with strains from the GEBA-RNB project [12] and GB30 was found to be closely related to R. leguminosarum bv. trifolii WSM1689 with 100 % 16S rRNA gene sequence identity. R. leguminosarum bv. trifolii WSM1689 is a highly effective microsymbiont of the perennial clover Trifolium uniflorum and has been shown to have a remarkable narrow host range [19]. Minimum Information about the Genome Sequence (MIGS) is provided in Table 1 and Additional file 1: Table S1.

Fig. 2
figure 2

Phylogenetic tree highlighting the position of Rhizobium leguminosarum bv. viciae GB30 (shown in blue print) relative to other type and non-type strains in the Rhizobium genus using a 901 bp internal region of the 16S rRNA gene. Bradyrhizobium elkanii ATCC 49852T was used as outgroup. All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5.05 [36]. The tree was built using the maximum likelihood method with the General Time Reversible model. Bootstrap analysis with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Strains with a genome sequencing project registered in GOLD [20] are shown in bold and have the GOLD ID mentioned after the strain number, otherwise the NCBI accession number has been provided. Finished genomes are designated with an asterisk

Table 1 Classification and general features of Rhizobium leguminosarum bv. viciae strain GB30 in accordance with the MIGS recommendations [37] published by the Genome Standards Consortium [38].

Symbiotaxonomy

R. leguminosarum bv. viciae strain GB30 was obtained from pea nodules (P. sativum cv. Ramrod) growing in sandy loam (N:P:K 0.157:0.014:0.013 %) in Janow near Lublin (Poland). The soil contained a relatively high number of R. leguminosarum bv. viciae, bv. trifolii and bv. phaseoli cells i.e., 9.2 × 103, 4.2 ÷ 103 and 1.5 × 103 bacteria/g of soil, respectively, as determined by the most probable number (MPN) method [10]. Plants were grown on 1 m2 plot for six weeks between May and June, 2008. Five randomly chosen pea plants growing in each other’s vicinity were harvested; the nodules were collected, surface-sterilized and the microsymbionts isolated [10]. One of the most abundant isolates, GB30, formed nodules (Nod+) and fixed N2 (Fix+) with P. sativum and Vicia villosa (cv. Wista) increasing the wet mass weight by 54 and 38 %, respectively. Plants inoculated with GB30 also showed a 2.6 fold increase in nodule number and a 2.2 fold increase in seed pod number.

Genome sequencing and annotation information

Genome project history

This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Genomic Encyclopedia of Bacteria and Archaea, The Root Nodulating Bacteria chapter (GEBA-RNB) project at the U.S. Department of Energy, Joint Genome Institute [12]. The genome project is deposited in the Genomes OnLine Database [20] and the high-quality permanent draft genome sequence in IMG [21]. Sequencing, finishing and annotation were performed by the JGI using state of the art sequencing technology [22]. A summary of the project information is shown in Table 2.

Table 2 Genome sequencing project information for Rhizobium leguminosarum bv. viciae strain GB30

Growth conditions and genomic DNA preparation

R. leguminosarum bv. viciae strain GB30 was grown to mid logarithmic phase in TY rich media [23] on a gyratory shaker at 28 °C. DNA was isolated from 60 mL of cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [24].

Genome sequencing and assembly

The draft genome of Rhizobium leguminosarum bv. viciae GB30 was generated at the DOE Joint Genome Institute [22]. An Illumina Std shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 25,943,396 reads totaling 3,891.5 Mbp. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI web site [25]. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artefacts (Mingkun L, Copeland A, Han J. unpublished). Following steps were then performed for assembly: (1) filtered Illumina reads were assembled using Velvet version 1.1.04 [26] (2) 1–3 Kbp simulated paired end reads were created from Velvet contigs using wgsim [27] (3) Illumina reads were assembled with simulated read pairs using Allpaths–LG (version r41043) [28]. Parameters for assembly steps were: 1) Velvet (velveth: 63 –shortPaired and velvetg: −very_clean yes –export-Filtered yes –min_contig_lgth 500 –scaffolding no –cov_cutoff 10) 2) wgsim (−e 0 –1 100 –2 100 –r 0 –R 0 –X 0) 3) Allpaths–LG (PrepareAllpathsInputs: PHRED_64 = 1 PLOIDY = 1 FRAG_COVERAGE = 125 JUMP_COVERAGE = 25 LONG_JUMP_COV = 50, RunAllpathsLG: THREADS = 8 RUN = std_shredpairs TARGETS = standard VAPI_WARN_ONLY = True OVERWRITE = True). The final draft assembly contained 78 contigs in 78 scaffolds. The total size of the genome is 7.5 Mbp and the final assembly is based on 910.4 Mbp of Illumina data, which provides an average of 121.9× coverage.

Genome annotation

Genes were identified using Prodigal [29], as part of the DOE-JGI genome annotation pipeline [30, 31]. The predicted CDSs were translated and used to search the National Centre for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [32] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [33]. Other non–coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [34]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes-Expert Review (IMG-ER) system [35] developed by the Joint Genome Institute, Walnut Creek, CA, USA.

Genome Properties

The genome is 7,468,464 nucleotides with 60.81 % GC content (Table 3) and comprised of 78 scaffolds of 78 contigs. From a total of 7,302 genes, 7,227 were protein encoding and 75 RNA only encoding genes. The majority of genes (79.57 %) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.

Table 3 Genome Statistics for Rhizobium leguminosarum bv. viciae strain GB30
Table 4 Number of genes associated with the general COG functional categories.

Conclusion

Rhizobium leguminosarum bv. viciae GB30 belongs to a group of Alpha-rhizobia strains isolated from Pisum sativum in Poland. Strain GB30 is part of the GEBA-RNB project that sequenced 24 R. leguminosarum strains and 12 R. leguminosarum bv. viciae strains [12]. Phylogenetic analysis revealed that GB30 is most closely related to Rhizobium leguminosarum bv. trifolii CB782 and WSM1689, both part of the GEBA-RNB project [12]. Full genome comparison of GB30 and WSM1689 [19] revealed that GB30 has the largest genome (7.4 Mbp), with the highest COG count (5,182), the lowest Pfam % (82.51) and the lowest TIGRfam % (22.13 %). The genome attributes of R. leguminosarum bv. viciae GB30, in conjunction with the other R. leguminosarum genomes, will be important for on-going comparative and functional analyses of the plant microbe interactions required for the successful establishment of agricultural crops.