Introduction

The genus Medicago comprises 87 species of annual and perennial legumes, including some that were formerly recognized as Trigonella and Melilotus species [1]. A small number of annual Medicago species that have been domesticated are grown extensively in the sheep-wheat zone of southern Australia, particularly where pasture regeneration after a cropping phase is desirable. Annual Medicago species are grown on more than 20 M ha [2] and are particularly valued for their contribution to farming systems, in which Medicago fix around 25 kg of N per tonne of legume dry matter produced [3].

Medicago are nodulated by two species of root nodule bacteria (Ensifer medicae and Ensifer meliloti) that are recognized as being distinct based on their different nodulation and N2 fixation phenotypes in host interaction studies and more detailed analyses of their genetics [4,5].

Ensifer medicae strain WSM1115 is used in Australia to produce commercial peat cultures (referred to as Group AM inoculants) for the inoculation of several species of annual Medicago (predominantly M. truncatula, M. polymorpha, M. scutellata, M. sphaerocarpus, M. murex, M. rugosa and M. orbicularis). WSM1115 has been used commercially since 2002 [6], when it replaced strain WSM688. WSM1115 was isolated from a nodule from the roots of burr medic (Medicago polymorpha) collected by Prof. John Howieson (Murdoch University, Australia) on the island of Samothraki, Greece.

WSM1115 was selected for use in commercial inoculants having demonstrated good N2-fixation capacity with the relevant medic hosts and adequate saprophytic competence in moderately acidic soil (pH(CaCl2) 5).

Saprophytic competence in acidic soils is a requirement of strains used to inoculate Medicago because several species (M. murex, M. sphaerocarpus and M. polymorpha) are recommended and sown into soils below pH(CaCl2) 5.5, a level that is known to limit both survival of medic rhizobia and nodulation processes [710]. Useful variation in saprophytic competence occurs between strains of medic rhizobia [9] and valuable insights into the mechanisms that confer acidity tolerance have been provided by studies using strain WSM419 [11], which has been recently sequenced [12]. However, the complex nature of soil adaptation means that in-situ field studies still provide the most reliable means of selecting an inoculant strain and were used to select WSM1115 for commercial use. In a cross row experiment comparing 15 strains on acidic sand (pH(CaCl2) 5.0; Dowerin, West Australia), the nodulation of plants inoculated with WSM1115 was equal to or better than that of the other strains. This translated to better plant shoot weights, which were similar to those of plants inoculated with WSM688 (the incumbent inoculant strain at time of testing) and 48% greater when compared to former inoculant strain CC169 (J. G. Howieson unpublished data).

The nitrogen fixation capacity (effectiveness) of Medicago symbioses is characterized by strong interactions between the strain of rhizobia and species of Medicago [1316]. Hence, the ability to form effective symbiosis with the species recommended for inoculation is an important consideration in inoculant strain selection. WSM1115 satisfies this requirement. In greenhouse tests it formed effective symbiosis with 16 genotypes of Medicago and overall produced 48% more shoot dry matter compared to plants inoculated with WSM688, the strain that it replaced (R.A. Ballard and N. Charman, unpublished data).

A limitation of strain WSM1115 is its poor persistence in moderately saline soils (e.g. where summer salinity levels exceed 10 ECe (dS/m)). Poor nodulation of regenerating pasture was first noted in 2004 during the field evaluation and domestication of the salt tolerant annual pasture legume messina (Melilotus siculus syn. Melilotus messanensis). Subsequent studies [17] confirmed that although WSM1115 was able to nodulate and form effective symbiosis with messina, it did not persist as well as other strains (e.g. SRDI554) through the summer months when salinity levels increased.

Here we present a preliminary description of the general features of Ensifer medicae strain WSM1115 together with its genome sequence and annotation.

Classification and features

Ensifer medicae strain WSM1115 is a motile, non-sporulating, non-encapsulated, Gram-negative rod in the order Rhizobiales of the class Alphaproteobacteria. The rod-shaped form varies in size with dimensions of approximately 0.5 µm in width and 1.0 µm in length (Figure 1A). It is fast growing, forming colonies within 3–4 days when grown on TY [18] or half strength Lupin Agar (½LA) [19] at 28°C. Colonies on ½LA are opaque, slightly domed and moderately mucoid with smooth margins (Figure 1B).

Figure 1.
figure 1

Images of Ensifer medicae strain WSM1115 using (A) scanning electron microscopy and (B) light microscopy to show the colony morphology on a solid medium.

Minimum Information about the Genome Sequence (MIGS) is provided in Table 1. Figure 2 shows the phylogenetic neighborhood of Ensifer medicae strain WSM1115 in a 16S rRNA gene sequence based tree. This strain has 100% sequence identity (1,366/1,366 bp) at the 16S rRNA sequence level to the fully sequenced Ensifer medicae strain WSM419 [12] and 99% 16S rRNA sequence (1362/1366 bp) identity to the fully sequenced E. meliloti Sm1021 [36].

Figure 2.
figure 2

Phylogenetic tree showing the relationship of Ensifer medicae WSM1115 (shown in bold print) to other Ensifer spp. in the order Rhizobiales based on aligned sequences of the 16S rRNA gene (1,290 bp internal region). All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5 [33]. The tree was built using the Maximum-Likelihood method with the General Time Reversible model [34]. Bootstrap analysis [35] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Brackets after the strain name contain a DNA database accession number and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD [32]. Published genomes are indicated with an asterisk.

Table 1. Classification and general features of Ensifer medicae strain WSM1115 according to the MIGS recommendations [20]

Symbiotaxonomy

Ensifer medicae strain WSM1115 forms nodules (Nod+) and fixes N2 (Fix+) with a range of annual and perennial Medicago species and Melilotus species (Table 2). Levels of N2 fixation in combination with Medicago littoralis is suboptimal, that species generally forming more effective associations with strains of Ensifer meliloti including strain RRI128 [38]. The level of N2 fixation with Melilotus albus is also noted as positive, but has been observed to vary markedly with different plant accessions.

Table 2. Compatibility of Ensifer medicae WSM1115 with various Medicago and allied genera for nodulation (Nod) and N2-fixation (Fix)

Genome sequencing and annotation information

Genome project history

This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [32] and a high-quality-draft genome sequence in IMG/GEBA. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 3.

Table 3. Genome sequencing project information for Ensifer medicae strain WSM1115

Growth conditions and DNA isolation

Ensifer medicae strain WSM1115 was cultured to mid logarithmic phase in 60 ml of TY rich medium on a gyratory shaker at 28°C [39]. DNA was isolated from the cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [40].

Genome sequencing and assembly

The genome of Ensifer medicae strain WSM1115 was sequenced at the Joint Genome Institute (JGI) using Illumina [41] data. An Illumina standard paired-end library with a minimum insert size of 270 bp was used to generate 23,080,558 reads totaling 3,462 Mbp and an Illumina CLIP paired-end library with an average insert size of 9,584 + 2,493 bp was used to generate 2,163,668 reads totaling 324 Mbp of Illumina data (unpublished, Feng Chen).

All general aspects of library construction and sequencing performed at the JGI can be found at the JGI user home [40]. The initial draft assembly contained 57 contigs in 11 scaffolds. The initial draft data was assembled with Allpaths, version 38445, and the consensus was computationally shredded into 10 Kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet, version 1.1.05 [42], and the consensus sequences were computationally shredded into 1.5 Kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second VELVET assembly was shredded into 1.5 Kbp overlapping fake reads. The fake reads from the Allpaths assembly and both Velvet assemblies and a subset of the Illumina CLIP paired-end reads were assembled using parallel phrap, version 4.24 (High Performance Software, LLC). Possible mis-assemblies were corrected with manual editing in Consed [4345]. Gap closure was accomplished using repeat resolution software (Wei Gu, unpublished), and sequencing of bridging PCR fragments. The estimated total size of the genome is 6.9 Mbp and the final assembly is based on 3,654 Mbp of Illumina draft data, which provides an average 530× coverage of the genome.

Genome annotation

Genes were identified using Prodigal [46] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [47]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [48], RNAMMer [49], Rfam [50], TMHMM [51], and SignalP [52]. Additional gene prediction analyses and functional annotation were performed within the Integrated Microbial Genomes (IMG-ER) platform [53].

Genome properties

The genome is 6,861,065 nucleotides with 61.16% GC content (Table 4) and comprised of 7 scaffolds (Figures 3a,3b,3c,3d,3e,3f and Figure 3g) From a total of 6,872 genes, 6,789 were protein encoding and 83 RNA only encoding genes. The majority of genes (76.25%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 5.

Figure 3a.
figure 3a

Graphical maps of SinmedDRAFT_Scaffold1.2 of the Ensifer medicae strain WSM1115 genome sequence. From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew.

Figure 3b.
figure 3b

Graphical maps of SinmedDRAFT_Scaffold2.1 of the Ensifer medicae strain WSM1115 genome sequence. From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew.

Figure 3c.
figure 3c

Graphical maps of SinmedDRAFT_Scaffold5.3 of the Ensifer medicae strain WSM1115 genome sequence. From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew.

Figure 3d.
figure 3d

Graphical maps of SinmedDRAFT_Scaffold3.7 of the Ensifer medicae strain WSM1115 genome sequence. From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew.

Figure 3e.
figure 3e

Graphical maps of SinmedDRAFT_Scaffold6.5 of the Ensifer medicae strain WSM1115 genome sequence. From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew.

Figure 3f.
figure 3f

Graphical maps of SinmedDRAFT_Scaffold4.6 of the Ensifer medicae strain WSM1115 genome sequence. From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew.

Figure 3g.
figure 3g

Graphical maps of SinmedDRAFT_Scaffold7.4 of the Ensifer medicae strain WSM1115 genome sequence. From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew.

Table 4. Genome Statistics for Ensifer medicae strain WSM1115
Table 5. Number of protein coding genes of Ensifer medicae strain WSM1115 associated with the general COG functional categories.