Introduction

Legume plants form nitrogen fixing symbiosis with root nodule bacteria, collectively called rhizobia. These legumes are particularly useful crop plants that do not require exogenous nitrogenous fertilizer to support growth in less fertile, nitrogen-deficient conditions. They include some of our staple food and feed plants such as beans, peas, soybeans, lentils, clover, peanuts and alfalfa and are mostly annual crops. In many arid and savannah regions, leguminous trees represent a particularly valuable resource as they are often deep-rooted and drought resistant. They have been used traditionally in the Sahel region as sources of timber, fodder and for soil improvement [1]. Prosopis chilensis, also known as Chilean mesquite, is a native tree from South America that has many uses: its nutritious pods can be ground to produce flour and are also eaten by livestock; its wood is used for construction and furniture. Chilean mesquite is also used for intercropping with other plants, for which it provides shelter and nutrients (leaf compost, nitrogen). Acacia senegal (recently renamed as Senegalia senegal) is a plant of particular importance in the production of gum arabic in the Sahel region and the Middle East. Its seeds are dried for human consumption, and its leaves and pods serve as feed for sheep, goats and camels. The plant is also used in agroforestry in intercropping with watermelon and grasses, and in rotation systems with other crops (Agroforestree Database [2]).

The microsymbiont of these legume trees from Sudan and Kenya [3] has been renamed as Ensifer arboris [4], of which LMG 14919T (= HAMBI 1552, ORS 1755, TTR38) is the type strain. This strain was isolated from root nodules of Prosopis chilensis from Kosti, Sudan, and shown to effectively nodulate its original host as well as Acacia senegal [5].

Given the drought tolerance of the host trees, it seems fitting that their symbionts are also stress resistant: Ensifer arboris was described as tolerant to temperatures up to 41–43 °C, 3% NaCl, several heavy metals (including Pb, Cd, Hg, Cu) and a wide range of antibiotics [3,5], characteristics that contribute to the success of the rhizobial-legume tree association in challenging environmental conditions [6]. Here we present a summary classification and a set of features for E. arboris strain LMG 14919T (Table 1), together with the description of the complete genome sequence and its annotation.

Table 1. Classification and general features of Ensifer arboris LMG 14919T according to the MIGS recommendations [7]

Classification and features

E. arboris LMG 14919T is a motile, non-sporulating, non-encapsulated, Gram-negative rod in the order Rhizobiales of the class Alphaproteobacteria. The rod-shaped form varies in size with dimensions of approximately 0.25 µm in width and 1.0–1.5 µm in length (Figure 1, Left and Center). The strain is fast-growing, forming colonies within 3–4 days when grown on half strength Lupin Agar (½LA) [19], tryptone-yeast extract agar (TY) [20] or a modified yeast-mannitol agar (YMA) [21] at 28°C. Colonies on ½LA are white-opaque, slightly domed and moderately mucoid with smooth margins (Figure 1 Right).

Figure 1.
figure 1

Images of Ensifer arboris LMG 14919T using scanning (Left) and transmission (Center) electron microscopy and the appearance of colony morphology on a solid medium (Right).

E. arboris LMG 14919T is capable of using several amino acids, including L-proline, L-arginine, sodium glutamate and L-histidine as sole nitrogen sources and can use a wide range of different carbon sources including L-arabinose, D-galactose, raffinose, L-rhamnose, maltose, lactose, D-fructose, D-mannose, trehalose, D-ribose, xylene, methyl-D-mannoside, sorbitol, dulcitol, meso-inositol, inulin, dextrin, amygdalin, arbutin, sodium citrate, itaconate, α-ketoglutarate, sodium maltose, 1,2-propylene glycol, and 1,2-butylene glycol [5].

Minimum Information about the Genome Sequence (MIGS) is provided in Table 1. Figure 2 shows the phylogenetic neighborhood of E. arboris LMG 14919T in a 16S rRNA sequence based tree. This strain shares 99% (1361/1366 bp) and 99% (1361/1366 bp) sequence identity to the 16S rRNA of the fully sequenced E. meliloti Sm1021 [26] and E. medicae WSM419 [27] strains, respectively.

Figure 2.
figure 2

Phylogenetic tree showing the relationship of Ensifer arboris LMG 14919T (shown in bold print) to other Ensifer spp. in the order Rhizobiales based on aligned sequences of the 16S rRNA gene (1,290 bp internal region). All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5 [22]. The tree was built using the Maximum-Likelihood method with the General Time Reversible model [23]. Bootstrap analysis [24] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Brackets after the strain name contain a DNA database accession number and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD [25]. Published genomes are indicated with an asterisk.

Symbiotaxonomy

E. arboris LMG 14919T was initially shown to form nodules (Nod+) and fix nitrogen (Fix+) with two leguminous tree species, P. chilensis and A. senegal. It was unable to elicit nodules on the herbaceous perennials Macroptilium atropurpureum, Trifolium pratense, Medicago sativa, Lotus corniculatus and Galega orientalis [5]. The symbiotic properties of this strain in seedlings of Acacia and Prosopis spp. in Sudan and Senegal have been reported in detail [6]. Indeterminate nodules are induced, mainly on the lateral roots either in clusters or individually. Young nodules are spherical and later become elongated and are commonly branched. LMG 14919T (=HAMBI 1552) was shown to nodulate and fix nitrogen in seedlings of African A. mellifera, A. nilotica, A. oerfota (synonym A. nubica), A. senegal, A. seyal, A. sieberiana, A. tortilis subsp. raddiana, Latin American A. angustissima, P. chilensis and P. pallida, and Afro-Asian P. cineraria. It also effectively nodulates with Latin-American introductions of P. chilensis and P. juliflora in Africa [6]. It induced small ineffective nodules on Australian A. holosericea and African P. africana [6].

Genome sequencing and annotation

Genome project history

This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [25] and an improved-high-quality-draft genome sequence in IMG. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 2.

Table 2. Genome sequencing project information for E. arborisLMG 14919T.

Growth conditions and DNA isolation

E. arboris LMG 14919T was cultured to mid logarithmic phase in 60 ml of TY rich medium on a gyratory shaker at 28°C [28]. DNA was isolated from the cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [29].

Genome sequencing and assembly

The genome of Ensifer arboris LMG 14919T was sequenced at the Joint Genome Institute (JGI) using Illumina technology [30]. An Illumina short-insert paired-end library with an average insert size of 270 bp generated 19,256,666 reads and an Illumina long-insert paired-end library with an average insert size of 9,232.94 +/− 2,530.88 bp generated 1,365,298 reads totaling 3,093.3 Mbp of Illumina data. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI user home.

The initial draft assembly contained 27 contigs in 9 scaffolds. The initial draft data was assembled with Allpaths, version r38445, and the consensus was computationally shredded into 10 Kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet, version 1.1.05 [31], and the consensus sequences were computationally shredded into 1.5 Kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second VELVET assembly was shredded into 1.5 Kbp overlapping fake reads. The fake reads from the Allpaths assembly and both Velvet assemblies and a subset of the Illumina CLIP paired-end reads were assembled using parallel phrap, version SPS 4.24 (High Performance Software, LLC). Possible mis-assemblies were corrected with manual editing in Consed [3234]. Gap closure was accomplished using repeat resolution software (Wei Gu, unpublished), and sequencing of bridging PCR fragments using Sanger (unpublished, Cliff Han) technology. For the improved high quality draft, one round of manual/wet lab finishing was completed. A total of 46 additional sequencing reactions, were completed to close gaps and to raise the quality of the final sequence. The estimated total size of the genome is 6.9 Mbp and the final assembly is based on 3,093.3 Mbp of Illumina draft data, which provides an average of 448× coverage of the genome.

Genome annotation

Genes were identified using Prodigal [35] as part of the DOE-JGI annotation pipeline [36] followed by a round of manual curation using the JGI GenePRIMP pipeline [37]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-protein coding genes and miscellaneous features were predicted using tRNAscan-SE [38], RNAMMer [39], searches against models of the ribosomal RNA genes built from SILVA [40], Rfam [41], TMHMM [42], and SignalP [43]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG-ER) platform [44].

Genome properties

The genome is 6,850,303 nucleotides with 62.02% GC content (Table 3) and comprised of 7 scaffolds (Figure 3) of 12 contigs. From a total of 6,545 genes, 6,461 were protein encoding and 84 RNA only encoding genes. The majority of genes (80.78%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.

Figure 3.
figure 3

Graphical map of the genome of Ensifer arboris LMG 14919T showing the seven largest scaffolds. From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew.

Table 3. Genome Statistics for Ensifer arboris LMG 14919T
Table 4. Number of protein coding genes of Ensifer arboris LMG 14919T associated with the general COG functional categories.