Introduction

Leguminous pasture species are important in Western Australian agriculture because the soils are inherently infertile. Together with changing patterns of rainfall, this agricultural system cannot continue to rely on the current commercially used annual legumes. Deep-rooted herbaceous perennial legumes including Rhynchosia and Lebeckia species from the Cape Floristic Region in South Africa have been investigated because of their adaptation to acid and infertile soils [13]. These plants naturally occur in the CFR, which is one of the richest areas for plants in the world and covers 553,000 ha of land protected by the UNESCO as important world heritage. Elevations in this area range from 2077 m in the Groot Winterhoek to sea level in the De Hoop Nature Reserve. Moreover, a great part of the area is characterized by mountains, rivers, waterfalls and pools. In areas where Lebeckia ambigua is native, rainfall ranges between 150 and 400 mm annually. Parts of the CFR have thus similar soil and climate conditions to Western Australia.

In four expeditions to the Western Cape of South Africa, held between 2002 and 2007, nodules and seeds were collected and stored as previously described [4]. The isolation of bacteria from these nodules gave rise to a collection of 23 strains that were identified as Burkholderia . Unlike most of the previously studied rhizobial Burkholderia strains, this South African group appears to associate with papilionoid forage legumes, rather than Mimosa species. WSM4176 belongs to a subgroup of strains that were isolated in 2004 from Lebeckia ambigua nodules collected near Nieuwoudtville in the Western Cape of South Africa [3]. The site of collection was moderately grazed rangeland field owned by the Louw family, and the soil was composed of stony-sand with a pH of 6. Burkholderia sp. strain WSM4176 is highly effective at fixing nitrogen with Lebeckia ambigua , with which it forms crotaloid, indeterminate, nodules [3].

WSM4176 represents thus a potential inoculant quality strain for Lebeckia ambigua , which is being developed as a grazing legume adapted to infertile soils that receive 250–400 mm annual rainfall, where climate change has necessitated the domestication of agricultural species with altered characteristics. Therefore, this strain is of special interest to the IMG/GEBA project. Here we present a summary classification and a set of general features for Burkholderia sp. strain WSM4176 together with the description of the complete genome sequence and annotation.

Organism information

Classification and features

Burkholderia sp. strain WSM4176 is a motile, Gram-negative, non-spore-forming rod (Fig. 1 Left, Center) in the order Burkholderiales of the class Betaproteobacteria . The rod-shaped form varies in size with dimensions of 0.1–0.2 μm in width and 2.0–3.0 μm in length (Fig. 1 Left). It is fast growing, forming 0.5–1 mm diameter colonies after 24 h when grown on half Lupin Agar [5] and TY [6] at 28 °C. Colonies on ½LA are white-opaque, slightly domed, moderately mucoid with smooth margins (Fig. 1 Right).

Fig. 1
figure 1

Images of Burkholderia sp. strain WSM4176 using scanning (Left) and transmission (Center) electron microscopy and the appearance of colony morphology on solid media (Right)

Figure 2 shows the phylogenetic relationship of Burkholderia sp. strain WSM4176 in a 16S rRNA gene sequence based tree. This strain clusters closest to Burkholderia tuberum STM678T and Burkholderia phenoliruptrix AC1100T with 99.86 and 97.28 % sequence identity, respectively. Minimum Information about the Genome Sequence is provided in Table 1.

Fig. 2
figure 2

Phylogenetic tree highlighting the position of Burkholderia sp. strain WSM4176 (shown in blue print) relative to other type and non-type strains in the Burkholderia genus (1322 bp internal region). Cupriavidus taiwanensis LMG 19424T was used as outgroup. All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5.05 [27]. The tree was build using the maximum likelihood method with the General Time Reversible model. Bootstrap analysis with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Strains with a genome sequencing project registered in GOLD [9] are in bold print and the GOLD ID is mentioned after the NCBI accession number. Published genomes are designated with an asterisk

Table 1 Classification and general features of Burkholderia sp. strain WSM4176 in accordance with the MIGS recommendations [28] published by the Genome Standards Consortium [29]

Symbiotaxonomy

Burkholderia sp. strain WSM4176 belongs to a group of Burkholderia strains that nodulate papilionoid forage legumes rather than the classical Burkholderia hosts Mimosa spp. (Mimosoideae) [7]. Burkholderia sp. strain WSM4176 was assessed for nodulation and nitrogen fixation on three separate L. ambigua genotypes (CRSLAM-37, CRSLAM-39 and CRSLAM-41) [3]. Strain WSM4176 could nodulate and fix effectively on CRSLAM-39 and CRSLAM-41 but was partially effective on CRSLAM-37 [3].

Genome sequencing information

Genome project history

This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Genomic Encyclopedia of Bacteria and Archaea, The Root Nodulating Bacteria chapter project at the U.S. Department of Energy, Joint Genome Institute for projects of relevance to agency missions [8]. The genome project is deposited in the Genomes OnLine Database [9] and the high-quality permanent draft genome sequence in IMG [10]. Sequencing, finishing and annotation were performed by the JGI using state of the art sequencing technology [11]. A summary of the project information is shown in Table 2.

Table 2 Genome sequencing project information for Burkholderia sp. strain WSM4176

Growth conditions and genomic DNA preparation

Burkholderia sp. strain WSM4176 was grown to mid logarithmic phase in TY rich media [6] on a gyratory shaker at 28 °C. DNA was isolated from 60 mL of cells using a CTAB bacterial genomic DNA isolation method [12].

Genome sequencing and assembly

The genome of Burkholderia sp. strain WSM4176 was sequenced at the DOE Joint Genome Institute (JGI) using Illumina data [13]. For this genome, we constructed and sequenced an Illumina short-insert paired-end library with an average insert size of 270 bp which generated 7,496,994 reads and an Illumina long-insert paired-end library with an average insert size of 6899.89 +/− 882.09 bp which generated 11,773,350 reads totaling 2891 Mbp of Illumina data (unpublished, Feng Chen). All general aspects of library construction and sequencing performed at the JGI can be found at the JGI’s web site [11]. The initial draft assembly contained 66 contigs in eight scaffold(s). The initial draft data was assembled with Allpaths, version r41554 [14], and the consensus was computationally shredded into 10 Kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet, version 1.1.05 [15], and the consensus sequences were computationally shredded into 1.5 Kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second Velvet assembly was shredded into 1.5 Kbp overlapping fake reads. The fake reads from the Allpaths assembly and both Velvet assemblies and a subset of the Illumina CLIP paired-end reads were assembled using parallel phrap, version 4.24 (High Performance Software, LLC). Possible mis-assemblies were corrected with manual editing in Consed [1618]. Gap closure was accomplished using repeat resolution software (Wei Gu, unpublished), and sequencing of bridging PCR fragments with Sanger and/or PacBio (unpublished, Cliff Han) technologies. For improved high quality draft and non-contiguous finished projects, one round of manual/wet lab finishing may have been completed. Primer walks, shatter libraries, and/or subsequent PCR reads may also be included for a finished project. A total of 11 PCR PacBio consensus sequences were completed to close gaps and to raise the quality of the final sequence. The total size of the genome is 9.1 Mb and the final assembly is based on 2891 Mbp of Illumina draft data, which provides an average 318× coverage of the genome.

Genome annotation

Genes were identified using Prodigal [19] as part of the DOE-JGI Annotation pipeline [17], followed by a round of manual curation using the JGI GenePRIMP pipeline [20]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [21], RNAMMer [22], Rfam [23], TMHMM [24] and SignalP [23]. Additional gene prediction analyses and functional annotation were performed within the Integrated Microbial Genomes platform [24].

Genome properties

The genome is 9,065,247 nucleotides with 62.89 % GC content (Table 3) and comprised of 13 scaffolds and 65 contigs (Fig. 3). From a total of 8497 genes, 8369 were protein encoding and 128 RNA only encoding genes. The majority of genes (75.46 %) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.

Table 3 Genome statistics for Burkholderia sp. strain WSM4176
Fig. 3
figure 3

Graphical map of the genome of Burkholderia sp. strain WSM4176. First four large scaffolds are shown according to size. From the bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew

Table 4 Number of protein coding genes of Burkholderia sp. strain WSM4176 associated with the general COG functional categories

Conclusion

Burkholderia sp. WSM4176 belongs to a group of Beta-rhizobia isolated from Lebeckia ambigua from the fynbos biome in South Africa [3]. WSM4176 is phylogeneticaly most closely related to Burkholderia tuberum STM678T. Both STM678T and WSM4176 have comparable genome sizes, 8.3–9.1 respectively. Recently, two more genomes from strains originating from Lebeckia ambigua were investigated, Burkholderia dilworthii WSM3556T and Burkholderia sprentiae WSM5005T [25]. Both of these strains have a genome size of 7.7 Mbp, which is considerably smaller than WSM4176. All four strains, STM678T, WSM3556T , WSM4176 and WSM5005T , contain a large number of genes assigned to transport and metabolism of amino acids (9.79–10.94 %) and carbohydrates (7.93–8.38 %), and transcription (9.55–9.94 %). Interestingly, STM678T was initially isolated from Aspalathus species but does not nodulate this host, however it has been shown to nodulate Cyclopia species from the same fynbos biome in South Africa as Lebeckia ambigua [26]. Considering the ability of these strains to nodulate and fix nitrogen effectively with legumes, they share in common many of the genes responsible for the nitrogenase pathway (IMG pathway number 798). The genome sequence of WSM4176 provides thus an unprecedented opportunity to study the host range and nitrogen fixation capacities of these fynbos bacteria.