Introduction

Agrobacterium radiobacter DSM 30147T (= ATCC 19358T) was first isolated from saprobic soil in 1902 as Bacillus radiobacter [1] and obtained its current name until Agrobacterium genus established by Conn in 1942 [2]. Based on phytopathogenic properties, Conn divided Agrobacterium into 3 species, A. radiobacter, A. tumefaciens and A. rhizogenes [2]. Subsequently, A. rubi, A. vitis and A. larrymoorei were also identified within the Agrobacterium genus [36]. Recently, A. rhizogenes was transferred to Rhizobium genus, as Rhizobium rhizogenes, based on multilocus sequence analysis (MLSA) using several housekeeping genes (rrs, atpD and recA) [7,8]. In addition, Young et al. proposed that A. radiobacter should have priority over A. tumefaciens, and A. tumefaciens may not officially represent a species [8,9]. Thus, currently, the genus Agrobacterium contains four validly named species, A. radiobacter, A. vitis, A. rubi and A. larrymoorei [79].

A taxonomic classification that relies on the phytopathogenic phenotypes may not accurately reflect the actual phylogenetic relationships of strains within Agrobacterium [10]. Accordingly, an alternative classification method was applied which divided most Agrobacterium strains into 3 biovariants (Biovars I, II and III) [10]. Among the 3 biovariants, Biovar I is the most complex group and includes several members (genomovars), designated as genomovar G1 through G9 and G13 [8,11]. At present, two strains in Biovar I have been completely sequenced: Agrobacterium sp. H13-3 (G1) and A. tumefaciens C58 (G8). The genome sequencing revealed that these strains contained two chromosomes and different numbers of plasmids. A. radiobacter DSM 30147T also belongs to Biovar I (it is classified as a member of genomovar G4), which indicates its close relationship to A. tumefaciens C58 and Agrobacterium sp. H13-3 [12].

Most strains in the genus Agrobacterium are phytopathogens and induce crown gall tumors or hairy root diseases in their host plants [2]. However, A. radiobacter is an exception because it does not have the tumor-inducing (Ti) plasmid that contributes to the pathogenicity [1316]. A. radiobacter members have been widely found in soil, in the rhizosphere of plants and in clinical specimens [17]. A strain of A. radiobacter was reported to enhance soil arsenic phytoremediation, indicating a potential application in bioremediation [18]. However, some members have been identified as opportunistic human pathogens [19]. So far, a total of 11 Agrobacterium genomes (3 finished and 8 draft genomes, listed in Table 1) have been sequenced but no genome of A. radiobacter has been reported. Considering its essential biological feature and important phylogenetic position in the genus Agrobacterium, we present the genome sequence of A. radiobacter DSM 30147T, the first sequenced strain in this species.

Table 1. General information and comparison of the 14 Agrobacterium-related genomes (12 Agrobacterium strains and 2 Rhizobium strains)

The descriptions of A. radiobacter have been reported in 1902 [1], 1942 [2], 1980 [21] and 1993 [22]. After that, fatty acids and utilization of more carbon and nitrogen sources have been tested and showed that the major fatty acids (> 5%) are 16:0, 19:0 cyclo ω8c, summed feature 2 (one or more of 12:0 aldehyde, iso-16:1 I and 14:0 3-OH) and summed feature 8 (18:1ω7c and/or 18:1ω6c) [23]. The strain can utilize adonitol, D-fructose, D-galactose, D-mannitol, lactose and raffinose as sole carbon sources and L-ornithine, L-proline and L-serine as sole nitrogen sources [23]. Citrate utilization, nitrate reduction and urease are all positive [23]. In this study, we performed more physiological/biochemical analysis and present the emended description of A. radiobacter.

Classification and features

Genome sequences and 16S rRNA genes were used for phylogenetic analysis. In view of the close evolutionary relationship and the inconsistent phylogeny between Agrobacterium and Rhizobium [12], we pre-analyzed all sequenced strains in these two genera and found that two “Rhizobium” members were very closely related to the 12 Agrobacterium members (including strain DSM 30147T). Thus, all of the 12 Agrobacterium members with sequenced genomes, two Rhizobium strains [R. lupini HPC(L) and Rhizobium sp. PDO1-076] (Table 1) and an out-group strain R. rhizogenes K84 [7,8], were included in the phylogenetic analysis. A comparison of the 15 genomes revealed a total of 370 proteins that were shared across these genomes. A rooted neighbor-jointing (NJ) phylogenetic tree was constructed based on the shared amino acid sequences. As shown in Figure 1a, A. radiobacter DSM 30147T was in the same cluster as the Biovar I members Agrobacterium sp. H13-3 (G1) and A. tumefaciens C58 (G8), and showed the closest relationship with A. tumefaciens str. Cherry 2E-2-2. A NJ phylogenetic tree was also constructed based on the 16S rRNA genes (Figure 1b). When comparing the trees generated by the core protein sequences with those generated by 16S rRNA gene sequences, small topological differences in topology were found between them. In comparison to the tree generated using the 370 conserved proteins, some strains could not be distinguished with a high degree of clarity using the 16S rRNA genes. Therefore, phylogenomic analysis was considered a more robust approach than that using the 16S rRNA genes to infer the phylogeny, especially for closely related strains [21,25,26].

Figure 1.
figure 1

Phylogenetic trees highlighting the relationships among A. radiobacter DSM 30147T and other closely related sequenced strains. (a) A tree was built based on 370 conserved proteins shared among the 15 genomes (12 Agrobacterium strains, 2 Rhizobium strains very closely related to Agrobacterium and one out-group strain, R. rhizogenes K84); (b) A tree inferred from the 16S rRNA genes of the same strains. The phylogenies were inferred by MEGA 5.05 using the neighbor-joining algorithm [20,24], and 1,000 bootstrap repetitions were computed to estimate the reliability of the branching order. The genome accession numbers of the strains used in the phylogenetic reconstructions: A. albertimagni AOL15, ALJF00000000; Rhizobium sp. PDO1-076, AHZC00000000; A. vitis S4, A. radiobacter, ASXY01000000; GCA_000016285; Agrobacterium sp. H13-3, GCA_000192635; Agrobacterium sp. 10MFCol1.1, ARLJ00000000; A. tumefaciens 5A, AGVZ00000000; A. tumefaciens F2, AFSD00000000; A. tumefaciens C58, GCA_000092025; Agrobacterium sp. ATCC 31749, AECL00000000; R. lupini HPC(L), AMQQ00000000; A. tumefaciens str. Cherry 2E-2-2, APCC00000000; Agrobacterium sp. 224MFTsu3.1, ARQL00000000; A. tumefaciens CCNWGS0286, AGSM00000000 and R. rhizogenes K84 GCA_000016265.

Strain DSM 30147T is rod-shaped (0.6–0.8 × 1.5–1.8 µm) (Figure 2). The enzyme activities and carbon sources utilization of strain DSM 30147T were tested using API ZYM, API 20 NE and API ID 32 GN systems and the results are shown in Table 2 and in the emended description of A. radiobacter.

Figure 2.
figure 2

A transmission micrograph of A. radiobacter DSM 30147T, using 200 kV transmission electron microscopy FEI Tecnai G2 20 TWIN (USA). The scale bar represents 1 µm.

Table 2. Classification and general features of Agrobacterium radiobacter DSM 30147T according to the MIGS recommendations [27,28]

Genome sequencing and annotation

Genome project history

To make a comprehensive genomic comparison for the Agrobacterium genomes, the whole genome sequence of A. radiobacter DSM 30147T was determined. This draft genome sequence has been deposited at DDBJ/EMBL/GenBank under accession number ASXY00000000. The version described in this study is the first version, ASXY01000000. The project information is summarized in Table 3.

Table 3. Project information

Growth condition and DNA isolation

A. radiobacter DSM 30147T was grown aerobically in LB medium [38] at 28 °C for 24 h. The DNA was extracted, concentrated and purified using the QiAamp kit according to the manufacturer’s instruction (Qiagen, Germany).

Genome sequencing and assembly

Illumina Hiseq2000 with the Paired-End library strategy (300 bp insert size) was used to determine the whole-genome sequence of A. radiobacter DSM 30147T and obtained a total of 15,140,909 reads (1.41 Gb data). The detailed methods of library construction and sequencing can be found at Illumina’s official website [39]. Using SOAPdenovo v1.05 [40], these reads were assembled into 612 contigs (> 200 bp) with a genome size of 7,122,065 bp and an average coverage of 196.3 ×.

Genome annotation

The draft genome of A. radiobacter DSM 30147T was annotated using the National Center for Biotechnology Information (NCBI) Prokaryotic Genome Annotation Pipeline (PGAP) [41], which combines the gene caller GeneMarkS+ [42] with the similarity-based gene detection approach. Protein function classification was performed by searching all the predicted coding sequences of strain DSM 30147T against the Clusters of Orthologous Groups (COGs) protein database [43] using Blastp algorithm with E-value cutoff 1-e10.

Genome properties

The whole genome of A. radiobacter DSM 30147T is 7,122,065 bp in length, with an average GC content of 59.9%, and distributed in 612 contigs. Compared to the complete reference genome A. tumefaciens C58 [44] (also belonging to Biovar I, Figure 1), the whole genome of strain DSM 30147T could clearly be divided into 2 replicons, a circular chromosome and a linear chromosome (Figure 3). In accordance with its non-phytopathogenicity phenotype, strain DSM 30147T did not contain a Ti plasmid. Of the 6,894 genes predicted, 6,853 were protein-coding genes (CDSs), and 41 RNA genes. A total of 5,320 CDSs (77.85%) were assigned with putative functions, and the remaining proteins were annotated as the hypothetical proteins. The genome properties and statistics are summarized in Table 4 and Figure 3. The distribution of the genes into COG functional categories is shown in Table 5.

Figure 3.
figure 3

The circular representation of the A. radiobacter DSM 30147T circular chromosome (left) and linear chromosome (right). From outside to center, ring 1, 4 show protein-coding genes colored by COG categories on forward/reverse strand; ring 2, 3 denote genes on forward/reverse strand; ring 5 shows G+C% content plot, and the innermost ring shows GC skew.

Table 4. Genome statistics
Table 5. Number of protein-coding genes associated with the general COG functional categories in A. radiobacter DSM 30147T genome

Comparative genome analysis of A. radiobacter DSM 30147T with the other related genomes

Strain DSM 30147T has the largest genome size of the 12 Agrobacterium strains sequenced to date and is larger than the 2 very closely related Rhizobium strain genomes as well (Table 1). OrthoMCL [45] was used to perform orthologs clustering analysis for the 14 genomes (Table 1). The results indicate that A. radiobacter DSM 30147T shares 1,636 genes with the other 13 strains and contains 548 strain-specific genes (Table 1), which potentially encode products that contribute to species-specific features differentiating A. radiobacter from other Agrobacterium species [46]. In addition, on average, only 31% core genes were shared among the 14 genomes, which reveals a high-degree of diversity within Agrobacterium genus.

Emended description of Agrobacterium radiobacter (Beijerinck and van Delden 1902) Conn 1942 (Approved Lists 1980) emend. Sawada et al. 1993

This emended description is based on that given by Beijerinck and van Delden 1902, Conn 1942 (Approved Lists 1980) and Sawada et al. 1993 with the following changes. Positive results are observed for acid phosphatase, α-glucosidase, alkaline phosphatase, arginine dihydrolase, β-glucosidase, citrate utilization, esterase (C4), leucine arylamidase, N-acetyl-β-glucosaminidase, naphthol-AS-BI-phosphohydrolase, nitrate reduction, urease and valine arylamidase, but negative results for α-galactosidase, α-mannosidase, β-fucosidase, β-galactosidase, β-glucuronidase, chymotrypsin, cystine arylamidase, esterase lipase (C8), lipase (C14) and trypsin. Arabinose, D-glucose, D-melibiose, D-ribose, D-sorbitol, gluconates, histidine, 4-hydroxybenzoate, 3-hydroxybutyrate, inositol, 2-ketogluconate, L-alanine, L-fucose, L-lactate, L-rhamnose, malate, maltose, mannose, N-acetyl glucosamine, propionate, salicin, sodium acetate and sucrose source while cannot assimilate adipate, caprate, 3-hydroxy-benzoate, itaconic acid, glycogen, 5-ketogluconate, phenylacetate, potassium, sodium malonate, suberate and valerate are utilized as the sole carbon sources. L-ornithine, L-proline and L-serine are utilized as nitrogen sources. The major fatty acids (> 5%) are 16:0, 19:0 cyclo ω8c, summed feature 2 (one or more of 12:0 aldehyde, iso-16:1 I and 14:0 3-OH) and summed feature 8 (18:1ω7c and/or 18:1ω6c). The members of this species are nonphytopathogenic, but in individual cases, some members of this species are detected as possible human pathogens.