Introduction

Strain IMRU 509T (= DSM 43111 = ATCC 23218 = JCM 7437) is the type strain of Nocardiopsis dassonvillei, which in turn is the type species of the genus Nocardiopsis. Currently, N. dassonvillei is one of 40 validly published species belonging to the genus. The genus name derives from the Greek name opsis, appearance, and from Edmond Nocard, who first described in 1888 the type species of the genus Nocardia, N. farcinica [1,2]. Nocardiopsis means “that which has the appearance of Nocardia”. The species epithet is chosen in honor of Charles Dassonville, a contemporary French veterinarian [3]. The genus Nocardiopsis was first described by Meyer in 1976 [4] for bacteria that were previously classified as either Streptothrix dassonvillei (Brocq-Rousseau 1904) [3], Nocardia dassonvillei [5], or Actinomadura dassonvillei [6] on the basis of their morphological characteristics and cell wall type [4]. The strain IMRU 509T is the neotype of the species N. dassonvillei (Brocq-Rousseau 1904). Databases provide contradictory speculations on the ecological and geographical origin of strain IMRU 509T (e.g., soil from Paris, France; mildewed grain of unspecified geographical origin), however, solid information could not be extracted from the original literature [4,5,79]. Members of this species can be isolated from a variety of different habitats, including mildewed grain and fodder [3], different soils [1013], antartic glacier [14], marine sediments [10,15], actinoryzal plant rhizosphere [16], gut tract of animals [17], active stalactites [18], cotton waste and occasionally in hay [19], air of a cattle barn [20], atmosphere of a composting facility [21], salterns [22] and from patients suffering from conjunctivitis [23] or cholangitis [8]. N. dassonvillei strains were also isolated from nodules and draining sinuses associate with an actinomycetoma of the anterior aspect of the right leg below the knee of a 39-year-old man [24]. A microorganism identical to Streptothrix dassonvillei was isolated two years later, but was placed in the genus Nocardia and designated N. dassonvillei [23]. Subsequently, the genus Actinomadura was described to harbor, among other species, also N. dassonvillei (Brocq-Rousseau) Liegard and Landrieu [4,8]. Further analysis supplied evidence that A. dassonvillei is not related to nocardiae [7]. Therefore, a new genus was created for A. dassonvillei on the basis of the characteristic development of spores, including the specific zig-zag formation of aerial hyphae before spore dispersal and the lack of madurose [4]. In 1976, A. dassonvillei was transferred to this new genus and was designated Nocardiopsis dassonvillei [4]. Also, N. dassonvillei is an earlier heterotypic synonym of N. alborubida [25]. The species epithet alborubida was considered as orthographically incorrect and corrected by Evtushenko to albirubida [10]. Subsequently, the species N. dassonvillei has been divided into three subspecies, namely subsp. prasina [26], subsp. albirubida (Grund and Kroppenstedt 1990) [10] and subsp. dassonvillei (Brocq-Rousseau 1904) [4,27], which is an earlier heterotypic synonym of Streptomyces flavidofuscus Preobrazhenskaya 1986 [28]. DNA-DNA hybridization data, as well as the results of biochemical tests, indicated that N. alborubida DSM 40465, N. antarctica DSM 43884, and N. dassonvillei DSM 43111 represent a single species designated N. dassonvillei [25]. Here we present a summary classification and a set of features for N. dassonvillei strain IMRU 509T, together with the description of the complete genomic sequencing and annotation.

Classification and features

The 16S rRNA gene sequences of the strain IMRU 509T share 95.9 to 99.5% sequence similarity with the 16S rRNA gene sequences of the type strains from the other members of the genus Nocardiopsis [29] The 16S rRNA gene of the strain IMRU 509T also shares 99% similarity with an uncultured 16S rRNA gene sequence of the clone AKIW919 from urban aerosol in USA [30], but none of the sequences in metagenomic libraries (env_nt) shares more than 89% sequence identity, indicating that members of the species, genus and even family are poorly represented in the habitats screened thus far (as of November 2010). A representative genomic 16S rRNA sequence of N. dassonvillei was compared with the most recent release of the Greengenes database [31] using NCBI BLAST under default values and the relative frequencies of taxa and keywords, weighted by BLAST scores, were determined. The three most frequent genera were Nocardiopsis (91.1%), Streptomyces (7.1%) and Prauseria (1.8%). The species yielding the highest score was N. dassonvillei (including hits to N. dassonvillei subsp. dassonvillei, formerly also known as Streptomyces flavidofuscus [9,28]). The five most frequent keywords within the labels of environmental samples which yielded hits were ‘soil(s)’ (15.4%), ‘algeria, nocardiopsis, saccharothrix, saharan’ (5.7%), ‘source’ (2.0%) and ‘alkaline’ (2.0%). These keywords fit to the morphology of the type strain as well as to the ecology of habitats from which the type strain and also other members of the species were isolated. The single most frequent keyword within the labels of environmental samples which yielded hits of a higher score than the highest scoring species was ‘desert/soil’ (50.0%).

Figure 1 shows the phylogenetic neighborhood of N. dassonvillei strain IMRU 509T in a 16S rRNA based tree. The sequences of the five 16S rRNA gene copies in the genome differ from each other by up to ten nucleotides, and differ by up to eight nucleotides from the previously published 16S rRNA sequence (X97886).

Figure 1.
figure 1

Phylogenetic tree highlighting the position of N. dassonvillei strain IMRU 509T relative to the type strains of the other species within the genus and to the type strains of the other genera within the family Nocardiopsaceae. The trees were inferred from 1,442 aligned characters [32,33] of the 16S rRNA gene sequence under the maximum likelihood criterion [34] and rooted in accordance with the current taxonomy [35]. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 750 bootstrap replicates [36] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [37] are shown in blue, published genomes in bold [38]. Note that the tree is more in accordance with the view of Grund and Kroppenstedt (1990) [39] to treat N. alborubida as a species of its own, rather than with the view of Yassin et al. (1997) [25] and Evtushenko et al. 2000 [10] to regard it as a subspecies of N. dassonvillei based on a 71% DDH value [10].

The cells of strain IMRU 509T are aerobic and Gram-positive [4]. (Table 1). Aerial mycelia are long, moderately branched, and, at the beginning of sporulation, more or less zig-zag-shaped (Figure 2). Later, the hyphae are straight or somewhat coiled [4]. They then divide into long segments which subsequently subdivide into smaller spores of irregular size [4]. Spores are elongated and smooth. Depending on the medium used, the color of the substrate mycelium is either yellowish-brown or olive to dark brown [4]. The aerial mycelium varies from a sparse coating to a thick, farinaceous to woolly cover of the colonies on oatmeal agar, oatmeal-nitrate agar, Bennett agar, Czapek-sucrose agar, inorganic salt-starch agar, yeast extract-malt extract agar, and complex organic medium 79 [4] of Prauser and Falta [51]. The color of the aerial mycelium is white or yellowish to grayish [4]. Colonies of substrate mycelia have dense filamentous margins [4]. Hyphae of the substrate mycelium fragment into coccoid elements after 3 to 4 weeks, depending on the medium used [4]. Soluble pigment is not produced [4]. Melanoid pigments are not produced on ISP 6 or tyrosine agar [4]. Growth of strain IMRU 509T was tested on basal medium with and without carbohydrates. No growth was detected in the absence of carbohydrates. Strain IMRU 509T was able to use N-acetyl-D-glucosamine, p-arbutin, D-galactose, gluconate, D-maltose, D-ribose, salicin, D-threalose, maltitol, putrescine, 4-aminobutyrate, azelate, citrate, fumarate, DL-lactate, L-alanine, β-alanine, L-aspartate, L-leucine and phenylacetate as sole carbon sources, but not α-D-melibiose, acetate, propionate, glutarate, L-malate, mesaconate, oxoglutarate, pyruvate, suberate, L-histidine, L-phenylalanine, L-proline, L-serine, L-tryptophan and 4-hydroxybenzoate [52]. However, L-arabinose, D-xylose, D-mannose, D-glucose, L-rhamnose, maltose, D-mannitol, D-fructose, sucrose and glycerol are the main carbohydrates used [4]. Acid is produced from L-arabinose, galactose, mannitol, sucrose and D-xylose [8]. Moreover, adonitol, dulcitol, i-inositol are not utilized [4]. L-alanine, proline and serine are also used as sole carbon as well as nitrogen sources, although proline and serine are weakly utilized [25]. Strain IMRU 509T was found to hydrolyze p-nitrophenyl α-D-glucopyranoside, p-nitrophenyl β-D-glucopyranoside, p-nitrophenyl phenylphosphonate and L-alanine p-nitroanilide, but not aesculin, bis-p-nitrophenyl phosphate, p-nitrophenyl phosphorylcholine, L-glutamate-γ-3-carboxy-p-nitroanilide and L-proline p-nitroanilide [52]. Meyer (1976) reported that IMRU 509T was not able to liquefy gelatin [4], while Yassin et al. reported in 1997 the opposite [25]. Strain IMRU 509T is able to hydrolyze starch, to peptonize milk, to decompose esculin and to reduce nitrate to nitrite [4]. Strains of N. dassonvillei show positive tests of the decarboxylation of lactate, oxalate and propionate [8]. They also decompose casein, tyrosine and Tween 85. They show optimal growth at mildly alkaline conditions of pH 8, and at a salinity of 0% NaCl [8]. No growth is observed at 20% NaCl or at 45°C [8]. The catalase test is positive [4]. Strain IMRU 509T hydrolyses adenine, xanthine and hypoxanthine [25].

Figure 2.
figure 2

Scanning electron micrograph of N. dassonvillei strain IMRU 509T

Table 1. Classification and general features of N. dassonvillei strain IMRU 509T according to the MIGS recommendations [40]

Chemotaxonomy

The cell wall of the strain IMRU 509T belongs to the chemotype III, which corresponds to the peptidoglycan type A1γ [53], i.e., N-acetyl-muramic acid, N-acetyl-glucosamine, alanine, glutamic acid, and meso-2, 6-diaminopimelic acid [4,8]. The products of the degradation of the cell wall are glycerol and glucose [8]. Strain IMRU 509T is susceptible to lysozyme [4]. The polar lipids found in strain IMRU 509T are phosphatidylinositol mannosides (PIM), phosphatidylinositol (PI), phosphatidylcholine (PC), monomannosyl diglyceride (MDG), phosphatidylglycerol (PG), phosphatidylmethylethanolamine (PME), monoacetylated glucose (AG), diphosphatidyl-glycerol (DPG), unknown phospholipids specific for Nocardiopsis, β-lipids of unknown structure (PL) [8]. The menaquinone type 4C2 was detected [8]. The menaquinone patterns of the strain IMRU 509T contain menaquinones from MK-10 to MK-10(H8) and sugar type C [8]. Small amounts of the MK-9 and/or MK-12 series are also found [8]. The main fatty acids detected in the strain IMRU509T were, iso-C16:0 (26.7%), anteiso-C17:0 (19.8%) and C18:1 (18.3%). Minor fatty acids detected included C18:0 (5.8%), C17:1 (5.2%), anteiso-C15:0 (3.2%), C16:0 (2.2%), iso-C17:0 (2.1%), C16:1 (1.2%) and iso-C15:0 (0.8%) [10].

Genome sequencing and annotation

Genome project history

This organism was selected for sequencing on the basis of its phylogenetic position [54], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [55]. The genome project is deposited in the Genome OnLine Database [37] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Table 2. Genome sequencing project information

Growth conditions and DNA isolation

N. dassonvillei strain IMRU 509T, DSM 43111, was grown in DSMZ medium 65 (GYM Streptomyces medium) [56] at 28°C. DNA was isolated from 0.5–1 g of cell paste using Qiagen Genomic 500 DNA Kit (Qiagen, Hilden, Germany) following the standard protocol as recommended by the manufacturer, with modification st/DALM for cell lysis as described in Wu et al. [55].

Genome sequencing and assembly

The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website [57]. Pyrosequencing reads were assembled using the Newbler assembler version 2.1-PreRelease (Roche). Large Newbler contigs were broken into 6,356 overlap ping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the PGA assembler. Possible mis-assemblies were corrected and gaps between contigs were closed by by editing in Consed, by custom primer walks from sub-clones or PCR products. A total of 462 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. Illumina reads were used to improve the final consensus quality using an in-house developed tool (the Polisher) [58]. The error rate of the completed genome sequence is less than 1 in 100,000. Together, the combination of the Sanger and 454 sequencing platforms provided 28.77 × coverage of the genome. The final assembly contains 68,385 Sanger reads and 1,376,163 pyrosequencing reads.

Genome annotation

Genes were identified using Prodigal [59] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [60]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes - Expert Review (IMG-ER) platform [61].

Genome properties

The genome consists of a 5,767,958 bp long chromosome with a 73% GC content, and a 775,354 bp long plasmid a 72% GC content (Table 3 and Figure 3a and Figure 3b). Of the 5,647 genes predicted, 5,570 were protein-coding genes, and 77 RNAs; 73 pseudogenes were also identified. The majority of the protein-coding genes (69.6%) were assigned with a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Figure 3a.
figure 3a

Graphical circular map of the chromosome (not drawn to scale with plasmid). From outside to the center: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew.

Figure 3b.
figure 3b

Graphical circular map of the plasmid (not drawn to scale with chromosome). From outside to the center: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew.

Table 3. Genome Statistics
Table 4. Number of genes associated with the general COG functional categories