Isolating new microbial strains and submitting the corresponding manuscript to a scientific journal for peer-review is an exciting step for any microbiologists. After a series of dedicated experiments to characterize these new isolates, writing sessions and feedback iterations with your co-authors, you are ready to enter the editorial system of the Journal of choice. Scientific journals provide their authors with clear advice/directives on aims and scope, style and format requirements, permissions, etc. as ‘Submission Guidelines’. These Instructions may differ among journals due to different focus on content and the history of the journal. Regular updates are necessary to follow new developments in methodologies, concepts, analytic software programs, and publication formats. Despite this assistance to match a journal’s style and to facilitate the work of editors and reviewers these guidelines are not always followed consistently by authors. As a consequence too many manuscripts are returned for revision by handling editors before they can be sent to reviewers to evaluate their scientific merit, thus resulting in delays in the reviewing process. In order to avoid this time-consuming ‘unnecessary’ iteration steps, editors of a few Springer Nature microbiology journals summarize some of the main omissions and recommend advice which should be read as an explanatory note to the ‘Guidelines’.

The following short text is meant for authors attempting to affiliate environmental isolates or clone sequences to species with validly published names. In contrast to the procedure of formal naming of novel strains at the rank of species [International Code of Nomenclature of Prokaryotes (Parker et al. 2019)] and guidance through minimal standards for the description of new taxa (https://lpsn.dsmz.de/text/minimal-standards) the aspect of assigning a name to other microbial resources is not regulated. This aspect, however, deserves as much attention in order not to create a database in which a defined name is pinned to a genetic entity that may differ significantly from the one whose name it bears.

By end of August 2021, the List of Prokaryotic names with Standing in Nomenclature (LPSN, Parte et al. 2020) contains 21.261 species with validly published names. The recent release of the All-species Living Tree Project (LTP) (www.imedea.uib-csic.es/mmg/ltp; Ludwig et.al 2021) contains more than 17.500 curated complete or almost complete 16S rRNA gene sequences of archaeal and bacterial type strains (indicating that such sequences are available for comparison of > 80% of the type strains). In contrast, the number of all 16S rRNA sequences in GenBank-NCIB (the majority being less than 300 nucleotides of short variable regions) deposited are as high as 40 million (Marcela Borba, pers. communication).

  1. 1.

    In manuscripts giving names to bacterial and archaeal isolates a major issue is the unfamiliarity of the meaning of a ‘type strain’ or the nescience to use this category for comparative purposes. A type strain is the strain on which the description is based, the name bearing strain, hence the point of reference (Lapage et al. 1990; Parker et al. 2019). As the intra-species variability in phenotypic and genomic differences among strains is often wide, the type strain—and not the species/strain name with the highest similarity, e.g. BLAST score—is the most important one to be used. Hence, new isolates must be compared to the phenetic and genetic properties of the type strain when attempts are made to affiliate strains to species, i.e. giving a name to a hitherto unassigned strain. While this is a mandatory aspect of any new prokaryotic species description, the affiliation of environmental isolates is in most cases not done against a type strain but to the one that shows the highest similarity 16S rRNA gene similarity value. However, as the vast majority of sequences are named without reference to type strains the credibility of such species affiliation is not granted. Comparison to type strains is mandatory for any methodological approach as it is the only strain which is available from public resource centers for comparative purposes while for the vast majority of strains the gene sequence accession number but not the strain itself is available.

Advice to authors: when using BLAST-NCBI tick the option ‘Limit to sequences from type material’ or, in the case of EzBioCloud (Yoon et al. 2017a), restrict the search only to type strains. Also, the type strain should be denoted with a with a superscript T after the strain number.

  1. 2.

    Wet-lab DNA-DNA hybridization (DDH) has been used for several decades as a basis for the recognition of a genomospecies (above a 70% threshold value), originally defined by Wayne et al (1987).

    With the introduction of ribosomal RNA gene sequencing the measure of relatedness was footed at a more objective basis. For complete or almost complete 16S rRNA sequences (> 1.300 nt) a 97% similarity threshold value between an isolate and type strains has been originally recommended for recognizing a novel genomospecies (Stackebrandt and Goebel 1994). Above this value, due to the evolutionary constraint nature of the 16S rRNA gene sequence (Woese et al 1990), DNA hybridization still needed to be performed to determine whether the isolate can be affiliated to a species with validly published names or whether it represents a novel species. This value was later increased to species cut-off values around 98.7% 16S rRNA gene similarity (Stackebrandt and Ebers 2006).

    An even more advanced step in the definition of a genomospecies was reached some years ago by in-silico methods based on the comparison of genome sequences that made laborious wet-lab DNA-DNA hybridization studies between an isolate and most closely related type strains unnecessary. A range of 98.2 and 99.0 % cut-off values of digital (d)DDH values were recommended by Auch et al. (2010) and Meier-Kolthoff et al. (2013) for in-silico genome-to-genome comparison which showed a better correlation with 16S rRNA gene sequence distances than the wet-lab DDH values. For an alternative measure of relatedness the Average Nucleotide Identity (ANI/OrthoANI) values of genome sequences around 95% were defined (Konstantinidis and Tiedje 2005; Rodriguez-R and Konstantinidis, 2014; Yoon et al. 2017b; Lee et al. 2016; Jain et al 2018). Thus, in groups of species, once described on the basis of lower than 70% DNA hybridization values but sharing higher than about 98.7% 16S rRNA gene sequence similarity among each other (most prominent examples [but not exclusively] are members of the Bacillus cereus or B. subtilis group, many species of Streptomyces, Vibrio, Rhizobium, or Pseudomonas) an isolate cannot be assigned a species name without additional genomic evidence.

Advice to authors: Additional evidence must be provided before a name is given to an isolate, such as generated by DDH, dDDH, or ANI analysis between an isolate and type strain of the most highly related species. Also, MALDI-TOF (Shah et al. 2020) or multilocus sequence analysis -MLSA- (Glaeser and Kämpfer 2015) are helpful to assign an isolate to a validly named species. In the absence of any of these data authors should be prudent to actually name an isolate but either use the name of a species group or the Genus spp category.

  1. 3.

    The situation is more complex in mycology as the 18S rRNA gene does not discriminate between closely related species and the size of the genome sequencing makes its routine analysis hardly feasible. DNA barcode sequences of ITS regions (ITS1, ITS2), 5.8S rRNA, internal described spacer and flanking regions of the 18S (SSU)—and 28S rRNA (LSU) regions are the preferred choice. While for yeast the combination of LSU and ITS discriminates well among species (with a threshold value of 98.4–99.5), this approach cannot be applied to the entire fungal kingdom as the intraspecific variability of ITS regions varies considerably and a similar range of threshold value cannot be applied for the species level (Raja et al. 2017). The underlying results and problems associated with sequence-based methods in mycology are well outlined in the article by Borman and Johnson (2020). As the species-resolution is often poor in many fungal taxa the discrimination power of regions of protein-coding genes were compared (Stielow et a. 2015) to select specific barcodes for species identification, among which TEF 1α, but also COX1, PBR2, ACT, TUB2, or RPL 10 ranked high in certain fungal lineages. Genes and primer sequences for DNA-based identification are summarized, among others, by Borman and Johnson (2020) and Tekpinar and Kalmer (2019).

Advice to authors: As ITS regions alone are insufficient to unambiguously name a new fungal isolate in a given genus or genus complex the literature must be searched for more discrimination barcoding regions to support the naming. In the absence of such data the authors must be refrain of applying an explicit name to an isolate.

  1. 4.

    Yet another source of premature assignment of species names to 16S rRNA gene sequences derives from the attempts to categorize Operational Taxonomic Units (OTUs) generated by next-generation sequencing of variable regions of this gene (usually V3 and V4). Due to the massive generation of such short sequences, different protocols were published (see e.g. Caporasco et al. 2010) allowing to subsampled open-reference OTUs. Going back to an early paper by Stackebrandt and Göbel (1994) who proposed a 97% threshold value of almost complete (> 1300 nt) 16S rRNA gene sequence similarities to separate different species (see paragraph 2), this value has been applied to define species and genera on the basis of short-sequence OTUs (250–500 nt) of metagenomic studies (e.g. Schloss and Handelsmann 2005; Rideout et al., 2014). This value has been re-assessed by analysis of high quality 16S rRNA gene sequences to species delineation threshold optimization at about 99% for full length sequences (in accord with the revised data; see paragraph (2) and 100% for the V4 region (Edgar, 2018). Regrettably, these more recent values are ignored in the majority of submissions and species names are still affiliated to short stretch-OTUs sharing higher than 97% 16S rRNA gene similarity–in most studies without providing intra-OUT (‘species’) similarity values. Along the same line is the application of linking functions to previously sequenced genomes of 97%-defined OTUs (e.g. Langille et al. 2013; PICRUSt), generating a mélange of genomic properties from a group of species with often significantly different genomic makeup. In the absence of cultured organisms the evaluation of such genomic properties of environmental microorganisms is indeed helpful to understand the function of an ecosystem–though the threshold values for OTUs should be narrowed down to 99% to make sure that only the genomes of the closest relatives at the intra- and infraspecific levels are included.

Advice to authors: In combination with arguments listed under 1. and 2. the names given today for OTU- defined ‘species’ are highly speculative, not re-assessed by different approaches but once published, taken for granted. The threshold value of short-stretch environmental OTUs should be increased from 97% to 99% to narrow the species diversity. As now species name-attached sequences enter databases the likelihood for a future even higher erroneous identification is predictable, hence authors should refrain from naming species on the basis of such shaky scientific grounds.

We hope these advice help to design your studies, ease the preparation of manuscripts and make the submission process more agreeable. Fingers crossed for a constructive peer-review process and good luck with your next isolation attempt.