A proposal for a portal to make earth’s microbial diversity easily accessible and searchable
- 431 Downloads
Estimates of the number of bacterial species range from 107 to 1012. At the pace at which descriptions of new species are currently being published, the description of all bacterial species on earth will only be completed in thousands of years. However, even if one day all species were named and described, these names and descriptions would still be of little practical value unless they could be easily searched and accessed, so that novel strains could be easily identified as members of any of these species. To complicate the situation further, many of the currently known species contain significant genotypic and phenotypic diversity that would still be missed if description of microbial diversity were limited to species. The solution to this problem could be a database in which every bacterial species and every intra-specific group is anchored to a genome-similarity framework. This ideal database should be searchable using complete or partial genome sequences as well as phenotypes. Moreover, the database should include functions to easily add newly sequenced novel strains, automatically place them into the genome-similarity framework, identify them as members of an already named species, or tag them as members of yet to be described species or new intra-specific groups. Here, we propose the means to develop such a database by taking advantage of the concept of genome sequence similarity-based codes, called Life Identification Numbers or LINs.
KeywordsGenome sequences Average nucleotide identity Database Bacterial species
LSH was supported by National Science Foundation Grant DBI-1062472. BAV and LT were supported by National Science Foundation grant IOS-1354215. Funding for work in the Vinatzer laboratory was also provided in part by the Virginia Agricultural Experiment Station and the Hatch Program of the National Institute of Food and Agriculture, US Department of Agriculture.
- Berge O, Monteil CL, Bartoli C, Chandeysson C, Guilbaud C, Sands DC, Morris CE (2014) A user’s guide to a data base of the diversity of Pseudomonas syringae and its application to classifying strains in this phylogenetic complex. PLoS ONE 9:e105547. doi: 10.1371/journal.pone.0105547 CrossRefPubMedPubMedCentralGoogle Scholar
- Bionomenclature ICo (2011) Bionomenclature Across All Groups of Organisms. http://www.bgbm.org/biodivinf/docs/biocode2011/biocode2.html—Introduction. Accessed Nov 23 2016
- Cantino PD, de Queiroz K (2004) The Phylocode. http://www.ohio.edu/phylocode/PhyloCode4c.pdf. 2013
- ICSP ICoSoP (2008) International code of nomenclature of prokaryotes (2008 Revision) [DRAFT]. http://code.icsp.org/. Accessed 22 Sept 2014
- ICTV ICoToV (2016) The international code of virus classification and nomenclature. http://www.ictvonline.org/codeOfVirusClassification.asp. Accessed 23 Nov 2016
- ICZN ICoZN (2012) International Code of Zoological Nomenclature. http://www.nhm.ac.uk/hosted-sites/iczn/code/. Accessed 23 Nov 2016
- Kamau EC, Winter G, Stoll P-T (2015) Research and development on genetic resources: public domain approaches in implementing the nagoya protocol. Routledge, AbingdonGoogle Scholar
- McNeill J et al. (2012) International code of nomenclature for algae, fungi, and plants (Melbourne Code) http://www.iapt-taxon.org/nomen/main.php. Accessed 23 Nov 2016
- Parte AC (2013) List of prokaryotic names with standing in nomenclature. http://www.bacterio.net/-number.html—total. Accessed 31 Jan 2017
- Vinatzer BA, Weisberg AJ, Monteil CL, Elmarakeby HA, Sheppard SK, Heath LS (2016) A proposal for a genome similarity-based taxonomy for plant-pathogenic bacteria that is sufficiently precise to reflect phylogeny, host range, and outbreak affiliation applied to Pseudomonas syringae sensu lato as a proof of concept phytopathology: PHYTO-07-16-0252-R doi: 10.1094/PHYTO-07-16-0252-R