Current Microbiology

, Volume 66, Issue 1, pp 96–101 | Cite as

A New Vector for Identification of Prokaryotes and Their Variable-Size Genomes

  • Tao Hou
  • Fu LiuEmail author
  • Caixia X. Lin
  • Dingyuan Y. Li


A large number of prokaryotes have been produced, so how to provide a means to describe and distinguish them accurately is becoming a key issue of prokaryotic taxonomy. We proposed an efficient algorithm to filter out most genome fragments that are horizontally transferred, and extracted a new genome vector (GV). To highlight the power of GV, we applied it to identify prokaryotes and their variable-size genome fragments. The result indicated that the new vector as species tags can accurately identify genome fragments as short as 3,000 bp at species level.


Genome Fragment Identification Accuracy Dissimilarity Index Genome Vector Gray Strip 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The research was supported by the Graduate Innovation Fund of Jilin University (20121101). We would like to thank the anonymous reviewers for their helpful comments on our work. We would also like to thank Dr. Xu, Y and Dr. Zhou F for their helpful discussions.

Supplementary material

284_2012_246_MOESM1_ESM.xls (85 kb)
Supplementary material 1 (XLS 85 kb)
284_2012_246_MOESM2_ESM.xls (5.5 mb)
Supplementary material 2 (XLS 5594 kb)
284_2012_246_MOESM3_ESM.xlsx (20 kb)
Supplementary material 3 (XLSX 20 kb)


  1. 1.
    Cole JR, Chai B, Marsh TL et al (2003) The Ribosomal Database Project (RDP-II): previewing a new auto aligner that allows regular updates and the new prokaryote taxonomy. Nucleic Acids Res 31:442–443PubMedCrossRefGoogle Scholar
  2. 2.
    Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans 13:21–27Google Scholar
  3. 3.
    Diaz NN, Krause L, Goesmann A et al (2009) TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 10:56PubMedCrossRefGoogle Scholar
  4. 4.
    Godfray HCJ (2002) Challenges for taxonomy. Nature 417:17–19PubMedCrossRefGoogle Scholar
  5. 5.
    Holt JG, Krieg NR, Sneath PHA (1997) Bergey’s manual of determinative bacteriology. Williams & Wilkins, BaltimoreGoogle Scholar
  6. 6.
    Karlin S, Brocchieri L, Mrazek J et al (1999) A chimeric prokaryotic ancestry of mitochondria and primitive eukaryotes. Proc Natl Acad Sci USA 96:9190–9195PubMedCrossRefGoogle Scholar
  7. 7.
    Karlin S, Burge C (1995) Dinucleotide relative abundance extremes: a genomic signature. Trends Genet 11:283–290PubMedCrossRefGoogle Scholar
  8. 8.
    Karlin S, Mrazek J, Ma J et al (2005) Predicted highly expressed genes in archaeal genomes. Proc Natl Acad Sci USA 102:7303–7308PubMedCrossRefGoogle Scholar
  9. 9.
    Karlin S, Zhu ZY, Karlin KD (1997) The extended environment of mononuclear metal centers in protein structures. Proc Natl Acad Sci USA 94:14225–14230PubMedCrossRefGoogle Scholar
  10. 10.
    McHardy AC, Martin HG, Tsirigos A et al (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4:63–72PubMedCrossRefGoogle Scholar
  11. 11.
    Mrazek J, Bhaya D, Grossman AR et al (2001) Highly expressed and alien genes of the Synechocystis genome. Nucleic Acids Res 29:1590–1601PubMedCrossRefGoogle Scholar
  12. 12.
    Mrazek J, Karlin S (1999) Detecting alien genes in bacterial genomes. Ann N Y Acad Sci 870:314–329PubMedCrossRefGoogle Scholar
  13. 13.
    Olsen GJ, Woese CR (1994) The winds of (evolutionary) change: breathing new life into microbiology. J Bacteriol 176:1–6PubMedGoogle Scholar
  14. 14.
    Otsu N (1979) A threshold selection method from Gray-level Histogram. IEEE Trans Syst Man Cybern SMC 9:62–66CrossRefGoogle Scholar
  15. 15.
    Qi J, Luo H, Hao BL (2004) CVTree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res 32:45–47CrossRefGoogle Scholar
  16. 16.
    Qi J, Wang B, Hao BL (2004) Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J Mol Evol 58:1–11PubMedCrossRefGoogle Scholar
  17. 17.
    Woese CR, Fox GE (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA 74:5088–5090PubMedCrossRefGoogle Scholar
  18. 18.
    Yao Z, Ruzzo WL (2006) A regression-based K nearest neighbor algorithm for gene functions prediction from heterogeneous data. BMC Bioinformatics 7(Suppl 1):S11PubMedCrossRefGoogle Scholar
  19. 19.
    Zhou FF, Olman V, Xu Y (2008) Barcodes for genomes and applications. BMC Bioinformatics 9:546PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • Tao Hou
    • 1
  • Fu Liu
    • 1
    Email author
  • Caixia X. Lin
    • 2
  • Dingyuan Y. Li
    • 1
  1. 1.College of Communications Engineering, Jilin UniversityChangchunChina
  2. 2.College of Information, Hainan UniversityHaikouChina

Personalised recommendations