An Integrative Approach for Genomic Island Prediction in Prokaryotic Genomes

  • Han Wang
  • John Fazekas
  • Matthew Booth
  • Qi Liu
  • Dongsheng Che
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6674)

Abstract

A genomic island (GI) is a segment of genomic sequence that is horizontally transferred from other genomes. The detection of genomic islands is extremely important to the medical research. Most of current computational approaches that use sequence composition to predict genomic islands have the problem of low prediction accuracy. In this paper, we report, for the first time, that gene information and inter-genic distance are different between genomic islands and non-genomic islands. Using these two sources and sequence information, we have trained the genomic island datasets from 113 genomes, and developed a decision-tree based bagging model for genomic island prediction. In order to test the performance our approach, we have applied it on three genomes: Salmonella typhimurium LT2, Streptococcus pyogenes MGAS315, and Escherichia coli O157:H7 str. Sakai. The performance metrics have shown that our approach is better than other sequence composition based approaches. We conclude that the incorporation of gene information and intergenic distance could improve genomic island prediction accuracy. Our prediction software, Genomic Island Hunter (GIHunter), is available at http://www.esu.edu/cpsc/che_lab/software/GIHunter.

Keywords

Genomic islands gene information intergenic distance sequence composition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hacker, J., Kaper, J.B.: Pathogenicity islands and the evolution of microbes. Annu. Rev. Microbiol. 54, 641–679 (2000)CrossRefGoogle Scholar
  2. 2.
    Hacker, J., Bender, L., Ott, M., et al.: Deletions of chromosomal regions coding for fimbriae and hemolysins occur in vitro and in vivo in various extraintestinal Escherichia coli isolates. Microb. Pathog. 8(3), 213–225 (1990)CrossRefGoogle Scholar
  3. 3.
    Hacker, J., Blum-Oehler, G., Muhldorfer, I., et al.: Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol. Microbiol. 23(6), 1089–1097 (1997)CrossRefGoogle Scholar
  4. 4.
    Lawrence, J.G., Ochman, H.: Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol. 44(4), 383–397 (1997)CrossRefGoogle Scholar
  5. 5.
    Karlin, S., Mrazek, J., Campbell, A.M.: Codon usages in different gene classes of the Escherichia coli genome. Mol. Microbiol. 29(6), 1341–1355 (1998)CrossRefGoogle Scholar
  6. 6.
    Karlin, S.: Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. Trends Microbiol. 9(7), 335–343 (2001)CrossRefGoogle Scholar
  7. 7.
    Hensel, M.: Genome-based identification and molecular analyses of pathogenicity islands and genomic islands in Salmonella enterica. Methods Mol. Biol. 394, 77–88 (2007)CrossRefGoogle Scholar
  8. 8.
    Cheetham, B.F., Katz, M.E.: A role for bacteriophages in the evolution and transfer of bacterial virulence determinants. Mol. Microbiol. 18(2), 201–208 (1995)CrossRefGoogle Scholar
  9. 9.
    Langille, M.G., Hsiao, W.W., Brinkman, F.S.: Detection of genomic islands using bioinformatics approaches. Nature Reviews Microbiology 8(5), 373–382 (2010)CrossRefGoogle Scholar
  10. 10.
    Langille, M.G., Hsiao, W.W., Brinkman, F.S.: Evaluation of genomic island predictors using a comparative genomics approach. BMC Bioinformatics 9, 329 (2008)CrossRefGoogle Scholar
  11. 11.
    Ou, H.Y., He, X., Harrison, E.M., et al.: MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands. Nucleic Acids Res. 35, W97–W104 (2007)CrossRefGoogle Scholar
  12. 12.
    Vernikos, G.S., Parkhill, J.: Resolving the structural features of genomic islands: a machine learning approach. Genome Res. 18(2), 331–342 (2008)CrossRefGoogle Scholar
  13. 13.
    Vernikos, G.S., Parkhill, J.: Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands. Bioinformatics 22(18), 2196–2203 (2006)CrossRefGoogle Scholar
  14. 14.
    Rajan, I., Aravamuthan, S., Mande, S.S.: Identification of compositionally distinct regions in genomes using the centroid method. Bioinformatics 23(20), 2672–2677 (2007)CrossRefGoogle Scholar
  15. 15.
    Hsiao, W., Wan, I., Jones, S.J., et al.: IslandPath: aiding detection of genomic islands in prokaryotes. Bioinformatics 19(3), 418–420 (2003)CrossRefGoogle Scholar
  16. 16.
    Tu, Q., Ding, D.: Detecting pathogenicity islands and anomalous gene clusters by iterative discriminant analysis. FEMS Microbiology Letters 221, 269–275 (2003)CrossRefGoogle Scholar
  17. 17.
    Waack, S., Keller, O., Oliver, A., et al.: Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics 7(1), 142 (2006)CrossRefGoogle Scholar
  18. 18.
    Karlin, S., Mrazek, J.: Predicted highly expressed genes of diverse prokaryotic genomes. J. Bacteriology 182(18), 5238–5250 (2000)CrossRefGoogle Scholar
  19. 19.
    Brieman, L.: Bagging Predictors. Machine Learning 24, 123–140 (1996)MATHGoogle Scholar
  20. 20.
    Che, D., Hockenbury, C., Marmelstein, R., Rasheed, K.: Classification of genomic islands using decision trees and their ensemble algorithms. BMC Genomics 11(Suppl 2), S1 (2010)CrossRefGoogle Scholar
  21. 21.
    Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)Google Scholar
  22. 22.
    Perna, N.T., Plunkett, G., Burland, V., et al.: Complete genome Sequence of Enterohaemorrhagic Escherichia coli O157:H7. Nature 409, 529–533 (2001)CrossRefGoogle Scholar
  23. 23.
    Beres, S.B., Sylva, G.L., Barbian, K.D., et al.: Genome Sequence of a serotype M3 strain of group A Sreptococcus: Phage-encoded toxins, the high-virulence phenotype, and clone emergence. Proceedings of National Academy of Science 99, 10078–10083 (2002)CrossRefGoogle Scholar
  24. 24.
    McClelland, M., Sanderson, K.E., Spieth, J., et al.: Complete genome Squence of Salmonella enterica serovar Typhimurium LT2. Nature 413, 852–856 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Han Wang
    • 1
  • John Fazekas
    • 1
  • Matthew Booth
    • 1
  • Qi Liu
    • 2
  • Dongsheng Che
    • 1
  1. 1.Department of Computer ScienceEast Stroudsburg UniversityEast StroudsburgUSA
  2. 2.College of Life Science and BiotechnologyTongji UniversityShanghaiP.R. China

Personalised recommendations