Identifying Bacterial Strains from Sequencing Data

  • Tommi Mäklin
  • Jukka Corander
  • Antti HonkelaEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1807)


Environmental and clinical settings can host a wide variety of both bacterial species and strains in a single colony but accurate identification of the organisms is difficult. We describe BIB, a probabilistic method for estimating the relative abundances of species or strains contained in mixed samples analyzed by short read high-throughput sequencing. By grouping closely related strains together in clusters, the BIB pipeline is capable of estimating the relative abundances of the clusters contained in a sequencing sample.

Key words

Bacteria Strain identification Abundance estimation Metagenomics Probabilistic modelling 



This work was supported by the Academy of Finland [259440 to A.H., 251170 to J.C.].


  1. 1.
    Balmer O, Tanner M (2011) Prevalence and implications of multiple-strain infections. Lancet Infect Dis 11:868–878Google Scholar
  2. 2.
    Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW (2012) Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13:601–612Google Scholar
  3. 3.
    Brito IL, Alm EJ (2016) Tracking strains in the microbiome: insights from metagenomics and models. Front Microbiol 7:712Google Scholar
  4. 4.
    Breitwieser FP, Lu J, Salzberg SL (2017) A review of methods and databases for metagenomic classification and assembly. Brief Bioinf
  5. 5.
    Sankar A, Malone B, Bayliss SC, Pascoe B, Méric G, Hitchings MD et al (2016) Bayesian identification of bacterial strains from sequencing data. Microb Genomics 2:e000075CrossRefGoogle Scholar
  6. 6.
    Glaus P, Honkela A, Rattray M (2012) Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics 28:1721–1728CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Hensman J, Papastamoulis P, Glaus P, Honkela A, Rattray M (2015) Fast and accurate approximate inference of transcript expression from RNA-seq data. Bioinformatics 31:3881–3889PubMedPubMedCentralGoogle Scholar
  8. 8.
    Kanitz A, Gypas F, Gruber AJ, Gruber AR, Martin G, Zavolan M (2015) Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol 16:150CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C (2006) An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 34:3150–3160CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Jiang H, Wong WH (2009) Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25:1026–1032CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN (2010) RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26:493–500CrossRefPubMedGoogle Scholar
  12. 12.
    Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R et al (1998) Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 95:3140–3145CrossRefPubMedGoogle Scholar
  13. 13.
    Cheng L, Connor TR, Sirén J, Aanensen DM, Corander J (2013) Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol Biol Evol 30:1224–1228CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Corander J, Sirén J, Arjas E (2008) Bayesian spatial modeling of genetic population structure. Comput Stat 23:111CrossRefGoogle Scholar
  15. 15.
    Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Tommi Mäklin
    • 1
  • Jukka Corander
    • 2
    • 3
  • Antti Honkela
    • 4
    • 5
    Email author
  1. 1.Helsinki Institute for Information Technology HIIT, Department of Mathematics and StatisticsUniversity of HelsinkiHelsinkiFinland
  2. 2.Helsinki Institute for Information Technology HIIT, Department of Mathematics and StatisticsUniversity of HelsinkiHelsinkiFinland
  3. 3.Department of BiostatisticsUniversity of OsloOsloNorway
  4. 4.Helsinki Institute for Information Technology HIIT, Department of Mathematics and StatisticsUniversity of HelsinkiHelsinkiFinland
  5. 5.Department of Public HealthUniversity of HelsinkiHelsinkiFinland

Personalised recommendations