Skip to main content

Methods for Data Analysis

  • Protocol
  • First Online:
Molecular Epidemiology of Microorganisms

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 551))

Abstract

The molecular epidemiology of infectious diseases uses a variety of techniques to assay the relatedness of disease-causing organisms to identify strains responsible for outbreaks or associated with particular phenotypes of interest (such as antibiotic resistance) and, it is hoped, provide insights into where and how these strains have emerged. The correct analysis of such data requires that we understand how the assayed variation accumulates. We discuss this with specific reference to three classes of methods: those based on gel electrophoresis of fragments generated by restriction enzymes or polymerase chain reaction (PCR), those based on microsatellites and other repeat elements, and raw sequence data from protein-coding genes. We also provide a simple example of how the likely origin of an apparently novel antibiotic-resistant strain may be identified and conclude with a discussion of some popular analysis packages and the more interesting prospects for the future in this rapidly developing field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Go, M. F., Kapur, V., Graham, D. Y., and Musser, J. M. (1996). Population genetic analysis of Helicobacter pylori by multilocus enzyme electrophoresis: extensive allelic diversity and recombinational population structure. J. Bacteriol. 178, 3934–3938.

    PubMed  CAS  Google Scholar 

  2. Achtman, M., Zurth, K., Morelli, G., Torrea, G., Guiyoule, A., and Carniel E. (1999). Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proc. Natl. Acad. Sci. U S A 96, 14043–14048.

    Article  PubMed  CAS  Google Scholar 

  3. Tenover, F. C., Arbeit, R. D., and  Goering, R.V. (1997). How to select and interpret molecular strain typing methods for epidemiological studies of bacterial infections: a review for healthcare epidemiologists. Infect. Control Hosp. Epidemiol. 18, 426–439.

    Article  PubMed  CAS  Google Scholar 

  4. Singh, A., Goering, R. V., Simjee, S., Foley, S. L., and Zervos, M. J. (2006). Application of molecular techniques to the study of hospital infection. Clin. Microbiol. Rev. 19, 512–530.

    Article  PubMed  CAS  Google Scholar 

  5. Aranaz, A., Liebana, E., Mateos, A., Dominguez, L., Vidal, D., Domingo, M., et al (1996). Spacer oligonucleotide typing of Mycobacterium bovis strains from cattle and other animals: a tool for studying epidemiology of tuberculosis. J. Clin. Microbiol. 34, 2734–2740.

    PubMed  CAS  Google Scholar 

  6. Fisher, M. C., Aanensen, D., de Hoog, S., and Vanittanakom, N. (2004). Multilocus microsatellite typing system for Penicillium marneffei reveals spatially structured populations. J. Clin. Microbiol. 42, 5065–5069.

    Article  PubMed  CAS  Google Scholar 

  7. Fisher, M. C., Koenig, G., White, T. J., and Taylor, J. W. (2000). A test for concordance between the multilocus genealogies of genes and microsatellites in the pathogenic fungus Coccidioides immitis. Mol. Biol. Evol. 17, 1164–1174.

    Article  PubMed  CAS  Google Scholar 

  8. Warren, R. M., Streicher, E. M., Sampson, S. L., van der Spuy, G. D., Richardson, M., Nguyen, D., et al. (2002). Microevolution of the direct repeat region of Mycobacterium tuberculosis: implications for interpretation of spoligotyping data. J. Clin. Microbiol. 40, 4457–4465.

    Article  PubMed  CAS  Google Scholar 

  9. Thomas, C. M., and Nielsen K. M. (2005). Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat. Rev. Microbiol. 3, 711–721.

    Article  PubMed  CAS  Google Scholar 

  10. Maiden, M. C., Bygraves, J. A., Feil, E., Morelli, G., Russell, J. E., Urwin, R., et al (1998). Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. U S A 95, 3140–3145.

    Article  PubMed  CAS  Google Scholar 

  11. Feil, E. J., Holmes, E. C., Bessen, D. E., Chan, M. S., Day, N. P., Enright, M. C., et al (2001). Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc. Natl. Acad. Sci. U S A 98, 182–187.

    Article  PubMed  CAS  Google Scholar 

  12. Feil, E. J., Li, B. C., Aanensen, D. M., Hanage, W. P., and Spratt, B. G. (2004). eBURST: inferring patterns of evolutionary descent among clusters of related bacterial genotypes from multilocus sequence typing data. J. Bacteriol. 186, 1518–1530.

    Article  PubMed  CAS  Google Scholar 

  13. Didelot, X., and Falush, D. (2007). Inference of bacterial microevolution using multilocus sequence data. Genetics 175, 1251–1266.

    Article  PubMed  CAS  Google Scholar 

  14. Feil, E. J., and Enright, M. C. (2004). Analyses of clonality and the evolution of bacterial pathogens. Curr. Opin. Microbiol. 7, 308–313.

    Article  PubMed  CAS  Google Scholar 

  15. Spratt, B. G., Hanage, W. P., Li, B., Aanensen, D. M., and Feil, E. J. (2004). Displaying the relatedness among isolates of bacterial species—the eBURST approach. FEMS Microbiol. Lett. 241, 129–134.

    Article  PubMed  CAS  Google Scholar 

  16. Turner, K. M., Hanage, W. P., Fraser, C., Connor, T. R., and Spratt, B. G. (2007). Assessing the reliability of eBURST using simulated populations with known ancestry. BMC Microbiol. 7, 30.

    Article  PubMed  Google Scholar 

  17. Melles, D. C., van Leeuwen, W. B., Snijders, S. V., Horst-Kreft, D., Peeters, J. K., Verbrugh, H. A., et al. (2007). Comparison of multilocus sequence typing (MLST), pulsed-field gel electrophoresis (PFGE), and amplified fragment length polymorphism (AFLP) for genetic typing of Staphylococcus aureus. J. Microbiol. Methods 69, 371–375.

    Article  PubMed  CAS  Google Scholar 

  18. Felsenstein, J. (1978). Cases in which parsimony of compatibility methods will be positively misleading. Syst. Zool. 27, 401–410.

    Article  Google Scholar 

  19. Felsenstein, J. (2004). Inferring Phylogenies (Sunderland, M. A., ed.). Sinauer Associates, Sunderland, MA.

    Google Scholar 

  20. Hall, B. G., and Barlow, M. (2006). Phylogenetic analysis as a tool in molecular epidemiology of infectious diseases. Ann. Epidemiol. 16, 157–169.

    Article  PubMed  Google Scholar 

  21. Hanage, W. P. (2007). Serotype replacement in invasive pneumococcal disease: where do we go from here. J. Infect. Dis. 196, 1282–1284.

    Article  PubMed  CAS  Google Scholar 

  22. Coffey, T. J., Daniels, M., Enright, M. C., and Spratt, B. G. (1999). Serotype 14 variants of the Spanish penicillin-resistant serotype 9V clone of Streptococcus pneumoniae arose by large recombinational replacements of the cpsA-pbp1a region. Microbiology 145, 2023–2031.

    Article  PubMed  CAS  Google Scholar 

  23. Kyaw, M. H., Lynfield, R., Schaffner, W., Craig, A. S., Hadler, J., Reingold, A., et al. (2006). Effect of introduction of the pneumococcal conjugate vaccine on drug-resistant Streptococcus pneumoniae. N. Engl. J. Med. 354, 1455–1463.

    Article  PubMed  CAS  Google Scholar 

  24. Pichichero, M. E., and Casey, J. R. (2007). Emergence of a multiresistant serotype pneumococcal strain not included in the 7-valent conjugate vaccine as an otopathogen in children. JAMA 298, 1772–1778.

    Article  PubMed  CAS  Google Scholar 

  25. Zhou, J., Enright, M. C., and Spratt, B. G. (2000). Identification of the major Spanish clones of penicillin-resistant pneumococci via the Internet using multilocus sequence typing. J. Clin. Microbiol. 38, 977–986.

    PubMed  CAS  Google Scholar 

  26. Falush, D., Wirth, T., Linz, B., Pritchard, J. K., Stephens, M., Kidd, M., et al (2003). Traces of human migrations in Helicobacter pylori populations. Science 299, 1582–1585.

    Article  PubMed  CAS  Google Scholar 

  27. Falush, D., Stephens, M., and Pritchard J. K. (2003). Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587.

    PubMed  CAS  Google Scholar 

  28. Corander, J., and Tang, J. (2007). Bayesian analysis of population structure based on linked molecular information. Math. Biosci. 205, 19–31.

    Article  PubMed  Google Scholar 

  29. Tang, J., Tao, J., Urakawa, H., and Corander, J. (2007). T-BAPS: a Bayesian statistical tool for comparison of microbial communities using terminal-restriction fragment length polymorphism (T-RFLP) data. Stat. Appl. Genet. Mol. Biol. 6, Article 30.

    Google Scholar 

  30. Waples, R. S., and Gaggiotti, O. (2006). What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity. Mol. Ecol. 15, 1419–1439.

    Article  PubMed  CAS  Google Scholar 

  31. Guillot, G., Estoup, A., Mortier, F., and Cosson, J. F. (2005). A spatial statistical model for landscape genetics. Genetics 170, 1261–1280.

    Article  PubMed  CAS  Google Scholar 

  32. Kingman, J. F. C. (1982). On the genealogy of large populations. J. Appl. Probability 19A, 27–43.

    Article  Google Scholar 

  33. Pybus, O. G., Rambaut, A., and Harvey P. H. (2000). An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155, 1429–1437.

    PubMed  CAS  Google Scholar 

  34. Drummond, A. J., Rambaut, A., Shapiro, B., and Pybus O. G. (2005). Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22, 1185–1192.

    Article  PubMed  CAS  Google Scholar 

  35. Salemi, M., de Oliveira, T., Soares, M. A., Pybus, O., Dumans, A. T., Vandamme, A. M., et al (2005). Different epidemic potentials of the HIV-1B and C subtypes. J. Mol. Evol. 60, 598–605.

    Article  PubMed  CAS  Google Scholar 

  36. Ewing, G., Nicholls, G., and Rodrigo, A. (2004). Using temporally spaced sequences to simultaneously estimate migration rates, mutation rate and population sizes in measurably evolving populations. Genetics 168, 2407–2420.

    Article  PubMed  CAS  Google Scholar 

  37. Hein, J., Schierup, M. H., and Wiuf, C. (2005). Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory. Oxford University Press, Oxford, UK.

    Google Scholar 

  38. Nielsen, R. (2004). Statistical methods in molecular evolution, in Statistics for Biology & Health, Springer-Verlag, New York.

    Google Scholar 

  39. Pearson, T., Busch, J. D., Ravel, J., Read, T. D., Rhoton, S. D., U’Ren, J. M., et al (2004). Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing. Proc. Natl. Acad. Sci. U S A 101, 13536–13541.

    Article  PubMed  CAS  Google Scholar 

  40. Keim, P., Van Ert, M. N., Pearson, T., Vogler, A. J., Huynh, L. Y., and Wagner, D. M. (2004). Anthrax molecular epidemiology and forensics: using the appropriate marker for different evolutionary scales. Infect. Genet. Evol. 4, 205–213.

    Article  PubMed  CAS  Google Scholar 

  41. Roumagnac, P., Weill, F. X., Dolecek, C., Baker, S., Brisse, S., Chinh, N. T., et al. (2006). Evolutionary history of Salmonella typhi. Science 314, 1301–1304.

    Article  PubMed  CAS  Google Scholar 

  42. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M. R., Li, P., et al (2006). Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, W729–W732.

    Article  PubMed  CAS  Google Scholar 

  43. Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007). MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596–1599.

    Article  PubMed  CAS  Google Scholar 

  44. Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591.

    Article  PubMed  CAS  Google Scholar 

  45. Ronquist, F., and Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574.

    Article  PubMed  CAS  Google Scholar 

  46. Huelsenbeck, J. P., and Ronquist, F. (2001). MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755.

    Article  PubMed  CAS  Google Scholar 

  47. Swofford, D. L. (2003). PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Sinauer Associates, Sunderland, MA.

    Google Scholar 

  48. Felsenstein, J. (1989). PHYLIP—phylogeny inference package (Version 3.2). Cladistics 5, 164–166.

    Google Scholar 

  49. Drummond, A. J., and Rambaut A. (2007). BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214.

    Article  PubMed  Google Scholar 

  50. Excoffier, L., and Heckel, G. (2006). Computer programs for population genetics data analysis: a survival guide. Nat. Rev. Genet. 7, 745–758.

    Article  PubMed  CAS  Google Scholar 

  51. Huson, D. H. (1998). SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73.

    Article  PubMed  CAS  Google Scholar 

  52. Bryant, D., and Moulton V. (2004). Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol. 21, 255–265.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

W. P. H. gratefully acknowledges the support of the Royal Society. D. M. A. is funded by a Wellcome Trust program grant awarded to Brian Spratt. We would like to thank Mat Fisher for helpful discussions.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Hanage, W., Aanensen, D. (2009). Methods for Data Analysis. In: Caugant, D. (eds) Molecular Epidemiology of Microorganisms. Methods in Molecular Biology™, vol 551. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-999-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-999-4_20

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60327-998-7

  • Online ISBN: 978-1-60327-999-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics