Skip to main content
Log in

A Hidden Markov Model Web Application for Analysing Bacterial Genomotyping DNA Microarray Experiments

  • Bioinformatics of Infectious Disease
  • Published:
Applied Bioinformatics

Abstract

Whole genome DNA microarray genomotyping experiments compare the gene content of different species or strains of bacteria. A statistical approach to analysing the results of these experiments was developed, based on a Hidden Markov model (HMM), which takes adjacency of genes along the genome into account when calling genes present or absent. The model was implemented in the statistical language R and applied to three datasets. The method is numerically stable with good convergence properties. Error rates are reduced compared with approaches that ignore spatial information. Moreover, the HMM circumvents a problem encountered in a conventional analysis: determining the cut-off value to use to classify a gene as absent. An Apache Struts web interface for the R script was created for the benefit of users unfamiliar with R.

The application may be found at http://hmmgd.cryst.bbk.ac.uk/. The source code illustrating how to run R scripts from an Apache Struts-based web application is available from the corresponding author on request. The application is also available for local installation if required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Table I
Fig. 3

Similar content being viewed by others

References

  1. Behr MA, Wilson MA, Gill WP, et al. Comparative genomics of BCG vaccines by whole genome DNA microarray. Science 1999; 284: 1520–3

    Article  PubMed  CAS  Google Scholar 

  2. Salama N, Guillemin K, McDaniel TK, et al. A whole genome microarray reveals genetic diversity among Helicobacter pylori strains. Proc Natl Acad Sci U S A 2000; 97(26): 14668–573

    Article  PubMed  CAS  Google Scholar 

  3. Fitzgerald JR, Sturdevant DE, Mackie SM, et al. Evolutionary genomics of Staphylococcus aureus: insights into the origin of methicillin resistant strains and the toxic shock syndrome epidemic. Proc Natl Acad Sci U S A 2001; 15: 8821–6

    Article  Google Scholar 

  4. Dorrell N, Mangan JA, Laing KG, et al. Whole genome comparison of Campylobacter jejuni human isolates using a low-cost microarray reveals extensive genetic diversity. Genome Res 2001; 11: 1706–15

    Article  PubMed  CAS  Google Scholar 

  5. Dziejman M, Balon E, Boyd D, et al. Comparative genomic analysis of Vibrio cholerae: genes that correlate with cholera endemic and pandemic disease. Proc Natl Acad Sci U S A 2002; 99(3): 1556–61

    Article  PubMed  CAS  Google Scholar 

  6. Porwollik S, Wong RM, McClelland M. Evolutionary genomics of Salmonella: gene acquisitions revealed by microarray analysis. Proc Natl Acad Sci U S A 2002; 99(13): 8956–61

    Article  PubMed  CAS  Google Scholar 

  7. Inwald J, Hinds J, Palmer S, et al. Genomic analysis of Mycobacterium tuberculosis complex strains used for production of purified protein derivative. J Clin Microbiol 2003; 41(8): 3929–32

    Article  PubMed  CAS  Google Scholar 

  8. Hinchliffe SJ, Isherwood KE, Stabler RA, et al. Application of DNA microarrays to study the evolutionary genomics of Yersinia pestis and Yersinia pseudotuberculosis. Genome Res 2003; 13: 2018–29

    Article  PubMed  CAS  Google Scholar 

  9. Cummings CA, Brinig MM, Lepp PW, et al. Bordetella species are distinguished by patterns of substantial gene loss and host adaptation. J Bacteriol 2004; 186(5): 1484–92

    Article  PubMed  CAS  Google Scholar 

  10. Snyder LA, Davies JK, Saunders NJ. Microarray genomotyping of key experimental strains of Neisseria gonorrhoeae reveals gene complement diversity and five new neisserial genes associated with minimal mobile elements. BMC Genomics 2004; 5: 23

    Article  PubMed  Google Scholar 

  11. Fukiya S, Mizoguchi H, Tobe T, et al. Extensive genomic diversity in pathogenic Escherichia coli and Shigella strains revealed by comparative genomic hybridization microarray. J Bacteriol 2004; 186(12): 3911–21

    Article  PubMed  CAS  Google Scholar 

  12. Garnier T, Eiglmeier K, Camus JC, et al. The complete genome sequence of Mycobacterium bovis. Proc Natl Acad Sci U S A 2003; 100(13): 7877–82

    Article  PubMed  CAS  Google Scholar 

  13. R Development Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2004

    Google Scholar 

  14. Gentleman RC, Carey VJ, Bates DM, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004; 5: R80

    Article  PubMed  Google Scholar 

  15. Kim CC, A JE, Chan K, et al. Improved analytical methods for microarray based genome-composition analysis. Genome Biol 2002; 3(11): Research0065

    Google Scholar 

  16. Witney AA, Marsden GL, Holden MTG, et al. Design, validation, and application of a seven-strain Staphylococcus aureus PCR product microarray for comparative genomics. Appl Environ Microbiol 2005; 71: 7504–14

    Article  PubMed  CAS  Google Scholar 

  17. Rabiner LR. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc IEEE 1989; 77(2): 257–86

    Article  Google Scholar 

  18. Eddy SR. What is a Hidden Markov model? Nat Biotechnol 2004; 22(10): 1315–6

    Article  PubMed  CAS  Google Scholar 

  19. Eddy SR. Profile hidden Markov models. Bioinformatics 1998; 14(9): 755–63

    Article  PubMed  CAS  Google Scholar 

  20. Fridlyand J, Snijders AM, Pinkel D, et al. Application of Hidden Markov Models to the analysis of array CGH data. J Multivariate Analysis 2004; 90: 132–53

    Article  Google Scholar 

  21. Snijders AM, Nowak NJ, Huey B, et al. Mapping segmental and sequence variations among laboratory mice using BAC array CGH. Genome Res 2005; 15: 302–11

    Article  PubMed  CAS  Google Scholar 

  22. The Apache Software Foundation. Apache Struts [online]. Available from URL: http://struts.apache.org/ [Accessed 2006 Oct 2]

  23. Cole ST, Brosch R, Parkhill J, et al. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 1998; 393(6685): 537–44

    Article  PubMed  CAS  Google Scholar 

  24. Yang YH, Dudoit S, Luu P, et al. Normalisation for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 2002; 30(4): el5

    Article  Google Scholar 

  25. Wernisch L, Kendall SL, Soneji S, et al. Analysis of whole-genome microarray replicates using mixed models. Bioinformatics 2003; 19: 53–61

    Article  PubMed  CAS  Google Scholar 

  26. Bilmes JA. A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden Markov models. Technical Report, University of California, Berkeley, ICSI-TR-97-021, 1998

    Google Scholar 

  27. Murphy KP: Dynamic Bayesian networks: representation, inference and learning: chapter 3, exact inference in DBNs 2002 [online]. Available from URL: http://www.cs.ubc.ca/~murphyk/Thesis/thesis.html [Accessed 2006 Oct 2]

Download references

Acknowledgements

The Mycobacterium tuberculosis dataset was provided by Jaqueline Inwald (Veterinary Laboratories Agency, UK), the Yersinia pestis dataset by Stewart Hinchliffe (London School of Hygiene and Tropical Medicine, UK) and the Staphylococcus aureus dataset by Jodi Lindsay (St. George’s Hospital Medical School, UK). Richard Newton was funded as part of a Wellcome Trust Functional Genomics programme grant. The Wellcome Trust funds the BμG@S (Bacterial Microarray Group at St. George’s Hospital Medical School) multi-collaborative microbial pathogen microarray facility under its Functional Genomics Resources Initiative. The authors have no conflicts of interest that are directly relevant to the content of this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Newton.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Newton, R., Hinds, J. & Wernisch, L. A Hidden Markov Model Web Application for Analysing Bacterial Genomotyping DNA Microarray Experiments. Appl-Bioinformatics 5, 211–218 (2006). https://doi.org/10.2165/00822942-200605040-00003

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2165/00822942-200605040-00003

Keywords

Navigation