Abstract
Whole genome DNA microarray genomotyping experiments compare the gene content of different species or strains of bacteria. A statistical approach to analysing the results of these experiments was developed, based on a Hidden Markov model (HMM), which takes adjacency of genes along the genome into account when calling genes present or absent. The model was implemented in the statistical language R and applied to three datasets. The method is numerically stable with good convergence properties. Error rates are reduced compared with approaches that ignore spatial information. Moreover, the HMM circumvents a problem encountered in a conventional analysis: determining the cut-off value to use to classify a gene as absent. An Apache Struts web interface for the R script was created for the benefit of users unfamiliar with R.
The application may be found at http://hmmgd.cryst.bbk.ac.uk/. The source code illustrating how to run R scripts from an Apache Struts-based web application is available from the corresponding author on request. The application is also available for local installation if required.
Similar content being viewed by others
References
Behr MA, Wilson MA, Gill WP, et al. Comparative genomics of BCG vaccines by whole genome DNA microarray. Science 1999; 284: 1520–3
Salama N, Guillemin K, McDaniel TK, et al. A whole genome microarray reveals genetic diversity among Helicobacter pylori strains. Proc Natl Acad Sci U S A 2000; 97(26): 14668–573
Fitzgerald JR, Sturdevant DE, Mackie SM, et al. Evolutionary genomics of Staphylococcus aureus: insights into the origin of methicillin resistant strains and the toxic shock syndrome epidemic. Proc Natl Acad Sci U S A 2001; 15: 8821–6
Dorrell N, Mangan JA, Laing KG, et al. Whole genome comparison of Campylobacter jejuni human isolates using a low-cost microarray reveals extensive genetic diversity. Genome Res 2001; 11: 1706–15
Dziejman M, Balon E, Boyd D, et al. Comparative genomic analysis of Vibrio cholerae: genes that correlate with cholera endemic and pandemic disease. Proc Natl Acad Sci U S A 2002; 99(3): 1556–61
Porwollik S, Wong RM, McClelland M. Evolutionary genomics of Salmonella: gene acquisitions revealed by microarray analysis. Proc Natl Acad Sci U S A 2002; 99(13): 8956–61
Inwald J, Hinds J, Palmer S, et al. Genomic analysis of Mycobacterium tuberculosis complex strains used for production of purified protein derivative. J Clin Microbiol 2003; 41(8): 3929–32
Hinchliffe SJ, Isherwood KE, Stabler RA, et al. Application of DNA microarrays to study the evolutionary genomics of Yersinia pestis and Yersinia pseudotuberculosis. Genome Res 2003; 13: 2018–29
Cummings CA, Brinig MM, Lepp PW, et al. Bordetella species are distinguished by patterns of substantial gene loss and host adaptation. J Bacteriol 2004; 186(5): 1484–92
Snyder LA, Davies JK, Saunders NJ. Microarray genomotyping of key experimental strains of Neisseria gonorrhoeae reveals gene complement diversity and five new neisserial genes associated with minimal mobile elements. BMC Genomics 2004; 5: 23
Fukiya S, Mizoguchi H, Tobe T, et al. Extensive genomic diversity in pathogenic Escherichia coli and Shigella strains revealed by comparative genomic hybridization microarray. J Bacteriol 2004; 186(12): 3911–21
Garnier T, Eiglmeier K, Camus JC, et al. The complete genome sequence of Mycobacterium bovis. Proc Natl Acad Sci U S A 2003; 100(13): 7877–82
R Development Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2004
Gentleman RC, Carey VJ, Bates DM, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004; 5: R80
Kim CC, A JE, Chan K, et al. Improved analytical methods for microarray based genome-composition analysis. Genome Biol 2002; 3(11): Research0065
Witney AA, Marsden GL, Holden MTG, et al. Design, validation, and application of a seven-strain Staphylococcus aureus PCR product microarray for comparative genomics. Appl Environ Microbiol 2005; 71: 7504–14
Rabiner LR. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc IEEE 1989; 77(2): 257–86
Eddy SR. What is a Hidden Markov model? Nat Biotechnol 2004; 22(10): 1315–6
Eddy SR. Profile hidden Markov models. Bioinformatics 1998; 14(9): 755–63
Fridlyand J, Snijders AM, Pinkel D, et al. Application of Hidden Markov Models to the analysis of array CGH data. J Multivariate Analysis 2004; 90: 132–53
Snijders AM, Nowak NJ, Huey B, et al. Mapping segmental and sequence variations among laboratory mice using BAC array CGH. Genome Res 2005; 15: 302–11
The Apache Software Foundation. Apache Struts [online]. Available from URL: http://struts.apache.org/ [Accessed 2006 Oct 2]
Cole ST, Brosch R, Parkhill J, et al. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 1998; 393(6685): 537–44
Yang YH, Dudoit S, Luu P, et al. Normalisation for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 2002; 30(4): el5
Wernisch L, Kendall SL, Soneji S, et al. Analysis of whole-genome microarray replicates using mixed models. Bioinformatics 2003; 19: 53–61
Bilmes JA. A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden Markov models. Technical Report, University of California, Berkeley, ICSI-TR-97-021, 1998
Murphy KP: Dynamic Bayesian networks: representation, inference and learning: chapter 3, exact inference in DBNs 2002 [online]. Available from URL: http://www.cs.ubc.ca/~murphyk/Thesis/thesis.html [Accessed 2006 Oct 2]
Acknowledgements
The Mycobacterium tuberculosis dataset was provided by Jaqueline Inwald (Veterinary Laboratories Agency, UK), the Yersinia pestis dataset by Stewart Hinchliffe (London School of Hygiene and Tropical Medicine, UK) and the Staphylococcus aureus dataset by Jodi Lindsay (St. George’s Hospital Medical School, UK). Richard Newton was funded as part of a Wellcome Trust Functional Genomics programme grant. The Wellcome Trust funds the BμG@S (Bacterial Microarray Group at St. George’s Hospital Medical School) multi-collaborative microbial pathogen microarray facility under its Functional Genomics Resources Initiative. The authors have no conflicts of interest that are directly relevant to the content of this study.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Newton, R., Hinds, J. & Wernisch, L. A Hidden Markov Model Web Application for Analysing Bacterial Genomotyping DNA Microarray Experiments. Appl-Bioinformatics 5, 211–218 (2006). https://doi.org/10.2165/00822942-200605040-00003
Published:
Issue Date:
DOI: https://doi.org/10.2165/00822942-200605040-00003