Genomic epidemiology of bacteria: promise and potential pitfalls

Let's start with the positive. The signs are good that rapid, cost-effective whole-genome sequencing will replace most existing molecular typing methods within public health reference laboratories over the next 5 to 10 years [1]. In this setting, genome sequencing represents the ultimate epidemiological typing method - a universally applicable, digital, 'library' typing method, portable internationally and across time. Its key selling point is that it is far more informative than any other approach, capable of distinguishing strains that differ by as little as a millionth of a genome [2]. On top of that, it is clear that microbial genome sequencing can shed light on the evolution of virulence or the molecular basis of antimicrobial resistance [3, 4].

Proof-of-principle studies now exist as to the value and utility of such approaches in real-world, real-time situations, including pandemic influenza and the recent German Escherichia coli O104:H4 outbreak [46]. A heady combination of rapid sequencing, prompt data release and a willingness of scientists across the world to collaborate in crowd-sourced analyses has given birth to an exciting new paradigm of 'open-source genomics' or 'Public Health 2.0'.

However, caveats remain. Bacterial genomic epidemiology is currently overly reliant on the identification of single nucleotide polymorphisms (SNPs) in draft genome sequences, largely because of the historical limitations of short-read technologies. It remains an open question whether for every bacterial lineage, SNP calling across the whole genome will always prove more informative than probing variation in the highly dynamic repetitive regions sampled by existing typing methods. The adoption of single-molecule long-read approaches such as that offered by Pacific Biosciences may also help wean us off a dependence on SNPs and reveal more large-scale genomic changes [6].

Although billed as a one-size-fits-all approach, the comparability and reliability of draft genome analyses remain critically dependent on the sequencing technology and analytical pipelines that are used; a draft genome sequenced today in Europe on one platform may not be easily compared with a draft genome sequenced half a world away on a different instrument in a few years time. And how easy will it be to redeploy staff in heavily unionized public health laboratories employed to use traditional approaches and re-equip them for the era of whole-genome sequencing?

Another important lesson comes from the deliberate release of anthrax into the US postal system in 2001 and the investigation that followed. Solving this case relied primarily on detection of rare colonial morphotypes in culture; genome sequencing had only a subsidiary role [7]. Crucially, this incident highlights the potential for apparently clonal bacterial cultures to contain mixtures of closely related but distinct genotypes. Imagine the following scenario. Patients A and B both carry an identical mixed population of genotypes X and Y. From patient A's sample, you pick a single colony representing genotype X, whereas from patient B you propagate a colony from the Y genotype. In such a situation, you might draw erroneous conclusions as to the relationship between the two infections and chains of transmission between these and other patients. This also highlights the problem that up until now genomic epidemiology has relied on isolation of organisms in pure culture.

Towards a culture-independent approach

Can we progress to a culture-independent approach? Can we apply high-throughput sequencing not just to epidemiology, but also to the detection, and even discovery, of microbial pathogens? The answer is a qualified yes, with four approaches jostling for our attention.

Firstly, high-throughput sequencing has already breathed new life into well-established community-profiling approaches that exploit amplification of molecular bar codes, such as 16S ribosomal RNA gene sequences [8]. This is delivering ever more detailed surveys of the various human- and animal-associated microbiomes. However, such techniques often fail to distinguish pathogenic species or strains from their closest non-pathogenic relatives (such as Shigella from E. coli; Streptococcus pneumoniae from Streptococcus mitis; enterovirulent E. coli from commensal strains), so, in diagnostic terms, this approach is probably best seen as a way-station to more informative methods that will emerge in the future.

Secondly, we can use metagenomics for diagnostic purposes. This approach involves extracting and sequencing all the DNA from a sample. Clinical specimens will contain variable amounts of human DNA, which may create genetic privacy issues. Human DNA may also swamp the microbial DNA, although with extremely high depth of coverage, sufficient microbial DNA sequences could, at least theoretically, be recovered to reconstruct genomes. However, using current sequencing technologies, metagenomics is still a long way from providing genome-scale information for each member of a microbial community equivalent to that obtained from microorganisms isolated in pure culture. For this approach to come of age, we need a sequencing platform that combines speed and cost-effectiveness with very long read lengths and extremely high throughput - a plausible, but not certain, prospect for the coming decade.

Metatranscriptomics - extracting RNA and then creating and sequencing cDNA - provides a third way forward. This approach has already proven successful in the discovery of new viral pathogens and can also be used to investigate the activities of bacterial communities [9]. Human RNA will also be found in clinical samples sent for microbiological investigation. However, instead of dismissing this as mere contamination, analysis of human transcripts in such samples is likely to provide additional information about a patient's condition. For example, a transcriptional profile associated with inflammation might provide evidence of infection rather than just colonization at a given body site.

A fourth option is to abandon attempts to sequence nucleic acids isolated from whole populations, but instead to use cell sorting or allied techniques to isolate and genome-sequence sub-populations or, in extremis, single cells. Such approaches have proven successful in the research environment, but seem a remote prospect in the clinical laboratory [10]. How soon before we can obtain a chlamydial genome sequence from a urine sample or a mycobacterial genome sequence from a few millilitres of cerebrospinal fluid?

So, will high-throughput sequencing render culture-based approaches redundant? Here, we face another chorus of caveats, centred on the fact that it is not always possible to predict phenotype from genotype. In some cases - for example, sensitivity testing in tuberculosis - we already have a good evidence base on which to judge whether a given mutation is likely to lead to resistance. Similarly, detection of sequences encoding enzymes associated with virulence (such as Shiga toxin) or resistance (such as an extended-spectrum beta-lactamase) will have reasonable predictive value. But one has to remember that many differences in phenotype rely on subtle changes in expression, often of multiple genes, and that a single base-pair change in a promoter or coding sequence can ablate the function of a gene or its associated protein. Therefore, it would be foolish to imagine that sequencing will replace culture for every application in clinical bacteriology. Instead, just as radio survived the advent of TV, culture will remain a part of the discipline, but widespread adoption of sequence-based approaches might mean that it becomes restricted to a limited number of settings.

In conclusion, there are many rivers to cross before medical microbiology becomes simply a branch of genomic medicine, but perhaps the promised land is in sight.