Hospital-acquired infections (HAIs) are infamous and widespread. A variety of microbial organisms can give rise to HAI and, with emerging and increasing antibiotic resistance among those nosocomial pathogens, the importance of HAI has been increasing steadily as well. HAI outbreaks are defined upon an increase in the frequency of hospital-acquired or healthcare facility-acquired cases of disease among patients or staff, over and above the expected base-line number of cases in a given setting. Healthcare facilities where HAI outbreaks can occur are hospitals, nursing homes, rehabilitation centres, (private) clinics, dialysis or cancer treatment centres, ambulatory surgery centres, physician offices, dental clinics, laboratories and any other facilities which provide healthcare or diagnostic services to individuals, whether public or privately owned [1]. To control HAI effectively, a combination of strategies must be used, including preventive measures such as education and environmental control, active surveillance of infection events, and recognition of and intervention into ongoing disease outbreaks (e.g. [2]). Molecular epidemiological investigations, critical to surveillance and intervention, rely on rapid and effective assessment of phenotypic and/or genetic relatedness among strains and help to distinguish outbreak situations from non-outbreak-related colonisation or infection events.

The impact and importance of HAI will continue to rise for the years to come and both detection of the HAI agents and their more detailed (clinical) epidemiological analyses need to be improved. Classical typing technologies need to be replaced by ones with improved reproducibility, data exchangeability and, most importantly, resolution. Only then will better tracing of HAIs, blocking their further spread and their elimination from the hospital eco-system, be feasible. Where, in the past, phenotyping was most important (from bacteriophage susceptibility testing, serological and antigen-based typing and more), over the past decades, different molecular typing methods have been developed. These ranged from plasmid characterisation, restriction enzyme analyses, various DNA amplification-mediated methods, multi-locus sequence typing (MLST) to the reigning gold standard technology for many species, pulsed-field gel electrophoresis (PFGE). Most recently, high-throughput next-generation nucleic acid sequencing was introduced, which initially focused on specific, pre-selected gene targets. To date, full microbial genome analysis by whole-genome sequencing (WGS) is even feasible and is, by many, considered to be the Holy Grail of typing. Many singular studies have already shown the feasibility of this approach [3, 4] and there are now even (commercial) service offers intended to provide extensive WGS characterisation of HAI isolates including direct typing for epidemiology purposes but also for the characterisation of antibiotic resistance and virulence factors [5, 6].

Here, we will describe the most important species involved in HAI, the gold standard typing technologies for their epidemiological typing and the perspective for fighting HAI caused by multidrug-resistant organisms (MDRO) with the newest generation of WGS-based methods. We exemplify several of these features using Clostridium difficile as a model organism.

Epidemiological context

Modern healthcare employs many types of invasive procedures and devices to treat (increasingly older and feeble) patients. Infections are often associated with the devices used in medical procedures, such as catheters or ventilators. These mostly HAIs include central line-associated bloodstream infections, catheter-associated, urinary tract infections, ventilator-associated pneumonia and several others. Infections may also occur at surgical sites (surgical site infections, SSIs) and the most common HAIs have been reviewed and discussed before [7]. The major HAIs include infections of the lung (22%), surgical sites (22%) and the gastro-intestinal system (13%).

In 2014, the results of a project known as the HAI Prevalence Survey were published. The survey described the burden of HAIs in US hospitals and reported that, in 2011, there were an estimated 722,000 HAIs occurring in US acute-care hospitals [8]. About 90,000 patients with HAIs died during hospitalisation. More than half of all HAIs occurred outside of the intensive care unit (ICU). This clearly shows the medical value of HAI, also on a global scale of course. Hence, HAI is a troublesome issue to the healthcare industry. The Centers for Disease Control and Prevention (CDC) estimates that 1 in 20 patients will contract an HAI each day and estimates suggest the economic burden of HAIs to soon reach $35.7 billion a year [9]. Costs are, in the major part, due to an extended hospital stay of around 17–18 days when an HAI is acquired. In the USA, close to 100.000 people die annually due to HAI (CDC general publication, 2009). The clinicians and organisations focused on hospital care quality are collaborating to improve infection control and reduce the number of patients whom contract HAI. However, outbreaks of HAI continue to occur.

Several studies show the distribution of pathogens involved in HAI. Depending on the country, the medical unit or the type of infection observed, several microorganisms are identified as major causative agents for HAIs. The more frequently detected outbreak-associated species are the following: Pseudomonas aeruginosa, Escherichia coli, Klebsiella pneumoniae, Clostridium difficile, Acinetobacter spp., Enterobacter spp., Enterococcus spp., Serratia marcescens, Staphylococcus aureus and several somewhat more minor species [1]. The majority of the problematic strains of these species have the capacity to quickly evolve (multiple) resistance to antibiotics. In this outbreak-associated, epidemiological context, cure of the infected is urgent, science should be translational and infection prevention is of utmost importance. New initiatives are in constant need.

Molecular bacterial typing methods

The goals of HAI outbreak investigation are to identify the pathogen that caused the outbreak, identify the source of the infection and, most importantly, control and prevent further spread of the infection. To establish the relatedness between strains, several molecular methods have been developed to calibrate and define genetic differences. Technologies have been optimised with respect to (minimal) costs, efficiency and discriminatory power. The more classical molecular methods most often used and cited for HAI are ribotyping, PFGE and MLST. In Table 1, mostly molecular methods of bacterial typing found in the scientific literature for the major species involved in HAI are summarised. The principles and (dis)advantages of a few frequently used and popular molecular typing methods, including WGS, will be briefly described below. Of course, there is a plethora of additional methods available and technical reviews have been published by many (e.g. [10,11,12]).

Table 1 Bacterial typing methods for some of the major species involved in hospital-acquired infections (HAIs)


The currently most popular format of ribotyping is based on polymerase chain reaction (PCR)-mediated detection of polymorphism in the 16S–23S intergenic spacer region [13]. Recently, it has been mostly used to investigate outbreaks due to C. difficile infection. However, this method generates bands of high and close molecular masses, which are difficult to separate by agarose gel electrophoresis. To improve the reading of banding patterns of PCR-ribotyping applied to C. difficile, partial sequencing of the rRNA genes (16S and 23S) and intergenic spacer region was performed, and then a new set of primers located closer to the intergenic spacer region has been defined [14]. The new PCR gave reproducible patterns of bands which were easier to separate by agarose gel electrophoresis. Two major kinds of PCR-based ribotyping exist: the PCR amplification followed by agarose gel electrophoresis and the ones followed by sequencer-based capillary separation. This typing method has evidenced major qualities such as ease of use, rapidity and reproducibility. A global reference library has been established and is available. Ribotyping is accurate and reproducible and the obtained data can be shared between laboratories [15].

Pulsed-field gel electrophoresis (PFGE)

Using this technique, the restriction pattern of the complete bacterial genome is visualised. Specific degradation of the bacterial chromosome is performed by infrequently cutting restriction enzymes. No mechanical damage occurs because of prior immobilisation of the DNA into agarose blocks. After enzymatic restriction, electrophoretic migration is performed in gel. PFGE is a highly discriminatory method: point mutations, deletions, insertions and loss or acquisition of plasmid might account for minor differences in profiles within a subtype or among epidemiologically related strains [16]. The PFGE method, however, is labourious and the results cannot be easily compared between laboratories, although the PulseNet project is proving the opposite of this statement [17]. Essentially, PFGE can be done for all bacterial species as long as the DNA is susceptible to restriction. However, it seems as if the technology is a bit on the return and likely to be replaced by sequencing-based technologies, even for the “big” public health applications.

Multi-locus sequence typing (MLST)

This method determines the genetic relatedness among strains by analysing the sequences of multiple genes that are compared for single nucleotide polymorphisms (SNPs) [18]. Since genes display varying degrees of genetic drift, housekeeping genes are most often sequenced because they are present in all isolates within a species and, genetically speaking, relatively stable. However, since they are under strong selective and functional pressure, their rate of genetic variability is relatively low and may not always provide adequate discrimination among unrelated isolates. For MLST, to be effective as an epidemiological tool, the selection of genes and their number needs to be adequate to distinguish among isolates which more recently diverged. For each of the genes sequenced, the allelic group including the isolate can be established as a sequence type (ST). A globally well-defined database allows scientists to register their strains and corresponding STs in the PubMLST database ( hosted at the university of Oxford [19], where one can find the definition of even the most recently described STs.

Table 2 shows the housekeeping genes used to determine the ST of various bacterial species. For Acinetobacter baumannii, two MLST schemes have been described, one by Bartual et al. [20] in Oxford (UK) and the second by researchers from the Institute Pasteur ( The two schemes do correlate well of course [21,22,23,24]. For Escherichia coli, even three schemes have been described, Pasteur’s (France) [25], Warwick’s [26] and the rarely used “st7” scheme [27]. Again, there is good correlation between the different nomenclatures despite the obvious use of different categorisation codes [28,29,30].

Table 2 Definition of the multi-locus sequence typing (MLST) schemes for major species involved in HAI (,,

Although typing of C. difficile strains is usually done by ribotyping [31], more and more scientists use MLST for this species as well [32]. Correlation between ribotye (RT) and ST results is possible (Table 3). MLST is a highly discriminative method for typing microorganisms and has been applied successfully for the epidemiologic characterisation of a variety of clinically important bacterial pathogens. This method offers to the users near-perfect stability and transferability of data. As presented before, the pubMLST database allows for international sharing of the results and a worldwide distribution view of the strains by simply tracing their STs. The negative point is that the costs are high due to the need for DNA sequencing [34].

Table 3 Correlation between ribotypes and sequence types (STs) for Clostridium difficile [31, 33]

Major clones involved in HAI outbreaks

Bibliographic research was performed using targeted PubMed searches (September 2017) to better understand the involvement of bacterial species in HAI events and identify the ribotyping- and serotyping-defined major clones involved in local and more extensive outbreaks. In Table 4, we present both the most and least frequently reported STs and ribotypes involved in HAI and their worldwide geographic repartition on a per publication account. We included data from the German outbreaks database at (Institute for Medical Microbiology and Hospital Epidemiology, Medical School Hannover; Schülke & Mayr Company; Institute for Hygiene and Environmental Medicine, University Medicine of Berlin). This is an online database documenting outbreaks that took place in the healthcare setting and were published in the peer-reviewed scientific literature. This database currently contains 3536 outbreaks published from the years 1936 to 2016 with 305 different pathogens. No results are presented for Serratia marcescens and Enterobacter aerogenes (no formally accepted MLST scheme existing and limited numbers of outbreaks documented). Note that the data displayed in Table 4 have been calculated using numbers of publications assuming that this approximately equals the numbers of outbreak events. The table does not provide exact information on absolute numbers of bacterial strains but sketches a more relative global picture. For that reason, we also decided not to use the bibliographic data to depict historic timelines. The table does, however, underscore the fact that, if one uses a universal typing language, a global picture on the clonal dissemination of outbreak-related bacterial strains can be sketched. The use of such a language, which is feasible with ribotyping and serotyping, should be core to future developments using alternative technologies, including WGS.

Table 4 The most and least frequently reported sequence types (STs) and ribotypes (RTs) with their worldwide geographic distribution for important species involved in HAIs. (a) Representation by order of the most and least reported STs and RTs. (b) The geographic distribution represented by a colour code for the different STs and RTs in Europe and Africa, America, Asia and Oceania. Data were collected by review of the scientific literature in the field of HAIs; the number of publications used for this synthesis is shown as n. * indicates unique STs and RTs reported in a continent deriving from different studies and distinct outbreaks

Typing by WGS

WGS technology allows for the precise and rapid sequencing of the full genome of bacteria and defines, essentially, the ultimate global typing language using solely the four base characters: G, A, T and C. Different techniques are being continuously developed and refined by technology leaders in the sequencing market, including Illumina, PacBio, Oxford Nanopore and several others. For epidemiological investigation of potential outbreak situations, most clinicians and laboratory scientists combine clinical data related to the strains and the patient in combination with the results obtained by conventional typing methods. A combination of such data and high-resolution WGS has been shown to be valuable for fine-tuning the investigations into outbreaks for many, if not all, bacterial pathogens [3]. This technology is being transformed into an automated process and will offer truly accurate and reproducible digital data with very high discriminatory power. Data can be shared between laboratories and easily interpreted by software analysis. WGS analyses will bring ultimate information on genes, resistance markers, virulence factors and global genomic characteristics, including all types of mutations and detailed differences between the genomes of different but possibly very closely related strains within a microbial species [35]. To obtain this important information, the need for (bio-)informatics expertise is obvious, since there are many possible ways to extract comprehensive information from massive amounts of raw data. Quality filtration, assembly and many other steps allow the utilisation of sequenced genome. It would be beyond the scope of the present manuscript to survey all the published applications of WGS for analysis of the epidemiology of the HAI pathogens. However, we will perform a brief summary of the current state of affairs in this field for C. difficile, which presents an important and clinically highly relevant example.

After comparative technology papers [36], the first papers on the epidemiological tracing of C. difficile using WGS started appearing in 2013. Again, Eyre et al. [37] developed interpretational software not only for genomic comparison but also for the assessment of mixed infections. Mixed infections were reliably identified and new strain transmission events were documented. Data presented by the same group [38] showed that, besides patient-to-patient transmission, many nosocomial infections were due to as yet unidentified reservoirs. A little later, it was also shown that, using WGS, it was easier and more detailed to define overall genetic diversity [39,40,41] and to distinguish relapsing infection from re-infection [42]. Obviously, WGS allowed for the detection of circulating sub-clones and straightforward confirmation of ongoing, longitudinal, inter-institutional, sometimes geographically disperse C. difficile outbreaks [43,44,45,46,47]. Finally, Stoesser et al. [48] demonstrated that WGS may lead to the identification of new reservoirs (pets in this particular case), show overlap between different community-based reservoirs and that it adequately highlights colonisation events and transfer of strains.

Pathogenicity markers can be catalogued based on WGS as well [49] and C. difficile virulence is largely attributed to its toxins. However, upon the emergence of more virulent clones, WGS showed that additional factors may drive invasive potential as well. Quesada-Gómez et al. [50] demonstrated that a particularly virulent type harboured gyrA resistance mutations, 10% more predicted genes including phages and mobile genetic elements and a deletion in the tcdC gene. Assessing phenotypes based on genome sequences was investigated in a large study from the UK [51]. A combination of clinical, resistance and genomic data allowed to conclude that restriction of fluoroquinolone usage led to a decline in the regional incidence of certain clonal types of C. difficile. Confirmatory observations were published by Caspers et al. [52] while using alternative models. Data such as those presented in this manuscript should be used to fine-tune local and regional antibiotic stewardship.

The example of C. difficile WGS pinpoints the many added values that WGS has for more refined studies in the colonisation, infection, epidemiology and molecular characterisation of medically important bacterial pathogens. This technology will continue to have an increasing impact on clinical decision-making.


Classical typing technologies have led the way towards a universal typing language. Serotyping and ribotyping have been instrumental, although the language as such was still rather “primitive”. Even though it uses a four-letter alphabet only, whole-genome sequencing (WGS) has the clear capacity to develop a better and even more broadly accepted universal dialect. It has provided spectacular opportunities for the investigation of organisms causing hospital-acquired infections (HAIs) through resistance prevalence studies in combination with genomic epidemiology. Despite this progress, one has to realise that, for many clinicians and clinical microbiologists, such methods will remain out of reach for the decade to come. This provides us with the obligation to keep a close track between data generated using more classical technologies and those generated by WGS. In the end, only the close “linguistic” coordination of datasets will provide global insights into the international dissemination of pathogens of concern. Special attention should be given to those pathogens that carry multi-antibiotic resistance markers in order to bring non-treatable infections to a full stop.