Introduction

The advent and continuous improvement of sequencing technologies, especially the shift to next-generation sequencing (NGS), provides many opportunities for the management of infectious diseases. Sequence information can identify a pathogen and its specific characteristics, as well as its relatedness to other pathogens. Compared to Sanger sequencing, NGS technologies allow a faster and cheaper way to sequence large amounts of nucleotides. As such, NGS can be viewed as a tool that makes whole-genome sequencing (WGS) accessible [1]. In contrast to genotyping, where only small parts of the genome are assessed, WGS provides characteristics of the entire genome of the infectious isolates, thereby combining maximal strain discrimination and the ability to link the genotype with clinically and epidemiologically relevant phenotypes [2, 3]. Sequence variations, such as single-nucleotide polymorphisms (SNPs), insertions/deletions, and accessory genes can be identified following bioinformatics analyses [1]. Highly discriminatory subtyping following WGS is accomplished based on either SNPs or allelic variation [5]. With decreasing costs and increasing laboratory and bioinformatics capacities, we are currently transitioning to genomic epidemiology, as whole pathogen genomes are available at the level of the population [3]. Adding genomic data to epidemiological analyses of infectious diseases greatly benefits disease prevention and control [1, 2]. During the last decade, NGS is no longer limited to research settings and is being rapidly translated into public health practice [4, 5].

This review focuses on the applications of NGS to the population-level management of bacterial infections (Fig. 1). This includes the use of WGS to study the relatedness of isolates in order to understand transmission dynamics, to detect and control outbreaks, to monitor trends, and to identify the emergence of new threats. More specifically, this review discusses the applications of pathogen genomics that lead to actionable results from a public health point of view.

Fig. 1
figure 1

Focus of the scoping review on pathogen genomics for public health practice. Different domains in the field of infectious diseases require access to the same pathogen genomic data. Whole-genome sequencing (WGS) has the ability to inform and improve individual patient care, by identifying the species, determining its pathogenic potential, and testing its susceptibility to antimicrobial drugs. WGS also provides data for public health surveillance about the relatedness of the pathogen to other strains to investigate transmission routes, monitor trends over time, and allow the identification and control of outbreaks and new threats. Research is a knowledge driver providing reference data, methods, and a deeper understanding about the underlying biological mechanisms to the other domains. The focus of this scoping review is on the use of WGS as a public health tool, i.e., at the level of the population

Public health activities related to infectious diseases can be classified as outbreak investigations, control-oriented surveillance, and strategy-oriented surveillance [3, 6]. The main objective of an outbreak investigation is to investigate the possible source(s) of infection and to implement effective and appropriate control measures to stop its further spread. Outbreak investigations are often hypothesis-driven and a reaction to a sudden increase in the number of cases [7]. In contrast, surveillance is the systematic collection, analysis, and dissemination of data for the planning, implementation, and evaluation of public health programs [8]. Baker et al. [6] differentiate between control-oriented and strategy-oriented surveillance, thereby providing a meaningful way to categorize the applications of molecular/genomic tools for disease surveillance [9]. This framework has also been adopted in the European Centre for Disease Prevention and Control (ECDC) roadmap for integration of molecular and genomic typing into European level surveillance and epidemic preparedness [10]. As defined by Baker et al. [6], the purpose of control-oriented surveillance is “to identify each occurrence of a particular disease, hazard, or other health-related event that requires a specific response, and to support the delivery of an effective intervention”. For example, control-oriented surveillance aims at the detection of outbreaks that require a specific response. Early outbreak detection can be achieved by prospectively genotyping as many consecutive cases in a population as possible to identify clusters of clonally linked isolates [3]. Baker et al. state that strategy-oriented surveillance aims “to provide information to support prevention strategies to reduce population risk” [6]. The aim is often to monitor long-term changes in epidemiology over larger geographic and population scales, requiring study designs that have a high degree of representativeness [9]. Strategy-oriented surveillance can for example detect the emergence of strains with enhanced virulence or drug resistance, help to identify risk factors associated with the transmission of specific strains, or predict the effectiveness of control programs such as vaccination campaigns [3].

Collective experience on the use of pathogen genomics for routine public health practice is spread across literature. WGS has been frequently used to aid outbreak investigations and routine surveillance at various levels (i.e., local, national, and international) and in different temporal scenarios (i.e., retrospective and prospective). The aim of this scoping review is to identify and characterize the recent literature concerning the application of NGS for public health practice, by (1) conducting a systematic search of the published literature, (2) mapping the characteristics of the identified studies, (3) describing the range of applications identified, and (4) assessing the added value, challenges, and requirements related to its implementation. The purpose is to provide an epidemiologist’s perspective on the use of bacterial pathogen genomics in a public health context to complement previous reviews that focused on technical aspects, bioinformatics, diagnostics, or microbiology (i.e., the perspective of microbiologists, bioinformaticians, and clinicians) [2, 5, 11,12,13,14,15,16]. This review aims to summarize the experience gained and use it to further advance the implementation of pathogen genomics in routine public health.

Methods

A scoping review methodology was chosen to provide an overview of the nature and extent of the literature on this topic via systematically searching, selecting, and summarizing evidence, rather than a traditional systematic review that often focuses on specific outcomes [17]. This review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [see Additional file 1] [18], adapted for use in a scoping review as appropriate, and adhering to the methodology outlined in The Joanna Briggs Institute Manual for Scoping reviews [19]. In addition, the framework outlined by Arksey and O’Malley [20] in their methodological paper on scoping reviews was followed. A scoping review protocol was developed a priori to ensure reproducibility and transparency of the review methods [see Additional file 2].

Eligibility criteria

The inclusion criteria were organized following the PCC (Population, Concept, and Context) elements. Studies had to include at least 2 individuals with a bacterial infection, and NGS had to be applied on the bacterial isolates. Consequently, non-human studies and case reports involving only one patient were excluded, as well as studies focusing on the host genome. Studies had to describe the application of NGS from a public health perspective (i.e., population-level). Therefore, the main study aims had to be within the context of an outbreak investigation, control-oriented surveillance, or strategy-oriented surveillance. Studies focusing on technical aspects, applying NGS solely for individual patient care, and using NGS primarily for research purposes were excluded. Further, only studies applying NGS within a real-life public health setting, as opposed to an experimental setting, and producing an output that can be directly translated into actionable results to benefit public health, were included. This also included proof-of-concept studies mimicking real-life public health situations. Studies published between January 2015 and September 2018 were included to consider the most current activities in this fast-evolving field. A full list of inclusion and exclusion criteria and a decision tree for study selection is provided in the additional material [see Additional files 3 and 4].

Searching

The PubMed search engine was used to identify manuscripts in English published between 1/1/2015 and 4/9/2018. In addition, reference lists of included studies and other reviews were examined (i.e., backward snowballing). Also, forward snowballing was performed by identifying relevant documents that cited the included studies, using the Google Scholar search engine.

Three domains were included in the search using the PubMed search engine: “bacterial infections,” “next generation sequencing,” and “public health.” Each domain had several search terms. Free text search and MeSH term search were combined. The search was pre-tested to determine the most effective balance of sensitivity and specificity in the identification of potentially relevant citations. The ability of the electronic search to capture all relevant primary research was verified by hand-searching reference lists from other reviews on the topic. The final search string is reported in the additional material [see Additional file 5]. The initial search was conducted on March 24, 2018, and was updated on September 4, 2018, selecting the date range “March 1, 2018, to September 4, 2018.”

Screening

A first screening phase based on titles and abstracts was conducted [NVG], and out-of-topic studies were excluded. A second screening stage based on the full texts was conducted in duplicate by two independent reviewers [NVG, TD] using a standardized eligibility form. If no consensus could be reached between the two reviewers, a third reviewer [NB] helped to resolve the disagreement.

Data extraction

Data extraction was performed by [NVG] using an extraction form that was designed for the purpose of this review through an iterative process [see Additional file 6]. Information regarding the studied pathogen, country, year of publication, number of isolates, sampling fraction and time orientation of the NGS analyses, setting, public health application, study aim(s), and level of implementation was extracted from each included study. In addition, key findings related to the use of NGS were summarized for every study.

Data synthesis

The main characteristics were summarized in tabular form (as per data extraction pro-forma as well as a numerical summary), with an accompanying narrative summary, based on the key findings extracted from every study, describing how the results relate to the review objective and question. The studies were categorized based on the public health application and study aim in order to structure the narrative summary. For the study aim, multiple classifications per article were allowed.

Results

Search results

The study selection process is summarized as a PRISMA flow diagram in Fig. 2. A total of 1549 studies were identified through the initial database search, hand searching, reference checking, and other reviews. The search was updated in September 2018, and an additional 142 articles were identified (of which 19 were included). A total of 275 studies were included in the review.

Fig. 2
figure 2

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram

Study characteristics

Out of the 275 included articles, 164 (60%) were outbreak investigations, 70 (25%) focused on strategy-oriented surveillance, and 41 (15%) on control-oriented surveillance. Almost all studies (274 out of 275) applied NGS technologies in the context of WGS, which will consequently be the focus for the remainder of this review. Table 1 gives an overview of the general characteristics of the included studies. Table 2 gives a deeper insight into the different study aims. The completed extraction form describing the characteristics of all included studies is available in the additional material [see Additional file 7].

Table 1 Characteristics of included studies (Jan 2015 to Sep 2018: n = 275)
Table 2 Study aims (applications of NGS) of included studies (Jan 2015 to Sep 2018: n = 275)

Outbreak investigations of food- and waterborne pathogens mainly focused on source tracing (n = 78, 48%), in order to identify and eliminate the source as quickly as possible. In case of person-to-person transmission, the outbreak investigations focused on understanding transmission dynamics and reveal the spread of the pathogen in the population, in order to interrupt transmission chains and to prevent its further spread (n = 85, 52%). Eleven studies (7%) reported the use of WGS to provide feedback on key phenotypic attributes, such as virulence genes or antibiotic resistance, in order to inform outbreak management. The majority of the outbreak investigations were performed retrospectively (n = 97, 59%), i.e., as a proof-of-concept and/or to improve future preparedness by addressing a specific public health problem from the past. Fifty-eight studies (35%) applied WGS in quasi-real time, i.e., directly impacting the ongoing outbreak. In the majority of outbreak investigations, WGS was used on a subset of available samples (n = 107, 65%), for example, to further differentiate between isolates assigned to the same subtype as identified by conventional characterization methods. Outbreak investigations using WGS were mainly applied to Staphylococcus aureus (n = 12) and multidrug-resistant (MDR) Gram-negative bacteria (n = 27) in a hospital setting, to Salmonella spp. (n = 33) and Listeria monocytogenes (n = 13) during foodborne outbreaks, and to Mycobacterium tuberculosis (n = 14).

Classifying studies as control-oriented was mainly based on the fact that these studies aimed to detect events that require immediate action (e.g., early outbreak detection) or that the study was initiated following a specific public health problem. They were performed retrospectively (n = 18, 44%), i.e., as a proof-of-concept, and/or prospectively (n = 25, 61%). Six studies (15%) reported a nation-wide implementation of prospective genotyping into routine public health practices.

Strategy-oriented studies are in general conducted over larger time periods and geographical areas, in order to better understand the behavior of a certain pathogen within a population, and to plan future prevention and control programs. Twenty-nine studies (41%) applied WGS to assess the impact of prevention and control programs, mostly to evaluate vaccination programs. Genomic-informed strategy-oriented surveillance has also been frequently applied to monitor long-term changes over a larger geographic and population scale (n = 38, 54%) to detect the emergence of strains with enhanced virulence, to monitor drug resistance, to detect cross-border transmission events, or to identify zoonotic pathogens. Studies describing the use of WGS for strategy-oriented surveillance were often performed retrospectively (n = 59, 84%) on a historical subset of samples in order to answer a specific public health question.

For control- and strategy-oriented surveillance activities, WGS was mainly applied to S. aureus (n = 23), MDR Gram-negative bacteria (n = 16), Neisseria meningitidis (n = 14), Salmonella spp. (n = 13), and M. tuberculosis (n = 9).

Results of individual studies

Outbreak investigations

WGS provides increased resolution for case ascertainment and linking possible sources to these cases during outbreak investigations of food- and waterborne pathogens compared to conventional typing methods where only a small fraction of the genome is used (e.g., pulsed-field gel electrophoresis [PFGE], multiple-locus variable-number tandem-repeat analysis [MLVA], and multi-locus sequence typing [MLST]). The discriminatory power of WGS allows heterogeneous clusters of isolates, often indistinguishable with these previously used typing methods, to be split up into smaller groups of cases that are more likely to originate from a common source [21,22,23,24,25,26,27,28,29,30,31]. The increased resolution of WGS is particularly useful for clonal pathogens or serotypes that show little genetic variation [32,33,34,35]. Investigations applying WGS during the course of an outbreak were able to identify the likely source of infection and rapidly implement control measures to stop further spread [36,37,38,39,40,41,42]. On several occasions, WGS data combined with epidemiological investigations enabled food authorities to intervene based on strong evidence and the subsequent timely recall of potentially contaminated food [43,44,45]. Moreover, the digital and universal nature of WGS allows data to be exchanged and analyzed between different countries during multi-national outbreaks [34, 44, 46,47,48,49]. However, this requires internationally standardized protocols and nomenclature, as well as meaningful interpretation guidelines [32]. In addition to rapid source tracing, WGS provides insights into the virulome of certain pathogenic clades [50,51,52,53]. For example, real-time WGS is able to generate timely information concerning the presence of virulence genes during a Shiga toxin-producing Escherichia coli (STEC) outbreak [51]. WGS was often used to guide nosocomial outbreak investigations. It was reported several times that transmission events that were suspected based on epidemiological data alone or using low-resolution strain typing methods like antibiogram profiles had been disproved by integrating WGS data [54,55,56,57]. The ability to quickly exclude a patient or potential source during an outbreak investigation is equally important for infection control purposes as the confirmation of related isolates [39], thereby preventing inappropriate, costly, and ineffective control measures [55, 56].

The most highlighted issue is the fact that WGS, as is equally the case for conventional typing methods, cannot stand on its own and that epidemiological data (including time, place, and exposure data) should complement the WGS results to identify a common source or link cases during outbreak investigations [24, 58,59,60,61,62,63,64]. False conclusions could be drawn from WGS data alone since it is possible that epidemiologically unrelated isolates are highly similar at the SNP level [58, 65,66,67]. Another reported issue was the potential misinterpretation of isolate relationships given the diversity of isolates that can be found within a single host (e.g., following long term carriage) or environmental reservoir. It was stressed by several studies that it is important to account for this “cloud of diversity” by increasing the number of samples taken from the suspected source [55, 66,67,68,69,70,71,72,73]. On the other hand, within-host diversity allows to identify long-term carriers [74].

Control-oriented surveillance

Several public health agencies launched pilot projects to implement WGS in routine practice for control-oriented surveillance purposes, such as early outbreak detection, and evaluated its performance [75,76,77]. In 2013, a multi-agency collaboration prospectively performed WGS on all available L. monocytogenes isolates collected from patients, food, and food processing environments in the USA. Implementation of WGS data into their surveillance activities led to the detection of an increased number of outbreak clusters. In addition, combining WGS data with robust epidemiological information solved more outbreaks compared to before with PFGE [77]. Retrospective comparisons during the transition period from traditional to WGS-based characterization were not considered an obstacle given the possibility to accurately extract traditional typing information for L. monocytogenes from WGS data [75]. Also for STEC O157, the added value of implementing WGS as a tool to inform national surveillance was demonstrated as it led to early and accurate outbreak detection [78], as well as the ability to extract information concerning important virulence determinants and monitor the emergence of hyper-virulent strains [79, 80]. Similarly, a prospective trial of sequencing all Salmonella Typhimurium isolates concurrently with the conventional MLVA typing technique in Australia demonstrated the higher resolution offered by WGS leading to better source attributions and more targeted epidemiological investigations [81]. However, several challenges related to the interpretation of WGS data remain. As for outbreak investigations, it was reported several times that WGS results should not be interpreted on their own. A single cutoff of the number of SNPs to assess relatedness cannot consistently predict whether isolates are epidemiologically linked [77, 81,82,83]. However, field data (e.g., demographic data or exposure histories) are only valuable when organized in a standardized format, requiring a more systematic approach to epidemiological data collection [84]. Another implementation barrier reported was the limited capacity of the public health unit to understand and use WGS data, implying an increased need for collaboration and exchange of expertise between microbiologists, bioinformaticians, and epidemiologists [81].

NGS has been applied to monitor antimicrobial resistance of hospital-acquired infections. Genotypic prediction of resistance of S. aureus strains seems at least as reliable as routine phenotypic testing. However, phenotypic prediction based on the genotype cannot replace phenotypic testing, as the present understanding of the genetic basis of resistance and the associated databases are not comprehensive [85]. This limitation was shown during a WGS-based surveillance of antimicrobial resistant determinants in Klebsiella pneumonia where the phenotypically determined resistance was higher than the sequenced-based resistance [86].

WGS has proven to be a more reliable tool to predict epidemiological links between tuberculosis cases than the conventional variable number of tandem repeat (VNTR) genotyping that often lead to false cluster identification [87]. WGS as a tool for the identification of tuberculosis outbreaks may be particularly useful in settings where the genetic diversity is expected to be lower such as geographically restricted M. tuberculosis populations [88], genetically closely related genotypes imported from a high-incidence region [89], or for highly monomorphic M. tuberculosis lineages [90].

Strategy-oriented surveillance

Several studies showed the value of WGS in understanding the impact of vaccination on circulating pathogen populations, potentially resulting in antigenic drift to escape vaccine-mediated immune selective pressure (i.e., strain replacement) [91,92,93,94,95,96,97,98,99,100]. Gaining insights into this is achieved by comparing the incidence of infections caused by vaccine targeted serotypes before and after the introduction of the vaccine, and to potentially identify the proliferation of non-vaccine targeted strains. The adoption of WGS methods to monitor pathogen populations during immunization programs has proven to be useful and could potentially identify differential impacts on distinct serotypes [91]. Genomic surveillance provides the required resolution for the development of targeted interventions [92] and to predict the impact of implementing a vaccination program in a given population [101]. The routine use of WGS for surveillance purposes can also inform antibiotic stewardship. One advantage of introducing WGS to inform treatment guidelines is the ability to identify genetically linked resistance that can be co-selected by multiple drugs, as opposed to phenotypic resistance rates that consider each antimicrobial class as a discrete unit [102]. In addition, resistance rates can vary significantly by clone implying that monitoring changes in population structure using WGS is useful to guide antibiotic usage policies [103]. Besides informing vaccination programs and antibiotic stewardship, WGS can reveal a detailed understanding of the transmission dynamics within and between healthcare settings, the community, and individual households, to appropriately direct control programs and decolonization strategies [104,105,106,107,108,109,110,111,112].

Several studies highlighted the public health benefits of WGS-guided surveillance to monitor the spread of multidrug-resistant isolates and mobile genes, including resistance-carrying transposons and plasmids that are able to transfer resistance between bacterial species [113,114,115,116,117,118]. The zoonotic potential of clinically relevant multi-resistant bacteria stresses the importance of the ‘One Health’ approach. In general, WGS-informed surveillance including isolates from different hosts and settings can provide evidence for interspecies transmission and focus control efforts on important reservoirs [119,120,121,122,123,124,125,126].

Reported challenges, issues, and obstacles from an epidemiologist’s perspective

Several studies reported challenges, issues, and obstacles (not exclusively) related to the integration of pathogen genomics within the activities of epidemiologists. For example, the inability to link laboratory and contextual data due to missing unique identifiers [127] impedes proper data integration. Further, contextual data was often missing, limited, or unstandardized [30, 34, 38, 49, 62, 71, 84, 94, 108, 128,129,130,131,132,133,134,135,136,137,138,139,140]. Regarding the sampling strategy, selection bias might arise when WGS has been performed on a small proportion of cases/isolates, severe cases are overrepresented among sequenced isolates, asymptomatic cases/carriers are excluded, or certain geographical regions or time periods are overrepresented [24, 30, 32, 52, 59, 85, 91, 93, 106, 116, 128, 141,142,143,144]. In addition, there might be insufficient statistical power to detect associations due to a low number of sequenced strains [81, 128, 144, 145].

Discussion

Applications

Within the set of studies included in this scoping review, NGS was mainly used as a tool to provide information on the whole genome of the bacterial pathogens. WGS has useful applications in both outbreak investigations and surveillance activities. Outbreak investigations benefit from the increased resolution offered by WGS for case ascertainment, linking cases to the possible sources, defining transmission clusters, and providing rapid feedback on key phenotypic attributes of the involved pathogens. The application of WGS during control-oriented surveillance was mainly aimed at early outbreak detection by accurately defining transmission clusters among circulating strains, unraveling transmission chains to guide targeted interventions, and identifying the emergence of new threats. The use of WGS during strategy-oriented surveillance seemed particularly useful to assess the impact of prevention and control programs, such as vaccination campaigns and antibiotic stewardship.

Level of implementation

WGS has been increasingly used as a typing tool for comparison of isolates during outbreak investigations. Most published studies were retrospective (59%), but an increasing number applied WGS in quasi-real time. For surveillance activities of certain pathogens, there has been a shift from proof-of-concept studies to routine use of WGS. In several countries, public health agencies and regulatory bodies [e.g., Public Health England, US Centers for Disease Prevention and Control (CDC), European Food Safety Authority (EFSA)] have implemented WGS as a routine typing tool for surveillance activities of selected pathogens. Although WGS is (or was) mainly used in parallel with conventional testing in many European countries [76, 81, 82, 87, 146], countries such as Denmark, France, and the UK have already transitioned completely to WGS for certain pathogens [34, 43, 46, 76, 84, 147,148,149,150]. Following the results of a survey conducted by ECDC, 20 countries (i.e., two thirds of European Union and European Economic Area countries) were routinely using WGS in 2017 for national surveillance of at least one human pathogen [151].

Added value

WGS has shown superior sensitivity and specificity to identify transmission clusters compared to traditional subtyping methods such as PFGE, MLVA, and MLST that often do not provide the required resolution to discriminate between outbreak-related and sporadic cases [21,22,23,24,25,26,27,28,29,30,31, 59]. Thanks to its greater specificity, WGS allows to reject a false hypothesis of transmission generated by conventional methods, thereby avoiding inappropriate, costly, and ineffective follow-up investigations and control measures [39, 55, 56, 78, 152]. More targeted interventions can save resources at the health protection and local authority level [78]. The major advantage of implementing WGS during surveillance activities or outbreak investigations is therefore inherent in the higher resolution of the WGS output itself. It should be noted that the utility of WGS varies depending on the public health objective (discriminating between closely related individual cases during a point-source outbreak or national surveillance purposes) [153], as well as on the population structure (high incidence settings versus low-transmission settings) [88, 89] and the clonality of the pathogen [32,33,34,35, 90]. Also, a stepwise implementation of typing methods has proven to be a useful approach. Conventional molecular methods can serve as a first-level classification to confine possible outbreak isolates. At the next level, WGS can bring deeper and more comprehensive insights [130, 154].

In terms of technical advantages, WGS is a universal test that is applicable to all organisms [155] and has the potential to provide multiple tests in silico (e.g., antibiotic resistance, serotype, virulence genes) from a single assay, thereby replacing several conventional methods and/or providing additional information on the studied pathogen [57, 79, 156, 157]. Therefore, NGS is able to replace current time-consuming and labor-intensive methods with a single, all-inclusive diagnostic test [94, 157, 158]. Moreover, the digital nature and the reliability of WGS data allow exchange and to compare data across countries [44, 46, 47]. The development of shared databases will make it increasingly possible to establish links between sequences from different countries and sources.

Challenges, issues, and obstacles

Definition of a cluster

The main issue reported when using WGS data to detect and to confirm transmission between isolates was the difficulty, if not impossibility, to define with a single SNP/allele threshold how much genetic variation can exist within an epidemiologically related cluster [1, 23, 28, 57, 58, 65, 67, 159]. The number of SNPs within a cluster often depends on various factors, such as the genetic diversity within each species, its molecular clock, evolutionary forces, the nature of the outbreak (point-source, long-lasting, multinational, etc.), the extent of diversity in the background population, within-host diversity, the population bottleneck during transmission, the level of asymptomatic infections, the number of isolates included in the analysis, and the methods used for genomic analysis [1, 25, 67, 74, 81, 83, 147, 160,161,162]. Many studies stress the fact that we cannot rely solely on genomic information during outbreak investigations or surveillance activities and that epidemiological data describing the temporal and spatial dynamics of infection should always be considered [59,60,61, 63, 67, 77, 81,82,83]. Contextual data should therefore be collected carefully and combined with WGS data for a proper interpretation, which is almost seamlessly linked to the challenge of data integration.

Data integration

A useful interpretation of genomic data is highly dependent on the epidemiological and clinical metadata [1, 160, 163, 164]. The integration of laboratory and epidemiological data is often hampered by the incomplete and/or unstructured nature of the contextual data [13, 84, 128, 165]. For example, during a multi-country outbreak investigation, it is important to develop a codebook for uniform and standardized data entry between countries [34, 49]. To maximize the potential of WGS, public health professionals have to identify a minimum set of variables (such as time, place of infection, host characteristics, clinical presentation, and exposures) that should be incorporated within surveillance activities of a particular pathogen [165].

Although WGS data has the potential to support phenotypic predictions of virulence and resistance based on the genotype, phenotypic data will still be needed to identify new resistance/virulence mechanisms and to keep the databases up-to-date [12, 85, 86, 157, 166, 167]. Therefore, phenotypic testing results and clinical data have to be collected in a standardized manner alongside the sequence data to feed the databases from which associations between genotype and phenotype can be observed [160].

More recently, digital streams (also called “Internet of things”) are being used as an input for surveillance systems (i.e., digital epidemiology). Examples include search engines, social media, mobile phones, and health trackers. These novel data streams, generated outside public health, could potentially enrich epidemiology by providing information on natural and social phenomena [168].

One Health, the concept of structured collaboration and coordination between human, animal, and eco health systems, has become an emerging focus due to the increased understanding of how animal and ecological reservoirs significantly influence human health [169]. Therefore, the management of infectious diseases requires sampling from different hosts and sources. As indicated by Rantsiou et al., the development of WGS is currently not at the same level in the food industry as compared to public health agencies [170]. The outputs produced by the different sectors should remain comparable at any time to ensure the linkage of isolates.

An overview of data integration is presented in Fig. 3.

Fig. 3
figure 3

Integration of multiple data types. The anticipated workflow of infection prevention and control includes the following: (1) samples are obtained from cases infected with a certain pathogen, as well as from other sources such as the environment, food, and/or animals following the One Health approach; (2) pathogens are isolated, and information concerning the biological characteristics is obtained through classical microbiological testing. Phenotypic tests are still required to feed databases and confirm genotype-phenotype associations. Culturing steps (isolation) are often preceding genome sequencing; however, sequencing directly from clinical samples is also possible using culture-independent methods (metagenomics); (3) high-throughput sequence data is generated (other -omics technologies such as transcriptomics, proteomics, and metabolomics can complement the genomic information); (4) relationships among isolates and specific characteristics are inferred based on sequence information obtained through bioinformatics tools; (5) to come to a meaningful outcome (i.e., transmission chains, cluster identification, source tracing, key phenotypic attributes), the genomic evidence is combined with epidemiological metadata (time, place, exposures, etc.) from field epidemiological investigations, clinical data obtained through the healthcare system, biological characteristics obtained through classic microbiological methods, and big data on natural and social factors. Finally, infection prevention and control measures can be conducted on the basis of this aggregated information

Collaboration between the different stakeholders

Following the previous section addressing the importance of data integration, it is clear that the switch to WGS requires an increase in multi-disciplinary working [9, 81, 148, 171]. In particular for data interpretation, expertise in bioinformatics and in biological, epidemiological, and microbiological sciences needs to be combined. Infectious disease epidemiologists implementing WGS data into their routine workflow might need training in genomics as well as skills in analyzing high-dimensional data sets. In addition to interdisciplinary and inter-sectoral (One Health) collaboration, the implementation of WGS should be coordinated at an international level as infectious diseases do not respect national boundaries [49].

Sampling frame

As for any type of epidemiological study, genomics-informed surveillance activities should be based on robust sampling strategies defining the required sample size and the number of samples needed from the different sources. The sampling framework will vary depending on the type of public health application (e.g., investigating an explosive outbreak often requires dense sampling including multiple samples from single sources, as opposed to strategy-oriented surveillance requiring a representative coverage of the population) [171]. Selection bias can be introduced when the subset of samples selected for WGS is not representative [128, 163]. In order to efficiently assemble a representative sample, it is often needed to develop a stratified sampling scheme according to time, place, and person, or to perform normalizations to maintain original sampling fractions.

Translational research and information overload

Genomic information must be interpreted and translated in a meaningful manner into both immediate public health action and longer term prevention programs [2, 57, 172]. Moving from low-resolution typing methods to WGS-based typing will lead to an increase in the detected number of hazards. Not only will the sensitive nature of WGS increase the number of clusters detected [76, 77], it will also provide additional information on the presence of resistance genes, virulence genes, etc. It will be important to filter out the hazards that are truly relevant from a public health point of view (i.e., separate the “signal” from the “noise”) and that subsequently require the initiation of a public health action.

Limitations of the scoping review

A possible limitation is the fact that only studies published after 2015 were included. However, NGS is a fast-evolving technique, and we were mainly interested in its state-of-the-art applications. Another potential limitation is that only one database (PubMed) was searched. However, hand searching and reference checking were applied to partly account for this. Still, it is likely that additional publications exist outside this search. Some work will have been missed, but the aim was to have a systematic overview of the field rather than an exhaustive capture of every single published article. An encountered difficulty during the selection process was the subjective nature of the inclusion and exclusion criteria, mainly in terms of classifying the studies according to their context (public health, research, or diagnostics). We tried to account for this by screening the full texts by two independent reviewers. Further, data extraction was based only on the data provided in the individual studies and, if available, supplementary information. Although the data extraction was performed using a standard procedure, it is possible that some information was misinterpreted. Given the large number of included studies, it was opted not to contact the investigators to retrieve additional data.

Future perspectives

Currently, the most common strategy to integrate WGS into routine public health surveillance is to add genomic typing to conventional surveillance activities where an increased resolution is considered necessary, i.e., on a selected subset of isolates. As costs will drop, microbiology laboratories will potentially implement WGS in their routine workflow. Following this scenario, the use of WGS is foreseen for all clinical isolates and the number of isolates sequenced would no longer be driven by pre-determined study designs [157, 163]. This way, sequence data gathered for diagnostic purposes can be accumulated for public health activities. In addition, large-scale research into genotype-phenotype associations from routinely collected data will be possible [157].

Conclusions

This scoping review addresses the current state and potential of implementing pathogen genomics for routine public health practice. Main applications include the use of WGS data for (1) source tracing during outbreak investigations, (2) early outbreak detection, (3) unraveling transmission dynamics in order to implement targeted interventions, (4) monitoring drug resistance, (5) detecting cross-border transmission events, (6) identifying the emergence of strains with enhanced virulence or strains with zoonotic potential, and (7) assessing the impact of prevention and control programs, such as vaccination campaigns. The main reported added value of WGS by the included studies is the superior resolution compared to the conventional methods, and consequently being able to accurately confirm or rule out transmission events. However, it should be emphasized that WGS cannot stand on its own and should be integrated with other data types. High-quality epidemiological data and study designs are needed to realize the full potential of WGS. Collaborations between infectious disease epidemiologists, public health practitioners, microbiologists, and bioinformaticians are key for a successful genomics-informed surveillance.