1 First Insights from the Complete Genome of Mycobacterium ulcerans

The first genome sequence of a M. ulcerans isolate was published in 2007 [1]. This finished and completely annotated genome represented an African clinical isolate (strain name Agy99), which was obtained in 1999 from a BU patient living in the Amansie West District of Ghana. An unexpected feature of the genome was the presence of a circular 174 kpb megaplasmid (named pMUM001) [1, 2]. The plasmid harboured three unusually large genes encoding the polyketide synthases (PKS) required for the biosynthesis of the major virulence factor, mycolactone [2, 3]. The 5.6 Mbp circular chromosome of the Agy99 genome also held some surprises, with the architecture of a bacterium undergoing reductive evolution. There was an abundance of pseudogenes (15% of the predicted ancestral protein-coding genes had been inactivated by accumulated mutations), evidence of large chromosome deletions and rearrangements, and extensive proliferation of two insertion sequences (IS2404 and IS2606) [1]. Collectively, these features pointed to a bacterial population that had ‘recently’ passed through an evolutionary bottleneck and was adapting to a changed environment. Scrutiny of the types of genes lost by mutation or DNA deletion in the Agy99 genome, suggested a mycobacterium adapting to an environment where the proteins and pathways required to both survive in diverse aquatic environments and persist intracellularly are no longer required. This assessment also showed that M. ulcerans has lost many of the proteins and cell-wall associated molecules known to be potent antigens in other notable mycobacterial pathogens such as Mycobacterium tuberculosis, Mycobacterium kansasii and Mycobacterium marinum. These observations, combined with the presence of pMUM001 and the specific ability of M. ulcerans to make the immunosuppressive small molecule mycolactone, suggested a mycobacterium adapting to a protected niche environment where extracellular persistence and an immune evasion phenotype provide a survival advantage [4, 5].

2 An Aquatic Origin and Two Bottlenecks for a Recently Emerged, and Globally Distributed Pathogen

The initial M. ulcerans genome assessment was based on a comparison with the complete genome sequence of a single strain of the fish and human pathogen, M. marinum. From the mid-2000s onwards, scientists began reporting the presence of mycolactone-producing mycobacteria (MPMs) infecting fish, frogs and other ectotherms worldwide [6,7,8,9,10,11]. Subsequent comparative genetic and then comparative genomic studies of these mycobacteria with strains of M. marinum and M. ulcerans confirmed that all MPMs likely emerged during a single evolutionary event when a population of M. marinum-like bacteria acquired a pMUM plasmid and the specific ability to make mycolactones and then spread worldwide (Fig. 1) [12, 13]. The key genetic signatures of extensive pseudogene accumulation, expansion of IS2404 and pMUM plasmid acquisition were present in the ancestor of all MPMs before they radiated around the world, represented by three main lineages, called lineages 1–3 (Fig. 1) [12]. The MPMs have been given a variety of species names including M. pseudoshottsii, M. liflandii, M. shinshuense and (confusingly) M. marinum. Based on their extensive shared genomic features and recent common heritage, it has been proposed that all MPMs should be considered under the single species banner of M. ulcerans [14]. The collective term M. ulcerans–M. marinum complex (abbreviated as MuMC) has also been proposed for all M. marinum and M. ulcerans as they share >97% nucleotide identity across a substantial 4.3 Mbp conserved core genome [12].

Fig. 1
figure 1

A recently emerged and globally distributed pathogen

This first major evolutionary bottleneck that saw the emergence of M. ulcerans has been followed by at least one further population constriction that gave rise to lineage 3. This lineage, also designated the ‘classical’ lineage in the literature [15], represents the M. ulcerans isolates causing Buruli ulcer in Australia, Papua New Guinea (PNG), and Africa, and is thus the lineage that accounts for most of the human cases of Buruli ulcer. This lineage is characterized by additional reductive evolution and expansion of IS 2606 to high copy numbers [12]. Interestingly, the ‘success’ of this lineage as the major human pathogen among all the MPMs is not due to the acquisition of additional genes. Lineage 3 contains no additional DNA sequences compared to isolates in lineages 1 and 2, thus more subtle genetic changes might underlie lineage 3 dominance. One potential genomic region for such changes is the mycolactone PKS gene locus on the pMUM plasmid. The different lineages of M. ulcerans produce different mycolactone sidechain structural variants with varying biological potencies (discussed in more detail in chapter “Mycolactone: More than Just a Cytotoxin” of this book). The genetic basis for some of these variants has been determined. Changes in acyltransferase domain substrate specificity within a particular PKS extension module or loss of an extension module within mlsB, the gene required for synthesis of the mycolactone sidechain, lead to biosynthesis of different mycolactones [16,17,18].

Given the ancestor of all M. ulcerans was an aquatic mycobacterium (M. marinum) and many of the MPMs are recovered from fish, frogs, and turtles, it seems likely that the origin of all MPM (including those lineages associated with Buruli ulcer) had an aquatic animal association. These ideas and inferences should be used to frame thinking around potential reservoirs of M. ulcerans in BU endemic areas. Of note too, the phylogeny of M. ulcerans from Australia and Papua New Guinea is ancestral to that of M. ulcerans from African countries, indicating that the spread of M. ulcerans throughout Australasia likely predates the spread of M. ulcerans across Africa [33].

3 New Understandings from Genomics on the Spread of M. ulcerans Across Africa

African countries carry the highest burden of BU, but until recently, knowledge on the spread of M. ulcerans across the continent has been sparse. Early studies of M. ulcerans populations using traditional genotyping techniques readily identified a highly clonal population structure, showing that M. ulcerans isolates had very conserved genomes that associated strongly with their geographic origin [19,20,21,22,23,24]. However, these techniques sampled only a small proportion of the mycobacterial genome, and thus lacked sufficient discriminatory power to crack the clonal population structure of this species at even the scale of a country, let alone at the scale of a BU endemic village. The advent of low cost, high throughput DNA sequencing has given researchers access to all the potential genetic variation arising within a M. ulcerans population and several studies have capitalised on this advance. Most recently, the genomes of an extraordinary collection of 165 M. ulcerans clinical isolates spanning 48 years and representing 11 endemic countries across Africa were sequenced and compared. Assessment of these genomes has produced the first detailed understanding of the introduction and continental spread of this pathogen. Key findings included the establishment of a molecular clock signal in the sequence data, suggesting that M. ulcerans is accumulating mutations at a rate of 0.33 SNPs per chromosome per year (approx. 6 × 10−8 substitutions/site/year). This rate of genetic change is comparable to M. tuberculosis, but 10–100 times slower than that of other pathogens such as Staphylococcus aureus [25]. Combining these temporal data with the high-resolution phylogeny inferred from genome comparisons and the geographic origins of the 165 isolates has permitted a reconstruction of the spread of M. ulcerans across Africa. The data suggest that there have been two distinct introductions of the bacteria to the African continent; the first occurring around 68 BC, with a likely origin in the area around current day Cameroon and Gabon, and then spreading outwards from these regions [26] (Fig. 2). This has been followed by a second, far more recent introduction during the 1800s. For both the early and recent sub-lineage (called Mu_A1 and Mu_A2 respectively; both representing sub-lineages of lineage 3), there were geographically localized but substantial clonal expansions of M. ulcerans populations in four particular hydrological basins (Congo, Kouffo, Oueme, and Nyong). These expansions occurred contemporaneously from the late 1800s onwards and in waves that mimicked interference in those specific regions by colonial powers during the ‘scramble for Africa’ [26].

Fig. 2
figure 2

Spread of two distinct M. ulcerans lineage 3 genotypes across Africa

Interestingly, the Mu_A2 genotype was also found in PNG, and phylogenetic inference suggests PNG may have been the origin of the Mu_A2 genotype. How the bacteria were transported across oceans and continents from south east Asia to Africa in the mid 1800s is not clear, although again, several European colonial powers were active in both these regions of the world at that time.

The presence of the Mu_A2 genotype in Africa indicates M. ulcerans can be mobilised and displaced across large distances, but the dominant characteristic of M. ulcerans populations is their strong, geographically restricted genotypes. In Africa, these constrained genotypes align with different hydrogeological basins [26]. This observation is consistent with the well-described epidemiology of BU, where human disease is strongly associated with lentic and lacustrine environments, and also supports the notion of an aquatic reservoir for the bacterium (or at least a reservoir species restrained somewhat by these basins), with the hydrogeography likely providing a major barrier to reservoir (and therefore pathogen) movement [26, 27].

In South Eastern Australia, the native possum is a wildlife reservoir of M. ulcerans and there is a strong association between possums harbouring the bacteria and human Buruli ulcer [28, 29]. Genomics has shown that possum and patient M. ulcerans isolates have identical genotypes [29] indicating that humans and possums are part of the same transmission network.

4 Genomic Approaches to Micro-Molecular Epidemiological Investigations of BU

There were expectations at the dawn of the genomic revolution that comparisons of M. ulcerans genomes collected from BU patients in endemic areas would reveal striking patterns of bacterial spread that in turn would lead to a substantially deeper understanding of how this enigmatic disease is transmitted [15]. Several M. ulcerans population genomic studies have now been conducted in Africa and Australia at the descending scales of country, region and village. Important new insights have been made, whilst also raising new questions about how M. ulcerans is spreading.

Several teams have explored genomic variation of M. ulcerans populations in West Africa and they have all reported and confirmed the strong association between genotype and region as described above. This relationship has a fractal quality, where genotypes continue to associate with region across large spatial scales; from continental, to country, to regional levels. However, somewhat unexpectedly, microepidemiological observations powered by the extreme resolution offered by whole genome sequence comparisons have shown that this association breaks down at very local scales (buffer size <50 km2), where the distribution of M. ulcerans genotypes appears to become random. This finding is illustrated in several independent studies from Ghana and Cameroon that report clonal complexes aligned with specific river systems as predicted from earlier pre-genomic studies, but there is also mixing of genotypes at district and village levels [30,31,32]. Conclusions from these observations include the possibility that (1) the disease is being vectored by a somewhat mobile entity, perhaps an insect or (2) people are moving to different local areas and acquiring the infection. While the data are not conclusive for any specific hypothesis, there are examples of very young infants with no history of any travel developing BU and becoming infected with genotypes present in more distant villages, thus suggesting something(s) is vectoring M. ulcerans to susceptible human hosts [32].

In addition to the genotypic mixing at local scales, there are also several examples now of very distinctly different genotypes co-circulating with a ‘local’ endemic M. ulcerans clone. This was first described in the Asante Akim North district of Ghana, where a genotype reminiscent of strains originating from Nigeria was detected alongside strains representing a local endemic clone [32]. The same phenomenon was recently described in south east Australia, with a distinct M. ulcerans clone from the far east of the country detected in a highly BU endemic area, co-circulating with a local clone [33]. These studies demonstrate the potential for M. ulcerans to be not recent at all mobilized and spread over large distances. The mechanisms of pathogen dispersal remain to be discovered, but perhaps it could be linked to the movement of human or other mammalian reservoirs [30, 31, 33].

M. ulcerans population genomic studies of highly active BU endemic areas in south east Australia have revealed some interesting features of the pathogen and the disease [33]. A study of 178 M. ulcerans genomes, collected over 70-years has provided a compelling reconstruction of the temporal and spatial spread of the pathogen in that region. The disease appears to have emerged in the early 1800s in the east of the country and then spread suddenly westward, moving into areas around the major population centre of Melbourne in the 1980s. Comparison of the temporal phylogeny with epidemiological data indicates that arrival of the bacteria in a specific region predates the appearance of human disease by 7–10 years. Similar to the African genomic studies, these observations and inferences suggest strongly that M. ulcerans is spreading, rather than awakening a reservoir of quiescent bacteria, perhaps following environmental disturbance. This information could help inform control strategies for BU, where environmental surveillance of the pathogen might help predict the occurrence of disease in humans.

5 Distinguishing Relapse from Reinfection and Familial Studies

An application of genomics has been to try and establish if a patient presenting twice with BU might have suffered a relapse of the previous infection or has been unlucky enough to be reinfected with a different population of bacteria. Distinguishing between these two scenarios is important for informing treatment options. Work in this area is just beginning, but in a study of four patients with recurrent BU episodes three were concluded to have relapsed, with 0–1 SNP differences between first and last obtained isolates from each patient, while the fourth patient was a possible example of re-infection with 20 SNPs different between first and last isolates [34]. This was the first study to deploy genomics in this way for informing BU treatment. The correct interpretation of these comparisons is obviously dependent on the availability of isolates from the patient over time, but equally important is a detailed understanding of the local population structure of M. ulcerans in that region. This understanding is necessary to give the proper context and therefore interpretation to any SNP differences discovered over time within a single patient. Another focused genomics study with similar aims was reported from Australia, where the research team sequenced and compared six M. ulcerans genomes recovered from three familial clusters [35]. The sequence data, combined with epidemiological information argued against person-to-person transmission and suggested that there was a relatively short time window of 1–2 months when family members were exposed to the pathogen [35].

6 Summary and Future Perspectives

M. ulcerans is likely a niche adapted mycobacterium. Its ability to make mycolactone is almost certainly critical to the persistence of the bacterium in that niche. The bacterium’s primary host and reservoir is likely aquatic. This reservoir is probably restrained within lentic systems. BU spreads when the bacterium moves from one region to another. Pathogen spread can occur rapidly. These are some of the useful insights that genomic approaches have yielded. From glimpses into the unusual biology of the pathogen when the genome was first revealed, to the large, population-based studies that reconstruct pathogen spread, genomics has been the key tool to drive these new understandings. Future research should capitalize on the advent of cost-effective long-read sequencing to focus on establishing and comparing complete M. ulcerans genomes. The additional DNA sequence variation that potentially lies within complex regions of the M. ulcerans genome could prove very informative for efforts to track pathogen spread. More effort should be made too, to interrogate and predict metabolic pathways from genome sequence data that can then be exploited to improve efforts to detect and isolate M. ulcerans in pure culture from environmental sources. The pMUM plasmid in particular should be a focus for complete sequence assembly, as assessment of the mycolactone PKS genes will inform on the basis for the biosynthesis of mycolactone structural variants and differential pathogenesis.