Introduction, methods and results

Contagious caprine pleuropneumonia (CCPP) is a devastating disease affecting domestic goats and several wild ungulate species in arid and semiarid regions of Africa, Middle East and Asia, where goat rearing plays an essential role in food security and poverty alleviation [1]. Owing to its high contagiousness, morbidity and mortality, CCPP is included in the list of notifiable diseases of the World Organisation for Animal Health (WOAH, founded as OIE; [2]). Its etiologic agent, a fastidious bacterium known as Mycoplasma capricolum subsp. capripneumoniae (Mccp), is very rarely isolated and CCPP is hardly ever reported. As a consequence, the distribution, prevalence and impact of CCPP are not well established [3].

To improve our understanding on the epidemiology of CCPP, a molecular typing scheme based on the analysis of eight genetic markers, known as Multi-Locus Sequence Analysis (MLSA), was developed in 2011 [4]. This tool was extremely robust and allowed genotyping directly from infected tissues from which Mccp could not be isolated. The scheme was applied to 27 strains of diverse origins, resulting in the identification of two lineages and 5 groups, which were correlated to the geographic origin of the strains (with the remarkable exception of the Arabian Peninsula, where strains from 4 out of the 5 groups were found). Notably, the identification of a distinct Asian cluster represented by two recent strains from Tajikistan and China (sole representatives of Central and East Asia available at the time) indicated a local evolution of strains and excluded a recent introduction of CCPP in the continent.

Thanks to the democratisation of high throughput sequencing technologies more sophisticated Mccp strain genotyping methods have been developed, from a multi-gene scheme [5] to a whole-genome sequence (WGS) analysis pipeline [6], attaining optimum strain typing for molecular epidemiology studies and outbreak investigations. However, WGS-based genotyping is not available to diagnostic laboratories, particularly in the regions where CCPP is prevalent, and MLSA may still be a valuable alternative, especially when isolation cannot be achieved. Only a few Mccp isolates and WGS have been made available since the MLSA work of 2011 and subsequent reports relating to Mccp strains from wildlife in the United Arab Emirates [7, 8], but MLSA has been conducted following investigations of CCPP outbreaks in Tibetan wild ungulates first identified in 2012 [9] and, more recently, in Pakistani goats in 2019 [10]. The objective of our study was thus to explore the diversity of Mccp strains in Asia, by analysing new MLSA data from Pakistan and China, including strains originating from wildlife. This was also the opportunity to update the global Mccp MLSA, by including all the data generated since 2011, and to analyse its value and performance in comparison to subsequent typing techniques based on WGS data.

The 43 Mccp strains and/or corresponding genomic sequences analysed in this study are presented in Table 1, including 8 strains from wild ungulate species. Thirty-three of them were included in subsequent typing schemes [5, 6] and corresponding phylogenetic groups are presented when available. Sixteen new strains were added to 27 previously published [4]. MLSA data of 6 new strains were extracted from available WGS, while the remaining 10 were obtained by PCR amplification and sequencing of the corresponding eight loci as previously described [4], with the exception that Sanger sequencing was performed by Macrogen (South Korea), while Geneious 10.2.6 [11] was used for sequence assembly and alignment.

Table 1 List of Mccp strains and genomes analysed in this study and corresponding MLSA types.

The sequences of epidemiologically-related strains collected in nearby locations during CCPP epizootics in Uganda, Tunisia and Tibet or obtained by in vitro passage (Table 1) were identical, showing that the molecular markers were stable and there were no laboratory-introduced variations. Furthermore, MLSA results obtained by locus amplification and sequencing versus extraction from WGS data (for 15 strains analysed by PCR and sequencing with WGS available in GenBank, Table 1) were also identical. The only exception was strain F38, for which a single nucleotide polymorphism (SNP) in the H2 locus differentiated MLSA sequences obtained by the two methods. However, since two different laboratory stocks of this strain were used for PCR and sequencing (CIRAD) and WGS (NCTC 10192 T), this SNP may result from divergent evolution undergone by the two laboratory stocks from the original 1974 isolate [12]. When the scheme was applied to the remaining 39 “unrelated” strains in Table 1, 24 sequence types (ST) (9 new) were discriminated based on 68 polymorphic positions (16 new), which are shown in Table 2, with locus sequences from Mccp type strain F38 serving as reference. This resulted in a Simpson’s index of diversity of 0.970 (0.953–0.987), which expresses the probability of two unrelated strains being characterised as the same type [13, 14]. All the strains could be discriminated individually by WGS analysis [6] and all but two of those analysed by Dupuy et al. [5] provided distinct genotypes (Table 1). However, these two isolates were actually discriminated by MLSA, which allowed typing of non-viable strains (n = 6, Table 1) with no added difficulty or cost.

Table 2 Sequence polymorphisms found among the Mccp strains analysed.

A robust tree (Figure 1) was obtained by distance analysis of MLSA data using DARwin 6 [15] as previously described. Seven genotyping groups were identified, distributed in the two lineages previously described. Pre-existing MLSA groups 1–5 were unchanged, with the exception of several additional ST identified in group 1, corresponding to East African and Emirati strains originating from domestic goat and wild ungulates respectively. The remaining new ST identified in this work corresponded to Asian strains and were clustered in two additional groups, positioned within lineage II. A highly variable cluster located near the centre of the tree and represented by Chinese strains from Shandong and Tibet was designated group 6, whereas the Pakistani strain constituted the single representative of group 7. All Asian strains (disregarding those originating from the Middle East) were found spread among three clusters (groups 3, 6 and 7) within lineage II, together with group 4 (represented by strains from North Africa, the Arabian Peninsula and Turkey) and group 5 (comprising mainly East African strains). As shown in Figure 2, a generally good correlation between ST and geographic origin was retained, with the exception of the Arabian Peninsula, where animals from diverse origins are imported every year, particularly at the occasion of Muslim feasts [4]. A similar situation was now observed in Turkey, since strains from Thrace and Elazig (East Turkey) were positioned in groups 3 and 4 respectively.

Figure 1
figure 1

Tree derived from distance analysis of the eight concatenated MLSA loci. Neighbour-joining tree (DARwin 6) based on the analysis of a 6753 bp-sequence resulting from concatenation of the eight MLSA loci corresponding to the 24 sequence types identified among 43 (39 unrelated) strains (Table 1). Genotypes are assigned colour categories according to their geographical origin. Bootstrap percentage values were calculated from 1000 resamples and values over 80% are shown. The scale bar shows the equivalent distance to 1 substitution per 1000 nucleotide positions.

Figure 2
figure 2

Geographic distribution of the strains analysed in this study. Each strain is represented by a symbol corresponding to its MLSA group (with circles and diamonds of various colours representing lineage I and II respectively) and its specific sequence type is indicated at the proximity. Strains for which the precise location was not known are indicated by barred symbols, placed arbitrarily in the country of origin. Question marks indicate areas from where no recent data is available.

Discussion

The relevance of the MLSA scheme for Mccp genotyping and epidemiology analyses is unquestionable, particularly when we consider its accessibility, affordability, ease of use and superior typeability, allowing direct genotyping from poor samples. Furthermore, its stability, regardless the method used to obtain the data, was remarkable and MLSA clustering was highly congruent with both epidemiological and phylogenetic analyses [5, 6]. Finally, its high discriminatory power was compatible with epidemiological investigations.

Analysis of new strains from Pakistan and China allowed a better representation of the spread of CCPP in Asia (Figure 2) and revealed unpredicted diversity in this continent (Figure 1). The Pakistani strain was the sole representative of a new cluster (group 7), the diversity and distribution of which remain to be disclosed. This was unfortunately the only strain available from South Asia, where the occurrence of CCPP was documented as early as 1914 [16] and where CCPP is known to be prevalent [10, 17, 18]. The new Chinese strains constituted a distinct cluster (group 6), separate from previously described Tajik and Chinese strains (group 3). A strain from Tibetan sheep collected in the Nagqu region of Tibet (Table 1), where devastating CCPP outbreaks have been reported in both domestic goat and antelope since 2012 [9],was placed at the base of this group. This strain was more closely related to strains from domestic goats collected at Shandong than to strains from Tibetan antelope collected at Nagqu. This may be explained by the wide area of distribution of domestic and wild ungulate species across the Qinghai-Tibetan plateau and its peripheral mountains. It was assumed that Tibetan antelopes were infected due to close contact with domestic goats, which are progressively invading their habitat [9]. Furthermore, strains from CCPP outbreaks affecting domestic sheep in Uganda [19] and four different wild ungulate species in the Middle East [7, 8, 20] (Table 1), were placed in group 1, very distant to those from Tibetan wildlife, and shared or were closely related to ST from goat isolates, indicating that the same strains can affect a wide variety of species. Again, the assumption was that domestic goats were the source of the infection in sheep and wildlife, though direct Mccp transmission among infected wild ungulates of different species has been demonstrated, at least in captivity [8, 20].

Analysis of new Emirati strains from wildlife and additional strains from East Africa resulted in the identification of three new ST in group 1, revealing greater diversity for this cluster, which is spreading in the region. The strain introduced in Mauritius in 2009 [21], shared ST with a highly virulent Kenyan isolate from 2012 [22, 23] and was closely related to a strain that was responsible for CCPP outbreaks across Tanzania in 2013 [6]. The relatively low diversity of this group, and generally of lineage I compared to lineage II, deserves further investigation. Similarly, the presence in East Africa of two distant genotyping groups (one from each lineage), suggesting two different introductions of CCPP in this region, needs to be elucidated for a better understanding of the origin and evolution of CCPP in Africa.

CCPP was suspected in India and China since the beginning of the twentieth century [16, 24], but its presence in Asia was only confirmed in 2007 [25]. Already in 2011, MLSA genotyping suggested that CCPP was present for a long time in Asia [4], which was substantiated by subsequent large-scale genomic analyses [5, 6]. The great genetic diversity observed here among Asian Mccp strains in spite of the limited number of samples analysed, together with the position of MLSA groups 3 and 6 (represented by Central and East Asian strains) at the centre of the tree, point towards a possible origin of CCPP in Asia. Again, the scarcity of Mccp strains hampers a precise determination of the emergence, diversity and distribution of Mccp.

A better assessment of the molecular evolution and epidemiology of CCPP in Asia and Africa calls for renewed efforts to dramatically enlarge the sample of strains from diverse origins representing the real distribution of CCPP, which is yet to be established (Figure 2). MLSA can be an excellent tool to do this, provided CCPP cases are investigated, since these analyses can be achieved from simple samples such as dried filter paper imbedded in infected material, which can be easily stored and shipped at room temperature.