Background

Enterocytozoon bieneusi is an obligate intracellular pathogen infecting humans, domestic animals and wildlife [1]. Since the detection of E. bieneusi in AIDS patients in 1985 for the first time, this pathogen has been found in other immunocompromised/immunodeficient individuals, such as organ transplant recipients, children, and life-threatening and persistent diarrhea often occur in these populations [2,3,4,5,6,7]. It has been recently identified in several species of non-human primates (NHPs) in China and elsewhere [8,9,10,11,12,13,14].

Approximately 500 E. bieneusi genotypes have been identified based on sequence analysis of the internal transcribed spacer (ITS) region of the rRNA gene [15, 16]. They belong to 11 phylogenetic groups. Among them, ITS genotypes in Group 1 and Group 2 have been found in a broad range of hosts including humans and are probably responsible for most zoonotic or cross-species E. bieneusi infections whereas host adaptation seems to be more common in genotypes of Groups 3 to 11 according to a recent review of E. bieneusi [6]. To date, more than 100 E. bieneusi genotypes have been detected in NHPs, mostly belonging to Group 1 [6, 9, 10, 13, 14, 17,18,19,20,21]. Among them, Type IV and Peru8 are common ITS genotypes in humans and various animals in many countries, including China [22]. Conversely, Macaque3 has been mostly found in monkeys and occasionally in dogs in China [1, 21, 23,24,25,26].

Multilocus sequence typing (MLST) based on sequence analysis of four markers (MS1, MS3, MS4 and MS7) with simple tandem repeats and the ITS has been used in investigating the transmission dynamics of E. bieneusi in humans and some animals, including pigs, sheep, rabbits, calves, black bears, giant pandas and horses [27,28,29,30,31,32,33,34,35]. Thus far, only three studies have used the MLST analysis to characterize E. bieneusi isolates in NHPs, and only one study has found the presence of an overall clonal population structure and an epidemic population structure in subpopulations of this pathogen in NHPs [11, 36, 37].

In this study, 88 E. bieneusi-positive samples from crab-eating macaques (Macaca fascicularis) in Hainan Province, China were used for MLST analysis and population genetic characterization of E. bieneusi. The present study aims at evaluating the genetic diversity and population structure of E. bieneusi in NHPs to better understand the transmission and epidemiology of E. bieneusi in these animals.

Methods

Source of E. bieneusi-positive samples

The 88 E. bieneusi-positive samples used in this study were from crab-eating macaques in Hainan, China. They consisted of ITS genotypes Type IV (n = 44), Macaque3 (n = 24) and Peru8 (n = 20), which were identified by sequence analysis of the ITS region of the rRNA gene in a previous study [8]. They were collected from different animals and used for further molecular characterization in this study. Other minor E. bieneusi genotypes from that study were not included in the present study, as the small numbers of samples for each of them made examinations of intra-genotypic variations of them difficult.

MLST analysis of E. bieneusi

Four established MLST loci were targeted in this study, including three microsatellite loci (MS1, MS3 and MS7) and one minisatellite locus (MS4). These loci were amplified by nested PCR using published primers and conditions [32]. A positive control (DNA of genotype BEB4 which usually infects cattle rather than NHPs) and a negative control (reagent water without DNA) were used in all PCR tests. PCR was performed twice on each DNA sample at each genetic locus. The secondary PCR products generated were bi-directionally sequenced on an ABI 3730 Genetic Analyzer (Applied Biosystems, Foster City, CA). The nucleotide sequences obtained were assembled using software ChromasPro 1.32 (http://technelysium.com.au/ChromasPro.html), and aligned with each other and homologous sequences downloaded from GenBank using ClustalX (http://clustal.org). The sequence alignments at the four genetic loci (MS1, MS3, MS4 and MS7) were used in the assessment of sequence polymorphism within the ITS genotypes, determination of MLGs and phylogenetic analyses of the MLST data.

Population genetic analysis of sequence data

The multilocus sequences data (concatenated sequences from all five markers, including the ITS) generated were used in the analysis of the population genetics of E. bieneusi using the software DnaSP version 5.10.01 (http://www.ub.edu/dnasp/) [38]. The population structure of E. bieneusi was assessed by intragenic and intergenic linkage disequilibrium (LD), neutrality (Fu’s statistics or Fs values on selective neutrality based on genotype frequency), and recombination events (Rms).

Population genetic analysis of allelic profile data

Pairwise intergenic LD based on allelic profiles at the five genetic loci was evaluated using the exact test and Markov chain parameters implemented in Arlequin version 3.5.1.2 [39]. The standardized index of association (ISA) was measured using LIAN 3.7 [40].

Subpopulation structure analysis

The phylogenetic relationship among MLGs including both SNPs and INDELs was assessed using the maximum likelihood (ML) analysis and general time reversible model implemented in MEGA 6 (https://www.megasoftware.net). The reliability of clusters formed was assessed using bootstrap analysis with 1000 replicates. An un-weighted pair group mean average (UPGMA) tree of the allelic data was generated using MEGA 6. Wright’s fixation index (FST) of E. bieneusi isolates was calculated using software Arlequin version 3.5.2.2.

STRUCTURE version 2.3.1 was used to identify distinct sub-populations, with K-values of 2–4 [41]. FST was calculated to assess the robustness of the sub-structuring and to determine genetic distances between subpopulations.

Results

PCR amplification efficiency at the MLST loci

The PCR amplification efficiencies were 81.8%, 80.7%, 86.4% and 89.8% at the MS1, MS3, MS4 and MS7 loci, respectively (Table 1). For ITS genotypes Type IV and Peru8, the amplification efficiencies were similar at the MS1 (88.7% and 80.0%), MS3 (81.8% and 85.0%), MS4 (86.4% and 85.0%) and MS7 (91.2% and 95.0%) loci. For Macaque3, however, the amplification efficiencies at the MS1 (70.8%), MS3 (75.0%), MS4 (87.5%) and MS7 (79.2%) loci were correspondingly lower than those of ITS genotypes Type IV and Peru8 at the three loci except MS4.

Table 1 PCR amplification efficiency of DNA from four Enterocytozoon bieneusi ITS genotypes at the other four MLST loci

Multilocus genotypes

Genetic diversity at the MLST loci was in the form of both SNPs and indels, with 24, 5, 25, 4 and 3 segregation sites at the MS1, MS3, MS4, MS7 and ITS loci, respectively (Table 2). Altogether, 6, 4, 6, 2 genotypes were detected at the MS1, MS3, MS4 and MS7 loci, respectively. The intragenic LD among segregating sites for each locus was calculated using the linear regression equation. The MS3, MS7 and ITS loci were in complete linkage disequilibrium (LD = 1). In contrast, the remaining two loci (MS1 and MS4) had incomplete LD, indicating the potential presence of genetic recombination. This was confirmed by further assessment of intragenic recombination, which showed the presence of two and five Rms at the MS1 and MS4 loci, respectively.

Table 2 Intragenic linkage disequilibrium and recombination events in individual genetic loci among 53 E. bieneusi samples

Of the 88 samples analyzed by MLST, 53 were positive at all four genetic loci. To determine the MLGs, sequences of the five genetic loci (MS1, MS3, MS4, MS7 and ITS) from each of the 53 samples were concatenated. Genetic diversity was assessed based on both total polymorphic sites (finite population variance estimates) and segregating sites only (infinite population variance estimates). Together they formed 33 MLGs with an Hd of 0.98 (Table 3). The mean numbers of pairwise differences in polymorphic and segregating sites were 16 and 14, respectively. The frequency of the MLGs ranged from 9.4% (one MLG with five samples), 7.5% (four MLGs each with three samples), 3.8% (eight MLGs each with two samples), to 1.9% (20 MLGs each with one sample) (Table 4).

Table 3 Genetic diversity in 53 Enterocytozoon bieneusi samples based on the analysis of concatenated multilocus sequences (2661 bp in length)
Table 4 Multilocus sequence types of ITS genotypes Type IV and Macaque3 and Peru8 of Enterocytozoon bieneusi in crab-eating macaques

Population structure

Inter-locus LD over all segregating sites within the multilocus contig was assessed for all 53 samples. This produced a Zns value of 0.185; the probability P-values for expected Zns ≤ 0.185 was 0.976. The incomplete LD among these 53 samples (|D′| Y = 0.8679 − 0.2781X) indicated a possible occurrence of genetic recombination, which was confirmed by results of the recombination analysis (10 Rms) (Table 3). Thereafter, we used Fu’s statistic test to evaluate the genotype frequency in this population. In this test, we obtained Fs values of -5.56 (P (Fs < − 5.56) = 0.158) and − 6.91 (P (Fs < − 6.91) = 0.122) based on polymorphic sites and segregating sites, respectively (Table 3). The results of Fu’s statistic tests indicated the presence of an excessive number of alleles.

We calculated ISA to assess the LD between alleles in pairwise combinations of genetic loci [39]. Results of this analysis of allelic profile data showed a positive ISA value together with VD > L. Furthermore, a significant PMC value (< 0.001) was obtained using the Monte Carlo method based on the null hypothesis of panmixia (Table 5). All these results indicated that the population was in LD. However, values of VD < L and PMC > 0.001 were obtained for subpopulations identified in this study or ITS genotype. Thus, these subpopulations and ITS genotypes were in linkage equilibrium (LE) and had the epidemic population structure.

Table 5 Results of linkage disequilibrium analysis of allelic profile data

Subpopulation structure

Maximum likelihood analysis of the concatenated sequences of the 33 MLGs showed four main subpopulations (SPs). SP1 contained samples of three MLGs of ITS genotype Type IV and two MLGs of ITS genotype Peru8, SP2 contained six MLGs of ITS genotype Type IV and two MLGs of ITS genotype Peru8, SP3 contained seven MLGs of ITS genotype Type IV, four MLGs of ITS genotype Peru8, and five MLGs of ITS genotype Macaque3, while SP4 contained two MLGs of ITS genotype Type IV and two MLGs of ITS genotype Macaque3 (Fig. 1). Similarly, four major clades were formed in UPGMA analysis of 33 MLGs in agreement with results of ML analysis of the concatenated sequence data (Fig. 2).

Fig. 1
figure 1

Phylogenetic relationships of multilocus genotypes (MLGs) of ITS genotypes Type IV, Macaque3 and Peru8 of Enterocytozoon bieneusi inferred using the maximum likelihood analysis of concatenated ITS, MS1, MS3, MS4 and MS7. Bootstrap values greater than 50% from 1000 replicates are shown at the nodes

Fig. 2
figure 2

An unweighted pair-group mean average (UPGMA) tree for multilocus genotypes (MLGs) of ITS genotypes Type IV, Macaque3 and Peru8 of Enterocytozoon bieneusi. Bootstrap values greater than 50% from 1000 replicates are shown at the nodes

In STRUCTURE analysis of the MLST data, we used the smallest value of K (K = 2) to measure population differentiation within the E. bieneusi population. Two subpopulations (SPs) with alpha (α) value of 0.104 were produced (1, 2; Fig. 3a). SP1 contained 14 MLGs, while SP2 contained the remaining 19 MLGs (Fig. 3a). When the K value was increased to 3, three clear SPs were formed (α = 0.074; Fig. 3b). SP1 included 8 MLGs belonging to SP1 at K = 2, SP2 included 3 MLGs belonging to SP1 and 7 MLGs belonging to SP2 at K = 2, while SP3 included 3 MLGs belonging to SP1 and 12 MLGs belonging to SP2 at K = 2 (Fig. 3a, b). When using K = 4, four clear SPs formed, with individuals in SP1 being similar to SP1 at K = 2 and 3 (Fig. 3). In addition, the α value of 0.061 was similar to the values generated at K = 2 and 3, suggesting the substructures at K = 4 could be realistic.

Fig. 3
figure 3

Subpopulation structure of 33 Enterocytozoon bieneusi MLGs. Different subpopulation patterns are shown depending on the setting of K values. a The generation of two subpopulations using K = 2. b The presence of three subpopulations using K = 3. c The appearance of four subpopulations using K = 4

FST analysis was used to assess the sub-structuring within the E. bieneusi population. For SP1 and SP2 generated by STRUCTURE at K = 2, the differentiation was significant (FST = 0.161, P < 0.0001). At K = 3, the differentiation between SP1 and SP2 (FST = 0.362, P < 0.0001), SP1 and SP3 (FST = 0.274, P < 0.0001), and SP2 and SP3 (FST = 0.234, P < 0.0001) were also significant. Similarly at K = 4, the differentiation between SP1 and SP2 (FST = 0.349, P < 0.0001), SP1 and SP3 (FST = 0.345, P < 0.0001), SP1 and SP4 (FST = 0.313, P < 0.0001), SP2 and SP3 (FST = 0.293, P < 0.0001), SP2 and SP4 (FST = 0.233, P < 0.0001), and SP3 and SP4 (FST = 0.254, P < 0.0001) were all significant.

Discussion

Currently, multilocus sequence typing has been used to analyze the genetic diversity of E. bieneusi around the world [15, 42, 43]. In Peru, India and Nigeria, 66 MLGs were identified in HIV-infected individuals [33, 37]. In China, 59 and 44 MLGs were identified in NHPs and pigs, respectively, while some genetic diversity based on MLST data was also observed in E. bieneusi isolates from domestic, wild, and zoo animals [29, 36, 42,43,44,45,46,47,48]. However, few studies analyzed the MLGs of this pathogen in NHPs in China. In the present study, 33 MLGs were identified from 53 E. bieneusi-positive samples from macaque monkeys, indicating a high genetic diversity within some common ITS genotypes (Type IV, Macaques3 and Peru8), and supporting high-resolution of the multilocus sequencing tool established by Feng et al. [32].

A strong and significant intragenic LD was observed in the analysis of the MLST data, indicating an overall clonal population structure of E. bieneusi in this study. In previous studies, the strong and significant intragenic LD obtained was used as an indication for the presence of a clonal population structure of E. bieneusi in the AIDS patients, giant pandas, pigs, fur animals and NHPs [36, 37, 47, 49, 50]. Therefore, there was a clonal endemicity of E. bieneusi in the study population.

Nevertheless, the MLGs found in crab-eating macaques were not segregated by ITS genotypes, indicating the existence of genetic recombination in the E. bieneusi population studied. In previous investigations, E. bieneusi MLGs were segregated by ITS genotypes for some genotypes with narrower host ranges, such as A, EbpA, SCO2 and Cs-4, but not for zoonotic genotypes such as Type IV, D, WL11 and Peru11 [15, 16]. Similarly, in this study, MLGs of E. bieneusi in crab-eating macaques were also not segregated by ITS genotypes, especially for the zoonotic genotypes Type IV and Peru8. Genetic recombination appears to be common in some microsporidia of insects, including Kneallhazia solenopsae in fire ants and Andreanna caspii in mosquitoes. In a previous study in China, the MLGs of some common ITS genotypes (Type IV, Peru8, D, Macaque3, CM2, Peru11, Henan V, PigEBITS7) in NHPs were not segregated by ITS genotypes in phylogenetic analysis, suggesting a possible occurrence of sexual recombination in E. bieneusi [36]. Therefore, data from both studies have indicated that there could be some sexual recombination among common E. bieneusi genotypes in NHPs.

Four SPs were obtained in this study among MLGs isolates within Group 1, according to results of phylogenetic and subpopulation genetics analyses. The four SPs all appear to have an epidemic population structure. In previous studies, the ITS genotype Macaque3 was mostly found in NHPs, and only occasionally in dogs [8,9,10, 26, 51]. Therefore, the clustering of the host-adapted ITS genotype Macaque3 in SP3 and SP4 suggests the NHP-specific nature of these two subpopulations. Both potentially zoonotic and host-adapted subpopulations have been identified in Group 1 in previous studies. The former included MLGs of ITS genotypes Type IV, D, Peru8, WL11, and Peru11 in humans, NHPs, pigs and companion animals, whereas the latter included MLGs of ITS genotypes A in humans, EbpA and CS-4 in pigs, and Macaque3 in NHPs [1, 15, 21, 36, 52, 53]. Therefore, SP3 and SP4 including MLGs of Macaque3 were potentially NHPs-adapted subpopulations, while SP1 and SP2 only including MLGs of ITS genotype Type IV and Peru8 were potentially zoonotic subpopulations.

Conclusions

A much higher genetic diversity was revealed in E. bieneusi isolates from crab-eating macaques by multilocus sequence typing (33 MLGs) than by ITS genotyping (three ITS genotypes). This appears to have resulted from genetic recombination among common ITS genotypes despite an overall clonal population structure in E. bieneusi. Nevertheless, the MLGs generated from the analysis have formed four subpopulations with different host ranges and an epidemic population structure. These results provide data on the population and subpopulation structure of E. bieneusi in NHPs, and would improve our perception of E. bieneusi epidemiology in these animals.