Introduction

Ocular infection with Chlamydia trachomatis (Ct) causes trachoma, the leading infectious cause of blindness1. Both ocular Ct infection and active disease prevalence decline from their peaks in pre-school children (one to four years old) to older children (five to fourteen years old) and from this group to adults (fifteen years or older)2, 3. This suggests that partial Ct immunity develops with increasing age in endemic communities, notwithstanding reduced exposure to Ct with increasing age4.

Conjunctival Ct infection induces a strong pro-inflammatory response marked by production of cytokines5, recruitment of neutrophils, macrophages and NK-cells6, 7. Induction and proliferation of CD4+ T-cells and production of interferon-gamma (IFNγ) have been implicated in successful resolution of infection in animal models and human infections8,9,10,11. Ocular Ct infection additionally induces local and systemic antibodies12. Neutralising antibodies against Ct have been demonstrated in animal models13 and in vitro 14. Paradoxically in a longitudinal study from The Gambia, higher IgG responses against the immunodominant major outer membrane protein (MOMP) were associated with higher rates of infection and higher titres increased the associated risk15. Ocular Ct infection clearly induces a strong humoral immune response, but its role in protection or pathology in humans in vivo remains unclear.

Screening whole-proteome arrays has become an effective method to describe the complete profile of antibody responses in infection and disease16. Studies utilising these proteome arrays have highlighted some common themes in humoral immune targets of human pathogens including functions in protein binding and catalytic activities, early or late expression in the developmental cycle and membrane localisation17.

Genome-wide analyses of the type or mechanisms of selection have been conducted on a number of human pathogens and identified known and novel targets that were under immune selective pressures. Studies in many human pathogens, including P. falciparum 18, 19, Helicobacter pylori 20, Staphylococcus aureus 21 and Streptococcus pneumonia 22, have highlighted cellular and humoral immune targets, cell surface proteins and known host-interactors as overrepresented in genes under both balancing (selection and maintenance of multiple alleles) and positive selection (selection of advantageous alleles).

Until recently, the number of sequenced Ct genomes has been low and they have been derived from specimens that were collected from ecologically disparate sites over a timespan >50 years (Harris et al.23). A study by Thomson et al.24 utilizing 3 genomes, more recent studies by Joseph et al.25, 26 utilising 12 and 32 genomes respectively followed by Borges et al.27 (n = 59 genomes), identified positive selection in known host-interactors that were either surface-exposed or secreted into the host cytosol. Most recently, Hadfield et al. analysed 563 Ct genomes28. Similar to the prior studies, they focussed on global diversity of the species and its evolutionary history, rather than within-population dynamics. In trachoma, previous studies have focused on the population genetics of ompA, which encodes the immunodominant MOMP. Two populations of Ct sequences from trachoma-endemic communities in The Gambia29 and Tanzania30 suggested ompA was under purifying (selection against deleterious alleles) and positive selection; similar variation in selection pressure has been found in urogenital Ct sequences31. A lack of balancing selection in this immunodominant antigen, which is a target of neutralising antibodies, is in contrast to other pathogens and highlights the need for further population-based studies of Ct-genes under selection to better understand the interactions between Ct and the host immune system.

We used sera, collected from Gambian children at the baseline point of a six-month longitudinal cohort, to screen a protein microarray of 894 genomic ORFs from serovar D Ct32. The complete profile of responses for each sample was determined and used to investigate the differential recognition of individual antigens and estimate the diversity and evenness of the antibody response associated with the frequency and duration of Ct infection. A population of 126 complete genome sequences of ocular Ct samples obtained from discrete communities of four Bijagos Islands (Guinea-Bissau) collected in a single survey33, 34 was used in tests of population genetic selection. Genome-wide evidence of selection and genome-wide screening of the antibody response in the context of susceptibility to ocular infection observed over a 6-month period were overlaid. This enabled us to identify new targets of humoral immunity and we uncovered two complementary immune evasion tactics that may support Ct survival and promote recurrent infection.

Results

Immunity defined by susceptibility to infection

After normalisation and filtering to remove infrequently recognised antigens, responses of 90 individuals covering 441 antigens were included in the analysis. Individuals were divided into those resistant or susceptible based on observed median duration of infections over the six-month study period (Supplementary Figure 1). The study-wide median duration of infection was 2 weeks (or one visit). Those with no infections or short duration infections were combined (≤2 weeks; resistant) and compared to those with long duration infections (>2 weeks; susceptible). The demographic similarity of resistant and susceptible individuals indicated that history of ocular Ct exposure was similar (Table 1).

Table 1 Age, gender and village membership in resistant and susceptible groups screened on the micro-array.

Diversity of the antibody response and infection susceptibility

The complete profile of antibody responses in resistant and susceptible individuals was used to investigate differences in diversity or evenness of anti-Ct responses. Breadth, defined as the number of antigens recognised by an individual, was higher in susceptible individuals but this was not significant (p = 0.088) (Fig. 1A). Diversity indices, which incorporate breadth and the relative strength of responses against an antigen, were higher in susceptible individuals. This reached significance for Shannon’s (p = 0.024) but not Simpson’s diversity indices (p = 0.080) (Fig. 1B and C). Together these indicate broader, less focussed antibody responses in susceptible individuals.

Figure 1
figure 1

Breadth and diversity of responses in resistant and susceptible individuals. Notched boxplots of breadth/diversity of responses (x-axis) in resistant and susceptible individuals (y-axis). (A) Breadth measured as the number positive responses within individuals (p = 0.088). Diversity measured using (B) Shannon’s diversity index (p = 0.024) and (C) Simpson’s diversity index (p = 0.080). Median (red lines) and notches were calculated as the median +/− 1.57 × IQR/sqrt of n, where IQR is the interquartile range and n is the number of samples. The IQR times 1.5 was added to the 75th percentile and subtracted from the 25th percentile to determine the whiskers. Dots are outliers.

Individual antibody responses were associated with increased susceptibility

Association between responses to individual antigens and infection frequency and duration was determined using a generalised linear model adjusting for other major risk factors (age, gender and village of residence). Forty-two antigens were identified as targets of differential antibody responses between resistant and susceptible individuals (p ≤ 0.05) (Table 2). Higher responses to each of these antigens was associated with susceptibility to infection.

Table 2 Forty-two differentially recognised antigens between resistant and susceptible individuals.

To examine when and how these antigens might be targeted by the host immune response during the Ct developmental cycle, their expression stage from Belland et al.35 and predicted localisation from three computational tools (loctree3, Cello and pSORTB) were compared between the 42 differentially recognised antigens and the 441 antigens (Supplementary Table 1). A χ2 test was used to quantify over-representation or under-representation of expression stage or predicted localisations in the differentially recognised antigens, compared to all 441 antigens. Very early (1 hour post infection [HPI]) and very late (24–36 HPI) expressed genes were significantly over-represented (p = 0.007). Secreted, inner membrane and periplasmic proteins were weakly over-represented in these antigens (p = 0.056). Experimentally defined localisation of these antigens showed mixed agreement with software predictions, a number of those known to be secreted or reside in the outer membrane were incorrectly classified as remaining inside the inclusion. There was less individually-determined data available for expression stage, although some antigens predicted to peak at 40 HPI have been shown since to peak 1–6 HPI. These disagreements, in classification of expression stage and localisation, strengthen the over-representation of antigens expressed early or late and localised to interact directly with the host.

Genome-wide evidence of purifying and positive selection

In general, pathogen immunogenic proteins are under natural selection, due to their impact on pathogen survival and transmission. To validate identified immune targets and identify further targets, sequence data from 126 ocular Ct samples from the Bjiagos Islands, Guinea-Bissau was examined for evidence of departure from neutral selection. These samples were collected and sequenced as described elsewhere33, 34, details in Supplementary Methods. Currently, no Ct whole-genome sequences are available from the villages included in this study. To determine the relevance of evidence of selection in samples from the Bijagos Islands, they were compared with five historical isolates from The Gambia (Supplementary Figure 2). The Bijagos Islands samples were separated from each other by 1–1119 SNPs, the Gambian samples were separated by 161–1704 SNPs and the two populations of samples were separated by 487–1019 SNPs. As expected, based on previous Ct sequences, this suggests the geographically-distinct populations are close genetic relations, supporting the use of the Bijagos Islands samples in predicting genes under selection within those circulating in The Gambia. Therefore we tested for evidence of departure from neutrality using a number of tests including Tajima’s D, Fay and Wu’s H, and the integrated haplotype score.

Tajima’s D can distinguish between directional selection (positive or purifying selection [D ≤ −1.8]) and balancing selection (D ≥ 2.044), by comparing the levels of low and medium frequency alleles36. Fay and Wu’s H can distinguish between balancing/positive selection and balancing/purifying selection, by comparing the levels of low and high frequency alleles. Balancing (D ≥ 2.044 and 0.72 ≤ H ≤ −3.85), positive (D ≤ −1.8 and H ≤ −3.85) and purifying selection (D ≤ −1.8 and H ≥ 0.72) can be differentiated, by combining D- and H-values.

Nineteen genes with evidence of selection by Tajima’s D were supported by Fay and Wu’s H. Ten of these genes had evidence of positive selection and 9 genes had evidence of purifying selection (Fig. 2A and Table 3 [‘Windows under selection’ = 0]). D- and H-values were then determined using a 42 base pair sliding window analysis, this window equates to the most frequent length of an antibody epitope (16 amino acids). Seventy-six windows across 12 genes had evidence of positive selection and 61 windows across 8 genes had evidence of purifying selection (Fig. 2B and Table 3 [‘Windows under selection’ >0]). Evidence of natural selection acting on/within these genes suggests they may influence Ct survival and transmission.

Figure 2
figure 2

Correlation between Tajima’s D and Fay and Wu’s H. Genome-wide correlation of D and H values at (A) gene-level and (B) epitope-level (sliding windows of 42 bp). Values significantly different from zero are indicated for each measure (dashed red lines). Genes with evidence of positive (blue), purifying (red) and balancing selection (green) are highlighted.

Table 3 Genes under selection identified by Tajima’s D and Fay and Wu’s H.

Integrated haplotype scores identify three genomic regions under positive selection

Identification of positive selection by Tajima’s D and Fay and Wu’s H is most powerful when an allele is close to fixation (only one allele at a given site). A genome-wide scan was performed to calculate integrated haplotype scores (iHS) for SNPs, to identify genes and regions under positive selection that have not yet reached fixation.

The median iHS score was 0.66 (95% CI 0.03–2.18), 20 SNPs in the top 1% of the genome-wide distribution of iHS (Fig. 3). The top 1% of SNPs highlighted three loci, which showed evidence of recent positive selection; CT048-CT074, CT154-CT155 and CT456-CT625. These loci include a region covering tarP and pmp family members and a region within the Ct plasticity zone.

Figure 3
figure 3

Evidence of positive selection using the integrated haplotype score. Genes in the top 1% of values (dashed red line) had the strongest evidence of positive selection. Regions (blue lines) and individual SNPs (blue shading) under positive selection are indicated.

Ct genes important for Ct survival and pathogenesis were the focus of natural selection

In this population of ocular Ct samples, 48 genes were identified with evidence of selection by either a combination of Tajima’s D and Fay and Wu’s H, at the gene or epitope level, or by iHS. Expression stage of these genes and localisation of the translated proteins was examined to discover common patterns in genes with evidence of selection (Supplementary Table 2). Secreted and outer membrane proteins were significantly over-represented in these targets (p = 0.004), as were genes with peak expression levels very early or very late in the developmental cycle (p = 0.0005). Expression at these pivotal stages of infection and extra-inclusion localisations suggests these genes are important factors in Ct survival and pathogenesis. Similar to the 42 antigens identified in this study, comparison of the array-determined expression stage and predicted localisation to more recent experiments supported these findings.

Variable evidence of selection acting on genes associated with infection frequency and duration

Evidence of selection is a common marker of pathogen immunogenic proteins due to their interactions with the host and impact on pathogen survival and transmission. Therefore, we examined the 42 antigens associated with susceptibility to infection for evidence of selection, as one means of validation as important immune targets (Fig. 4 and Supplementary Table 3).

Figure 4
figure 4

Evidence of selection in antibody targets associated with susceptibility to infection. Evidence of selection was determined at the gene-level (A) and the epitope-level (B) by Tajima’s D and Fay and Wu’s H. Evidence of positive selection on SNPs and larger genomic regions was independently determined by iHS (C). Association with susceptibility to infection is indicated (red). Thresholds for genes considered under selection, values significantly different from zero, is indicated for each measure (dashed red lines). Genes or SNPs (C) with evidence of positive (blue), purifying (red) and balancing selection (green) are highlighted.

Four of 42 targets associated with susceptibility to infection had gene-level evidence of selection by Tajima’s D, supported by Fay and Wu’s H. CT694 and CT695, had evidence of positive selection at the epitope level supported by D and H values. CT545 and CT806 had evidence of purifying selection at the epitope level.

Ten genes were within regions under positive selection by iHS. One target, susceptibility-associated CT228, contained a SNP under positive selection by iHS.

The majority of the antigens associated with susceptibility had no evidence of selection in this population, suggesting they are evolving under neutral selection.

Discussion

There is considerable evidence from animal models suggesting antibody responses are necessary for long-term protection and immunity from chlamydial infection37,38,39. In mice, the breadth of the antibody response is higher in strains that are more susceptible to chlamydial infection and associated pathology40,41,42. Furthermore, in non-human primates, partial immunity to ocular infection was consistent with development of a focussed antibody recall response43.

Previous work on the Gambian six-month longitudinal cohort utilised in this study focussed on systemic and local cell-mediated immune responses44, 45. To gain a more complete picture of the immune responses underlying differential outcomes, we screened serum samples against a Ct protein microarray to examine the relationship between individual serological immune responses and the acquisition and resolution of ocular Ct infection. Antibody responses were more focused in children who were able to resolve infection, while heightened responses to 42 antigens were associated with susceptibility to infection and longer durations of infection.

Tests for selection identified a number of genes and regions of the Ct genome under purifying and positive selection, using 126 ocular samples collected from the Bijagos Islands (Guinea-Bissau). These regions were focused in immunogenic proteins and those that interact directly with the host. This result was further strengthened by using experimentally derived expression stage and localisation of individually investigated genes, as opposed to array-based profiling of expression and bioinformatics prediction of localisation. Evidence for selection within the potential immune targets identified from the proteome array analysis was variable, 5/42 targets had significant evidence of selection.

Genes important in host-pathogen interactions are under natural selection. Most frequently, selection is observed in pathogen immune targets, where balancing and/or positive selection can aid immune evasion46. We found evidence of selection in Ct genes that code for known or putative immune targets or virulence factors, similar to previously described Ct genomics studies24,25,26,27. Genes coding for proteins involved in cell entry, cell exit and intracellular interactions via the inclusion membrane encompassed the majority of genes under selection. Additionally, 17 of 48 genes with evidence of selection have no known function, suggesting these genes may be important for Ct survival and transmission. Genome-wide scans for evidence of selection therefore have the ability to highlight important, previously uncharacterized genes.

Seventeen of the genes under selection are known to be immunogenic; this supports immune-recognition as a key driving factor of selection within this population of ocular Ct samples. However, there was limited evidence of balancing selection, a common mechanism of immune evasion employed by human pathogens18, 21, 47, 48. This method of immune evasion relies on cyclical presentation of alternative forms of immunodominant, primarily surface-exposed, antigens to the host immune system to avoid recognition and clearance.

Typically, in trachoma endemic communities where immunity develops slowly, Ct is able to reinfect individuals within the same households or village. The lack of protective antibodies, combined with the absence of balancing selection, suggests that Ct employs a different strategy for immune evasion. The results suggest two potential routes for immune evasion, a) blocking and/or invasion-enhancing antibodies against surface antigens or b) masking of protective humoral immune responses through heightened responses against a large number of immunogenic but non-protective antigens.

Blocking antibodies are induced by surface antigens of P. falciparum 49, Candida albicans and Neisseria gonorrhoeae and can inhibit protective responses50, 51. Antibody-dependent enhancement is a well-described process in viral infections52, 53, involving cross-linking of host-cell surface receptors by virus-antibody complexes leading to enhanced infectivity. Similar observations in P. vivax 54 suggest this mechanism may be more widely utilized by pathogens.

Through Ct-protein array screening of serum from children resistant or susceptible to frequent and/or prolonged Ct infection, we identified three susceptibility associated surface antigens CT017 (Ctad1), CT541 (MIP) and CT579. Antibodies generated against non-protective surface antigens provide a survival advantage for Ct and such mutations would be expected to be under purifying selection. There was no clear evidence of selection in these antigens. However, three of four outer membrane proteins (ompA [MOMP], pmpC and pmpH) with strong evidence of selection were under purifying selection, suggesting these antigens are targets of blocking and/or invasion-enhancing antibodies. In support of this, MOMP can induce blocking55 and invasion-enhancing antibodies14, 56 in vitro and in mouse models.

The majority of the 42 susceptibility associated antigens were not localised to the EB surface, therefore their role in Ct infection and disease cannot be explained by a hypothesis that solely involves non-protective surface antigens. The breadth of the antibody response shows that a high number of Ct antigens are accessible to the host immune system, likely through release of Ct proteins as a result of inclusion/host cell lysis or recently described chlamydial extrusions57, 58. Antibody responses against the majority of Ct proteins are not protective, leading to a hypothesis that heightened responses to a large number of immunogenic but not protective antigens masks protective humoral immune responses.

A diverse antibody profile would provide an advantage for the bacteria therefore targeted genes would be expected to be under purifying selection, however the number of antigens targeted and their inherently random nature implies no one gene would show strong evidence of selection. In support of evasion by decoy, 37/42 of the antigens identified as susceptibility-associated had no strong evidence of selection. Of the five under selection, two had evidence of purifying selection (CT545 and CT806). The remaining three were under positive selection (CT228, CT694 and CT695), however they are known virulence factors59,60,61,62,63, therefore immune evasion may not be the only driving force of selection.

Ideally, samples from the Gambian villages utilised in this study would have been screened for evidence of selection, however whole-genome sequences are currently unavailable. The phylogeny of the Bijagos Islands samples and the previously published Gambian samples, suggests they are closely related. This relatedness and the similar community-level endemicity of active trachoma (Gambian cohort 21.5% and Bijagos Islands 22% [one to nine year olds]) and ocular Ct infection (Gambian cohort 9.9% and Bijagos Islands 25% [one to nine year olds]), support the assertion that the genes under selection may be shared between these geographically-separated Ct populations. Additionally, evidence of diversity within the Bijagos Islands samples and historical Gambian samples, suggests observed differences in infection frequency and duration between villages are not due to the presence of a single dominant Ct clone.

These data demonstrate the value of genomic information in the identification of immune targets and virulence factors, particularly in combination with proteome-wide antibody responses. Less focussed antibody responses in susceptible individuals and evidence of purifying selection in Ct surface antigens, support an evasion strategy in which Ct presents a number of non-protective, irrelevant antigens to the immune system to block or misdirect protective responses. These results will allow targeted development of vaccine candidates and may explain the observed success of single and multi-antigen vaccines against Ct, as they would promote a focussed, protective humoral immune response. In summary, our data support the importance of antibody responses in human immunity to Ct infection.

Methods

Ethics statement

The Gambian longitudinal cohort was conducted in accordance with the Declaration of Helsinki. The study and its procedures were approved by the Gambia Government/Medical Research Council The Gambia Unit Joint Ethics Committee and by the Ethics Committee of the London School of Hygiene and Tropical Medicine (LSHTM). Verbal consent was obtained from community leaders. Written informed consent was obtained from all study participants’ guardians on their behalf. A signature or thumbprint is considered an appropriate record of consent in this setting by the above ethical bodies.

The Bissau-Guinean cross-sectional survey was conducted in accordance with the Declaration of Helsinki. The study and its procedures were approved by the Comitê Nacional de Ética e Saúde (Guinea-Bissau) and the LSHTM Ethics Committee. Verbal consent was obtained from community leaders. Written informed consent was obtained from all study participants or their guardians on their behalf if participants were children. A signature or thumbprint is considered an appropriate record of consent in this setting by the above ethical bodies.

Clinical cohort study and participants

A six-month longitudinal study was previously conducted and recruited 345 children, aged four to fifteen years old, from nine villages in The Gambia11. Villages were selected following initial trachoma rapid assessment screening that found clinical signs of active trachoma in 20% of school age children. At baseline and at each fortnightly visit for six-months, children were examined for clinical signs of active trachoma. Trachoma was graded according to the simplified WHO grading system64. Two conjunctival swabs were collected in duplicate to test for Ct infection and tear fluid was collected for mucosal cytokine and antibody assays. Venous blood samples were collected at baseline and cessation of the study, 186 samples were collected. Villages 3 and 9 declined to give venous blood samples and could not be included in this study.

Chlamydia trachomatis antigen microarrays

Ninety sera from baseline and 33 sera from the end of the study were screened against a previously published protein microarray covering 894 genomic open reading frames (ORFs) from serovar D Ct32. Briefly, 894 ORFs were cloned into the pXT7 expression vector, proteins were expressed and printed on glass slides. After blocking, the arrays were probed with diluted sera, bound antibody was detected using a biotin-conjugated anti-human antibody followed by a streptavidin-conjugated secondary antibody.

Proteome microarray normalisation, filtering and clustering

The raw signal intensity data (Supplementary Table 4) from the protein micro-array was transformed by inverse hyperbolic sine transformation and normalised by mean-centring. To filter out infrequently recognised antigens, post-normalisation, the global median of the data was calculated and individual antigens whose median was lower than the global median were excluded from further analysis.

Different methods to identify positive-negative breakpoints in the distribution of the data were tested. Silhouette width was used to quantify the best method. Cluster separation per antigen was derived from the average silhouette width. To classify positive responses, two clusters were identified using the method that had the highest average silhouette width for each antigen.

Diversity metrics

Breadth of response was defined as the number of antigens to which each individual had a positive response. Diversity was calculated using Shannon’s entropy (H) and Simpson’s diversity index (D).

Statistical analyses

Intensity of responses was compared using a generalised linear model (glm) and the number of positive responses using Fisher’s exact test. For the glm, 10,000 permutations of the outcome variable were performed to generate an adjusted p-value. Adjusted odds ratios (OD*) and adjusted confidence intervals (CI*) were calculated as the unadjusted OR/CI exponentiated by half the range for a given antigen.

Chlamydia trachomatis population genetic metrics

As part of a cross-sectional population-based survey in trachoma-endemic communities on the Bijagós Archipelago of Guinea-Bissau, upper tarsal conjunctival swabs were taken33. Whole-genome sequence data (Supplementary Table 5) was obtained from 126 Ct-positive swabs using SureSelect enrichment and Illumina paired-end DNA sequencing technology and assembled using the A/Har13 as reference genome. Aligned multi-fasta files for each gene were used as input for Variscan-2.0.365 to calculate Tajima’s D and Fay and Wu’s H. Integrated haplotype scores (iHS) were calculated using the rehh package in R. Missing base-calls were imputed using a genetic-distance based method.