Introduction

The involvement of the adherent-invasive Escherichia coli (AIEC) pathotype in Crohn’s disease (CD) pathogenesis has been extensively supported, as many researchers have reported higher AIEC prevalence in CD patients than controls1,2,3,4,5,6,7,8,9, and mechanisms of pathogenicity have been linked with CD pathophysiology10,11,12,13,14,15,16,17. The ability to adhere to and invade intestinal epithelial cells, as well as, to survive and replicate inside macrophages are key characteristics of AIEC strains2. No gene or sequence exclusive to the AIEC pathotype has been identified, and AIEC identification currently remains challenging; the only way to identify an AIEC strain is by assessing bacterial infection in cell culture assays which are non-standardised and highly time-consuming2.

AIEC strains isolated to date are clonally diverse and belong to distinct serotypes. Although AIEC primarily fall into the B2 phylogroup, AIEC strains belonging to the A, B1, and D phylogroups have also been isolated1,3,4,6,9,18,19,20,21,22,23. In terms of virulence genes, AIEC resemble extraintestinal pathogenic E. coli (ExPEC), which are mostly non-invasive and the majority of them do not behave like AIEC3,24,25,26, with the exception of some isolates26,27.

Up to now, six genetic elements (pduC, lpfA, lpfA + gipA, chuA, 29 point mutations and 3 genomic regions) have been suggested as putative AIEC molecular markers6,21,23,28,29, however they either present low sensitivity or have been studied in a small number of strains. In a previous study conducted in our research group30, we designed a classification algorithm based on the identification of the nucleotides present in three Single Nucleotide Polymorphisms (SNPs). This algorithm displayed 82.1% specificity, 86.4% sensitivity and 84.0% accuracy within our Spanish strain collection. Given the high genotypic variability of AIEC, our aim was to validate the tool previously presented in AIEC/non-AIEC strains from distant geographical origins and ExPEC strains in order to assess the usefulness of these SNPs as molecular signatures for AIEC screening in external collections.

Results

Confirmation of the validity of the algorithm30 in additional geographically distant AIEC/non-AIEC and ExPEC strains was performed.

When all AIEC/non-AIEC strains from Girona, Mallorca, France, Chile and Australia, as well as ExPEC strains were analysed, 73/98 of the non-AIEC strains were correctly classified but only 39/86 of the AIEC strains were appropriately predicted, resulting in a high probability of obtaining false negatives (54.6%). Therefore, in comparison to the values obtained within our strain collection (82.1% specificity, 86.4% sensitivity and 84.0% accuracy), the global accuracy was significantly reduced (60.9%), with decreased specificity (74.5%) and especially lower sensitivity (45.4%) (Table 1, Fig. 1). In contrast to the previous study30, the SNPs that were found to be differentially distributed among our AIEC and non-AIEC strains (E3-E4_4.4 and E5-E6_3.16 = 3.22(2)) showed similar frequencies according to phenotype when all the strains were considered (Table 2). According to the algorithm30, strains displaying guanine (G) in SNP E3-E4_4.4 are classified as non-AIEC, and the same occurs for those that do not have the gene (−) where SNP E3-E4_4.4 is located and display a nucleotide other than G at SNP E5-E6_3.16 = 3.22(2). Indeed, most AIEC strains (54.6%) were incorrectly classified because they accomplished these conditions (Fig. 2). Other possible SNP combinations were considered for all the strains included in the study but none improved the precision of the algorithm.

Table 1 Summary table of the accuracy of the tool in each strain collection analysed.
Figure 1
figure 1

Geographical distribution of the isolates assessed in four groups of analysis and the percentage of strains that are correctly (green) or incorrectly (red) predicted by the SNP algorithm in comparison with their previous phenotypic characterisation. (A) AIEC/non-AIEC strains from Girona (Spain)30, including LF82 as a reference strain; (B) ExPEC (Spain and USA)26,35,36 and AIEC/non-AIEC strains from Girona (Spain)30; (C) AIEC/non-AIEC strains from Girona (Spain)30 and AIEC/non-AIEC from France, Chile6, Spain (Mallorca)6, Australia33 and ExPEC-Spain26,36 and ExPEC-America35; (D) AIEC/non-AIEC strains from Girona30 and Mallorca6 (Spain).

Table 2 Frequency of particular nucleotide variants in SNP E3-E4_4.4 and E5-E6_3.16 = 3.22(2) with respect to phenotype in two collections of AIEC/non-AIEC strains. Values are given in percentages with respect to the total number of AIEC or non-AIEC strains.
Figure 2
figure 2

Classification algorithm for AIEC identification. Assessed in our collection and external strain collections (France, Chile6, Spain (Mallorca)6, Australia33 and ExPEC-Spain26,36 and ExPEC-America35). Percentages represent the proportion of strains that are correctly predicted as AIEC or non-AIEC based on the result for each SNP combination. The number of total strains corresponding to each condition is indicated. (−): no amplification; other: a nucleotide different from guanine (G) or overlapping peaks.

Despite global accuracy of the algorithm being much lower when all strains were considered, the method was suitable for geographically close strain collections. Indeed, if only Spanish strains (Girona and Mallorca) (N = 63) were considered, the accuracy of the tool was maintained (80.9%) (Table 1). Specificity was also good (82.3%), meaning there was a low probability of false positives (17.7%) (Fig. 1). Therefore, strains from different laboratory collections, but of similar geographical origin, were suitable for screening by this method.

The inclusion of ExPEC strains (N = 45) revealed that the tool was also useful for distinguishing the ExPEC and AIEC pathotypes, since 84.6% of strains displaying the AIEC phenotype were correctly classified, with a global accuracy of 78.9% (Table 1, Fig. 1).

These results demonstrated that the classification algorithm presented has limited applicability for all E. coli strains assessed. However, this novel molecular tool showed promising results for Spanish AIEC and ExPEC strains.

Discussion

The identification of molecular tools or rapid tests to easily identify the AIEC pathotype would be of great interest to scientists studying the epidemiology of the pathotype, as well as clinicians hoping to detect which patients are colonised by AIEC to apply personalised treatments. Although several studies have been conducted with this aim in mind, there is still no molecular signature specific to AIEC6,21,23,28,29.

In a previous study we performed comparative genomics of three AIEC/non-AIEC clone pairs and presented a classification algorithm that combines three SNPs, allowing for the classification of phylogenetically and phenotypically diverse E. coli isolates with a high accuracy rate in our strain collection30. Since the application of a molecular tool could assist in overcoming the problem of AIEC identification, we further tested the specificity and sensitivity of the tool in additional geographically distant and phylogenetically diverse AIEC strains, as well as ExPEC strains, which share genetic and phenotypic features3,24,25,26.

The tool was found to be accurate enough to distinguish between AIEC and ExPEC strains, since the sensitivity was 84.6% and the accuracy was 78.9%. In this case, we assessed both AIEC/non-AIEC from Girona (Spain) and ExPEC strains, the latter being mostly Spanish isolates. These results indicated that for a given geographic origin this algorithm could be applied to differentiate ExPEC from AIEC. So far, most of the studies looking for AIEC biomarkers have not included ExPEC strains in their analysis6,21,23,28. There is only one that focused on synonymous and non-synonymous SNPs along the genome of four B2-AIEC strains that could differentiate them from other B2-non-AIEC and B2-ExPEC genomes available in databases. Although they found 29 SNPs that could separate AIEC from non-AIEC using a bioinformatics approach, but did not include the three SNPs in the presented algorithm, it did not find a signature sequence that distinguishes AIEC from ExPEC29. It is not possible to determine whether the high accuracy value we reported is due to similar geographic origin (40 from Spain and 5 USA) or not. Thus the inclusion of other ExPEC strains would be needed to validate the tool further. Unfortunately, the predicted values of the tool decreased considerably (60.9% of accuracy) when strains across several geographic regions were considered. AIEC isolates from France, Chile and Australia were poorly discriminated with the SNP algorithm presented, resulting in significantly reduced sensitivity values (32.3, 0 and 15.4% respectively). Of note, this algorithm may be suitable for Spanish strains, because the accuracy was still high when two different collections of strains were studied (Girona and Mallorca) (80.9% accuracy). Taking into account that the variable gene content of E. coli is highly variable across different geographic regions31, this variation contributes to the algorithm not being applicable across geographically diverse regions and it is subjected to possible variations in the accuracy presented in a particular country.

In conclusion, the molecular tool that we previously proposed30 is not universal since its accuracy was reduced to 60.9% once a larger strain collection from different geographic locations and pathotypes was screened. We suspect it might be a good discrimination tool for a particular geographic location, in this case Spain. However, this observation should be confirmed with the addition of other Spanish strain collections including AIEC, non-AIEC, and other E. coli intestinal and extraintestinal pathotypes. The study of new SNPs that could be useful to distinguish between AIEC/non-AIEC strains from different geographical origins might be time-consuming and unprofitable and should consider many aspects that make it even more complicated (for example, the moment of strain isolation and the patient’s treatment). Therefore, we believe that new approaches (e.g. transcriptomics, metabolomics or epigenetics) should be applied to find a universal AIEC biomarker that could be used as a rapid standardised method for detecting AIEC from E. coli isolates, or maybe just E. coli isolates that have a strong colonizing ability. Nonetheless, there is a possibility that a no universal marker exists and then it would be interesting to look for a biomarker that englobes the majority of AIEC strains32. In any case, this work highlighted the importance of validating putative molecular markers in a diverse strain collection, in terms of geographic origin and pathotype, in order to assess whether or not it could be used universally.

Methods

The SNPs included in the algorithm (E3-E4_4.4, E5-E6_3.16 = 3.22(2) and E5-E6_3.12) were screened by PCR and Sanger sequencing. Primers and PCR conditions are indicated in Table 3. Apart from the strains assessed in the previous study (22 AIEC and 28 non-AIEC, which includes LF82 strain)30, this collection comprised 60 AIEC and 29 non-AIEC strains mainly isolated from CD patients and controls from distinct geographical origin (Spain (Mallorca)6, Chile6, France and Australia33) (Table S1). Most of these strains were phenotypically characterised in previous studies6,33. The adhesion and invasion indices of 25/33 Australian strains were measured in this study as previously described1,30,34 in order to classify them phenotypically as AIEC or non-AIEC. In addition, 45 strains isolated from patients with extraintestinal diseases were also included; these were previously isolated from American patients with meningitis35, and Spanish patients with sepsis26 or urinary tract infection36 (Table S1). Phenotypic characterisation of these strains was performed by Martinez-Medina et al.26; in which four strains presented the AIEC-phenotype and were considered as such in the analysis and 41 did not (these were classified as non-AIEC).

Table 3 Primers and PCR conditions used to amplify fragments of the genes in which the Confirmed SNPs were located.

Strains studied in this study were previously isolated under the approval of the Ethics Committee 183 of Clinical Investigation of the Hospital Josep Trueta of Girona on May 22 2006; Ethical 184 committee of Hospital Saint-Louis (CPP#2009/17); Institutional Review Board of Clínica Las 185 Condes, Faculty of Medicine, Universidad de Chile; Ethics Committee of the Northern 186 Metropolitan Health Service, Santiago, Chile; the Balearic Islands’ Ethical Committee, Spain; and 187 ACT Health Human Research Ethics Committee (ETH.5.07.464). Subjects gave written 188 informed consent in accordance with the Declaration of Helsinki.

The differences in the distribution of nucleotides present in each polymorphic site between phenotype were calculated using the Χ2 test. To establish the usefulness of the algorithm for AIEC identification, the specificity, sensitivity and accuracy values were measured as follows: Sensitivity (%)= (true positives/(true positives + false negatives)) × 100, Specificity (%)= (true negatives/(true negatives + false positives)) x 100; and, Accuracy (%)= ((true positives + true negatives)/(total of cases)) × 100. A p-value ≤ 0.05 was considered statistically significant in all cases.