Background

Cystic fibrosis (CF) is the most common, lethal autosomal recessive disease in Caucasian populations [1]. Most CF patients die in their third or fourth decade from complications of chronic pulmonary infection. Pseudomonas aeruginosa is the predominant pathogen and once it is established within the lungs of CF patients it is rarely eradicated, resulting in increased treatment requirements and an accelerated decline in lung function, quality of life and survival [2]. While many CF patients acquire P. aeruginosa from their natural environment, there is also evidence of person-to-person transmission occurring [3]. Delaying or even preventing P. aeruginosa infection is an important management goal. Consequently, determining P. aeruginosa acquisition pathways and conducting longitudinal surveillance using molecular-based typing techniques are critical steps for developing novel interventions and evidence-based infection control policies to interrupt the spread of transmissible strains within the CF community [46].

Recently, multi-locus sequence typing (MLST) has emerged as an important epidemiological tool for investigating temporally and geographically diverse bacteria [7]. It offers a standardised, reproducible and portable typing approach that allows reliable data comparisons by way of a publically accessible web-based database [7, 8]. However, when applied to large-scale investigations involving many hundreds or thousands of isolates it is limited by cost and complexity [9]. To circumvent these problems, some researchers have utilised defined sets of informative single nucleotide polymorphisms (SNPs) derived from MLST data to infer genetic relationships between isolates. In essence, it is a narrowed MLST approach and has been applied to various organisms, including pathogens relevant to CF, such as methicillin-resistant Staphylococcus aureus and P. aeruginosa [1013]. Selection of appropriate SNPs, including SNP location and total numbers, is an integral facet of informative SNP strategy to ensure a discriminatory, yet cost-effective, typing scheme. However, once an informative SNP approach tailored to a particular purpose is implemented, it will theoretically have limitations in terms of discriminatory power if used beyond its original objectives.

Previously, we have shown that SYBR Green-based real-time polymerase chain reaction (PCR) assays and high-resolution melting (HRM) curve analysis targeting 10 key SNPs in five housekeeping genes (HRM10SNP) can detect the major P. aeruginosa strains shared by CF patients in Queensland, Australia [10]. Furthermore, we demonstrated recently that this form of typing can be adapted to the iPLEX MassARRAY platform to allow high-throughput genotyping [14]. However, based on the high levels of genetic diversity observed amongst shared P. aeruginosa strains in the national Australian CF study [15] and also internationally amongst patients attending CF clinics [16], we sought to reassess the HRM10SNP and investigate alternative SNP-based typing strategies for identifying a broader range of P. aeruginosa strains.

Methods

Clinical isolates

To ensure representative and geographical diversity, 506 clinical isolates were sourced randomly from a biobank of CF isolates collected as part of an ongoing national study of shared P. aeruginosa strains involving patients attending 11 CF clinics in Australia’s five largest cities [15] (Additional file 1: Table S1). Isolates were incubated on horse blood agar plates for 24-hours at 37°C. Once purity was confirmed, heat-denatured suspensions of each isolate were prepared as described previously [10].

HRM10SNPAssay

The HRM10SNP assay was performed for each isolate as described previously [10]. Briefly, each heat-denatured isolate was tested using 10 individual PCR reactions using the qPCR SuperMix-UDG (Invitrogen Australia, Mulgrave, NSW, Australia) on the Rotorgene-6000 (Qiagen, Doncaster, Victoria, Australia). Results from each reaction were compiled to provide a 10-SNP profile for each isolate. As reported previously, isolates with 10-SNP profiles of CTCCTCGGCA, TCTTTCGGTA and CCTCCTGATG were determined to be AUST-01, AUST-02 and AUST-06, respectively [10].

20-SNP iPLEXMassARRAY(iPLEX20SNP)

The iPLEX20SNP assay was based on the Sequenom MassARRAY platform (Sequenom, Brisbane, Queensland, Australia) and was a modification of a method described previously [14]. Here, SNPs were derived by analysing sequence data on the P. aeruginosa PubMLST website [17]. Briefly, 1070 concatenated sequences of P. aeruginosa housekeeping genes (acsA, aroE, guaA, mutL, nuoD, ppsA, and trpE) were downloaded (12 January, 2012) and investigated for informative SNPs with the aid of the Minimum SNPs software version 2043 [18] and by manual sorting (using BioEdit version 7.0.9.0). Overall, 20 SNPs were identified and SNP positions based on the 2882 bp concatenated P. aeruginosa MLST sequence are listed in Tables 1 and 2. Of these 20 SNPs, four were identical to SNPs used in the HRM10SNP assay; SNPs at sites 7, 322, 1152 and 2551 of the iPLEX20SNP assay (Tables 1 and 2) overlapped with SNPs 1, 2, 5 and 10 from the HRM10SNP assay.

Table 1 Primers for primary PCR reaction for the iPLEX20SNP
Table 2 Extension primers used for the iPLEX20SNP

Primers and extension primers for each of the 20 SNPs in the iPLEX20SNP were designed as reported previously [14]. All 20 target SNPs were designed for use in a single multiplex well using Assay Designer 4.0 software (Sequenom, Herston, Queensland, Australia). The 24 amplification primers and 21 extension primers used for SNP detection are listed in Tables 1 and 2. Two extension primers with overlapping mass were used for SNP site 416 to accommodate a known proximal SNP variation (Table 2).

SNP detection by MassARRAY was performed as outlined formerly [14], with the following modifications: (1) following the initial PCR, residual PCR Taq polymerase was removed by protease digestion; 1 μl of protease solution (1.07 AU, Qiagen, Doncaster, Victoria, Australia) was added to each PCR reaction and the mixture incubated at 55°C for 30 min followed by an inactivation at 95°C for 5 min; and (2) the single base extension step was performed using the iPLEX Pro Extension Reaction Kit (Sequenom, Herston, Queensland, Australia) following manufacturer’s instructions. SNPs were coded from 1 to 20 to generate a 20 SNP code. The 20-SNP profiles were then interpreted using the data compiled from in-silico analysis of the P. aeruginosa MLST database, as described below, to provide predicted sequence types (STs). Characterised isolates representative of each SNP were used as reference controls for each test run.

In-silicoanalysis of 20-SNP profiles from the MLST database

For final result analyses, 1779 concatenated sequences of P. aeruginosa housekeeping genes were again downloaded (13th December 2012) and reanalysed. The 1779 P. aeruginosa sequences yielded 1401 different STs. The predicted ability of the 20-SNP profile to distinguish these 1401 STs was investigated, as was its ability to distinguish STs of national: AUST-01 (ST- 649), AUST-02 (ST-775), AUST-03 (ST-242), AUST-04 (ST-788), AUST-05 (STs 274 and 781), AUST-06 (ST-801), AUST-07 (ST-262), AUST-08 (STs 782, 783, 784 and 785) and AUST-09 (STs 274 and 1043), AUST-10 (STs 155 and 179), AUST-11 (STs 803, 1034, 1037, 508, 804, 822 and 882), AUST-12 (ST-179), AUST-13 (STs 389 and 800), AUST-14 (STs 155 and 179), AUST-15 (ST-17), AUST-16 (ST-905), AUST-17 (ST-810), AUST-18 (ST-274), AUST-19 (STs 155 and 786), AUST-20 (ST-655), AUST-21 (STs 808), AUST-22 (ST-809), AUST-23 (ST-833), AUST-24 (ST-308), AUST-25 (ST-274), AUST-26 (ST-179), AUST-27 (ST-455), AUST-28 (ST-241), AUST-29 (ST-261), AUST-30 (ST-1036), AUST-31 (ST-274), AUST-32 (ST-236), AUST-33 (ST-12), AUST-34 (ST-1038), AUST-35 (ST-553), AUST-36 (ST-277), AUST-37 (ST-155), AUST-38 (STs 254 and 1041) [15], and international importance: LES (ST-146), Manchester (ST-217), DK2 (ST-386), PA01 (ST-549), PA14 (ST-253), M18 (ST-1239), PACS2 (ST-1394), NCGM2.S1 (ST-235), PA7 (ST-1195), Clone C (ST-17), Dutch-1 (ST-406), Dutch-2 (ST-497) and Midlands (ST-148) [17, 19, 20].

Statistical analysis

Discriminatory power and the quantitative measure of congruence between the HRM10SNP and iPLEX20SNP methods and corresponding 95% confidence intervals (CI) were determined by calculating the Simpson’s Index of Diversity and the adjusted Wallace coefficients respectively using the online analysis tool at http://darwin.phyloviz.net/ComparingPartitions/index.php?link=Tool. The 20-SNP profile in-silico data were used to predict STs for the 506 clinical isolates utilising the experimental results from the iPLEX20SNP assay.

Results

In-silicoanalysis of MLST data

Analysis of the P. aeruginosa PubMLST website [17] (13th December 2012) showed that the 1401 STs could be divided into 927 different 20-SNP profiles (Additional file 2: Table S2). Overall, 711 STs could be distinguished individually by the 20-SNP profile, whereas the remaining 690 STs had overlapping 20-SNP profiles with one (n = 120), two (n = 51), three (n = 13), four (n = 11), five (n = 6), six (n = 2), seven (n = 2), eight (n = 1), nine (n = 2), ten (n = 4), 11 (n = 2), 12 (n = 1), or 13 other (n = 1) STs (Additional file 2: Table S2). In total, 486/690 (70.4%) STs showing overlapping 20-SNP profiles comprised closely related single- or double-locus variant STs. Based on these data, the D-value for the 20-SNP profiling method was calculated as 0.999 (95% CI 0.998, 0.999). For the STs of national and international importance, nine could be distinguished individually by the 20-SNP profile, whereas the remaining exhibited overlapping 20-SNP profiles. Most of the latter were again single- or double-locus variant STs (Table 3 and Additional file 2: Table S2).

Table 3 P. aeruginosa MLST data from the P. aeruginosa MLST database website (13 th December 2012) and associated SNP profiles for STs of national and international importance

HRM10SNP and iPLEX20SNP typing of the 506 clinical isolates

Application of the HRM10SNP assay provided complete 10-SNP profiles for 494/506 isolates (type-ability = 97.6%) of which 92 different 10-SNP profiles were observed; 12 isolates were not typed using the HRM10SNP method as one or more SNPs failed to be called by the HRM analysis (Additional file 3: Table S3). The iPLEX20SNP assay provided complete 20-SNP profiles for 471/506 isolates (type-ability = 93.1%) of which there were 147 distinct 20-SNP profiles; 35 isolates failed to provide complete 20-SNP profiles due to the iPLEX20SNP assay failing to characterise one or more SNPs (Additional file 3: Table S3). When the 147 complete 20-SNP profiles (471 isolates) from the iPLEX20SNP assay were used to predict a MLST type (based on the data provided in Additional file 2: Table S2), 124 of 147 (84.4%) profiles matched profiles obtained from the MLST website and there could provide a predicted MLST type or types. Twenty-three 20-SNP profiles from 28 isolates did not match with any of the listed 20-SNP profiles in Additional file 2: Table S2, and therefore a MLST type could not be predicted.

Overall, 470 isolates provided complete SNP profiles by both the HRM10SNP and iPLEX20SNP assays. Simpson’s Index of Diversity and adjusted Wallace coefficients between the HRM10SNP and iPLEX20SNP methods were calculated using these 470 isolates (Table 4). Simpson’s Index of Diversity of the iPLEX20SNP (0.947) was similar to that of the HRM10SNP method (0.944). However, when concordance between the assays was assessed using the adjusted Wallace coefficient, the iPLEX20SNP method (94.9%) was a better predictor of the HRM10SNP method than vice versa (89%). To investigate the latter further we identified all 10-SNP profiles that were further discriminated by the 20-SNP profiles (Additional file 4: Table S4); 34 HRM10SNP profiles were further distinguished into 101 20-SNP profiles using the iPLEX20SNP method. Of note, these involved 30 STs associated with CF strains of local or international importance (Additional file 4: Table S4). In contrast, there were only 11 iPLEX20SNP profiles that were further discriminated by the HRM10SNPassay (Additional file 3: Table S3).

Table 4 Number of types, Simpson’s index of diversity and adjusted Wallace coefficients for the HRM10SNP and iPLEX20SNP assays calculated from application to the 470 isolates providing complete SNP profiles by both methods

Given the high prevalence of AUST-01, AUST-02 and AUST-06 in Australia, and that the HRM10SNP assay was primarily designed to target these strains, we compared the ability of both assays to distinguish these strains. For isolates identified as AUST-01, AUST-02 or AUST-06 by either method (Additional file 3: Table S3), the results of the two methods were in agreement for 80/81 (98.8%), 48/49 (97.8%) and 11/12 (91.7%) isolates, respectively. Both isolates giving discrepant results for AUST-01 and AUST-06 were identified as AUST-01 or AUST-06 by the iPLEX20SNP method, but not by the HRM10SNP assay. For both of these isolates, their 10-SNP profiles by the HRM10SNP differed by only one SNP from the expected profiles of AUST-01 and AUST-06. Upon repeat testing in the HRM10SNP assay, both subsequently typed as AUST-01 and AUST-06, suggesting that there was a mistake in the original HRM10SNP testing. The discordant result for AUST-02 was associated with a different ST; one isolate was identified as AUST-02 by the HRM10SNP method, but differed by two SNPs from the expected 20-SNP profile for AUST-02 in the iPLEX20SNP assay (predicted MLST type of 778).

Discussion

The in-silico analyses of sequence data from the P. aeruginosa PubMLST website showed that more than half of recognised STs could be distinguished individually by the 20-SNP profile of the iPLEX20SNP assay. Furthermore, the recognised STs that were unable to be distinguished by this assay were typically single- or double-locus variants. Hence, theoretically the iPLEX20SNP method has considerable potential for broader-based MLST-focused studies of P. aeruginosa, here and elsewhere. As the iPLEX20SNP is also based on the Sequenom MassARRAY platform, it is particularly suitable for high-throughput investigations [14]. Using this technology up to 384 isolates can be tested within one working day for less than $AUD 10 per isolate [14], and is therefore quite favourable compared to other technologies. For example, for our 506 test isolates we estimate that classical DNA sequencing-based MLST would have cost approximately $AUS 60,720 ($AUS 120 per isolate), whereas costs for the HRM10SNP and iPLEX20SNP methods were approximately $AUS 10,120 ($AUS 20 per isolate)13 and $AUS 5,060 respectively.

Compared to the HRM10SNP, the iPLEX20SNP method clearly provided better discrimination when applied to the P. aeruginosa test isolates used in this study. Of note was that the HRM10SNP assay grouped numerous unrelated isolates, including STs of shared strains in the CF patient population, while the iPLEX20SNP method was able to distinguish between these isolates (Additional file 4: Table S4). This was likely due to the higher number of SNPs and that SNP selection for iPLEX20SNP was based on a large international MLST database. These observations provide experimental data to support the above in-silico analyses. Indeed, in the clinical context, attaining optimal discriminatory power is particularly important when trying to identify new or emerging shared P. aeruginosa strains in CF patients. Consequently, iPLEX20SNP is ideally suited for broader, investigatory studies of P. aeruginosa infected patients.

While the HRM10SNP lacked overall discriminatory power, it nevertheless proved to be well-suited for detecting AUST-01, AUST-02 and AUST-06 amongst P. aeruginosa isolates from a broad range of Australian CF clinics. AUST-01 and AUST-02 are the shared P. aeruginosa strains of greatest concern in Australia [15], and therefore simple methods for detecting these strains remain of local clinical and research interest. The one key benefit of the HRM10SNP method is that it is based on real-time PCR technology, which is now commonplace in most clinical microbiology laboratories. Hence, the HRM10SNP method may still be a useful diagnostic tool locally for laboratories with no access to specialised equipment such as the Sequenom MassARRAY platform.

Limitations in terms of typeability (i.e., the number of isolates providing complete SNP profiles) were observed, however, with 2.4% and 6.9% of isolates failing to give complete profiles in the HRM10SNP and the iPLEX20SNP assays respectively. Typically these problems are caused by poor isolate preparation (i.e., insufficient DNA) or otherwise sequence variation in primer targets [10, 14]. Given the sheer diversity amongst the P. aeruginosa MLST housekeeping genes, it is highly likely that sequence variation would account for a large proportion of the problems observed here. In any event, we do not see this as an important limitation affecting the broader utility of the assays given that other methods, such as DNA sequencing, could be applied if necessary to the small numbers of untypeable isolates.

Conclusions

In summary, molecular typing is an integral part of investigating the development and spread of shared P. aeruginosa strain genotypes in patients with CF. The iPLEX20SNP is a superior new method providing sufficient throughput and discriminatory power for broader SNP-based MLST-style investigations of P. aeruginosa, whereas the HRM10SNP method remains a convenient technique for screening CF clinical isolates for the current most commonly shared Australian P. aeruginosa strains and should be able to be performed by most clinical microbiology laboratories.

Availability of supporting data

The data sets supporting the results of this article are included within the article and its additional files.

Authors’ information

Melanie W Syrmis and Timothy J Kidd equal first authors.