Introduction

Human papillomavirus (HPV) is now understood to be necessary but insufficient for the development of cervical cancer [1]. There are more than 100 known types of HPV, of which over 40 infect the female genital tract. Of these, at least 15 are denoted as ‘high risk’ (HR) [2] for cervical cancer.

The recent development of vaccines against two (HPV types 16 and 18) [3, 4] or four (HPV types 16, 18, 6, and 11) [57] HPV types has highlighted the need for timely population-based HPV prevalence data for Canada. Such data can be used to estimate the expected effectiveness of these vaccines in reducing conditions and procedures arising from those HPV types. It can also establish a baseline from which to monitor potential changes in HPV prevalence and type distribution after uptake of a vaccine. Our objective was to establish baseline HPV prevalence and HPV type distribution in women who participated in the population-based cervical cytology screening program in British Columbia in 2004, to enable optimal public health decision-making regarding prevention of cervical cancer and related conditions.

Methods

Study population

The centralized cervical cancer screening program (CCSP) of BC has been operational since 1960. It processes every Pap smear done in BC at a single facility; all cytology results are stored in a single database. More than half a million women participate in the CCSP each year and over 70% of eligible women in BC are screened, on average, every 30 months.

Specimen collection and cytological interpretation

A flowchart summarizing sample collection and experiments is shown in Supplemental Online Figure A. The 8,700 samples used in this study were derived from a feasibility study of liquid-based smears collected by 99 high-volume smear-takers from different parts of BC within the CCSP between March and July 2004 [8]. The sample included women aged 13–86; median age was 38. About 98.2% of the smears in this study are from the cervix or endocervix; 1.8% from vaginal samples. Practitioners were instructed to obtain the sample from the transformation zone of the cervix using a Rovers® Cervex Brush. Swabs were placed in SurePath® media. TriPath Imaging Inc. equipment was used to process samples according to the manufacturer’s instructions. Cervical smears were interpreted by Canadian-registered CCSP cytotechnologists and the BC Cancer Agency-based cytopathologists. Cytological interpretation was reported using the British Society of Clinical Cytology terminology currently in use in BC. For this analysis, however, results were reclassified using the Bethesda system. Negative and benign changes were kept as originally categorized. Mild dyskariosis was classified as low-grade intraepithelial lesions (LGIL) of the squamous or glandular type; moderate or severe dyskariosis and suspicious smears were classified as high-grade intraepithelial lesions (HGIL) of the squamous or glandular type. Smears showing squamous (87.7%) and glandular (12.3%) abnormalities were not separated in our main analysis of LGIL or HGIL for simplicity of data presentation. Individual typing data has been separated by glandular or squamous type and is included in a separate table (Table 1). The categories of ASCUS and AGUS were not used.

Table 1 HPV type distribution according to cellular origin of abnormality, 95% CI shown in brackets

This study was approved by the joint Clinical Research Ethics Board of the BC Cancer Agency and the University of British Columbia. Use of specimens for this study was performed according to the ‘Secondary Use of Personal Information in Health Research: Case Studies’ (Canadian Institutes of Health Research, November 2002). Cytology results were recorded in the CCSP database. Each sample was assigned a study number, and the data including the age of the participant, geographic region of the smear taker, cytology result and previous screening history were attached to the study number. Subsequently, the remainder of each sample and the data were stripped of potential patient identifiers. The data and samples left over after cytology were then transferred to the Genome Sciences Centre at the BC Cancer Research Centre for HPV analysis.

Study sample selection

From the total study sample set of 8,700, forty samples were from repeat smears from the same women and were excluded, leaving 8,660 independent samples. PCR analysis was performed on 4,980 samples including all 614 cytologically abnormal samples and a random selection (every second sample by study number) of 4,366 normal and benign cytology smears. This sample showed a representative distribution to that of the remaining samples. Neither normal nor benign smears showed a statistically significant difference in age distribution or geographic location between selected and not selected smears. Age was tested using the t-test, and also using Mantel-Hanzel chi-square analysis with six age categories (<20, 20–29, 30–39, 40–49, 50–59 and 60+). Geographic location was tested using the chi-square test.

DNA extraction, quantification and quality control

The portion of each sample remaining after cytology (1–6 ml) was pelleted by centrifugation, re-suspended in 300 μl of phosphate-buffered saline, and stored at −80°C. DNA was extracted from 150 μl of thawed re-suspended cellular material using the PureGene DNA isolation kit (Gentra Systems, MN, USA). DNA samples were quantified by fluorometry and 10 ng aliquots arrayed in 96-well plates for PCR analysis. Plates were arrayed according to sample number and were not separated according to cytology. The β-globin gene primers were used to confirm the competence each DNA sample to support PCR. The percentage of samples that passed this quality control test is 96.8% (4,821 samples) samples that did not pass this test were not included in HPV testing (see Supplemental Online Figure A).

HPV testing and HPV type determination

Tagged GP5+/GP6+ consensus primers [9, 10] were used to detect HPV by amplifying a 150 bp sequence of the viral L1 gene from virtually any HPV type, and bi-directional sequencing was used to determine HPV type(s) present in each sample. The GP5+/6+ primers [9, 10] were modified by the addition of SeqA2 (GAATTCTCTAGATGATCAGCGGC) or Seq B2 (CGAACTTTATTCGGTCGAAAAGG) tags to their 5′ ends to simplify later sequencing. Testing of known HPV types mixed with genomic DNA demonstrated effectiveness of the tagged primers in detecting various HPV types. PCR analysis was carried out as previously described [9] with minimal changes (95°C 30 s, 40°C 1 min, 68°C 30 s for 40 cycles). An aliquot of each PCR product was separated on a 3% agarose gel for visualization. Samples that showed the expected 150 bp band were designated as HPV positive. Aliquots of PCR products from HPV positive samples were then re-arrayed into 96-well plates and purified by the AMPure magnetic bead system (Agencourt Bioscience Corporation, Beverly, Massachusetts, USA). Purified PCR products were bi-directionally sequenced using BigDye 3.1 at 1/24 chemistry and run on 3730xl capillary sequencers (Applied Biosystems, Foster City, California). Sequence traces that produced apparent multiple overlapping sequences were flagged as possible multiple infections (MI). PCR products from such samples were phosphorylated with polynucleotide kinase (New England BioLabs, MA, USA) and subcloned by blunt end ligation into pUC19. Sixteen clones of each putative MI were bi-directionally sequenced using the -21 M13 Forward (TGTAAAACGACGGCCAGT) and M13 Reverse (CAGGAAACAGCTATGAC) primers. Sequences were aligned to a database of all known HPV L1 sequences using local BLAST alignment, and the best match scored as a specific HPV type if it had greater than 95% similarity over more than 50 bases.

For this study types 16, 18, 26, 31, 33, 35, 39, 45, 51, 52, 53, 56, 58, 59, 66, 68, 73, and 82 were considered high-risk HPV types.

Statistical analyses

Statistical analyses were performed using the SAS package (SAS Institute Inc., Cary, NC, USA). All cytologically abnormal samples were HPV typed, but not all normal or benign samples were typed; it was, therefore, necessary to weight by cytology in the final prevalence analyses. Weighting was performed as follows. Column “Study Sample” in Tables 2 and 3 adjusts prevalence estimated from successful HPV testing to reflect cytology distribution in the study sample. The weight for each normal, benign, LGIL and HGIL is the proportion it constitutes of the study sample, divided by the proportion it constitutes of successful HPV tests. Multiply infected samples were defined as samples for which two or more HPV types were detected. Such samples were counted as a positive for one type of HPV and also included among positives for another or other types of HPV, in calculations of the prevalence of each HPV type.

Table 2 HPV Prevalence and type distribution, shown by cytology group, (95% CI shown in parentheses)
Table 3 Multiple infections, shown by cytology group (95% CI shown in parentheses)

The Cochrane–Armitage trend test was used on HPV prevalence rates by 5-year age groups shown in Figs. 1 and 2a.

Fig. 1
figure 1

Prevalence of individual HPV types and categories by cytology. Data are not adjusted for ages. 95% confidence intervals are shown

Fig. 2
figure 2

HPV prevalence by 5-year age strata. 95% Confidence intervals are shown. In BC, cervical cancer screening is not recommended for women over 70, so those over 70 in this sample represent a nonrandom set of women who likely presented with symptoms and for whom cytology was conducted as part of a diagnostic work-up

Results

Study participants are representative of the CCSP in terms of age, cytology and geographical distribution

Table 4 shows the age distribution of the study sample set compared to that of the entire CCSP in 2004, including 95% CI. While the difference in some categories is statistically significant, differences are small in practical terms. Furthermore, adjustment by age, by cytology or both has been included among HPV prevalence estimates. The age distribution is comparable, except that the study sample shows modest over-recruitment of women in the two youngest (<20 and 20–24 year old) age groups. The distribution of cytology is similar to that of the CCSP (Table 4). Of the 8,660 independent samples, 9 were unsatisfactory for interpretation. Of the interpretable samples, 65 showed high-grade intraepithelial lesions (HGIL, 0.8%), 549 showed low-grade intraepithelial lesions (LGIL, 6.3%), 413 showed benign changes (4.8%), and the remaining 7,624 were cytologically normal (88.1%) [8]. Table 4 also shows that the study sample is distributed, by health authority region, comparably to the CCSP.

Table 4 Comparison of study sample composition to the Cervical Cancer screening program in BC, data for 2004

Prevalence and type distribution

Table 2 summarizes both the overall HPV prevalence and type distribution by cytology, with 95% confidence intervals. Figure 1 illustrates these data graphically. The overall HPV prevalence of the study population, adjusted from the data for the 4,980 tested samples as described earlier, was 16.8%. Of them 13.9% were positive for high risk HPV, and 11.6% had the high risk types 16 or 18 that are targeted by vaccines. HPV prevalence increases with each more-severe cytological category. HPV16 is the most common type, found in 10.7% of samples. The HPV16 prevalence generally increases with the severity of the abnormalities that are precursors to cervical cancer; it is present in 8.7% of cytologically normal samples, 35.2% of LGIL and 52.4% of HGIL. Age-adjusted data can be found in Supplemental Online Table A. Adjusting for age slightly reduces the prevalence of HPV overall (to 15.5%) and in normal cytology (to 12.1%), as well as the overall prevalence of HPV16 (to 10.2%).

Figure 1 illustrates a striking difference between HR and LR HPV types. HR types (16, 18 or all HR types as a group) show higher prevalence in samples with HGIL than in those with LGIL, whereas the reverse is true for LR types (6, 11, all LR types together). This is consistent with the findings that LR types are less likely to be associated with progression to cervical cancer.

Figure 2a, b shows HPV prevalence and type distribution by age. Overall HPV positivity, and both high risk and low risk types are most prevalent in women under age 20, with decreasing prevalence seen up to approximately age 60. Trend was highly significant for any HPV type, any high risk type, and low risk type, at p < 0.0001 for each, significant for HPV 16/18 at p = 0.0004 and HPV 16 at p = 0.0153, and not significant for HPV 18 at p = 0.2217.

Table 3 shows the rates of multiple infections (MI) involving different combinations of HPV types, by cytology; Supplemental Online Table B lists age-adjusted MI rates. While the percentage of samples that have MI increases with the severity of the cytological abnormality, dividing the MI rate by the percentage of HPV positives in each category illustrates that the percentage of HPV positive samples that have MI decreases with increasing severity of the lesions (from 39.8% in normal samples to 19.8% in LGIL and 17.2% in HGIL). HGIL have a higher percentage of MI only because they have more HPV; the HPV infections they have are more likely to be single HPV types.

Discussion

This study provides an estimate of the prevalence of HPV in women participating in routine cytology screening in BC. This is the largest typing study of its kind in Canada to date and one of the largest single-center studies worldwide. While the HPV prevalence of screened women is not necessarily equivalent to that of all women [11] in BC, the high participation rate of the CCSP (70% [12]) argues that it provides a good estimate for the province. The use of direct sequencing theoretically allows the detection of all known HPV types and provides a level of detail not attainable using existing hybridization probe sets or the Digene Hybrid Capture 2 system.

Several international studies have examined the prevalence of HPV in women. The diversity of population samples, sample media and HPV typing methods make it difficult to identify studies that are exactly comparable to each other. It is not surprising that our HPV type distribution differs from that of a large US study based on self-sampled vaginal swabs [13]. Low-risk HPV types that are more prevalent in the vagina and vulva [14] will not be well represented in our samples, as these are almost exclusively cervical smears. It is also reported that self-collected vaginal sampling methods are generally less sensitive than cervical smears for the detection of HPV [15, 16]. Overall HPV infection rates in population-based studies where all women were included found HPV prevalence rates from 2% in Hanoi, Vietnam [17], to 40% in Mozambique [18]. Our overall HPV infection rate was 16.8%, close to that of an Ontario study (13.3%) [19]. A recent large study in The Netherlands typing high-risk HPV in the population found a rate of 5.6% [20] and shows a similar trend for age as observed in our study (Fig. 2a, b). Our prevalence is also at a similar level to that seen in a recent, large meta-analysis for Asian women at 14.4% for cytologically normal samples [21]. Prevalence of HR HPV ranged from 4.4% [22] to almost 20% [23] in these studies, in keeping with our rate of 13.9%. Similarly to many other studies, HPV16 was the most common high-risk type in BC. We found a higher prevalence of HPV16 (10.7%) than other studies, which showed less than 1% to just over 5%. Several studies [17, 2228] used GP5+/6+ primers, but detected HPV types by hybridization followed by enzyme-based immunoassay. Our method subjects the PCR products to an additional, albeit linear, amplification in the sequencing reaction, likely enhancing the sensitivity of detection. Comparison of cycle sequencing, line blotting and hybrid capture showed that sequencing is the most sensitive [29].

HPV positivity increases, as expected, from normal (12.3%), to benign (19.6%), to LGIL (69.3%), to HGIL (81.0%). The trend for HPV 16 also makes sense, going from normal (8.7%), benign (7.6%), LGIL (35.2%) and HGIL (52.4%). The relative proportion of HPV16 (HPV16/totalHPV), however, is unexpectedly high in normal samples (71%), when compared to benign (39%), LGIL (51%) and HGIL (65%). It is unlikely that contamination could account for this difference, because all samples were processed on multi-well plates and were not separated according to cytology. We propose that there is a real biological explanation for this observation that likely relates to the sensitivity of PCR and sequencing to detect HPV. We may be detecting transient, sub-clinical HPV exposures in addition to overt HPV16 infections that would be detected with less sensitive techniques.

We found that 33% of HPV positive samples contained multiple HPV types, within the range of 12–62% seen in other studies [28, 30]. Direct sequencing may underestimate the MI rate; however, our conservative over selection of potential MIs (all sequence traces with any sign of mixed types were subcloned) should compensate for this. Our higher observed prevalence of HPV16 is not likely to be a result of our intensive characterization of MI samples.

The percentage of HPV positive samples that had multiple infections was higher in the cytologically normal HPV positive samples, possibly reflecting clonal outgrowth of cells infected with a single HPV type in the pre-cancerous lesions. This may imply that multiple types of HPV simultaneously infect the same woman but not necessarily the same individual cells, or it may reflect the preferential persistence of one HPV type. Multiple infections may be more recent infections that have had less time for one or some of the types involved to be cleared.

We did not exclude women who were tested as a follow-up to a previous abnormal smear. The smear-takers were high-volume sites; this could bias toward young sexually active women who are seeking birth control. Compared to the CCSP in 2004, our sample has a higher proportion of young women, who would be more likely to have current HPV infections. Sellors and colleagues showed a lower rate (9.6% for high-risk HPV) in older women than in younger women [31]; we also observe this trend. Thus, our study could slightly over estimate the prevalence of HPV infection relative to the general female population of BC.

Differences between recruitment methods can complicate direct comparison of our findings to those of other Canadian studies [11, 19, 32, 33]. An Ontario study [19] using Digene Hybrid Capture 2 and PCR showed a prevalence range of 25–6% depending on age; Montreal University students had an HPV rate of 29% [33], similar to the prevalence we observed in this age group. A recent international analysis by IARC [28] illustrates the differences in type distribution in different countries. We detect HPV90, but this type was not included in the probe set used by IARC [28]. Types seen more commonly in Asian countries (such as 51, 52 and 58) were not significantly increased in BC, despite its large Asian population. Comparison to worldwide data [28] demonstrates our ability to detect most or all known HPV types, despite the tendency of GP5+/6+ primer set to underestimate HPV 52 [34].

Prophylactic vaccines are now available against HR HPV types 16 and 18. Efficacy evaluations to date for these vaccines show 100% protection against development of LGIL and HGIL associated with the HPV types targeted. A minimum estimate of the impact of a vaccine protecting against HPV16 and HPV18 would be the proportion of a lesion for which representative samples are positive for 16, 18, or 16 and 18, but not other HR types of HPV. In our data, the proportion of LGIL and HGIL samples that meet this criterion is 29.9% for LGIL and 55.6% for HGIL (data not shown). Conservatively, we predict that vaccinating against HPV16 and 18 would, in an effectively vaccinated group, prevent the development of one-third of LGIL and more than half of HGIL, and an even larger proportion of cervical cancer. If these estimates are expressed as a percentage of those LGIL and HGIL that had detectable HR HPV, a likely more realistic estimate (57.2% of LGIL and 70.0% of HGIL) of the percentage of these lesions that are attributable to vaccine-related HR types is obtained. Including additional HPV types in future vaccines (such as 56 and 90 in BC) would further increase the percentage of cervical lesions prevented. These data provide a baseline from which to monitor changes in HPV prevalence that result from future use of HPV vaccines in BC and may inform the use of HPV testing as a first-line screening alternative to cytology.