Background

Cervical cancer (CC) remains a leading cause of gynaecological cancer-related mortality worldwide and constitutes the second most common malignancy in women.1 Although patients with CC exhibit differences in clinical behaviour, infection by high-risk human papilloma virus (HPV) remains an important initiating event in CC tumorigenesis,2 and one of the most important risk factors for developing CC.3 Most HPV infections are cleared spontaneously by the immune system, yet in some cases, it persists leading to cancer.4 Following infection, the virus can remain in its episomal form, or become integrated into the host genome. Both patterns may be present jointly (episomal/integrated).5 It is thought that the longer half-life of integrated viral transcripts compared to half-life of episomal transcripts favours cellular immortalisation and transformation into cancer cells while also providing a selective growth advantage.6 Most often, the integration of HPV DNA leads to a breakpoint in the E2 gene, resulting in de-repression of the E6 and E7 viral oncogenes. When the virus remains episomal, expression of E6 and E7 proteins may result from leaky expression or epigenetics dysregulation. E6 and E7 proteins impact the function of p53 and pRb proteins, allowing squamous cell tumorigenesis.6

Several mechanisms of integration have been reported in the literature; the “looping” model of HPV integration following DNA replication and recombination (resulting in DNA concatemers)7 is the most widely accepted but not experimentally reconstituted. HPV DNA integration into the human genome triggers various genetic alterations, such as oncogenes amplification, tumour suppressor gene inactivation, inter- or intra- chromosomal rearrangements as well as genetic instability.6,8 Genes localised near the integration sites of viral genomes can experience changes in RNA and protein expression levels, leading to over- or under-expression. In 2015, whole-genome sequencing and high-throughput viral integration methods identified as many as 3667 HPV integration breakpoints in cervical neoplastic lesions. Frequent integration sites have been reported in genes relevant to the neoplastic process, such as the MYC oncogene.9 Loss of function (LOF) in the RAD51B tumour suppressor gene following HPV DNA insertion was reported to affect the DNA repair pathway and genomic instability in CC.10

HPV DNA integration occurs as a single copy or in multiple repeats (in tandem or dispersed).11 In 2016, Holmes et al. developed a Capture HPV method to identify five different HPV signatures in 72 CC. The first two signatures contain two hybrid chromosomal–HPV junctions which are co-linear (2 Junctions Colinear “2J-COL”) or non-linear (2 Junctions Non-Linear “2J-NL”) depending on their relative orientations. It reflects two modes of viral integration, associated with chromosomal deletion or amplification events, respectively. The third and fourth signatures exhibit several hybrid junctions either clustered in one chromosomal region (Multiple Junctions Clustered “MJ-CL”) or scattered at distinct loci (Multiple Junctions Scattered “MJ-SC”) while the fifth signature consists of episomal forms of HPV (EPI).12

On the assumption that HPV integration types/signatures/pattern might predict clinical outcomes, we analysed the association between the different viral integration signatures, clinical and pathological parameters and outcome in the large cohort of 272 HPV-positive CC patients enrolled in the prospective BioRAIDs study [NCT02428842].

Methods

Patients and samples

Patients included in this study were enrolled in the EU-funded RAIDs Network (Rational Molecular Assessment and Innovative Drug Selection, www.raids-fp7.eu) prospective CC BioRAIDs study [NCT02428842]. The clinical protocol together with tumour sampling procedures, quality control of samples and treatment in 18 European centres (seven European countries) as well as study results have been previously published.13,14,15

HPV typing

All samples included in this study were analysed for HPV type, using the SPF10 primer set and INNO-LiPA HPV genotyping extra line probe assay (Fujirebio Europe, Gent, Belgium) according to the manufacturers’ protocol. For DNA isolation, one to five 10 μm tissue sections were cut depending on the size of the tumour biopsy. DNA was isolated using the automated Tissue Preparation System (Siemens Healthcare Diagnostics, NY, USA).

PIK3CA mutation detection

A mutational analysis of the PIK3CA gene had been previously carried out on all tumour samples.15 In summary, paired-end whole-exome sequencing was performed on a HiSeq2500 platform, with an Agilent SureSelectXT Human. The sequencing was performed to reach an average depth of coverage of at least 80× per sample. Dedicated filtering strategies were applied to somatic variants depending on their functional impact per gene category: oncogene or tumour suppressor gene or uncharacterised. For oncogenes as PIK3CA, hotspot missense mutations known in the COSMIC database were considered. Among the 87 PIK3CA mutations, three patients had an H1047R mutation (exon 20) and 84 patients had a E452K/E545K mutation (exon 9).

DNA library preparation

The DNA libraries were prepared using 500 ng of genomic DNA (extracted from frozen tissue), starting with ultra-sonication (Covaris) to produce double-strand DNA fragments of approximately 280 bp. End-Repair and A-tailing were applied to facilitate ligation of the adapters, containing unique barcodes for each sample, specific to the Illumina technology for amplification and sequencing. KAPA Hyper Prep kit was used, according to the manufacturer’s instructions.

HPV double capture method

The double capture method was carried out using the SeqCap EZ Rapid Library Small Target Capture method, developed by Roche, which is adapted to capture small DNA targets. The DNA libraries were multiplexed (by 12) and hybridised for 16 h with the biotinylated HPV oligonucleotide probes, recognising all HPV genotypes. The DNA sequences were then captured by streptavidin beads and amplified by PCR. We performed a double capture (i.e. two rounds of hybridisation and capture) to improve the efficiency and specificity. Post-capture libraries were sequenced using Illumina MiSeq system (Illumina, San Diego, CA, USA), in paired-end 150, with 24 samples multiplexed on a V2 micro flow-cell.

The HPV copy number shows the abundance of the target relative to the endogenous control (KLK3) in order to normalise the starting amount and quality of genomic DNA. Similar results were obtained with other endogenous diploid controls (GAPDH, RAB7A).

Bioinformatics analyses

In order to analyse our HPV capture data, we set up a new bioinformatics pipeline called nf-VIF available at https://github.com/bioinfo-pf-curie/nf-vif/, which implements the methods we already described in Holmes et al. Briefly nf-VIF performs (i) quality controls and cleaning of raw sequencing Illumina data, (ii) HPV genotyping, and (iii) the detection of the HPV insertion sites within the human genome. Nf-VIF is implemented through the Nextflow workflow management system, ensuring a high portability, reproducibility, and scalability (see Supplementary materials for details).

Statistical analysis

The correlations between HPV integration signatures and clinical, biological and molecular features were analysed using chi-square tests, chi-square tests with Yates’ correction or Fisher’s exact tests, as appropriate. Progression-free survival (PFS) was defined as the time interval from the date of CC diagnosis to progression. Survival data were censored on the date of last follow-up. To visualise the efficacy of a molecular marker (i.e., HPV copy number) to discriminate two populations (patients who progressed) in the absence of an arbitrary cut-off value, data were summarised in an ROC curve.16,17 The AUC (area under curve) was calculated as a single measure to discriminate efficacy. Survival curves were estimated by the Kaplan−Meier method, and compared using the log-rank test. For all statistical tests, significance level was defined as p < 0.05.

Results

Patient characteristics

Clinical, histological, biological (including PIK3CA mutational status) and outcome of the 272 HPV-positive CC patients from the BioRAIDs European study are presented in Table 1. All samples were obtained prior to treatment. Median PFS of the whole cohort was 20.15 months. Fifty-four (20%) patients were treated with upfront surgery, 42 (15%) patients with neoadjuvant chemotherapy and 176 (65%) patients with external beam radiation therapy with concomitant platinum-based chemotherapy. The majority of patients (230 patients corresponding to 85%) had squamous cell carcinoma. Classical prognostic biomarkers such as FIGO stage (2018) and presence of lymph nodes (FIGO III/IV) correlated with PFS in the study population (Table 1).

Table 1 Clinical and biological characteristics of 272 patients with HPV-positive cervical cancer, in relation to progression-free survival.

HPV16 was the most common genotype (n = 155, 57%) followed by HPV18 (n = 36, 13%) and HPV45 (n = 27, 10%) (Table 1). Eighty-seven patients (32%) harboured a PIK3CA mutation, which on its own did not correlate to PFS in this subpopulation.

Integration mechanisms

The breakpoints identified on the HPV genome and HPV statuses are reported in Table 2 and Supplementary Table 1. In the absence of integration (n = 33, 12%), no HPV-chromosomal breakpoint was observed and the viral genome persisted in an episomal form (EPI). Five HPV integration patterns were observed: 2J-COL (n = 30, 11%), 2J-NL (n = 53, 20%), 2J (n = 34, 12%), MJ-CL (n = 27, 10%), MJ-SC (n = 95, 35%). The BioRAIDs CC series differed significantly from that of HPV-positive anal carcinoma (p < 0.0001) (Fig. 1) recently published by our team17 in that episomal HPV was much less frequent in CC as compared to anal carcinoma, while “2J” signatures (2J and 2J-NL and 2J-CPL) were more often represented in CC. The results were similar in HPV16-positive cancers that represent the majority of the subtypes in both cervical and anal cancers (Supplementary Fig. 1).

Table 2 Relationship between mechanisms of integration of HPV and clinical, biological and pathological characteristics of the 272 patients with HPV-positive cervical cancer.
Fig. 1: Distribution of the HPV integration signatures according to the location of HPV-positive squamous cell carcinomas.
figure 1

a Cervical cancer; b anal cancer; p < 0.0001. 2J-COL 2 Junctions Colinear, 2J-NL 2 Junctions Non-Linear, MJ-CL Multiple Junctions Clustered, MJ-SC Multiple Junctions Scattered, EPI episomal and 2J 2 Junctions.

Interestingly, coinfections were observed in 12 CC patients (Supplementary Table 2). These tumours presented unique integration site per HPV genotype, where for each case the HPV breakpoints are different.

Most frequent HPV integration sites

We identified >300 different HPV-chromosomal junctions (inter- or intra-genic) (Fig. 2 and Supplementary Table 3). The most frequent integration site was in the MACROD2 gene (n = 7) (Supplementary Fig. 2) followed by the MIPOL1/TTC6 (n = 5), TP63 (n = 5), and several others such as ERBB2 (two sites); KLF12, and RAD51B with a single site (Fig. 2 and Supplementary Table 3). The two tumours with ERBB2 integration sites were whole-exome sequenced and both showed ERBB2 amplifications.15

Fig. 2
figure 2

Distribution of HPV insertion sites in the genome of patients with HPV-positive CC. Each dot represents an HPV integration site.

Association between HPV insertion mechanisms with clinical and biological parameters

The distribution of HPV integration signatures according to clinical, biological and pathological characteristics is presented in Table 2 and Supplementary Table 1. While episomal forms were more frequent in PIK3CA mutated tumours (p = 0.023), HPV integration signatures were not associated with histological subtype, with FIGO stage/lymph nodes (presently FIGO stage 3), or treatment assignation but they were associated with HPV genotype status (p < 0.0001). HPV18 and HPV45 genotypes were always integrated (most frequently as 2J). Multiple (MJ) viral integration signatures were predominant in HPV16-positive samples (n = 86/155, 57%) as compared to other HPV genotypes (n = 36/117; 31%) (Table 2; p < 0.0001).

Association between the insertion mechanisms and the progression-free survival

There was no significant correlation between the HPV integration signatures (EPI, 2J and MJ) and the PFS (Supplementary Fig. 3a, 3b). Similarly, there was no significant association between the HPV integration signatures and the PFS in the subgroup of HPV16-positive patients (data not shown).

The most frequent integration site was in the MACROD2 gene (n = 7) (Supplementary Fig. 2). Patients with HPV integration sites into the MACROD2 gene (introns 5, 6 and 7) did not have a significantly poorer outcome but the numbers are insufficient to draw any conclusions (p = 0.38, Supplementary Fig. 3c). In an exploratory study, interestingly, patients harbouring several viral types did not seem to do worse as compared to patients with single viral infections (Supplementary Fig. 3d), but this did not reach statistical significance (p = 0.09).

Comparison of HPV copy number to HPV subtypes, insertion patterns and outcome

The HPV copy number was estimated by the ratio of the number of HPV reads over the control human gene KLK3. The optimal cut-off was four (as determined in the “Methods” section). Patients were classified into low (ratio < 4, n = 145) vs. high HPV copy number (ratio ≥ 4, n = 127). HPV16-positive patients consistently had a higher HPV copy number (n = 95/155, 61%) (p < 0.0001) as compared to patients with other HPV subtypes (n = 32/117, 27%) (Table 3). Samples with 2J type insertions displayed a low HPV copy number while MJ type insertions were associated with a high HPV copy number (p < 0.0001). Furthermore, patients with a low HPV copy number showed poor outcome in comparison to patients with a high HPV copy number (p = 0.011) (Fig. 3).

Table 3 Clinical and biological characteristics of 272 HPV-positive cervical cancer, in relation to HPV copy number.
Fig. 3
figure 3

Progression-free survival of the 272 HPV-positive cervical cancer patients according to HPV copy number.

Discussion

In this CC patient population from the prospective BioRAIDs study, we were able to identify >300 HPV-chromosomal (inter-genic or intra-genic) junctions; the MACROD2 gene being the most frequent integration site (n = 7), followed by MIPOL1/TTC6 (n = 5) and TP63 (n = 5). Interestingly, our data identified a new CC-related recurrent integration site in the MACROD2 (mono-ADP-ribosylhydrolase) gene. Non-coding and structural mutations/variations in the germline MACROD2 gene have been associated with psychiatric disorders, obesity and cancer predisposition.18,19,20 Deletions in the MACROD2 gene are frequent in colorectal cancer21,22 and are reported to alter DNA repair and sensitivity to DNA damage and consequently impact colorectal tumorigenesis.23 Neither RNA expression nor functional studies support a tumour suppressor role of MACROD2 gene. This gene spans more than 2 Mb and constitutes a common fragile site contributing to increased genomic instability.24,25 Our results report intronic integration sites in the MACROD2 gene yet there is still lack of evidence concerning the functional consequence of these intronic integrations within MACROD2. Functional analyses are not straightforward due to the high rate of splicing in MACROD2 and the important number of alternative transcripts (coding and non-coding) of variable size. MACROD2 deletions and haploinsufficiency were linked to impaired PARP1 activity and chromosomal instability in colorectal cancer26 and in liver cancer,27 suggesting a tumour suppressing function of this gene. Importantly, the present study identifies HPV integration as a new molecular pattern of MACROD2 alteration likely causing loss of function, but the seven patients in our cohort with HPV integration in the MACROD2 gene are presently insufficient to discern a meaningful impact on CC evolution, albeit responsible for genomic instability.

Previously, frequent integrations in other SCCs were reported in the MYC, TMEM49, FANCC and RAD51B genes28,29,30 as well as in the following: POU5F1B, FHIT, KLF12, KLF5, HMGA2, LRP1B, LEPREL1, DLG2 and SEMA3D. Slightly less common integration sites were reported in the following genes: AGTR2, DMD, CDH7, DCC, HS3ST4, CPNE8, C9orf85, MSX2 and CADM2.9 Several of these previously reported integration sites into genes such as FHIT, KLF12, RAD51B were detected in a single or in two patients of the present CC cohort. HPV integration in MIPOL1/TTC6 and TP63 genes were reported in five patients each. Concordant with our results, Parfenov et al. reported in a head and neck squamous cell carcinomas a rearrangement between chromosomes 3 and 13 close to the HPV integration site in a non-coding region but involved in a region of chromosome 3 where TP63 genes are located.31 P63 plays a key role in epidermal keratinocyte proliferation and differentiation and is a master regulator of gene expression pattern and epigenetic landscape that define epidermal fate.32 TP63-driven enhancer reprogramming promotes aggressive tumour phenotypes in primary pancreatic ductal adenocarcinomas.33 HPV integration in TP63 genes was recently reported in HPV-positive vulvar cancer patients.34 In another HPV-positive head and neck squamous cell carcinoma study, HPV sites of integration into MIPOL1/TTC6 were identified in more than one tumour sample. The integration of HPV into the ERBB2 gene site was observed in two patients in association with ERBB2 amplifications, in concordance with previous reports.10

Twelve percent of CC patients did not display any HPV integration, while 43% had double junctions and 49% multiple junctions’ signatures. The distribution of HPV signatures in our CC cohort differed from that previously described in HPV-positive anal squamous cell carcinoma with a lower rate of episomal HPV as compared to anal cancer (45%).17

No significant association was observed between HPV integration signatures and treatment type, histological subtype or FIGO staging. MJ viral integration signatures were predominant in HPV16-positive samples and tumours with viral integration (2J or MJ) had less frequent activating mutations in PIK3CA than those harbouring episomal HPV, confirming previously reported data.12 Similar results were also observed when considering only HPV16 patients (data not shown). This is in accordance with the literature where HPV integration is reported to provide a selective growth advantage of cancer cells.6 CC patients with a high HPV copy number had significantly better PFS, as compared to patients with low HPV copy number. These results are consistent with other reports in the literature.35,36

In conclusion, while HPV integration is thought to be a random event, our results point out that some hotspots may impact cancer evolution. This would need analyses in larger aggregated datasets. The episomal form of HPV was less frequent in cervical carcinoma as compared to another genital carcinoma (anal carcinoma) and its presence was significantly associated with high HPV copy number, suggesting a decrease of viral replication upon integration. Mutations in PIK3CA were significantly associated with high HPV copy number and with the episomal form of HPV. The analysis of outcome based on PIK3CA alone did not show an association with poor outcome. In a prior analysis of the BioRAIDS dataset, the association of PIK3CA with epigenetic alterations was associated with a shorter PFS.15

To our knowledge, this is the first study assessing the prognostic value of HPV integration in a prospectively annotated patient cohort and reporting an HPV integration at the MACROD2 gene, known to be implicated in impaired PARP1 activity and chromosome instability.