Characteristics of HPV integration in cervical adenocarcinoma and squamous carcinoma

Purpose HPV integration usually occurs in HPV-related cancer, and is the main cause of cancer. But the carcinogenic mechanism of HPV integration is unclear. The study aims to provide a theoretical basis for understanding the pathogenesis of cervical adenocarcinoma (AC) and cervical squamous carcinoma (SCC). Methods We used HPV capture sequencing to obtain HPV integration sites in AC and SCC, and analyzed cytobands, distribution of genetic and genomic elements, identified integration hotspot genes, clinicopathological parameters, breakpoints of HPV16 and performed pathway analysis. Then we conducted immunohistochemical (IHC) assay to preliminarily verify the expression of most frequently integrated genes in AC, STARD3 and ERBB2. Results The results revealed that the most frequently observed integrated cytoband was 17q12 in AC and 21p11.2 in SCC, respectively. The breakpoints in both AC and SCC were more tended to occur within gene regions, compared to intergenetic regions. Compared to SCC samples, AC samples had a higher prevalence of genomic elements. In AC, HPV integration has no significantly difference with clinicopathological parameters, but in SCC integration correlated with differentiation (P < 0.05). Breakpoints of HPV in SCC located in LCR more frequently compared to AC, which destroyed the activation of promoter p97. Hotspot genes of HPV integration were STARD3 and ERBB2 in AC, and RNA45S rDNA and MIR3648-1 in SCC, respectively. Meanwhile, we preliminarily proved that the expression of STARD3 and ERBB2, the most frequently integrated genes, would increase after integration. Conclusion These results suggested that HPV may utilize the powerful hosts’ promoters to express viral oncogenes and overexpression of viral oncogenes plays a significant role in the carcinogenesis of SCC. In AC, HPV integration may affect hosts’ oncogenes, and the dysregulation of oncogenes may primarily contribute to progression of AC. Supplementary Information The online version contains supplementary material available at 10.1007/s00432-023-05494-4.


Introduction
Cervical cancer is a common gynecological tumor [1] , with approximately 604,000 new cases and 342,000 deaths reported worldwide in 2020 [2] .Squamous carcinoma (SCC) and adenocarcinoma (AC) are the two main pathological types of cervical cancer.With widespread cervical cancer screening, the mortality of SCC has signi cantly decreased.However, the mortality of AC remains high because of low detection sensitivity, young age of onset, low survival rate, and high rate of metastasis [3][4][5] .
To our knowledge, persistence of HPV infection is the main cause of cervical cancer.HPV is a small nonenveloped double-stranded DNA virus [6] that establishes its genome as episomes in proliferating basal cells [7] .
The expression of viral oncoproteins E6 and E7 promotes the progression of cervical cancer.Although HPV integration is critical, the carcinogenic mechanism of integration is unclear.Studies have reported that HPV integration sites tend to be located in repetitive regions of the host genome, with some sites near cancer-relevant genes [8] , and identi ed integration hotspot genes, including POU5F1B, FHIT, and KLF12 [9] .Kamal et al. studied the pattern of HPV integration in SCC and AC and reported that HPV integration signatures are not associated with histological subtypes [10] .Because the number of AC cases is limited, most studies on HPV integration have focused on SCC [8,9,11,12] ; however, none have evaluated the discrepancy in integration features between SCC and AC.We systematically and comprehensively studied the differences of HPV integration characteristics between SCC and AC.The aim of this study is to understand HPV integration-mediated carcinogenesis in AC and SCC, and to provide a useful information for researches on treatment and screen of cervical cancer.

Sample collection
A total of 66 AC samples were collected from the Maternal and Child Health Hospital of Hubei Province between 2015 and 2020.The clinicopathological parameters of AC samples included age, HPV type, FIGO stage, differentiation, and lymph node metastasis.
HPV capture sequencing HPV capture sequencing was employed to identify breakpoints of the virus and host.Integrated sites with reads ≥ 7 were deemed valid integration sites.The sequencing data for 87 cases of SCC, sequenced using the same method, were provided by Tongji Hospital, which is a liated with the Tongji Medical College of Huazhong University of Science and Technology (doi:10.1038/ng.3178).

Analysis of integration sites
The breakpoints were annotated using the latest version of ANNOVAR in hg38 coordinates.Annotated genes with frequency ≥ 3 were de ned as HPV integration hotspots.An integration site was con rmed to be adjacent to a genomic element when the element overlapped with the anking region (200 bp) [9] .Cytobands, transcription factor binding sites (TFBSs), repetitive regions, and CpG islands were downloaded from UCSC (http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database).DNaseI hypersensitivity sites (DHSs), methylated or acetylated histone binding sites, and chromatin state of HeLa-S3 cells were downloaded from ENCODE (https://www.encodeproject.org).Functional annotation analysis of breakpoints was performed using R packages, and a heatmap was plotted using an online platform for data analysis and visualization (https://www.bioinformatics.com.cn).Protein-protein interaction network analysis (PPI) were constructed using the online tool STRING (https://www.string-db.org/), by setting medium con dence at 0.400 and excluding irrelevant genes.

Generation of random breakpoints
Using R, 1000 random breakpoints were generated within the chromosome, serving as a random control.SPSS 25.0 was used for statistical analysis.P-values were calculated using the χ 2 test and corrected using Fisher's exact test.
Statistical analysis SPSS 25.0 was used for statistical analysis.P-values were calculated using the χ 2 test and corrected using Fisher's exact test.

Distribution of breakpoints in exonic, intronic, and intergenic regions
We have explored the tendency of HPV integration for genes, but the locational relationship between integration sites and known human genes is unclear.In AC, we identi ed 169 valid integration sites, including 4 (2.4%) in exons, 101 (59.8%) in introns, and 64 (37.8%) in intergenic regions.In SCC, we identi ed 685 effective integration sites, including 29 (4.2%) in exons, 348 (50.8%) in introns, and 308 (45%) in intergenic regions.The results indicated that compared to the randomized control, both genomes of AC, and SCC showed a tendency to be integrated within genes, with integration sites in SCC more likely to be in exons.(Fig. 3, **P < 0.01)

Distribution of breakpoints in genomic elements
Next, we examined the distribution of breakpoints for AC and SCC in several genomic features, including repetitive elements, chromatin genome segmentations, methylated or acetylated histone binding sites, CpG enriched regions, and DHSs.This helps understand common molecular features that may lead to HPV integration into the human genome [13] and cause the discrepancy between AC and SCC.ANNOVAR was used to perform region-based annotation.Repetitive elements are necessary for accurate genome replication and transmission to progeny cells [14] .In this study, the integration sites in AC tended to locate in simple repeat elements (18.8%), whereas those in SCC showed a higher tendency for Alu elements (27.9%).In AC, breakpoints occurred less frequently in LINE elements (21.4%) than RANDOM (**P < 0.01, Fig. 4A).
Open chromatin regions, which are associated with transcriptional regulation, provide insights into genome function.Based on ENCODE chromatin genome segmentation information of HeLa-S3, the integration points of AC tended to located more in enhancers (13.9%) than RANDOM (**P < 0.01).Compared to the randomized control, HPV tended to integrate in transcription start site anking regions (frequency rate = 3.5% and 4.7% for SCC and AC, respectively) and Tx (frequency rate = 18.7% and 24.3% for SCC and AC, respectively) (Fig. 4B).
Additionally, we examined other important genomic features, such as CpG enriched regions, TFBSs, and DHSs.
CpG islands help identify promoters.TFBSs play a crucial role in regulating transcription [17] .DHSs represent the susceptibility of chromatin [18] .In our study, integration events in AC showed a preference for DHSs (69.2%) and TFBSs (60.9%), whereas in SCC, breakpoints tended to locate in CpG islands (3.6%).This indicated that in AC, HPV has a higher propensity for integration in regulatory regions compared to SCC.
To analyze the relationship between these genomics elements and HPV integration sites with great con dence, we examined the co-occurrence of integration sites in DHSs and other genomic elements in AC and SCC (Table 1).
We found that in AC, breakpoints were more frequently co-occurred in DHSs and enhancers (11.8%) or TFBS (46.7%) compared to SCC(**P < 0.01).This suggests a strong association between breakpoints in AC and these elements.Both AC and SCC breakpoints showed a tendency to co-occurrence in Tss (Frequency rate = 3.5% and 4.7% for SCC and AC, respectively), but the difference between AC and SCC was not statistically signi cant.

Clinical characterization
To nd out the relationship between clinicopathological parameters of AC and SCC, and HPV integration, we analyzed the characteristic of samples.A total of 66 AC samples were collected, of which 11 had invalid integration sites (frequency ≤ 3).HPV integration did not occur in two samples; therefore, 53 samples had valid breakpoints.The results revealed that many variables, including age, HPV type, FIGO stage, differentiation, and lymph node metastasis have no relation with HPV integration (Supplementary data, Table .S2)

Pathway analysis
To clarify the function of genes affected by HPV integration in AC and SCC, we used GO Pathway and PPI network analysis.GO analysis of hotspot genes revealed signi cant differences in enriched pathways between AC and SCC (Fig. 5A).Transmembrane receptor protein kinase pathway was enriched for AC; cell junction and cell-cell adhesion pathway were enriched for SCC.These results revealed that there was signi cant difference between the enrichment pathway of AC and SCC.To understand the interaction of corresponding genes, the PPI network were constructed.Because most hotspots genes (frequency ≥ 3) of SCC were noncoding genes, we did not examined the interaction in SCC. 150 genes affected by HPV integration in AC samples were analyzed, and 115 genes were included in the network after the elimination of disconnected genes.The network was shown in Fig. 5B.The results indicated that in AC, genes ERBB2 and MYC were closely related to other affected genes by integration.

Discussion
To our knowledge, HPV is the main driver of cervical cancer and plays a signi cant role in the progression of cancers through various mechanisms.First, expression of the E6 and E7 oncogenes, that target p53 and Rb family proteins respectively, and lead to their degradation, results in uncontrolled proliferation [19,20] .In addition, HPV integration induces structural alterations and instability in the host genome [21] .These alterations may include gene ampli cation, deletion, extrachromosomal circular DNA formation, and focal rearrangement [21] .This study provides insights into HPV integration-mediated carcinogenesis in cervical cancer by analyzing the characteristics of HPV integration in SCC and AC from different genetic perspectives.
We examined the cytobands of HPV integration in AC and SCC samples.In AC, 8q24.21 was the most frequently integrated bands.This result agree with previous studies [9,13,22].Among 22 integration sites on band 8q24.21,7 were located within a dis-tance of 2kb to 460kb upstream of the MYC gene and 15 were within a distance of 13kb to 366kb downstream of the MYC gene.These distances represent exceedingly short regions in genomic terms.M Peter had proposed that viral and MYC sequences were co-ampli ed in an amplicon between less than 50 and 800 kb in size [23].The ampli cation of the MYC oncogene was shown to signi cantly associate with heavilyrearranged extrachromosomal circular DNA amplicons, which contain distal (> 1 Mb) ampli ed segments [24].So we supported that the ecDNA amplicon could simultaneously promote viral oncogene and MYC gene coamplifying.This may play a important role in carcinogenesis of HPV.
Consistently, Hu supported the association between mutations in cytoband 17q12 and cervical cancer [22] .Within cytoband 17q12, ERBB2 has been recognized as a potential target for treating cervical cancer [23] .In this study, HPV breakpoints were found at speci c human genome sites, such as chr17:39703981/39695556 (ERBB2) and chr17:39647045/39661406 (STARD3).Furthermore, we found that integration occur in intron of ERBB2, which consist with the results that integration sites tended to located within introns.Martina Schmitz found that HPV integration sites were located within the intron sequences of known genes, and insertional mutagenesis could in uence the function of genes [24] .
21p11.2 (16%) was the most frequently integrated cytoband in SCC.In this cytoband, genes encoding 45S ribosomal RNA (RNA45S rDNA), which harbor strong promoters, are abundant.Viral sequences could utilize hosts' promoters to express its proteins.Inagaki proposed that two distinct cDNA clones of the HPV type 18 transcript are present in HeLa cells [25] .These clones contain the viral genes E6 and E7, as well as a shared human sequence located at chr8:127228560-127229294 [25] .They proposed that HPV 18 integrates into the lncRNA and utilizes its promoter to express the viral oncoprotein E6 and E7 in HeLa cells [25] .In our study, viral sequence integrated in 21p11.2cytoband could likewise utilize RNA45S rDNA' promoters to express viral proteins in SCC.
Another cytoband 3q28 in SCC has been reported in previous study [13] .In 3q28, mutations in TP63, a member of the p53 family, are associated with various cancers: notably, lung and bladder cancers [26,27] .Min Liu hypothesized that highly expressed HPV-human fusion transcripts could promote overexpression of E6*I and E7 and inhibit the transcription of tumor suppressor genes TP63 and P3H2 [28] .In our study, only HPV16 integrated in cytoband 3q28 for SCC. 3 breakpoints were located in intergenic regions between TP63 and P3H2.Therefore, this indicated that HPV-human fusion transcripts may play a important role in carcinogenesis of SCC, by promoting overexpression of viral oncoproteins and inhibiting the transcription of tumor suppressor genes.In conclusion, we proposed that HPV integration in different cytobands promotes the progression of cancers through different mechanisms and shows a tendency for integration into speci c cytobands.
We identi ed integration sites within regions with various elements, such as repetitive elements, open chromatin (DHSs), and chromatin genome segmentation (chromatin 18-state of HeLa-S3).The ndings revealed that breakpoints in AC tended to locate in simple repeat elements, DHSs, TFBSs and H3K4 methylated histone binding sites compared with that in SCC.Among these elements, DHSs serve as a global measure of chromatin accessibility [29] , providing insights into the functional elements within the genome.We identi ed Enh, TFBS, and Tss with great con dence by intersecting DHSs with chromatin state regions identi ed in the ENCODE project (Table .1).This suggested that in AC, HPV has a higher propensity for integration in regulatory regions, potentially in uencing host gene expression and cellular processes.
HPV integration at recurrent loci may provide selection advantage to host cells [24] , leading to the recurrence of hotspot genes.We found that hotspots were coding genes (8/12) in AC and noncoding genes (8/15) in SCC.In AC samples, ERBB2 (9) and STARD3 (9) were the most integrated genes.The mutation rate of ERBB2 was reported to be signi cantly higher in AC than SCC [30] .Moreover, the positive rate of ERBB2 protein in AC was more than one-half [31] .Though the PPI network for AC, we found ERBB2 was closely related to other genes affected by HPV integration.Therefore, ERBB2 maybe a potential target for treating AC.STARD3 ( 9) is a member of the lipid transporter subfamily and plays a critical role in the synthesis and maintenance of cholesterol balance [32] .
Studies on STARD3 have focused on gastric carcinoma and breast cancer [32,33] ; however, studies on cervical cancer are lacking.Previous studies have shown that STARD3 co-ampli ed and co-expressed with HER2 in breast cancer [34,35] .The reduction of STARD3 expression in HER2-positive cancer cell lines inhibit their growth [36,37] .In this study, STARD3 was related to ERBB2 closely in the PPI network.Therefore, this gene maybe a novel therapeutic target for AC.RAD51B ( 3) is known as a hotspot gene [38] .It plays a critical role in homologous recombinational repair of DNA double-strand breaks (important for genomic stability and a hallmark of cancer) [39]   .In this study, HPV integration sites were found in intron 7 of RAD51B-204 (the main transcript) and its breakage may be promoting progression of cancer.Therefore, we hypothesized that HPV integration may lead to the dysregulation of oncogenes, that plays a more important role in HPV-related AC.Distribution of integration sites in exonic, intronic, and intergenic regions.The random integrated ratio of each region is counted according to the random distribution of breakpoints in the human genome.P-values were calculated using the χ 2 test and corrected by Fisher's exact test.

Figures
Figures

Figure 1 Distribution
Figure 1

Figure 2 Frequency
Figure 2