Background

Cervical cancer remains the fourth most common cancer affecting women’s health worldwide, especially in China. Persistent high-risk HPV infection is considered the primary aetiological factor for cervical cancer [1]. HPV is a small, double-stranded, circular DNA virus that exclusively infects epithelial cells of the skin or mucosa. The genome size of HPV is approximately 8.0 kb, containing six early genes (E1, E2, E4, E5, E6, E7), two late genes (L1, L2), and long control region (LCR). The oncoproteins encoded by the E6 and E7 genes play a crucial role in HPV-driven viral carcinogenesis and cancers [2, 3]. E6 binds to the tumour suppressor protein p53 and prevents its translocation, and mediate the cellular transformation by inhibiting the ability of p53 to activate. E7 binds to the retinoblastoma protein (Rb) and induces cells to enter into premature S-phase by disrupting Rb-E2F complexes. These processes lead to impaired p53 and Rb functions, involving DNA repair, cell cycle, apoptosis, and ultimately result in immortalization of HPV-infected cells [4].

To date, over 200 HPVs have been identified, which are divided into five main genera referred to as alpha (α)-, beta (β)-, gamma (γ)-, Mu(μ)-, and Nu(ν)-papillomaviruses [5]. In addition, these HPVs are divided into high- and low-risk types based on their carcinogenic potential. Of which, at least 14 are classified as high-risk HPV types, such as HPV16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66 and 68. It is worth noting that almost all high-risk HPV types belong to the α genus, including α-5 (HPV51), α-6 (HPV56, 66), α-7 (HPV18, 39, 45, 59, 68), and α-9 (HPV16, 31, 33, 35, 52, 58). Of which, α-9 species causes approximately 75% invasive cervical cancers worldwide [6]. Our previous study suggested that women infected with HPV16, 18, 31, 33, 52, or 58 are more likely to have CIN2 + lesions [7]. Moreover, the incidence of CIN2 + caused by HPV33, 52, and 58 was greater than that caused by HPV 18 in Taizhou [7]. Different HPV types exhibit differences in their immunogenicity, adaptability, and carcinogenicity, which may be due to different genetic variations and selection pressures during the evolutionary process [8, 9]. Therefore, focus should be placed on these common carcinogenic HPV types, such as α-9 species in the Taizhou region.

Persistent infection of human epithelial cells by HPV leads to integration of the viral DNA into the host genome, usually disrupting the E1 and/or E2 genes [10]. HPV integration is a key event for cervical carcinogenesis, leading to structural aberrations in the host genome or abnormal gene expression of target genes. Of which, the expression of E6 and E7 oncoproteins cause cellular immortalization and neoplastic transformation. The most frequent integration sites include SHKBP1, ERBB3, CASP8, HLA-A, HLA-B, TGFBR2, PIK3CA, EP300, FBXW7, PTEN, NFE2L2, ARID1A, KRAS, MAPK1, etc. [11, 12]. The transformed cervical cells show expression of HPV E6 or E7, with antigenic epitopes on their protein surface that stimulate B lymphocytes to produce antibodies [13, 14]. Viral antigen peptides are presented at the cell surface through human leukocyte antigen (HLA) and are recognized by CD8 + cytotoxic T lymphocytes (CTLs) [15,16,17]. Therefore, E6 or E7 molecules are considered ideal targets for HPV therapeutic vaccines, inducing cell-mediated immunity by stimulating CTLs in immune response strategies or eliciting humoral immunity by activating B lymphocytes to produce specific antibodies [18, 19].

Unfortunately, as a major issue for cervical cancer, local data available on HPV therapeutic vaccines are still limited in China, and almost no consideration is given to E6 or E7 gene mutations. Hence, there is an urgent need to further study the genetic variation, positive selection site, protein structure, and antigen epitopes of α-9 HPV E6 or E7 to provide data to explore effective treatment of HPV-induced cervical lesions in Taizhou, China.

Materials and methods

Study population

This study was ethically approved by the Institutional Medical Ethics Review Board of Taizhou Hospital, China. Cervical exfoliated cells were collected from women who underwent cervical cancer screening at Taizhou Hospital. The specimens were collected by cervical scraping and stored in 2.5 ml of cell preservation buffer at -20 °C. Before specimen collection, written informed consent was obtained from all participants, and the participants’ privacy was strictly protected.

HPV genotyping

HPV types were identified using the HPV Genotyping Kit (Tellgen Corporation, China), which was approved by the China’s FDA (Certified No. (2014): 3,400,847). The protocol for HPV genotyping has been described in detail in our previous studies [7, 20]. Briefly, the kit uses a set of biotinylated amplimers and multiplex HPV genotyping methods with bead-based Luminex suspension array technology, which is able to simultaneously identify 14 high-risk HPV types including 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, 68 and β-globin gene (internal control).

PCR amplification and sequencing

Based on the reference sequences of α-9 HPV types in GenBank, specific primer pairs for the entirety of the E6 and E7 regions were designed using the Primer-BLAST tool (ncbi.nlm.nih.gov/tools/primer-blast). The primers, PCR conditions, amplicon size, and reference sequences are listed in Table S1. Genomic DNA was extracted using a DNA Extraction Kit (#GK0122, GENEray, China). PCR products were purified and sequenced at BGI, and all data were confirmed by repeating PCR and sequencing reactions at least twice. In this study, genetic variant data for HPV31 and 35 were combined with data from our previous studies on HPV16, 33, 52 and 58 [21,22,23,24].

Phylogenetic analysis and homology comparison

All successfully acquired nucleotide sequences were aligned by BioEdit. Then, a phylogenetic tree of α-9 HPV E6 and E7 variation patterns was constructed by the maximum-likelihood method with one thousand bootstrap replicates using MEGA X. The phylogenetic tree was constructed using 201 complete α-9 HPV E6 and E7 sequences, including 64 HPV16, 16 HPV31, 15 HPV33, 5 HPV35, 27 HPV52, 25 HPV58, and 49 reference sequences downloaded from NCBI.

Selective pressure analysis

The CodeML program in PAML (abacus.gene.ucl.ac.uk/software/paml.html) was used to calculate the nonsynonymous (dN)/synonymous (dS) ratio (ω) for selective pressure analysis. If nonsynonymous mutations are favoured by Darwinian selection, they will be fixed at a higher rate than synonymous mutations, resulting in dN > dS, ω = dN/dS > 1 [25].

Protein structure analysis

The obtained nucleotide sequences were translated into amino acid sequences (protein primary structure) using MEGA X. Then, the secondary and tertiary structures of the E6 and E7 proteins of α-9 HPV were predicted using PSIPred (bioinf.cs.ucl.ac.uk/psipred) and Swiss-model (swissmodel.ExPASy.org), respectively.

Prediction of linear and conformational B-cell epitopes

Linear B-cell epitopes were predicted from the primary sequence of the E6 or E7 protein using sequence-based methods (Kolaskar and Tongaonkar’s antigenicity, tools.immuneepitope.org/bcell/). Conformational B-cell epitopes were predicted from the tertiary structure of E6 or E7 using ElliPro, which identifies protrusions in antigen surfaces (tools.immuneepitope.org/ellipro).

Prediction of cytotoxic T-cell epitopes

CTL epitopes were predicted from the E6 or E7 protein of α-9 HPV types using the NetCTL server, which accepts the FASTA format (tools.immuneepitope.org/netchop). NetCTL gives results for 9-mer peptides together with their predicted MHC binding affinity, binding affinity rescale value, C-terminal cleavage affinity, and TAP transport efficiency. C-terminal cleavage weights and TAP transport efficiency were calculated using the default values 0.15 and 0.05, respectively; 9-mer peptides with a prediction score > 0.75 were considered to be potential CTL epitopes.

Results

The prevalence of α-9 HPV types in Taizhou

A total of 60,259 women underwent the first round of HPV screening in Taizhou Hospital from December 2012 to June 2023. The overall rate of high-risk HPV infection was 17.3% (10,423/60,259), including infection with multiple HPV types (4.5%). All high-risk HPV-positive samples belonged to the α genus, including α-5 (782, 7.5%), α-6 (1466, 14.1%), α-7 (3521, 33.8%), and α-9 (7815, 75.0%) (Fig. 1, Table S2). α-9 HPV accounted for 75.0% of the high-risk HPV-positive samples in Taizhou, both alone and in combination with other types. In this study, we selected the α-9 HPV single-type-positive samples (HPV 16, 31, 33, 35, 52, 58) for subsequent analyses, including 298 HPV16, 149 HPV31, 185 HPV33, 123 HPV35, 325 HPV52, and 199 HPV58 samples. The clinicopathological information of the study population are shown in Table S3.

Fig. 1
figure 1

Distribution of different high-risk HPV genotypes in Taizhou from 2012 to 2023

Nucleotide variations of α-9 HPV types

Compared with the reference sequences from GenBank, 53 nucleotide mutations were detected in HPV16, with 34 (34/53, 64.2%) nonsynonymous mutations (24 in E6 and 10 in E7). In addition, 12 (12/24, 50.0%), 10 (10/20, 50.0%), 2 (2/7, 28.6%), 17 (17/35, 48.6%), and 17 (17/26, 65.4%) nonsynonymous nucleotide mutations were detected in the HPV31, 33, 35, 52, and 58 E6 and E7 genes, respectively. A summary of nucleotide variations and amino acid substitutions in the E6 and E7 regions of α-9 HPV are shown in Tables 1 and 2.

Table 1 Nucleotide variations and amino acid substitutions in α-9 HPV E6 region
Table 2 Nucleotide variations and amino acid substitutions in α-9 HPV E7 region

Of all genetic variations in our study, 31 newly reported nonsynonymous mutations have never been reported: T105G(M8R), C110G(Q10E), G126A(R15Q), A134C(K18Q), T137G(L19V), A152G(T24A), A296C(K72Q), T310G(F76L), T434G(C118G) in HPV16 E6, and A619T(T20S), A647C(N29T), G676C(D39H), T730C(F57L), G791T(R77L), G823T(G88*) in HPV16 E7; A299G(T64A), A446C(E113D), C537G(P143G) in HPV31 E6, and C788T(R77C) in HPV31 E7; T196G (C30G), G458T (R117L) in HPV33 E6; G108C(E3Q), G267A(D56N), T292G(I64S) in HPV52 E6; A194C(K29Q), G225A(R39Q), C228T(S40F), A390C(K94T), G432A(R108K), and C534A(P142H) in HPV58 E6 and T678A(D35E) in HPV58 E7.

Phylogenetic construction

All nucleotide sequences of this study were submitted to GenBank, and accession numbers were obtained (HPV16E6E7: MT681266-MT681329, HPV31E6E7: OR540563-OR540578, HPV33E6E7: OQ672665-OQ672679, HPV35E6E7: OR540579-OR540583, HPV52E6E7: ON529577-ON529603, HPV58E6E7: MH348918-MH348942). A phylogenetic tree was constructed from 201 complete α-9 HPV E6 and E7 nucleotide sequences (152 obtained from our study and 49 downloaded from GenBank) (Fig. 2). Furthermore, we constructed phylogenetic trees based on the E6 and E7 amino acid sequences (Figures S1- S2).

Fig. 2
figure 2

Phylogenetic tree of α-9 HPV variants. Maximum-likelihood analysis (with MEGA X) of E6 and E7 nucleotide sequences was inferred from 201 complete α-9 HPV E6/E7 sequences, including 64 HPV16, 16 HPV31, 15 HPV33, 5 HPV35, 27 HPV52, 25 HPV58, and 49 reference sequences

Selective pressure analysis

The results of dN/dS analysis for α-9 HPV E6 and E7 genes showed a high nonsynonymous mutation rate for positively selected sites. Nineteen positive selection sites were detected, including 32E of HPV16 E6, 93K of HPV33 E6, 93R of HPV52 E6, 29S*, 77R*, 88-** of HPV16 E7, 45A, 97Q of HPV33 E7, 63E of HPV35 E7, 52S, 64D of HPV52 E7, 9R, 20T, 35D, 41G, 61T, 63G, 64T, and 77V of HPV58 E7. However, no reliable HPV31 E6, HPV35 E6, HPV58 E6, or HPV31 E7 positive selection sites were detected (Table 3).

Table 3 Positive section sites of α-9 HPV E6 and E7 sequences

Protein structure analysis and homology modelling

Nucleotide nonsynonymous substitution changes the amino acid composition, which affects the structure and function of proteins. Our analysis showed that α-9 E6 and E7 are composed of residues 148–158 and 97–99, respectively. The template-target pairwise sequence alignment for α-9 HPV E6 and E7. More details are shown in Table S1 and Figures S3-S4. All amino acid substitutions in E6 and E7 are shown in Fig. 3. As depicted in Figure S5, six E6 or E7 proteins of α-9 genus HPV are highly homologous, so their correlations are included in our research. As shown in Fig. 3, the majority of amino acid substitutions are located on the outer edge of E6 or E7 proteins and near the zinc granule, which is situated in the active site. Interestingly, we found that the 93rd residue is not only a common nonsynonymous mutation in the E6 region but also a positive selection site.

Fig. 3
figure 3

3D structural models of α-9 HPV E6 or E7 proteins

We selected the HPV variant with the highest infection rate in each type for homology modelling. The template coverage for the E6 protein of α-9 HPV included most of the protein sequence, no involving only a short N-terminal stretch (see Figure S5, A-F for the homology models and Figure S3 for the template-target pairwise sequence alignments). However, the template coverage for the E7 protein was low, excluding approximately 50 residues at the N-terminal stretch (see Figure S5, G-L for the homology models and Figure S4 for the template-target pairwise sequence alignments).

Prediction of linear and conformational B-cell epitopes

The predicted linear B-cell epitopes of α-9 HPV E6 or E7 proteins are presented in Table 4. Conformational B-cell epitopes were predicted from protein tertiary structure models using ElliPro. Amino acid residues, the number of residues, the sequence location and the PI score of the predicted conformational epitopes are given in Table 5, and the graphical depiction of these epitopes are provided in Figure S6. Immunoinformatics predicted 57 potential linear and 59 conformational B-cell epitopes, many of which are also predicted as CTL epitopes (Tables 4, 5 and 6).

Table 4 Predicted linear B-cell epitopes of E6 and E7 proteins of α-9 genus HPV
Table 5 ElliPro predicted conformational B-cell epitopes of E6 and E7 proteins of α-9 genus HPV
Table 6 NetCTL predicted CTL epitopes of E6 and E7 proteins of α-9 genus HPV

Prediction of cytotoxic T-cell epitopes

Based on the literature, we selected the HLA-A*02 and HLA-B*62 supertypes for NetCTL epitope prediction, as well as CTL epitopes that overlap with the sites for the predicted B-cell epitopes (linear and/or conformational). The predicted CTL epitope 9-mer peptides with their predicted MHC-I binding affinity, rescaled binding affinity, proteasomal C-terminal cleavage affinity, and TAP transport efficiency are indicated in Table 6, with an overall prediction score threshold of 0.75.

Discussion

Cervical cancer remains a major challenge for women’s health, with approximately 600,000 new cases and 340,000 deaths worldwide every year [1, 26]. High-risk HPV genotypes are the main cause of cervical cancer, which can be largely prevented through HPV vaccination and cervical screening. However, the cervical cancer burden for women remains heavy in China, with only 3% of women aged 9–45 years receiving complete HPV vaccination [27, 28]. Although prophylactic vaccines are effective at protecting against approximately 90% of HPV infections, their benefits in eliminating preexisting infection are limited. Therefore, potential therapeutic vaccines need to be developed for treatment of persistent HPV infection or cervical lesions, with the goal of activating adaptive immune responses by presenting viral antigen peptides to the immune system. However, because the genetic variations of HPV genotypes show some degree of geographical differences, it is recommended that the ideal therapeutic vaccine be based on a local type.

The Taizhou area is located along the central coast of Zhejiang Province in China, with high prevalence and pathogenicity of α-9 HPV (HPV 16, 31, 33, 35, 52, 58), especially HPV52 and 58 [20]. According to our previous findings, the odds ratio (OR) for CIN2 + in women infected with α-9 HPV is 3.2 when compared to women infected with α-5, α-6, or α-7 [7]. The genetic variation of E6 or E7 genes may correlate highly with cancer risk in Taizhou [21,22,23]. Therefore, E6 or E7 molecules might be regarded as ideal targets for HPV therapeutic vaccine development and cervical cancer treatment. The purpose of this study was to provide data for effective prevention and treatment of HPV-induced cervical lesions in Taizhou by analysing the phylogenetic tree and epitope prediction of α-9 HPV E6 or E7.

In Taizhou, we found that the most common nucleotide mutations observed in the E6 gene were T178G (D32E, 192/298, positive selection) of HPV16; C520T (I138V, 63/149) of HPV31; A387C (K93N, 45/185, positive selection) of HPV33; T341C (W78R, 122/123) of HPV35; A379G (K93R, 313/325, positive selection) of HPV52; and A388C (K93N, 87/199, positive selection) of HPV58. Most high-frequency nonsynonymous mutations are positive selection sites, which contribute to HPV adaptation. In addition, their corresponding genetic variants have been widespread in Taizhou, including MT681266 (38.9%, 116/298, HPV16), OR540577 (24.8%, 37/149, HPV31), OQ672675 (20%, 37/185, HPV33), OR540579 (62.6%, 77/123, HPV35), ON529581 (60.9%, 198/325, HPV52), MH348927 (22.1%, 44/199, HPV58). Interestingly, we found that the 93rd residue is not only a common nonsynonymous mutation in the E6 region but also a positive selection site, which may have evolutionary significance in rendering HPV33, 52, and 58 adaptive to their environment. Notably, three amino acid substitutions at the 93rd residue were detected in HPV52 E6: K93R (96.3%), K93G (2.5%), and K93Q (2.2%). In addition, the K93N substitution was observed in both HPV33 and HPV58. These 93rd-amino acid substitutions may lead to differences in the ability of these viruses to bind to p53, thus affecting their carcinogenicity [29]. The D32E substitution in HPV16 E6 has been confirmed to be associated with cancer risk, possibly because the amino acid substitution affects the E6-p53 interaction [30].

In the E7 gene, the most common nucleotide mutation observed was A647G (N29S, 195/298, positive selection) of HPV16. Similarly, there were three amino acid substitutions at the 29th residue in HPV16 E7: N29S (65.44%), N29H (9.40%), and N29T (0.34%). In addition, all HPV58 E7 with G63S carries T20I, and all HPV58 E7 samples with G63D carry G41R. In our previous study, it was reported that HPV58 E7 T20I/G63S substitutions increase risk of HPV carcinogenesis [24]. Boon et al. [31] reported that T20I/G63S substitutions possess greater ability to degrade Rb, immortalize, and transform primary cells.

Consistently with our selective pressure analysis, positive selection sites for 32E in HPV16 E6, 29S in HPV16 E7 were found in Kunming, Southwest of China [32]. Positive selection sites for 93K in HPV33 E6, 45A, 97Q in HPV33 E7 were found in Sichuan province, Southwest of China [33]. Notably, positive selection site for 63G in HPV58 E7, which has been widely reported in other regions of China, including Sichuan province [33] and Yunnan province [34, 35] in Southwest China, Hubei province in central China [36], as well as the present study (Southeast China). However, there were no reports of positive selection sites for HPV58 E6 in China [33,34,35,36]. No positive selection sites for HPV16 E6 and E7 genes have been reported in central China [37]. No positive selection sites for HPV52 E6 and E7 genes have been reported in central and Southwest China [38, 39]. In addition, selective pressure analysis from non-Asian population showed different results, with 10R, 14H, 83V in HPV E6 and 85G in HPV16 E7 under positive selective pressure [40, 41]. Therefore, these results indicate that genetic variations among HPV types may lead to biological advantages through fixed mutations in their genomes and that even small variations might lead to minor adaptive improvements. Furthermore, the genetic variations of HPV may differ in terms of infectivity, carcinogenic potential, and host immune response. Therefore, the data provided in this study may have significant implications for understanding the biological differences of HPVs in Taizhou, as well as for developing local therapeutic vaccines.

HLA participates in the local immune response of viral infection through its target recognition function, blocking HPV infection or preventing tumour cell invasion and metastasis. However, a minority of infected cells can escape host immune surveillance, causing persistent HPV infection. Immunoinformatics provides new strategies for identifying ideal epitopes for HPV therapeutic vaccine targets. Our predicted T- and B-cell epitopes may be used for development of vaccines targeting specific HPV variants, and our results suggest that amino acid substitution may influence these epitopes. For example, the prediction score of the HPV16 E6 CTL epitope 25-33ELQTTIHDI was 1.0904, and because of the D32E substitution the score of the epitope 25-33ELQTTIHEI increased to 1.3272; the HPV16 E6 predicted epitope 31-39HEIILECVY became a new epitope because of the mutation D32E. Therefore, the substitution of the positive selection site D32E in HPV16 E6 influences its antigenic epitopes, which may make it more difficult to detect by the immune system, thereby enhancing adaptability of HPV to the environment. In addition, Li et al. [37] suggested that non-conservative substitutions of amino acids should be fully considered when developing therapeutic vaccines, such as H31Y, D32N, D32E, I34M, L35V, E36Q, L45P, N65S, and K75T in HPV16 E6. Chen et al. [33] suggested that K93N in HPV33 E6, Q97L in HPV33 E7, R145K in HPV58 E6, and T20I in HPV58 E7 belong to ideal B-cell epitopes.

In addition, our results suggest that the CTL epitope of HPV16 E6 is 75-83KFYSKISEY; homologous epitopes were identified in HPV31 E6 68-76RFYSKVSEF, HPV33 E6 68-76RFLSKISEY, HPV35 E6 68-76KFYSKISEY, HPV52 E6 68-76RFLSKISEY, and HPV58 E6 68-76RLLSKISEY. Homologous epitopes were also identified in HPV16 E7 11-19YMLDLQPET, HPV31 E7 11-19YVLDLQPEA, HPV35 E7 11-19YVLDLEPEA, and HPV52 E7 11-19YILDLQPET. These E7 epitopes are restricted to HLA-A*02, which is the first noteworthy allele of the HPV-restricted epitope [42,43,44]. Riemer et al. [45] found that CTLs can recognize and lyse target cells presenting HPV16 E7 11-19YMLDLQPET. However, further experiments are required to validate the potential HPV therapeutic vaccines predicted through immunoinformatics.

Conclusions

In summary, this is the first almost comprehensive study to explore the genetic variations, phylogenetics, positive selection sites, and antigenic epitopes of α-9 HPV E6 and E7 molecules in Taizhou, China, and the results will be helpful for local HPV therapeutic vaccine development.