Introduction

Cervical cancer is the second most common cancer among women aged 15–44, 99.7% of the cervical cancers were found to be associated with high-risk (HR) Human Papilloma Virus (HPV) persistent infection [1]. The three main genera are α, β, γ, of which α genus is associated with anal and oral mucosal infection. α-9 genus HPV is almost all carcinogenic HPV, causing 75% invasive cervical cancer worldwide, and its carcinogenicity is mainly realized by E6 and E7 early proteins encoded by HPV E6 and E7. The carcinogenicity of HPV E6 protein is more evident than that of E7 protein, in terms of the cell cycle changes and the efficiency of HPV infected cells to permanent biochemical transformations [2, 3]. Without E7, E6 can connect and ubiquitin degradation the p53 protein via E6AP, interfere the cell cycle, activate telomerase and reverse transcriptase to accumulate the mutations, for infected cells immortalization and maintain immortalization, which is closely related to the function of HPV immortalization, cell transformation, and carcinogenesis [4,5,6,7].

At present, there is no specific drug for HPV treatment and mainly relies on the body's immune system to detect and eliminate the virus. Human leukocyte antigen (HLA) has the function of recognizing itself target by recognizing and stimulating CD8+ cytotoxic T lymphocytes (CTL), CD4+ helper T lymphocytes (Th) as well as binding antigen polypeptide to regulate the immune response, control and eliminate HPV infection [8,9,10,11]. The antigen epitopes are composed of specific amino acid sequences, are the targets of immune rejection [12,13,14]. HPV E6 protein has been considered as a potential target for the activation of T cells in immune response strategies and maybe an ideal target for HPV therapeutic vaccines [15, 16].

HPV is a high infectious and mutable virus, with different epidemic trends and mutation types in different regions and populations [17]. The polymorphism of HPV E6 oncogene is strong, its non-synonymous mutation changed E6 protein amino acid composition, which may relate to the differences in immune response and pathogenicity [18]. Some mutant strains can even fix their genes by mutations, enhancing their adaptability to the environment and changing its infection rate [19]. For example, L83V (L90V) of HPV-16 E6 in Swedish and Italian populations as well as D25E (D32E) of HPV-16 E6 in Japanese populations have been proven to be associated with the progression of cervical cancer [20,21,22,23]. In the current HPV vaccine design targeting E6 protein, the E6 mutants have almost never been considered. In the epitope-specific vaccine designed by Kelly L, the body's long-active T-cell response to HPV-18 was induced by targeting the reference sequence of HPV E6 and E7. Due to the relatively rapid mutation rate of HPV, the host response capacity, malignant tumor prevention and therapeutic efficacy of vaccine were changed great [15].

HPV has strong regional and population differences, the prevalence of α-9 HPV and the harmfulness of E6 oncoprotein are extremely high, and E6 polymorphism is closely related to the difference of immunogenicity, adaptability, and pathogenicity. Therefore, it’s urgent to study the genetic diversity, positive selection sites, antigen epitope, the protein structure of α-9 HPV E6 for providing data to realize the effective prevention and control of the disease in this region.

Materials and methods

Samples resource

The study was ethically approved by the Education and Research Committee and Ethics Committee of Sichuan University, Sichuan, China. Eighteen thousand sixty-seven specimens were randomly collected from January 2012 to December 2017 in Chengdu Women and Children's Center Hospital, Chengdu Jinjiang District Women and Children's Hospital, Angel Women's and Children's Hospital, Affiliated Hospital of Sichuan Reproductive Health Research Center, Sichuan Reproductive Health Research Center Affiliated Hospital, Shuangnan Hospital, Chengdu Song zi niao Sterility Hospital, Infertility Hospital Affiliated to Chengdu Medical College and Chengdu Jinsha hospital. Before sample collection, written informed consent was obtained from all patients or their guardians, and patient privacy is strictly protected. The cell specimens were collected randomly by cervical scraped and placed in − 20 °C antiseptic buffer (9 g NaCl, 10 g C6H5CO2Na, 1 L H2O).

Genomic DNA extraction and HPV typing

HPV DNA was extracted and evaluated using the Human Papillomavirus Genotyping Kit For 23 Types (Yaneng Bio, Shenzhen, China) according to the manufacturer’s guidelines.

PCR amplification and variant identification

The primers of α-9 HPV E6 were designed by PRIMER version 5.0 and NCBI (National Center for Biotechnology Information) Primer Blast based on the reference sequences, the primers and reference sequences used for the molecular characterization analysis of α-9 HPV E6 were shown in Additional file 1: Table S1 and synthesized by TSINGKE (Chengdu, China). The PCR reaction system consists of 5 µl HPV DNA, 13.1 µl ddH2O, 1 µl primers, 0.4 µl TransTaq DNA polymerase, 2.5 µl dNTPs, and 3 µl buffer. The reaction conditions were shown in Additional file 1: Table S1. The PCR products were visualized by gel electrophoresis in 2% agarose gel (Sangon Biotech Co., Ltd.). The target products of E6 were purified and sequenced by TSINGKE at least twice (Chengdu, China).

Sequence analysis

Genetic polymorphisms analysis of α-9 HPV E6 gene

The successfully amplified sequences was sequenced, and the sequences were analyzed by NCBI BLAST, Premier5, and DNAMAN5.2.2. Nucleotide mutations of α-9 HPV E6 sequence were determined according to the reference sequence in GenBank (Additional file 1: Table S1). Chi-square test was used to confirm the significance of data differences, and P < 0.05 was considered as significant differences between the data.

Selective pressure analysis of α-9 HPV E6

Phylogenetic Analysis by Maximum Likelihood 4.8 (PAML 4.8, http://abacus.gene.ucl.ac.uk/software/paml.html) was used to calculate the ratio (ω = dN/dS) between non-synonymous mutation rate (dN) and synonymous mutation rate (dS) to determine the α-9 HPV E6 gene-positive selection sites.

Amino acid composition and protein structure analysis of α-9 HPV E6

Mega6.0 software was used to translate the E6 nucleotide sequence into the E6 protein sequence. PSIPred (http://bioinf.cs.ucl.ac.uk/psipred/) and Swiss-model were used to analyze the secondary and tertiary structure of E6 protein.

T-cell antigen epitopes predicted analysis of α-9 HPV E6 protein

According to the Chinese major histocompatibility complex database (dbMHC) average frequency of HLA alleles, 13 HLA-I and 6 HLA-II alleles were selected (Additional file 1: Table S2). Based on the selected HLA alleles, the T-lymphocyte epitopes of α-9 HPV E6 protein were predicted by IEDB resource (http://www.iedb.org/). According to the method recommended by IEDB, lower the percentile rank (PR) of antigen epitopes is better the affinity, peptides with PR < 1.0 for HLA-I and peptides with PR < 5.0 for HLA-II were deemed to meaningful as well as selected for further analysis.

Results

The prevalence of α-9 HPV in Sichuan

Out of 18,067 samples, 6092 positive results were detected and 4466 were HR HPV, all of which belonged to α genus. The HPV positive samples of α-1, α-3, α-5, α-6, α-7, α-8, α-9, α-10, α-11 are 167 (2.74%), 25 (0.41%), 137 (2.25%), 438 (7.19%), 571 (9.37%,), 413 (6.78%), 3270 (53.68%), 1021 (16.76%), 50 (0.82%) respectively (Fig. 1). α-9 HPV accounting for 73.22% of HR HPV positive samples. Due to the small positive sample sizes of HPV-35 and HPV-67, five other HPV type (HPV-16, HPV-31, HPV-33, HPV-52, HPV-58) were selected for subsequent studies.

Fig. 1
figure 1

Distribution of different HPVs positive samples from 2012 to 2017

Nucleotide polymorphisms and selective pressure analysis of α-9 HPV E6

250 HPV-16 E6 were successfully amplified, 162 (64.80%) HPV-16 E6 samples were variants, and 17 non-synonymous mutations were detected. 96 HPV-31 E6 were successfully amplified, 68 (70.80%) variants and 6 non-synonymous mutations were detected. 216 HPV-33 E6 were successfully amplified, 76 (35.19%) variants and 6 non-synonymous mutations were detected. 288 HPV-52 E6 were successfully amplified, 250 (86.80%) variants and 13 non-synonymous mutations were detected. 405 HPV-58 E6 were successfully amplified, 356 (87.90%) variants and 4 non-synonymous mutations were detected. Details of α-9 HPV E6 nucleotide polymorphisms were shown in Tables 1, 2, 3, 4 and 5. All the sequences were submitted to the GenBank, and accession numbers were obtained. (HPV16E6: MZ803036-MZ803058, HPV31E6: MZ803026-MZ803035, HPV33E6: MZ576479-MZ576485, HPV52E6: MZ803059-MZ803078, HPV58E6: MZ803079-MZ803087).

Table 1 Nucleotide mutation and amino acid substitution in HPV-16 E6
Table 2 Nucleotide mutation and amino acid substitution in HPV-31 E6
Table 3 Nucleotide mutation and amino acid substitution in HPV-33 E6
Table 4 Nucleotide mutation and amino acid substitution in HPV-52 E6
Table 5 Nucleotide mutation and amino acid substitution in HPV-58 E6

Calculated by Codeml software using Naive NEB and Bayes Empirical Bayes models, seven positive selection sites of α-9 HPV E6 were detected, there were D32E of HPV-16 E6, K35N, K93N, R145I of HPV-33 E6, K93R of HPV-52 E6, K93N, R145K of HPV-58 E6. In contrast, no reliable HPV-31 E6 positive selection site was selected out (Table 6).

Table 6 Positive section site of α-9 HPV E6

The protein structure analysis of α-9 HPV E6

Nucleotides non-synonymous mutation changed the amino acid composition of protein, which affects the structure of the protein, while the protein function is mainly realized by its structures. With the help of Mega6.0, PSIPred and Swiss-model, the primary, secondary, and tertiary structure difference of α-9 HPV E6 protein reference and mutation sequence were revealed.

In HPV-16 E6, I34R, L35V, R62I, P66A and L90V all located in β-fold, E120D, D32N, D32E located in the periphery of the spatial protein structure and close to the active region of znic granules. The amino acid number in the α-helix and β-sheet regions are different in protein reference and mutation sequence. Details are shown in Figs. 2 and 3.

Fig. 2
figure 2

Secondary structure of HPV-16 E6 comparing reference to the variant sequence. Note a is the secondary structure pattern diagram constructed based on HPV-16 E6 reference sequence; b is the secondary structure pattern diagram constructed based on HPV-16 E6-1 mutation sequence; c is the secondary structure pattern diagram constructed based on HPV-16 E6-2 mutation sequence, b and c are different in 32th amino acid. The black boxes are the difference areas between the reference and mutation sequence secondary structure.

Fig. 3
figure 3

Tertiary structure of HPV-16 E6 comparing reference to the variant sequence. Note a is the homology modeling structure of HPV-16 E6 reference sequence; b is the larlarian diagram of HPV-16 E6 reference sequence homology modeling; c is homology modeling structure of HPV-16 E6 mutation sequence; d is the larchian diagram of HPV-16 E6 mutation sequence homology modeling

In HPV-31 E6, T64A, K65R, and F69L located in α-helix, T60Y located in β-sheet region, and K123R, A138V located in the coil. Amino acid substitution has no influence on the secondary and tertiary structure (Figs. 4, 5).

Fig. 4
figure 4

Secondary structure of HPV-31 E6 comparing reference to the variant sequence. Note a is the secondary structure pattern diagram constructed based on HPV-31 E6 reference sequence; b is the secondary structure pattern diagram constructed based on HPV-31 E6 mutation sequence

Fig. 5
figure 5

Tertiary structure of HPV-31 E6 comparing reference to the variant sequence. Note a is the homology modeling structure of HPV-31 E6 reference sequence; b is the larlarian diagram of HPV-31 E6 reference sequence homology modeling; c is homology modeling structure of HPV-31 E6 mutation sequence; d is the larchian diagram of HPV-31 E6 mutation sequence homology modeling

S74T and Q113R located in α-helix of HPV-33 E6 protein, K93N located on the outer edge of E6 protein and near the zinc granule, the above amino acid substitutions all located in the active region of the protein. Amino acid substitution changed the number of amino acids in the α-helix and β-sheet region, as well as made the E6 protein show more contact with the environment (Figs. 6, 7).

Fig. 6
figure 6

Secondary structure of HPV-33 E6 comparing reference to the variant sequence. Note a is the secondary structure pattern diagram constructed based on HPV-33 E6 reference sequence; a is the secondary structure pattern diagram constructed based on HPV-33 E6 mutation sequence. The black boxes are the difference areas between the reference and mutation sequence secondary structure

Fig. 7
figure 7

Tertiary structure of HPV-33 E6 comparing reference to the variant sequence. Note a is the homology modeling structure of HPV-33 E6 reference sequence; b is the larlarian diagram of HPV-33 E6 reference sequence homology modeling; c is homology modeling structure of HPV-33 E6 mutation sequence; d is the larlarian diagram of HPV-33 E6 mutation sequence homology modeling

R77K, E89K of HPV-52 E6 located in α-helix, N127I located in the β-sheet region, K93R situated on the outer edge of E6 protein and close to the zinc granules, all the amino acid substitutions found in the active region of the protein. Amino acid substitution increased the number of amino acids in the α-helix and β-sheet region, and the number of buried amino acids decreased (Figs. 8, 9).

Fig. 8
figure 8

Secondary structure of HPV-52 E6 comparing reference to the variant sequence. Note a is the secondary structure pattern diagram constructed based on HPV-52 E6 reference sequence; b is the secondary structure pattern diagram constructed based on HPV-52 E6 mutation sequence. The black boxes are the difference areas between the reference and mutation sequence secondary structure

Fig. 9
figure 9

Tertiary structure of HPV-52 E6 comparing reference to the variant sequence. Note a is the homology modeling structure of HPV-52 E6 reference sequence; b is the larlarian diagram of HPV-52 E6 reference sequence homology modeling; c is homology modeling structure of HPV-52 E6 mutation sequence; d is the larlarian diagram of HPV-52 E6 mutation sequence homology modeling

E32Q, D86E, K93N and R145K are located in the coil of HPV-58 E6 protein, E32Q, K93N situated on the outer edge of E6 protein, and close to the zinc granule, belonging to the active region of E6 protein. Amino acid substitution increased the number of amino acids in the α-helix region and decreased the number of amino acids in the coil region (Figs. 10, 11).

Fig. 10
figure 10

Secondary structure of HPV-58 E6 comparing reference to the variant sequence. Note a is the secondary structure pattern diagram constructed based on HPV-58 E6 reference sequence; b is the secondary structure pattern diagram constructed based on HPV-58 E6 mutation sequence. The black boxes are the difference areas between the reference and mutation sequence secondary structure

Fig. 11
figure 11

Tertiary structure of HPV-58 E6 comparing reference to the variant sequence. Note a is the homology modeling structure of HPV-58 E6 reference sequence; b is the larlarian diagram of HPV-58 E6 reference sequence homology modeling; c is homology modeling structure of HPV-58 E6 mutation sequence; d is the larlarian diagram of HPV-58 E6 mutation sequence homology modeling

The antigen epitopes analysis of α-9 HPV E6 protein

In HPV-16 E6 reference sequence, 97 HLA-I and 25 HLA-II epitopes were selected out, and epitope prediction results of variants were different, details were shown in Additional file 1: Tables S3 and S4. M1K made epitope affinity increase; R17G, D32N, D32E, I34R, L35V, P66A, H85Y and L90V changed epitope number and affinity; E120D and R151T made new epitopes appear. The effection of amino-acid substitution on HPV-16 E6 epitopes were summarized in Table 7.

Table 7 Effect of Amino-acid substution on T-cell epitopes of HPV-16 E6

125 HLA-I and 43 HLA-II epitopes of HPV-31 E6 reference sequence was selected out, and epitope of variants were different (Additional file 1: Tables S5, S6). H60Y, K65R changed epitope number and affinity, T64A decreased epitope number, and K123R, A138V made new epitope appear. The effection of amino-acid substitution on HPV-31 E6 epitopes were summarized in Table 8.

Table 8 Effect of Amino-acid substution on T-cell epitopes of HPV-31 E6

109 HLA-I and 41 HLA-II epitopes of HPV-33 E6 reference sequence was selected out, epitope of variants were different (Additional file 1: Tables S7, S8). K35N decreased epitope number and affinity; S74T, N86H, K93N and R145I changed epitope number and affinity; Q113R increased epitope affinity. The effection of amino-acid substitution on HPV-33 E6 epitopes were summarized in Table 9.

Table 9 Effect of Amino-acid substution on T-cell epitopes of HPV-33 E6

95 HLA-I and 50 HLA-II epitopes of HPV-52 E6 reference sequence was selected out, epitope of variants were different (Additional file 1: Tables S9, S10). E21K, L46V, E89K, K93R and N127I changed epitopes number and affinity; 105 M increased epitope affinity; N122K decreased epitopes number and E138K decreased epitope affinity. The effection of amino-acid substitution on HPV-52 E6 epitopes were summarized in Table 10.

Table 10 Effect of Amino-acid substution on T-cell epitopes of HPV-52 E6

113 HLA-I and 44 HLA-II epitopes of HPV-58 E6 reference sequence was selected out, epitope of variants were different (Additional file 1: Tables S11, S12). D86E, K93N changed epitopes number and affinity, and R145K changed HLA-I epitope. The effection of amino-acid substitution on HPV-58 E6 epitopes were summarized in Table 11.

Table 11 Effect of Amino-acid substution on T-cell epitopes of HPV-58 E6

Discussion

Cervical cancer is the second major malignant tumor in women in childbearing age and seriously threatens women's health. HR-HPV persistent infection is closely related to the occurrence and development of cervical cancer and other malignant diseases. α-9 HPV is almost all carcinogenic and associated with 75% cervical cancers. Sichuan is a multi-ethnic mixed residence area with a high prevalence rate of α-9 HPV. From 2012 to 2017, α-9 HPV positive samples accounted for 53.68% of all HPV positive samples and 73.22% of high-risk HPV positive samples, showing an increasing trend.

α-9 genus HPV is highly prevalent and pathogenic; its carcinogenicity is mainly realized through E6 oncoproteins guided by the E6 gene through triggering the immortalization of infected cells. As an early gene of the virus, HPV E6 has a high mutation risk. In Sichuan, 21, 13, 8, 21, 8 nucleotide mutations were detected in HPV-16, HPV-31, HPV-33, HPV-52, HPV-58 E6 respectively, and among them, 17, 6, 6, 13, 4 were non-synonymous mutation.

Gene non-synonymous mutations change the amino acid composition, and structure of the protein, as well as the functions of protein, are mainly realized by its structures. HPV E6 consists of one N-terminal (residues 1–36), one C-terminal (residues 147–158) and two Zinc fingers (residues 37–73 and 110–146, CxxC-(29x)-CxxC) three domains. The two Zinc finger binding domains form a deep pocket, which can mediate the most important tumor suppressor protein p53 ubiquitination degradation by binding to the "LXXLL" sequence of E6AP protein [24, 25]. 145–149 were PDZ domain-containing combined region that was the target of E6 protein for cellular transformation and the carboxy-terminal half being principally involved in p53 binding [26]. K93N of HPV-33 E6, K93R of HPV-52 E6, and K93N of HPV-58 E6 are located at the outer edge of E6 protein and near the zinc granule [27]. The N86H, R145I of HPV-33 E6 and D86E, R145K of HPV-58 E6 occurred in the same positions; K93N of HPV-33 E6, K93R of HPV-52 E6 and K93N of HPV-58 E6 all located in the 93rd of the E6 protein; those amino acid substitutions located in protein active region, can cause the E6 terminal and the trend of the carboxyl end structure disorder. Those protein conformational changes may lead to the differences in their ability to bind to the host p53 protein and other potential proteins, thus affecting the pathogenicity of α-9 HPV [29].

Positive selection sites make the gene frequency of the corresponding amino acid increasingly stable and enhance the species' adaptability to the environment [28]. According to the calculation, the positive selection site of HPV-16 E6 was D32E (128/250); HPV-33 E6 were K35N (42/216), K93N (42/216), R145I (33/216); HPV-52 E6 was K93R (252/288); HPV-58 E6 were K93N (111/405), R145K (16/405); These positive selection sites all belong to its high frequency non-synonymous mutation, suggesting that these positive selection sites, which contribute to the adaptation of α-9 HPV E6, have been widely spread.

HPV E6 protein plays a key role in cervical cancer development. During HPV infection, the immune system will treat E6 protein as an antigen presentation to eliminate HPV infection and reduce the occurrence risk of HPV-related diseases with the help of body immunity [30]. Some specific mutations in HPV E6 may lead to the differences in the infection ability and pathogenicity of the virus. Positive selection sites of HPV-16 E6 D32E, D32N located in protein outer edge and next to the zinc granules; 6 HLA-II epitopes disappeared due to D32E/D32N; In Japan, D32E has been confirmed to be associated with the development of cervical cancer [31]; T-cell antigen epitopes affinity reduced due to D32E, D32N, that may lead to the persistent infection of virus and promote the development of cervical cancer. Positive selection sites K35N and K93N of HPV-33 E6 are close to the zinc granules, while R145I located in the E6 PDZ binding domain; K35N and R145I made 35-43KPLQRSEVY for HLA-C*03:02 and 141-149RSRRRETAL for HLA-C*01:02 disappear respectively, K93N changed the epitope number and affinity; Above three positive selection sites of HPV-33 E6 located in E6 protein active region, affect the protein conformation, function and reduced the immunogenicity of the peptide containing the above sites to a certain amount. The positive selection site K93R of HPV-52 E6 changed the epitope number and decreased the affinity of excellent epitopes. K93N and R145I of HPV-58 E6 reduced the affinity of excellent HLA-I antigen epitopes. Those positive selection sites reduced the immunogenicity of E6 overall, which may make HPV-infected cells more difficult to be detected by the immune system, and enhance HPV adaptability to the environment. No positive selection site was selected out in HPV-31 E6, and the high-frequency non-synonymous mutation sites enhanced the affinity and number of E6 epitopes, which may relate to its extremely low prevalence.

Studies have found that mutations affect the efficiency of HPV vaccine [32], the protein structure and antigen epitope bioinformatics prediction method were introduced to analyze the influence of HPV E6 mutation on protein conformational and immunogenicity. We discussed the relationship between protein structure, positive selection site, antigen epitope and pathogenicity of α-9 HPV E6 protein in Sichuan was discussed for the first time. Amino acid substitution in positive selection sites may affect the virus infection efficiency, immunogenicity, and pathogenicity by altering their T-cell epitopes affinity to improve the survival ability of α-9 HPV as well as an adaptation to evolution. These results help explore the relationship between HPV E6 polymorphism and HPV affection capacity and its action mechanism to improve the therapeutic vaccine of α-9 HPV in Sichuan regions of China.

Conclusion

α-9 HPV is extremely prevalent in Sichuan, China. The positive selection site K93N of HPV-33 E6, K93R of HPV-52 E6 and K93N of HPV-58 E6 all occurred in the 93rd amino acid of E6 protein. N86H, R145I (positive selection site) of HPV-33 E6 and D86E, R145K (positive selection site) of HPV-58 E6 occurred in the same location of E6. α-9 HPV E6 positive selection sites that adaptive to the environment D32E, K35N, K93N, R145I, K93R, R145K have been widely spread, they all located in the E6 protein active region and altered their protein structure, as well as overall reduce the immunogenicity of the E6 protein, so that HPV infected cells are more difficult to be detected by the immune system and enhance the adaptability of α-9 HPV to the environment.

E6 mutations in positive selection sites may affect the virus infection efficiency, immunogenicity by altering their protein structure, epitopes affinity to improve the survival ability of HPV.