Background

Currently, over 200 types of HPV have been fully characterized, of which the great majority clusters phylogenetically within three genera of the Papillomaviridae family: alpha (α), beta (β), and gamma (γ) [1]. The α genus contains HPV types that infect mostly mucosal and genital regions, including 65 papillomavirus types from humans, and this group of viruses constitutes 14 species groups [2]. Persistent HPV infections are considered the material cause of cervical cancer, where greater than 99% of cervical cancer lesions contain HPV DNA [3]. At least 3 ancestral papillomaviruses are responsible for the current heterogeneous groups of genital HPV genomes, including low-risk (LR)1 (α1, 8, 10 and 13), LR2 (α2, 3, 4 and 14) and high-risk (HR) (α5, 6, 7, 9 and 11) [2].

However, why only a small proportion of HPV infections progressed to precancer and cancer is unclear [4]. In addition to the pathogenic heterogeneity of distinct HPV types, previous studies indicate that HPV variants are also associated with different risks of cancer progression. For example, the HPV16 variant has significantly different risks of HPV persistent infection, progression to cervical intraepithelial neoplasia (CIN) and cervical cancer [5, 6]. Lisa Mirabello observed that compared to the most frequent A1/A2 sublineages, the A4, C, D2 and D3 sublineages conferred a higher hazard of CIN and cervical cancer [7]. The C variant (vs. B variant) of HPV52 was associated with an increased prevalence of cytologically diagnosed and histologically confirmed HSIL or worse lesions [8]. These data indicate that HPV variants have different phenotypic characteristics, including carcinogenicity.

HPV E6 and E7 are the major oncogenes, which are highly expressed in tumours and are related to inducing cellular immortalization, transformation, and carcinogenesis through protein–protein interactions with tumour suppressor proteins [9]. For example, E6 binds the conserved LxxLL consensus sequences of the ubiquitin ligase E6-associated protein (E6-AP), which works as a connecting bridge between E6 and p53, leading to its subsequent degradation [10]. Similarly, E7 targets and promotes the inactivation of RB1, thus inducing cell-cycle progression through activation of E2F-driven transcription [11].

In this study, we focused on the phylogeny and polymorphism of E6 and E7 gene variants for 14 common HPV types (HPV16, 31, 33, 52, 58, 51, 53, 66, 18, 39, 59, 68,6, 44) in Shanghai women with cervical lesions. This comprehensive analysis will help us understand the clinical and biological role of sequence variation.

Materials and methods

Study population

In total, 262 HPV-positive patients (mean age 38.34 ± 10.52 years, 21–78) with histopathologically confirmed cervical lesions, including 92 nonneoplastic, 69 low-grade squamous intraepithelial lesion (LSIL) and 101 high-grade squamous intraepithelial lesion (HSIL), were recruited from the Cervical Disease Centre at the Shanghai First Maternity and Infant Hospital, Tongji University School of Medicine in Shanghai, China. Histopathological findings are divided into certain groups as nonneoplastic (chronic cervicitis and inflammation-related regenerative changes), LSIL (CIN I/mild dysplasia), HSIL (CIN II and CIN III/moderate and severe dysplasia) and invasive carcinoma. CIN I refers to mildly atypical cellular changes in the lower third of the epithelium, CIN II refers to moderately atypical cellular changes confined to the basal two-thirds of the epithelium (formerly called moderate dysplasia) with preservation of epithelial maturation. CIN III refers to severely atypical cellular changes encompassing greater than two-thirds of the epithelial thickness and includes full-thickness lesions (previous terms were severe dysplasia or carcinoma in situ).

The criteria for the inclusion of patients enrolled into their current study: HPV single infection; Histopathologically confirmed by Colposcopy biopsy. The exclusion: Co-infected with different HPV types; Not histopathologically confirmed; the patients with vaginitis or other bacterial/virus infection.

Genomic DNA isolation and HPV typing

DNA from exfoliated cervical cells was extracted using the TIANamp Genomic DNA Kit (No: 3304–9) according to the manufacturer’s instructions. HPV genotyping was conducted using an HPV GenoArray Test Kit (HybriBio Ltd).

Amplification and sequencing

After HPV testing, the remaining DNA samples were stored at − 80 °C and used to amplify E6 and E7 using specific primers (Table 1). Subsequently, PCR products excised from 1.5% agarose gel were sequenced bidirectionally by SAIYIN Gene Biotechnology Company, Shanghai, China.

Table 1 Primers used for the molecular characterization of fourteen human papillomavirus E6 and E7 genes

Phylogenetic tree analysis and sequence analysis

The neighbour-joining phylogenetic tree of the HPVs was constructed by MEGA 7.0 using the maximum composite likelihood estimate [12]. To construct distinct phylogenetic branches, the reference HPV sequences were obtained from the GenBank database. The phylogenetic trees were visualised in FigTree v1.4.3 and online Evolview [13, 14].

The sequences were subsequently analysed by NCBI Blast, and all unique sequences were compared pairwise using the ClustalW tool of MEGA 7.0. The nucleotide positions of HPV were numbered on the basis of the reference sequence KU298876.1 (HPV6), NC_001526 (HPV16), NC_001526 (HPV18), J04353.1 (HPV31), M12732.1 (HPV33), M62849.1 (HPV39), U31788.1 (HPV44), KU298901.1 (HPV51), NC_001592.1 (HPV52), GQ472849.1 (HPV53), D90400 (HPV58), X77858.1 (HPV59), U31794.1 (HPV66), and DQ080079 (HPV68).

Statistical analysis

Fisher’s exact test was chosen for statistical analysis. P < 0.05 was used as the threshold to indicate statistical significance. All the P values in the present study were two-sided. The power calculation was performed by G*power software [15].

Results

In this study, total DNA was extracted from exfoliated cervical cell samples from 262 HPV-positive patients. In total, 233 E6 and 212 E7 sequences were successfully amplified by PCR. Based on the reference sequences, we confirmed that these sequences were divided into 14 types of HPV (16, 31, 33, 52, 58,51,53, 66,18, 39, 59, 68, 6, 44) and 5 species groups (alpha-5, alpha-6, alpha-7, alpha-9, alpha-10) using phylogenetic tree analysis, where alpha-10 was a low-risk (LR) clade (Fig. 1 and Fig. 2).

Fig. 1
figure 1

Phylogenetic tree of Alphapapillomavirus based 233 nucleotide sequence alignments of HPV E6. The maximum likelihood tree was constructed using MEGA7.0. Phylogenetic trees were visualised in FigTree v1.4.3 and Evolview. These sequences were divided into 5 species groups (alpha-5, alpha-6, alpha-7, alpha-9, alpha-10), of which alpha-10 was a low-risk (LR) clade. Green, grey and red circle represent cervicitis, low-grade squamous intraepithelial lesion, high-grade squamous intraepithelial lesion, respectively; the star represents nonsynonymous mutation, and blue stars are insertion/deletions

Fig. 2
figure 2

Phylogenetic tree of Alphapapillomavirus based 156 nucleotide sequence alignments of HPV E7. The maximum likelihood tree was constructed using MEGA7.0. Phylogenetic trees were visualised in FigTree v1.4.3 and Evolview. These sequences were divided into 5 species groups (alpha-5, alpha-6, alpha-7, alpha-9, alpha-10), of which alpha-10 was a low-risk (LR) clade. Green, grey and red circle represent cervicitis, low-grade squamous intraepithelial lesion, high-grade squamous intraepithelial lesion, respectively; the star represents nonsynonymous mutations, and blue stars are insertion/deletions

Table 2 shows the distribution of the sublineage-specific infections for individual types in cervicitis, LSIL and HSIL groups. The incidence of HSIL was significantly increased in patients infected with alpha-9 HPV compared with other species (P < 0.0001), especially HPV16 (P < 0.0001). There was no statistically significant difference in the severity of CIN for all types of lineages. For HPV16, 37.5% HPV16 A1–3 sub-lineage caused HSIL, as well as 75.9% A4. Among 54 determinable samples of HPV 52, the A, B and C variants were found in 1 (1.85%), 52 (96.3%) and 1 (1.85%) samples, respectively, and lineage B was the most common. Among 45 determinable samples of HPV 58, sublineages A1, A2 and A3 variants were found in 57.8, 22.2 and 17.8% of all HPV58 samples, respectively. The nonprototype-like variant (sublineage B1) of HPV58 was rare in our study. A2 (69.23%, 9/13) and A1 (66.67%, 10/15) were common sublineages for HPV31 and HPV33, respectively.

Table 2 Distribution of lineage-specific human papillomavirus infections in samples from Shanghai

Interestingly, we observed that one variant represented four out of 13 HPV59-positive samples that appeared to form a new candidate, sublineage B1–2 (Fig. 3a). A 9-base sequence (AGTGAAACT) was inserted after position 519 of the E6 sequence, and 9 inserted bases were translated into 3 amino acids SET (Fig. 3b and c). These diagnostic SNPs were unique to the B1–2 sublineage.

Fig. 3
figure 3

Phylogenetic tree and schematic representation of 4 novel HPV59 E6 variants. a Phylogenetic tree of the HPV59 variants based on E6 sequences. b An insertion (AGTGAAACT) at nucleotide sites 519 in 4 variants. c The insertion sequence was translated into SET. Four variants are marked by red circles

Nonsynonymous mutations for the E6 and E7 genes within all types of HPV were evaluated. The A burden test was used to determine if the variant distribution was different between the IF, LSIL and HSIL groups by viral region (Table 3, Fig. 1, and Fig. 2). Despite nearly equal numbers of E6 and E7 sequences among three groups (IF, 159; LSIL, 121; HSIL, 165), the IF group overall had a significantly higher number of variants compared to the LSIL and HSIL groups (P = 3.83× 10− 4). Strikingly, the E7 gene had significantly fewer nonsynonymous variants in the HSIL compared to LSIL and IF groups (P = 3.17× 10− 4).

Table 3 Rare variant burden analysis for nonsynonymous variants within all types of HPV for the cervicitis, LSIL and HSIL groups

Moreover, we confirmed that the incidence of HSIL in patients infected with the alpha-9 HPV group was significantly increased compared with the other groups (P < 0.0001). We then further analysed nonsynonymous mutations of the alpha-9 HPV (HPV16, 31, 33, 52, 58) E6 and E7 genes in the HSIL case and control groups (Table 4). In the case group, 13 variations were observed in the E6 gene, and 19 mutations were observed in the E7 gene. In the control group, 17 and 14 variations were found in the E6 and E7 genes, respectively. For HPV16, the distribution of T7220G (D32E) variation in E6 and A7689G (N29S) in E7 showed a different trend between the case group and control group (P = 0.036 and 0.022) (Table 4), power (1-β) 0.562 and 0.629. For HPV58, A388C (K93 N) variation can significantly reduce the risk of HSIL and was a protective factor (P = 0.015), power (1-β) 0.624. In the remaining three types of alpha-9 HPV, no significant differences in the distribution of other variations between the case group and the control group were found. In addition, we performed co-variation analysis of five HPVs E6 and E7 genes in the α-9 group. But there was no significant correlation between E6 and E7 covariation and cervical lesions (Additional file 1: Table S1).

Table 4 HPV E6/E7 gene variations and amino acid substitutions in the case and control groups

Discussion

Persistent infection with HPV is the most important risk factor for cervical cancer [16]. According to their oncogenic potential, HPV types are divided into high-risk HPV types (16, 18, 31, 33, 35, 39, 45, 51, 52, and 58) associated with cervical cancers, and low-risk types (6, 11, 40, 42, 43, 44, and 54) associated with genital warts [17]. The E6 and E7 oncoproteins of HPV contribute to oncogenesis by associating with the tumour suppressor proteins p53 and pRb, respectively [18]. In this report, we describe the E6 and E7 genes of 14 conventional HPV species (HPV16, 31, 33, 52, 58,51,53, 66, 18, 39, 59, 68,6, 44) in Shanghai women with cervical lesions. This work provides basic information and reference variant sequences for future investigation of viral-host evolution and viral pathogenesis.

In this study, the α-9 (HPV16, 31, 33, 52, 58), α-5 (HPV51), α-6 (HPV53, 66), α-7 (HPV18, 39, 59, 68) and α-10 (HPV6, 44) were were detected and analyzed. 79.26% of α-9 HPV infection caused CIN confirmed histopathologically, 55.49% of which were HSIL. HPV16 A4, HPV31 A2, HPV33 A1, HPV52 B and HPV58 A1 were the most common sublineages in the α-9 HPV group. In China, the A1-A3 sublineage of HPV16 was predominant in northeast China [19], and A4 was common in central and south China [20, 21]. Globally, the risk of cervical cancer caused by the A3, A4 and D sublineages was significantly higher compared with HPV16 A1 [22]. In our study, HPV 31/33/52/58 had variant lineages similar to those reported by previous studies, and sublineages associated with CIN and/or cervical cancer were HPV52 C and HPV33 A1 [8, 23,24,25,26]. We should improve the screening of cervical cancer based on HPV pathogenic sub-lineages in different regions. This also reduces the rate of colposcopy biopsy, which can reduce the burden on patients and reduce the waste of medical resources. Simple infections of HPV16 carcinogenic subtypes or low-grade lesions caused by them should be intervened as early as possible rather than just follow-up. However, the sample size should be expanded to further confirm our research results.

The genome variations in humans and HPV may influence any stage of HPV infection by inducing cervical cancer [27]. For E6, the T7220G (D32E) variation in HPV16 E6 was a risk factor that increased the incidence of HSIL, whereas A388C(K93 N) variation in HPV58 E6 significantly reduced the risk of HSIL. Previous studies have shown that the susceptibility to cervical disease is increased by the specific protein interaction HPV16 E6 (L83 V)-p53 (Arg-72, [28]. Moreover, the gene variant T350G of HPV-16 was found to display more efficient degradation of Bax and binding to the E6 binding protein [29]. We found that E7 was highly conserved in the HSIL group compared to the <HSIL group, and A7689G (N29S) in E7 significantly increased the risk of HSIL. While the HPV16 A4 sublineage (P < 0.0001) and HPV16 E7 29S (P = 0.0002) rarely occurred in cancer patients compared to women with cervicitis in Vietnam [30]. HPV16 E7 S63F was significantly different between the case and control groups (P = 4.861 × 10− 10) in a Han Chinese population [31]. The T20I/G63S substitutions in HPV16 A3 E7 significantly increased the risk for HSIL in Taizhou area, China [32]. In one word, HPV sub-lineage and variation dispersal was population-specific, and we should develop different screening and treatment schemes according to the distribution of HPV variation in different regions. Due to the limitation of sample capacity, we should increase the sample size to confirm the role and mechanism of these mutations in the development of cervical cancer in Shanghai area or south China.

In current study, the E7 gene had significantly fewer nonsynonymous variants in the HSIL compared to LSIL and IF groups (P = 3.17× 10–4). Lisa Mirabello et al. confirmed hypovariation in that E7 had significantly fewer, rare non-silent genetic variants in cancers (P = 6.13× 10− 5) compared to E6 [33]. Previous studies have reported that the HPV16 E7 protein leading to cervical cancer is virtually invariant, and E7 displayed a fully conserved sequence [34, 35]. In summary, E7 variation greatly decreases the risk of CIN and invasive cancer.

Conclusions

In this study, we focused on the phylogeny and polymorphism of 14 HPV variants based on the E6 and E7 genes. In addition, we also found that the E7 gene lacked significant genetic variation in CIN, and which was strict conservation in the HSIL. This comprehensive analysis will help us understand the clinical and biological effects of sequence changes and provide preventative/therapeutic interventions for HPV-related CIN and cervical cancer.