Background

Cervical cancer continues to be a major public health problem particularly in less-resourced countries with an estimated 570,000 cases diagnosed, and 311,000 death rates recorded in 2018 [1]. In sub-Saharan Africa, cervical cancer is the most common cancer and its accounts for 22.2% of all the cancers in women and it is the also the most common cause of cancer death among women [2]. It is one of the most common cancers in women living with human immunodeficiency virus (HIV) and presents a significant public health threat to women especially on the African continent. In sub-Saharan Africa, human papillomavirus (HPV)-associated cervical cancer is an important cause of morbidity and mortality [3]. Cervical cancer has been recognized as a rare outcome of a sexually transmitted infection, and the etiology is limited to a few HPV genotypes. HPV infections can be facilitated by co-infection with HIV [4]. To date, over 200 HPV genotypes have been identified and classified according to their mucosal types, they are divided into high risk (HR), possible or probable high risk and low risk (LR) HPV genotypes, depending on their association with the development of cancer. For decades, 14 highly carcinogenic HPV genotypes (-16, -18, -31, -33, -35, -39, -45, -51, -52, -56, -58, -59, -66 and -68) have been recognized as the causative agent of cervical cancer [5, 6]. However, HPV genotypes-16 and -18 have been demonstrated and identified as the predominant oncogenic genotypes of all cervical cancer cases worldwide [7]. Despite having the highest burden of risk factors associated with HPV infection, persistence, and progression to cervical cancer [8], comprehensive data on HPV genotypes and profiles of HPV mutations associated with different cervical cancer risk groups such as HIV/HPV co-infected in Africa are lacking.

Molecular epidemiological studies assessing the phylogenetic association of HPVs based on oncogenic risk and supporting specific biological and pathological traits distinctive to HPV genera, species and types are still limited. Previous studies in Africa have explored the frequency and distribution of the HPV genotypes in HIV-infected women [9], with consistent results about the influence of HIV on HPV genotype distribution. Several studies in Africa have also explored the genomic diversity of HPV variants [10,11,12,13,14,15]. HPV genotypes-16, -18, -45, and -58 were observed among most cervical samples studied from Botswana and Kenya [3, 14, 16,17,18,19,20,21,22,23,24,25,26].

The objective of this study was to use Sanger genotyping protocol to detect and genotype HPV isolates in cervical cancer specimens of patients’ living- and not living- with HIV obtained from Botswana and Kenya. Apart from determining the prevalence and distribution of HPV genotypes (LR and HR) among the two groups (HPV versus HIV/HPV-coinfected), the mutation profiles (nucleotide) and amino acid changes) within L1 region of HPV genotypes were used to determine the mean pairwise distances and assessing for any signature mutations associated with cervical cancer among the HIV/HPV-coinfected patients. Phylogenetic analyses were used to (i) assign HPV genotypes, and (ii) determine clustering of sequences within and between the two countries.

Methods

Study design and population

Formalin-fixed paraffin-embedded (FFPE) tissue specimens from women living with HIV (WLWHIV) and -women not living with HIV (WNLWHIV), diagnosed with invasive cervical cancer previously typed using the Abbot Real-Time polymerase chain reaction (PCR) and Linear Array HPV Genotyping Test, LA-HPV (Roche Applied Sciences, Indianapolis, IN) from prior retrospective cross-sectional studies [19, 21] from Botswana and Kenya were utilized.

We proposed to describe the sequence variation of samples with single HPV infections for the two HR-HPV genotypes (HPV-16 and -18) which are responsible for the majority of cervical cancer cases reported in both countries within the ~ 450 base pairs (bp) region amplified by these HPV genotyping systems. Only samples harbouring HPV-16 or -18 were used in this study, samples with mixed HPV infection were excluded from this study. Demographic and clinical data including age, cancer stage, country of origin and HIV status were obtained for samples from Botswana. However, only HIV status was available for Kenya samples. Cervical cancer staging was performed as previously described [21].

DNA extraction

DNA extraction was performed from the tissue samples archived using a previously established protocol [27]. The extracted DNA was stored at − 80 °C prior to analysis. For the samples from Botswana, the presence of HR-HPV DNA was detected using Abbot Real-Time PCR (Abbot Molecular Inc., Chicago) [21]. While specimens from Kenyan women, HR-HPV detection and genotyping had been previously performed using the LA-HPV [19].

Polymerase chain reaction (PCR) and sequencing

To isolate a region of L1 targeted by the LA-HPV and the Abbott Real-Time PCR method, conventional PCR for HPV DNA viral amplification of a ~ 450 bp HPV-specific segment from the L1 gene covering nucleotide positions (5722-6162) numbered according to NC001526 HPV-16 reference genome was performed using the forward primer MY09 5′-CGTCCMARRGGAWACTGATC-3′ and the reverse primer MY11 5′-GCMCAGGGWCATAAYAATGG-3′. Five microliters of DNA were added to 15 μL of reaction mix containing 1 × PCR buffer, 0.2 mM dNTPs, 4 mM MgCl2, 0.3 μΜ of each primer, and 2 U/μL of Platinum Taq DNA Polymerase, High fidelity (Invitrogen, USA).

The thermocycling conditions were denaturing at 96 °C for 10 s, annealing at 50 °C for 5 s, and final extension at 60 °C for 4 min for 25 cycles. PCR products were subjected to electrophoresis in 4% agarose (Applichem) using 1 × TBE buffer (Applichem) and visualized under UV light. PCR products were then purified using commercially available SureClean Plus (Bioline) and the purified products were sequenced directly via automated sequencing using two overlapping PCR primers (both forward and reverse). The BigDye Terminator v3.0 kit (Applied Biosystems; Foster City, CA, USA) was used for sequencing using the automated Sequencer (ABI PRISM 3130xl; Applied Biosystems).

Sequencing and sequence editing

The generated chromatographs were quality assessed using Sequencher v5.0 software (Gene Codes Corp., Ann Arbor, MI, USA) followed by generation of consensus sequences in FASTA file format, which were used, for downstream analysis. An online L1 Taxonomy Tool Analysis Results was utilized to check for similarity with the known HPV reference sequences. Additionally, the BLAST tool was used to determine the subset of sequences with high EO value that can be included in the phylogenetic tree. Finally, referenced alignments per each genotype were performed using AliView version 1.26 and were used for phylogenetic analyses while the translated alignment was used for mutation analysis. We developed a specific workflow to answer the study objectives (Fig. 1).

Fig. 1
figure 1

Scheme used to achieve study objectives

Phylogenetic analysis

Phylogenetic inference was performed using the Bayesian Markov chain Monte Carlo (MCMC) approach implemented in the Bayesian Evolutionary Analysis with the Sampling Trees software (BEAST v1.10.2) with an uncorrelated log-normal relaxed molecular clock, General Time Reversible substitution model, and gamma site heterogeneity. The MCMC was set at a chain length of 500,000,000 with parameters logged every 10,000. The tree was visualized in FigTree v1.4.3 after a 10% burn-in using Tree Annotator v1.8.4. Posterior probabilities 0.90 and above were noted as statistically significant. The sequences generated in this study are available in GenBank under accession numbers (awaiting accession numbers).

HPV-specific mutation analysis

To assess for any known mutations or signatures amino acids that may be associated with immune pressure among HPV patients in Botswana and Kenya, 95 aligned sequences with high coverage were included in the analysis (N1 dataset). Here, the top 100 sequences after sorting by E value and covering the full L1 region were obtained from the NCBI blast search and compared to the N1 dataset. Comparisons were done at the aa level to exclude synonymous polymorphisms. Genetic diversity and any signature mutations associated with HIV/HPV co-infection were investigated by comparing the HPV sequences isolated from patients with HPV mono-infection versus those with without. Mutations that had not been noted previously in the literature were termed “candidate novel mutations”. The candidate novel mutations were further assessed using Protein Variation Effect Analyzer (PROVEAN) [28] and SNAP2 [29] to determine their impact at the gene level. Briefly, PROVEAN classifies mutations with a negative impact on protein biological function as deleterious and SNAP2 uses ‘effect’ or ‘neutral’ to indicate the presence or absence of change in protein function caused by a mutation.

Statistical analysis

Raw data were collected, processed and coded using Excel. Study demographics were presented as percentages for categorical variables and compared among participants with HR-HPV and LR-HPV HPV genotype using the Chi-square test. Wilcoxon rank-sum test was utilised to compare continuous variables. Differences among the prevalence of HPV genotypes in Botswana and Kenya was assessed using a comparison of proportion test. All the statistical analysis was done using Stata version 15 (Stata Corp, College Station, TX, USA) and p-value < 0.05 was considered as statistically significant.

Results

Population characteristics

The present study analysed 98 samples positive for HPV-16 or -18 only from 2 previous studies [19, 21]. All samples with mixed HPV infection were excluded from the study. Study participants were a subpopulation of samples from Botswana (n = 72) and another from Kenya (n = 26). Baseline demographics and clinical characteristics of the study participants are summarised in Table 1. The distribution of HPV genotypes stratified by country are shown in Fig. 2. Out of 98 generated sequences, 83.7% (82/98) participants had HR-HPV genotypes while 16.3% (16/98) had LR-HPV genotypes (Table 1). However, in some patients, we identified other additional HPV genotypes which were previously missed when the same samples were genotyped using Abbott and LA-HPV. Among participants with HR-HPV genotypes, 51.2% (42/82) were coinfected with HIV (HIV/HPV), while 47.5% (39/82) were from Botswana and 3.7% (3/82) were from Kenya based on MY09/11 Sanger sequencing-based method. The most predominant cancer stage among participants from Botswana was stage 2. Amongst individuals with LR-HPV genotypes, 12.5% were infected with HIV, cancer stage 3 was the most predominant cancer stage. The prevalence of HR-HPV genotypes was significantly higher in the Botswana population compared to Kenya (p-value < 0.001). We did not record any statistical significance in other variables among HR and LR-HPV genotypes among the Botswana population.

Table 1 Baseline demographics and clinical characteristics for participants
Fig. 2
figure 2

HPV genotypes among women with cervical cancer from Botswana and Kenya

HPV genotypes

Human papillomavirus genotypes were identified using both phylogenetic tree (Fig. 3) and online L1 Taxonomy Tool Analysis. After quality control (QC), 3 sequences were excluded for mutation and phylogenetic analyses because sequences were too short (< 450 bp). In total, 19 HPV genotypes were determined, -6, -11, -16, -18, -35, -39, -42, -45, -53, -54, -55, -56, -58, -59, -73, -81, -89, -122, and -159 (Fig. 2). Out of 19 different genotypes found in Botswana and Kenya populations, five showed statistically significant differences in their frequency (p-value < 0.05) between countries (Fig. 2). Thus, HPV genotypes-6, -54 and -73 were higher in Kenyan population while genotypes-16 and -18 were recorded higher in Botswana population. The WLWHIV had multiple HPV genotypes compared to WNLWHIV even though the difference was not statistically significant (p-value > 0.05).

Fig. 3
figure 3

A phylogenetic tree of the L1 region (~ 450 bp) covering nucleotide positions (5722-6162) numbered to NC001526 HPV-16 reference genome) used to assign HPV types to sequences in this study. Trees were constructed using BEAST method. Strains from Botswana sequenced in the present study are shown in the tree (left), while Kenya sequences are shown in the tree (right). Reference strains are designated by their accession number. All positions with less than 95% site coverage were eliminated, i.e., fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position (partial deletion option). There were a total of 339 positions in the final dataset. Evolutionary analyses were conducted in BEAST v1.2

Phylogenetic analysis

Phylogenetic analysis included all the 95 HPV sequences obtained in this study that were adequate for phylogeny (> 400 bp sequence length). All the studied HPV sequences clustered with reference and had posterior probabilities > 90%, and could be used to assign the genotypes with confidence. Phylogenetic inference with maximum likelihood and MCMC methods showed that there were no isolated clusters among HPV sequences from both countries. This was also true when trees for HPV-16 (n = 29) and HPV-18 (n = 12) sequences from this study and the respective GenBank references for the L1 gene region were constructed (Fig. 4). However, there was a general increase in nucleotide genetic diversity among the HPV sequences isolated from WLWHIH as opposed to WNLWHIV (Fig. 4) as shown by multiple branches within the phylogenetic tree. The overall mean pairwise distances of HPV-16 and HPV-18 sequences isolated from WLWHIV versus WNLWHIV were not statistically significant.

Fig. 4
figure 4

Phylogenetic analysis using BEAST tree for HPV-16 (A) and -18 (B) sequences from the present study. Mean pairwise distribution of HPV 16 sequences isolated from WLWHIV versus WNLWHIV (C)

Mutations analysis

Nucleotide diversity

In total, 68 nucleotide base substitutions were detected using reference sequences for each genotype. Additional file 1: Table S1 summarises the different polymorphisms detected per genotype and only genotypes with counts greater than 1 per strata were considered. Although there were more HPV types among WLWHIV, most of the HPV genotypes among WLWHIV were HR-HPV genotypes.

Mutations at the amino acid level

The present study defined mutations as amino acid changes that differed from that of reference sequence and candidate escape mutations were those that have not yet been reported in 1000 most similar sequences. Six (6) new variants were identified based on the sequencing of the L1 region, HPV-16 (L441P, S343P), HPV-18 (S424P), HPV-45 (Q366H, Y365F), belonging to the HPV HR-HPV group and HPV-84 (F458L) belonging to the LR-HPV group.

Discussion

Given the uniquely large and diverse collection of HPV-genotyped cervical samples worldwide, we were able to evaluate the genetic diversity within HPV genotypes and report on the geographic distribution of HPV variants, as well as measure their association with cervical cancer in Botswana and Kenya. Several studies have investigated HPV prevalence and type-specific distribution in cervical cancer patients from Botswana and Kenya, and found that HPV-16 or -18 (HPV-16/18) were identified in 93.8% from Kenya, and 61.8% from Botswana. Non-HPV-6/18 genotypes were detected in 17.2% from Kenya, and 47.8% from Botswana [18, 19]. Again some studies have shown HPV prevalence in Botswana and Kenya [3, 14, 16,17,18,19,20,21,22,23,24,25,26]. However, the numbers of studies investigating the variants within HPV genotypes are limited. In the present study, the molecular characterization of HPV variants was performed in sequences from Botswana and Kenya. By sequencing the L1 region of HPV positive cervical samples, we were able to confirm the majority of the previously reported HPV genotypes [14, 16,17,18,19,20,21, 24,25,26]. However, other HPV genotypes” were identified through sequencing in samples that were previously positive for HPV-16 or -18 only in both countries. This is likely due to the fact that the MY09/11 primers are universal primers and sequencing is not limited by the presence of HPV type-specific probes as in the LA-HPV and Abbott realtime PCR methods. Again, detection from FFPE samples could have been suboptimal due to nonuniform coverage of the tissue. In some cases, low-level of HPV replication may be missed because of thresholds standards. Overall, the frequency of the genotypes identified in both countries are the same as previously reported, having HPV-16 as the most frequently detected genotype followed by HPV-18 and the rest. The sequencing analysis illustrated that multiple variants were identified in both countries. Variants of HPV that changed the protein sequence of the capsid protein encoded by L1 were found. These observations, in light of the slow evolution of HPV at the population level, suggest that strong selection pressures are at play in each infection cycle. We analyzed the mutation frequency in the panel for HPV genotypes in WLWHIV and WNLWHIV and found mutations within HPV genotypes. We could not assess the clinical relevance (i.e., association with HIV infection or cancer stage at presentation) of the detected variants due to small sample size. However, it is interesting that the new variations in high-risk genotypes were only identified in samples from HIV patients. The most striking finding in this report is the high proportion of mutations in the HPV‐genotypes (including both those arising among WLWHIV and those arising among WNLWHIV). Amino acid variations were found in the L1 region of the HPV genotypes (Additional file 1: Table S1). In this study, 6 novel amino acids variations, which were previously unreported in the literature, were found in HPV-16, HPV-18, HPV-45 and HPV-84 (Table 2). However, two of them were found in only one sample (Y365F and F458L) and may have occurred by PCR amplification. These variations could be related to the early promoter activation of HPV and may play a crucial role in the transcriptional modulation of the HPV L1 oncogenes via the promoter. Variants identified based on the sequencing of the L1 region are; HPV-16 (L441P, S343P), HPV-18 (S424P), HPV-45 (Q366H, Y365F), belonged to the HPV HR-HPV group and HPV-84 (F458L) for the LR-HPV group. These alterations are described for the first time, and functional implications resulting from this variation need further analyses. Changes in the L1 region of the HPV genome may be important for discriminating the infectious potential of different variants, as well as in defining epitopes relevant to vaccine design. Some previous studies investigating HPV-16 full-length sequences in cervical specimens have shown that the contiguous deletions identified to be highly associated with cancer are suggestive of a pattern of HPV integration [30,31,32,33,34,35,36,37]. The findings of this study indicate that there could be variants of HPV circulating within sub-Saharan Africa. Further studies are needed to confirm the presence of new HPV variants and genotypes and to understand the evolution of HPV isolates in Botswana and Kenya by analyzing the complete HPV genome or different regions of HPV genes such as L2, LCR, E6 and E7.

Table 2 Novel genetic mutations found in the 5′ and 3′-ends of HPV-16, -18, -45, -84 -L1 regions

Our work showed genetic variability of L1, making it essential to take into account the HPV variants lineages or population stratification when developing vaccines. The limitation of this study was the small sample size for samples from Botswana and Kenya; we could not determine whether the synonymous mutations that are described have any impact on the protein expression. We attempted to investigate HPV full-length sequences in FFPE cervical cancer specimens, but were unable to do so because of highly fragmented samples. Further study is required to determine whether variants represent a higher risk for cervical cancer. Additionally, functional studies regarding HPV polymorphisms across the HPV genome variants should be conducted to explore the biological evidence of carcinogenicity.

Conclusion

We demonstrated the genomic diversity of HPV sequences and phylogenesis of HPV genotypes in Botswana and Kenya samples giving important information as six new L1 single amino acids changes were identified. New variations in high-risk genotypes were only identified in samples from HIV patients. However, it was not possible to correlate disease severity with any particular variant. The results illustrated that multiple HPV variants are available in both countries. Further studies with complete sequencing of HPV genomes from large population-based and case–control studies of cervical pre-cancer and cancer are required to understand viral carcinogenesis and possibly to improve preventive and therapeutic strategies in the future.