Introduction

Hepatitis B virus (HBV) disease is one of the main global health problems that two billion people are infected and 350 million people undergo chronic infection as well [1]. HBV belongs to the protyotype member of the family Hepadnaviridae, and has a compact and circular DNA genome of about 3.2 kb in length, with four overlapping open reading frames including large S region (PreS/S), PreC/C, × and P [2, 3]. Moreover, the overlapping regions on the genome are helpful to study the evolution of the virus with its point mutations, because the incidence of recombination is rare and any point mutation could effect the genetic characteristics of two overlapped genes [3]. The evolution of HBV should be interactional and constrained by the overlap of genes [4]. In some cases, the evolution of one overlapping-gene protein may evolve more rapidly as a consequce of negative selection to the other,[5]. And the overlapping genes might be subject to different selections [6]. Furthermore, independent adaptive selection for both overlapping genes has been reported [7]. One of the main features of HBV are its genetic heterogeneity [8]. There are four main subtypes, namely ayw, adw, adr and ayr [9]. According to phylogenetic analysis of the complete HBV genomic sequence, 9 genotype of HBV from genotype A to I have been determined and divided into approximately twenty-five subgenotypes [1014]. HBV genotypes show distinct geographical distributions at the level of nucleotide different more than 8% each other [11, 15, 16]. It is noticed that nucleotide composition comprising of HBV coding sequence with various genetic diversities is selective rather than random, because the natural selection from host is responsible for selection of various strains shaped by mutation. In previous reports, translation selection and compositional constraints under the mutational pressure are thought to be the major factors accounting for codon usage variation among genomes in microorganisms [1724]. In some RNA viruses, compared with natural selection, mutation pressure plays a more important role in synonymous codon usage pattern [25, 26]. Although it is known that compositional constraints and translation selection are the more generally accepted mechanisms accounting for codon usage bias [2730], other selection forces have also been proposed such as fine-tuning translation kinetics selection as well as escape of cellular antiviral responses [23, 3134]. Thus, the codon usage pattern may be important in disclosing the molecular mechanism and evolutionary process of HBV to avoid host cell response. To our knowledge, it is the first systemic study to analysis the synonymous codon usage pattern and evolutional dynamics of HBV as well as the relationship between codon usage pattern of HBV and its host.

Result

Synonymous coodn usage in HBV

The C% and U% were higher than A% and G%, and C3% and U3% were higher than A3% and G3% in HBV (Table 1).

Table 1 The overall nucleotide contents and nucleotide contents at the synonymous third position of sense codons in the whole coding sequence of HBV

The overall nucleotide composition never affects the nucleotide contents in the third site of codon in HBV coding sequence, suggesting that composition constraints may be one of the factors in affecting the codon usage pattern of HBV. For the synonymous codon usage pattern of HBV, the over-represented synonymous codons are rare in HBV coding sequence, only including UCU for Ser, in addition, the under-represented ones contain AUA for Ile, CCC for Pro, ACC for Thr, GCC for Ala, CGU and CGG for Arg (Table 2).

Table 2 The relationship of the synonymous codon usage pattern between HBV and human cell

The codon usage bias of HBV suggests that some synonymous codons are not chosen equally and randomly.

Genetic relationship based on synonymous codon usage in HBV

The PCA detected the first principal component (f 1 ') which can account for 23.65% of the total synonymous codon usage variation, and the second principal component (f 2 ') for 19.47% of the total variation. Based on the geographical factor in influencing HBV evolution potentially, there is an obviously geographical distribution. For example, the overall codon usage pattern of HBV isolated from Philippines and South Korea is far from those of China and Indonesia, and the HBV isolated from Germany and Iran has a similar genetic diversity with that isolated from South Africa (Figure 1).

Figure 1
figure 1

The genetic characteristic of HBV isolated different countries.

Based on the subtypes of HBV, the plots for the subtype adw were generally divided into two groups, while the other three subtypes seem to have a similar genetic characteristic (Figure 2).

Figure 2
figure 2

The genetic characteristic of HBV based on the main four subtypes.

It is worth noting that the plots for different HBV genotypes were generally separated from each other. Moreover, the genotypes A and B have an obviously different genetic characteristic with the rest, while genotypes C, D and G appear to have a relationship of evolution (Figure 3).

Figure 3
figure 3

The genetic characteristic of HBV based on different genotypes.

These results indicated that the geographic distribution might be a limited factor to effect the codon usage of the whole HBV coding sequence, and the subtypes did not reflect the characteristic of HBV evolution to some degree. In this case, the codon usage variation might be one of factors to drive HBV evolution.

The effect of mutation pressure on codon usage of HBV

To analyze if the evolution of HBV is shaped by mutation pressure from virus itself or by translation selection from host, G+C content at the first and second codon positions (GC12%) was compared with that at synonymous third codon positions (GC3%) (Figure 4).

Figure 4
figure 4

Correlation between GC content at first and second codon positions (GC 2 %) with that at synonymous third codon positions (GC 3 %).

A highly significant correlation was observed (r = 0.432, P < 0.01), implying that mutation pressure from base composition of HBV is a main factor in shaping genetic diversity of this virus, since the effects are present at all codon positions. In addition, the ENC values were calculated for each strain and the plot was made by ENC value against GC3% (Figure 5).

Figure 5
figure 5

Distribution of the codon usage index, ENC, and GC content at synonymous third codon positions (GC 3 %). The curve shows the expected codon usage of GC compositional constraints alone account for codon usage bias.

The Figureure 5 represented that the plots of HBV aggregated below the expected curve, suggesting other selections take part in the process of HBV evolution.

Comparative analysis of the RSCU values between HBV and human cell

There is a resemblance of synonymous codons usage pattern between this virus and human cell, for example, the similar synonymous codon usage pattern includes all synonymous codons for Phe, Ile, Val, Ser, Ala, Tyr, His, Lys, Asp, Cys and Gly (Table 1). This may be explained that the codon usage of HBV adapting to its host under translation selection could result in the multiplication of progeny virus. This phenomenon possibly implies that the resemblance of codon usage is favorable for HBV replication in human cells. But if compared with the under-represented codons in human cells, CCG for Pro, ACG for Thr, CAA for Gln and CUA for Leu in HBV are highly used (Table 1). The result suggested that these codons could influence the translational rate of the context flanking them, resulting in the viral product correct fold.

Discussion

The ENC values calculated for HBV indicated that although a significantly lower bias of codon usage exists in HBV, the codon usage is not mainly affected by mutation pressure. As for some viruses, previous study reported that the major factor in shaping codon usage patterns appears to be mutation pressure rather than natural selection [19, 21, 24, 35]. However, the comparison of the synonymous codon usage between HBV and human cells suggested that the interaction of mutation pressure with translation selection exists in the process of HBV evolution, although ENC values for the whole HBV coding sequence to represent mutation pressure is one of the factors in influencing codon usage pattern. This characteristic of HBV confers adaptive advantages which result in a highly efficient dissemination of the virus through different ways of transmission.

The pattern of codon usage is a genetic characteristic of various organisms in Previous study [19, 20, 27, 31, 32, 35, 36]. Because C%, U%, U3% and C3% play roles in the formation of the different optimal codons with any nucleotide-ended, the codon usage pattern of HBV is likely influenced by composition constraints. The codon usage pattern of PV is mostly coincident with that of its host, while the codon usage pattern of HBV is antagonistic to that of its host [37, 38]. The codon usage pattern of HBV is a mixture of the two types of codon usage. The coincident portion of codon usage pattern for HBV enables the corresponding amino acids to be translated rapidly, the other antagonistic portion of codon usage pattern likely enable viral proteins to be folded properly, although the translation efficiency of the corresponding amino acids is decreased. Latent genes in Epstein-Barr virus deoptimize codon usage in order to evade competition for host protein translation [28] and attenuation of PV activity was performed by rare codon pairs inducing poor translation for sequences of viral proteins [27]. These results suggested that disfavored codons coding for amino acids may not be a deleterious factor for viruses to adapt to its host cells.

According to the data of codon usage pattern of HBV isolated from different countries, the geographic factor fails to influence the formation of codon usage pattern of HBV. After all, with development of international communication and highly efficient dissemination of HBV through various approaches of transmission, the affection of geographic factor seems to be weak on the limitation of HBV distribution in different countries. It is interesting that the main four subtypes of HBV have no significant difference in genetic characteristic shaped by different human races. This result might suggested that translation selection from human is not a single factor to shape the overall codon usage pattern of this virus and mutation pressure from HBV itself is a main force to drive HBV evolution. Genotyping of HBV is of high interest because there is increasing evidence that HBV genotypes may be associated with HBeAg sero-conversion rates, mutation occurring in the procure and core promoter region, severity of liver disease and treatment response [15, 16, 39, 40]. There is a significant difference of the overall codon usage pattern of HBV between genotypes A, B, E and C, D, G. HBV genotypes and subgenotypes have been associated with differences in clinical and virological characteristics, showing that they may play a role in the virus-host relationship [41]. It has been shown that genotypes C and D are associated with more serious liver injuries and with a higher incidence of HCC than genotypes A and B [4244]. In addition, genotype C and D have a much lower rate in response to interferon therapy than those infected with A or B genotypes [40, 45]. Moreover, subtle differences in frequency and type of lamivudine resistant variants occur in genotype A and D infectious [15]. An evolutionary approach to HBV infection, based on the principles of natural selection, may offer explanation for how modes of transmission may favor some genotypes and subgenotypes over others and influence HBV virulence.

The genetic diversity and codon usage patterns we proposed here are helpful to understand the processes of HBV evolution, especially the roles played by translation selection from host and mutation pressure from virus. Additionally, such information might benefit to understand the roles of geographic and subtype factors in influencing the process of HBV evolution.

Materials and methods

Sequence data

The 58 complete RNA sequences of HBV were downloaded from the National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/Genbank/ and detailed information about the viruses were listed in Table 3

Table 3 The information of HBV strains in this study

Each general nucleotide composition (U%, A%, C% and G%) and each nucleotide composition in the third site of codon (U3%, A3%, C3% and G3%) in HBV coding sequence were calculated by biosoftware DNAStar 7.0 for windows.

The calculation of the relative synonymous codon usage (RSCU)

The relative synonymous codon usage (RSCU) values for the whole 58 coding sequence of HBV were calculated as previously described [46]. RSCU values do not depend on the factors of amino acid composition and the size of the coding sequence, because the two factors can be eliminated in the process of calculation. When RSCU value is equal to 1.0, it means that this codon is chosen equally and randomly. The RSCU value for a synonymous codon more than 1.0 or less than 1.0 indicates the more frequency or less frequency, respectively. The synonymous codons with RSCU more than 1.6 were thought to be over-represented, while the synonymous codons with RSCU less than 0.6 were regarded as under-represented [47].

Analysis of codon usage bias

The 'effective number of codons' (ENC), the useful estimator of absolute codon usage bias, was a measure quantifying the codon usage bias of the whole coding sequence of HBV. The ENC value ranges from 20 (when only one synonymous codon is chosen by the corresponding amino acid) to 61 (when all synonymous codons are used equally) [48]. In this study, this measure was used to evaluate the degree of codon usage bias of coding sequences for HBV.

Principal component analysis

Principal component analysis (PCA), which was a commonly used multivariate statistical method [24], was carried out to analyze the major trend in codon usage pattern among different strains of HBV. PCA involves a mathematical procedure that transforms some correlated variable (RSCU values) into a smaller number of uncorrelated variables called principal components. Each strain was represented as a 59 dimensional vector, and each dimension corresponded to the RSCU value of each sense codon, which only included several synonymous codons for a particular amino acid, excluding the codon of AUG, UGG and three stop codons.

Correlation analysis

The relationship between each general nucleotide composition (U%, A%, C% and G%) and each nucleotide composition in the third site of codon (U3%, A3%, C3% and G3%) in HBV coding sequence and the relationship between U3%, A3%, C3%, G3% and the coodn usage pattern of HBV were evaluated by the Pearson's rank.

All statistical processes were carried out by statistical software SPSS11.5 for windows.

Author details

Experimental Center of Medicine, Lanzhou General Hospital, Lanzhou Military Area Command; Key lab of Stem cells and Gene Drugs of Gansu Province, Lanzhou 730000, China