Introduction

The burden of chronic infection due to hepatitis B virus (HBV) is currently estimated at around 250 million worldwide (WHO; fact-sheet no. 204, [1]), while deaths as a result of the chronic sequelae of the disease, such as cirrhosis, hepatocellular carcinoma (HCC) or liver failure, stand at 800,000 annually [2, 3].

Studies performed by Krugman and colleagues during the 1960s, particularly at the Willowbrook State School for mentally handicapped children, established the distinctive clinical, epidemiological, and immunological differences between infectious hepatitis and serum hepatitis [4, 5]. These two entities were soon revealed to be due to infection with hepatitis A virus (HAV) or HBV, respectively. During the same period, work by Blumberg and colleagues led to the description of the Australia antigen detected in the sera of patients with leukemia, Down’s Syndrome and hepatitis [6, 7]. The connection between the Australia antigen, or hepatitis B surface antigen (HBsAg) as it is now known, and HBV, became apparent by 1970 [8]. Electron microscopic studies in the early 1970s led to the visualisation of the infectious virion or Dane particle [9], while detergent treatment of such particles exposed the nucleocapsid core of the virus [10]. This was followed by the characterisation of the virus genome, the virion-associated proteins and the detailed definition of the serological profiles in acute and chronic HBV infection [10,11,12]. Seminal work in Taiwan by Beasley and co-workers established the connection between the virus and the development of HCC [13].

Another historical landmark in HBV research was the introduction of the plasma-derived vaccine in the early 1980s, following extensive evaluation first in chimpanzees and then in the gay community of New York [14]. Concerns about the human immunodeficiency virus which appeared during that time and the safety of plasma-derived products in general, led to the development of a recombinant vaccine, which since its introduction has effectively reduced the prevalence of HBV infection in many countries of the world where the virus is still endemic [15].

Finally, great strides have been made in the treatment of chronic HBV carriers through the use of immunomodulators such as pegylated interferon-alpha (PegIFNα), nucleoside analogues such as lamivudine (3TC), entecavir (Baraclude) and telbivudine (Tyzeka or Sebivo) or the nucleotide analogues adefovir dipivoxil (Hepsera) and tenofovir disoproxil fumarate (Viread) [16]. PegIFNα given for a finite period of 1 year acts by promoting lysis of infected hepatocytes through cytotoxic T cell action, while cytokines released control viral replication. Nucleos(t)ide analogues on the other hand which are given long term act as chain terminators at the stage of DNA synthesis. Viral suppression is achieved in about 25% of those treated with PegIFNα, and this is more profound with the newer necleos(t)ide analogues, as they are more potent and have a high barrier to resistance [17].

Absence of a cell culture system for the propagation of the virus impeded the study of its life cycle and molecular biology for many years. This, however, changed in the early 1980s with the advent of genetic engineering techniques which made possible the cloning of the viral genome and its sequencing, investigation of the function of its proteins and unravelling the unique and fascinating mechanism of its replication strategy. The animal models of chimpanzees in the early days, and those of ducks and woodchucks, and more recently chimeric mice, have allowed the study of aspects of the life cycle, molecular biology and immune pathogenesis of the virus, and facilitated vaccine development and antiviral testing. What follows is a synopsis of the findings of many studies carried out using the above-described approaches, which have greatly improved our understanding of this very important pathogen.

Virological perspectives

Classification

The Hepadnaviridae family, with HBV as the prototype, is comprised of a group of hepatotropic DNA viruses which on the whole are species-specific and are divided into two genera. The Orthohepadnavirus genus includes members that infect mammals (woodchucks, ground squirrels, bats and primates) and have about 70% nucleotide homology between them. The Avihepadnavirus genus infects birds such as ducks (duck hepatitis B virus, DHBV), herons, storks, geese and parrots with about 80% homology between them. The homology, however, between the two genera is around 40%, but nevertheless they share a common genomic organisation [18].

Nucleotide sequencing studies of human HBV isolates from around the world have established, based on sequence divergence of > 8%, eight genotypes designated A–H, with characteristic geographical distribution. Genotypes A and D are frequently found in Africa, Europe, and India, genotypes B and C in Asia, genotype E is restricted to West Africa, and genotype F in Central and South America. Genotype G and H distribution is less clear, but isolates have been reported from Central America and southern Europe [18, 19], while possibly two new ones, genotype I from Vietnam, Laos and Eastern India appears to be an intergenotypic recombinant between genotypes A, C, and G [20], and genotype J isolated from a Japanese man who lived in Borneo, and which appears to be a recombinant between genotype C and gibbon HBV [21]. Genotypes A, B, C, D, F and I can be further subdivided based on nucleotide divergence of 4% into at least 44 sub-genotypes; A1–6, B1–9, C1–16, D1–7, F1–4 and I1–2 [19, 22]. In addition, other than the above-mentioned recombinants, B/C and C/D recombinants represent the majority of such isolates, while other intergenotypic recombinants that occur less frequently involve most of the other genotypes [23].

Virion structure and genome organisation

The infectious virion or Dane particle of 42 nm in diameter is comprised of an outer envelope made of HBsAg in a lipid bilayer [9], wrapped around the nucleocapsid core of the virus, which in turn encloses the viral genome and a copy of its polymerase [9, 24]. In addition, there is an abundance of sub-viral particles circulating in serum consisting entirely of HBsAg and devoid of any nucleic acid containing cores. These are the 25-nm spheres and the 22-nm diameter filaments, which outnumber infectious virions by 100- to 10,000-fold [25].

The partially double-stranded DNA genome of the virus is a relaxed circle of 3.2 kb in length (rcDNA) [26, 27]. In view of this, it is the smallest among DNA viruses and one of the most compact as the 4 open reading frames (ORFs) that the genome contains are either wholly or partially overlapping. Thus, every nucleotide of the genome forms part of an ORF. In addition, all regulatory elements such as the two enhancers (Enh1, Enh2), the four promoters (core, S1, S2 and X), polyadenylation, encapsidation (ε) and replication (DR1, DR2) signals lie within these ORFs (Fig. 1).

Fig. 1
figure 1

The cccDNA, depicted here in linear outline, is the transcriptionally active form of HBV. The enhancers and promoters involved in transcript synthesis are also shown as are the transcripts themselves, the ORFs which they contain and the proteins which they encode, their lengths and their co-terminal nature at a common polyadenylation signal (An). Modified from Baltayiannis and Karayiannis [28]

RNA transcription and protein translation

The four ORFs are those of the surface (PreS/S), core (C), polymerase (P) and X genes, which encode in total 7 proteins translated from six co-terminal, unspliced and capped mRNAs ending at a common polyadenylation signal, which is situated in the core ORF (Fig. 1, [29]). The aforementioned promoters and enhancers direct the synthesis of the mRNA transcripts through the recruitment of transcription factors which are particularly enriched in hepatocytes (reviewed in [30]). This in part also explains the liver tropism of the virus.

The core promoter is responsible for the synthesis of two longer than genome length mRNAs (3.5 kb) which differ with respect to the start of their 5′ end. The longer of the two by a small number of ribonucleotides is the precore mRNA containing the initiation codon for synthesis of the precore protein, which is the precursor for hepatitis B e antigen (HBeAg). The protein undergoes proteolytic processing at its N-terminus for the removal of a signal peptide of 19 amino-acids in length, which targets the protein to the ER, and a furin cleavage for the removal of an arginine rich domain at its C-terminus [31,32,33]. What is left is the 15-kD HBeAg which is secreted and dispensable protein for replication that appears to act as a tolerogen in newborns and chronic infection [34]. The other transcript is the pregenomic RNA (pgRNA) which is bicistronic in nature and encodes for the core (21kD) or hepatitis B core antigen (HBcAg) and the polymerase (90kD) proteins. The pgRNA in addition constitutes the template for reverse transcription during the replication of the viral genome, as explained below. The core has the capacity to dimerise and form nucleocapsids by self-assembly consisting of 240 copies (120 dimers) of the protein [35], while the polymerase is a multifunctional protein which acts as a reverse transcriptase (rt), as DNA polymerase and has RNase H activity also [36]. Regulation of production of these two proteins is such as to favour the generation of the numbers mentioned above of core molecules required for nucleocapsid formation per single molecule of polymerase packaged with the pgRNA.

The Pre-S/S ORF of the genome encodes for the three envelope glycoproteins produced by differential initiation of translation at each of three in-frame initiation codons. These are known as the large (L), middle (M) and small (S) HBsAgs, with the latter being the more abundant constituent of the viral envelope. Two transcripts of 2.4 and 2.1 kb are involved, the synthesis of which is controlled by the S1 and S2 promoters respectively. L-HBsAg is translated from the 2.4 kb transcript while the M- and S-HBsAgs are translated from the 2.1 kb transcript, the latter through leaky ribosome scanning. The S-HBsAg is thus the smallest of the three co-terminal proteins and its 226 amino-acids are shared by the other two at their C-terminus. The M protein has an additional 55 amino-acids at its N-terminus encoded by the Pre-S2 region, while the L protein includes in addition another 107–118 amino-acids (depending on genotype) from the Pre-S1 region [29]. The first 48 N-terminal amino-acids of Pre-S1, as originally thought (see later), contain the region responsible for attachment of the virus to its hepatocyte receptor [37, 38]. Moreover, myristylation of L-HBsAg appears to be essential for infectivity [39]. All three proteins are glycosylated while the L and S proteins may also be present in an unglycosylated form in particles. They are synthesised at the ER and maintain a transmembrane configuration that enables budding of the virus through the ER during maturation [40].

The envelopes of the Dane particles and of the two types of sub-viral particles contain all three HBsAgs, but their relative ratios are not identical. The S protein represents the majority in Dane particles with equal amounts of the M and L proteins, whereas the spheres contain mainly S and M proteins, with trace amounts of the L protein. Filaments have S-HBsAg as the majority protein with M and L proteins being present in equal amounts, which, however, are not as high as those in the Dane particle [41].

The fourth and smallest ORF encodes for the 17kD HBx protein which is translated from the shortest transcript of 0.7 kb in length. It consists of 154 amino-acids and appears to modulate host-cell signal transduction, acts as a gene transactivator under experimental conditions, can activate transcription factors and therefore is implicated in binding the covalent closed circular DNA (cccDNA) minichromosome (reviewed in [42, 43]).

Viral life cycle

Attachment

It is now clear that species-specificity and hepatotropism are determined by the requirement of transcription factors enriched in hepatocytes as mentioned above and the expression on human hepatocyte cells of the recently described sodium taurocholate co-transporting peptide (NTCP) which constitutes the HBV receptor (Fig. 2a, [44]). NTCP is a bile acid transporter expressed at the basolateral membrane of hepatocytes. The receptor binds the N-terminal end of L-HBsAg as described above, and in fact the region involved may include the first 75 amino-acids [45]. In addition, subsequent studies have indicated that heparin sulfate proteoglycans may be involved in the initial stages of binding [46], as well as glypican 5 [47], thus suggesting co-operative binding in the process of attachment and uptake.

Fig. 2
figure 2

Diagrammatic representation of hepatocyte infection with HBV. a The various stages of the life cycle of the virus from attachment to release, as explained in the text and numbered as: 1 attachment; 2 endocytosis; 3 capsid release; 4 rcDNA entry into the nucleus; 5 cccDNA synthesis; 6 transcription; 7 mRNA transfer to the cytoplasm; 8 encapsidation; 9 (−)-DNA strand synthesis by reverse transcription; 10 (+)-DNA strand synthesis; 11 budding of virions into the ER lumen; 12 virus release through multivesicular body (MVB) transfer to hepatocyte surface. b Basolateral release of the virus for cell-to-cell spread. Part modified from Baltayiannis and Karayiannis [28]

Penetration and uncoating

Evidence suggests that, following binding, the virion is internalised through clathrin-mediated endocytosis [48]. However, information on the removal of the viral envelope and the trafficking of the nucleocapsid to the nuclear pores is still lacking. Transport factors such as importin alpha and beta and component nucleoporin 153 ensure nucleocapsid delivery to the nuclear basket [49, 50]. Disassembly of nucleocapsids ensues leading to the release into the nucleoplasm of the rcDNA genome of the virus with its covalently attached polymerase.

Transcription/translation

The rcDNA is converted into the cccDNA form in a process involving a number of stages whereby the viral polymerase covalently attached to the 5′ end of the negative (−)-DNA strand and the short RNA oligomer from the 5′ end of the positive (+)-DNA strand which is used to prime (+)-DNA strand synthesis are removed, the variable positive strand is completed and finally the ends of the two now complete strands are ligated together [29, 51]. This process likely involves the use of specific cellular factors that are presently unknown. In this form, cccDNA is quite stable and behaves as a minichromosome. Moreover, a number of epigenetic factors which are recruited onto the cccDNA, such as histones H3 and H4, transcription factors that include CREB, ATF, STAT1 and STAT2, chromatin-modifying enzymes, histone acetyltransferases and deacetylases, and the HBc and HBx proteins [52,53,54] appear to modulate cccDNA transcriptional activity. Thus, the cccDNA constitutes the template for viral transcript synthesis by the host RNA polymerase II. The synthesised transcripts are then exported to the cytoplasm where they are translated into the various viral proteins described above.

Encapsidation

The pgRNA, in addition to being the transcript for core and polymerase synthesis, also serves as the template for DNA synthesis by reverse transcription in the first instance. Being longer than genome length, it has a terminal redundancy which is the result of readthrough, passing the start of the transcript synthesis by about 120 nt, terminating with the poly-A tail. This redundancy contains a second copy each of the direct repeat 1 (DR1) and the encapsidation signal ε, a secondary RNA structure that encompasses the precore nucleotide sequence [55, 56].

Encapsidation of the pgRNA into the nucleocapsid is the next step in the virus life cycle and involves a series of events employing both viral and host factors. The polymerase has three functional domains, each one of which is in turn involved in DNA priming (terminal protein), reverse transcription (rt) and pgRNA degradation (RNAse H). The terminal protein is separated from the rt domain by a spacer region of unknown function. The polymerase engages the ε at the 5′ end of the pgRNA, a process that triggers encapsidation of the complex by the core protein (Fig. 3). It appears that the cap structure is also involved in this process [58], as well as eIF4E and heat shock proteins, which are thought to be instrumental in aiding encapsidation, stabilisation and activation of the polymerase [59, 60]. The C-terminus of the core protein, which as mentioned before is arginine-rich and in addition certain of its amino-acids are phosphorylated, is involved in pgRNA binding thus facilitating encapsidation. The core is also involved in reverse transcription initiation, nucleocapsid envelopment and fulfils other roles, as recently reviewed [61].

Fig. 3
figure 3

Replication strategy of HBV. a Primer synthesis; b translocation and binding to DR1; c synthesis of the (-)-DNA strand with concurrent degradation of the pgRNA; d circularisation of the (−)-DNA strand though covalent attachment of the polymerase at its 5′ end and preservation of the RNA primer (DR1) from the 5′ end of the pgRNA; e hybridisation of the DR1 RNA primer with DR2 at the 5′ end of the (−)-DNA strand and extension for (+)-DNA strand synthesis. Modified from Karayiannis [57]

The subsequent steps in virus nucleic acid synthesis then take place within the nucleocapsid. The ε encapsidation structure consists of a lower stem, an upper stem, a side bulge and an apical loop, formed through base pairing of palindromic nucleotide sequences. Part of the side bulge of ε serves as a template for the synthesis of a 4-nucleotide-long DNA primer [62], covalently attached to the terminal protein domain of the polymerase through a phosphodiester linkage between dTTP and the hydroxyl group of a tyrosine residue in the terminal protein (position 63) [63, 64]. The polymerase–primer complex next translocates to the 5′ of the pgRNA, where it hybridises with part of the DR1 region with which the primer is homologous. This translocation event to the correct site, i.e. DR1 at the 3′ end of the pgRNA, is likely assisted by two other elements, ϕ and ω (Fig. 3), which interact with ε [65, 66]. (−)-DNA strand synthesis is thus initiated by reverse transcription as the complex advances towards the 5′ end of the pgRNA, having a terminal redundancy of about 10 nucleotides. The RNA template is concurrently degraded by the RNAse H activity of the polymerase, except for the final 11–16 or so ribonucleotides which encompass the DR1 region of the 5′ of the pgRNA. This ribonucleotide fragment serves as the primer for (+)-DNA strand synthesis [67]. It hybridises with the homologous to it DR2 at the 5′ end of the newly synthesised (−)-DNA strand necessitating a second translocation event. (+)-DNA strand synthesis then proceeds to the 5′ end of the (−)-DNA strand. Circularisation facilitated by the short terminal redundancy of the (−)-DNA strand allows template exchange and continuation of (+)-DNA strand synthesis along the 3′ end of the (−)-DNA strand [68]. Synthesis of both DNA strands occurs within the nucleocapsid and this is facilitated through pores in the capsid that allow passage of nucleotides. However, once the maturing nucleocapsid is enveloped by budding through the endoplasmic reticulum membrane, the nucleotide pool within the capsid is depleted leaving an incomplete (+)-DNA strand, hence the partially double stranded nature of the HBV genome [69].

Maturation

Mature nucleocapsids, meaning that they contain the newly synthesised partially double-stranded rcDNA with the polymerase still bound to the 5′ end of the (−)-DNA strand, can follow one of two pathways. Early on infection and until sufficient amounts of HBsAg accumulate, nucleocapsids are shuttled back to the nucleus in order to replenish the cccDNA pool [52, 70]. In the final stages of morphogenesis, virions bud through the ER membrane where HBsAg proteins are already localised into the lumen, acquiring in the process their outer envelope [71]. A crucial determinant of these processes is the topology and conformational arrangement of L-HBsAg. Studies have established that about half of the protein lies with its N-terminus on the cytosolic side of the ER, whence it favors binding to the nucleocapsid and therefore budding. L-HBsAg with its terminus in the lumen allows its exposure on the surface of the virion and therefore easy access to the NTCP receptor during binding to hepatocytes [41].

Egress

It was generally assumed that, as HBsAg proteins accumulate in the ER-Golgi intermediate compartment, virions will accumulate in the lumen and follow the secretory pathway during cell exit, akin to the route that SVPs follow. It is now known that virions exit by using a different pathway which relies on proteins associated with the endosomal sorting complex required for transport, which form multivesicular bodies [72].

Infection through cell-to-cell spread

The life cycle described above is initiated following receptor-mediated attachment of a cell-free virus which moves from the sinusoidal lumen filled with blood into the Space of Disse. In view of the large size of the liver, the mode of viral spread must be efficient and of course dependent on the size of the inoculum, both of which determine the length of the incubation period. Clusters of virus-infected cells are frequently observed following immunohistochemical staining of liver sections from patients and experimentally infected animals [73]. Such observations suggest that HBV may also spread through cell-to-cell infection. Thus, movement of infectious virions to remotely situated hepatocytes possible only through their secretion into the extracellular milieu may not be the only means of virus spread.

It is generally accepted that HBV is not directly cytopathic, but that the histological pathology observed is the result of activation of the adaptive immune response and, in particular, the production of virus specific cytotoxic T-cells [74,75,76]. Lysis of infected hepatocytes evident by a rise in transaminase levels during acute infection or persistently raised levels in chronic infection entails their replacement through regeneration. Hepatocytes although terminally differentiated retain their capacity for substantial proliferation in response to liver injury. Their normal lifespan is longer than 6 months [77]. Replacement of lost hepatocytes may also occur through stem cell differentiation, as these cells may be resident in the region of portal tracts [78]. These events allow the liver to maintain its mass, but at the same time have implications when one considers which determinants are needed which facilitate cell-to-cell spread of HBV.

As described above, chronic HBV infection is maintained by the presence of the cccDNA minichomosome which persists throughout the lifespan of the infected hepatocyte. During liver regeneration as a result of immune-mediated hepatocyte lysis and hepatocyte proliferation to compensate for this, cell division may result in cccDNA decline and generation of cccDNA-free cells. Indeed, a recent study using the urokinase-type plasminogen activator/severe combined immune-deficiency mouse model with transplanted tupaia hepatocytes infected with woolly monkey HBV, indicated that, although there was increased hepatocyte proliferation, this was accompanied by a 75% reduction in virion production, as well as reduced pgRNA synthesis and core protein production [79]. It appears therefore, that during cell division, the cccDNA pool is reduced through loss of cccDNA, and that only a fraction of daughter cells carry cccDNA. Therefore, this mechanism only accounts for limited cell-to-cell spread of infection.

Cell-to-cell spread employed by other viruses often involves complex inter-cellular adhesion and is safeguarded by the presence of the appropriate receptors; cellular polarity which determines whether virus is released into the extracellular compartment or basolaterally may contribute to pathogenicity, and finally intra-cellular trafficking (Fig. 2b, [80]). What is more, cell-to-cell spread is favored by viruses which are released from the infected cell through budding from the cell membrane or use an exocytic route such as that of multivesicular bodies described above for hepadnaviruses. This mode of virus spread/transmission avoids immune attack, in particular antibody-mediated blocking of receptor binding. In the case of HBV, experimental evidence suggests that cell-to-cell spread is favored by polarised release of virions [73, 81]. In fact, hepatocytes traffic and export HBV basolaterally by polarity-dependent mechanisms, which in the case of DHBV is associated with sphingolipid structures [82].

Exosomes are extracellular vesicles that originate from multivesicular bodies, 40–150 nm in diameter, which are produced by most cell types. They have been implicated in a number of processes, and, following their secretion into the extracellular space, they can mediate indirect cell-to-cell communication through the transfer of macromolecules, miRNAs and other RNAs, but also viruses [83]. Cell-to-cell spread of hepatitis C virus is well documented [84] and, indeed, exosomes appear to transmit the virus to hepatocytes in a receptor-independent manner [85].

The impact of cell-to-cell spread in HBV infection and persistence are not well understood at all. However, mathematical modelling has recently been employed in an attempt to shed light on these events and study the effect on outcome following potential actions by the immune system that can result in clearance, non-clearance or fulminant hepatitis of acute HBV infection. Cell-to-cell transmission, although not impacting establishment of infection, it appeared to hinder its clearance and suggested that it might be a factor in causing fulminant hepatitis. The model showed that it is the combination of cell-to-cell transmission strength, cytokine production and the T cell clearance number that decides the fate of HBV acute infection [86]. Clearly, further work is needed to shed more light on the mechanisms involved in cell-to-cell spread of HBV.

Concluding remarks

Our knowledge with regards to the replication strategy of HBV and its life cycle in general has increased significantly in recent years. There are still, however, missing pieces in this jigsaw, the completion of which requires a reliable and as near authentic as possible in vitro model of HBV infection. This will allow the finer dissection of the various steps involved in the replication process of the virus described above, the interactions of the polymerase itself with the pgRNA template initially and subsequent DNA strand synthesis. Moreover, the successful expression of the polymerase by recombinant DNA technology and its crystallisation will allow studies on its multifunctional role and active conformation during replication, all of which may reveal possible targets for future antiviral design. Finally, such cell culture systems will elucidate and map in greater detail the steps involved in cell-to-cell spread of the virus.