Background

Adeno-associated viruses (AAV) are small (~ 26 nm) and nonenveloped viruses belonging to the family of Parvoviridae [1]. They have been broadly utilized in gene therapies for hereditary diseases such as neuromuscular, neurodegenerative, ocular, hemophilic, and lysosomal storage disorders [2,3,4,5,6]. Compared to other gene delivery platforms, such as lipid nanoparticle (LNP) and lentivirus, AAV vectors afford the advantage of high safety profile, lasting gene expression, broad tissue tropism, and capability of transduction of nondividing cells [7,8,9,10].

The genome structure and the transduction cycle of AAV are schematically depicted in Fig. 1. AAV carries a single-stranded DNA (ssDNA) genome of approximately 4.7 kilobases (kb) that encodes three open reading frames flanked by two inverted terminal repeat (ITR) [1]. The cap gene transcribes mRNA via the p40 promoter, which is further spiced into three viral proteins (VP1, 2, 3) [11]. VP1 spans the entire VP2 sequence in addition to a ~ 130-amino-acid N-terminal region, and the VP2 protein contains VP3 sequence in addition to a ~ 60-amino-acid N-terminal region. Sixty copies of proteins at ratio of approximately 1:1:10 for VP 1–3 assemble into the characteristic icosahedral capsid [11]. The assembly-activating protein (AAP) is also expressed from a frame shift of the cap gene [12]. The rep gene encodes four nonstructural rep proteins (Rep78, Rep68, Rep52, and Rep40) from the p5 and p19 promoter. The Rep proteins have endonuclease, DNA helicase, and ATPase activities that are essential for AAV DNA replication and packaging [13]. The inverted terminal repeat (ITR) adopts T-shaped motif, which functions as primer for double-stranded DNA synthesis inside the nucleus [14]. The cap and rep genes are replaced with the therapeutic gene of interest for AAV-based gene therapy [15].

Fig. 1
figure 1

Schematic diagram for the transduction life cycle of AAV virions. A Genome structure of AAV. A single-strand DNA ~ 4.7 kb is flanked by two inverted terminal repeat (ITR). Multiple cap and rep genes are transcribed from two open reading frames through different promoters and alternative splicing. B Surface rendering of the crystal structure of AAV2. The capsid adopts an icosahedral conformation with key structural features including the fivefold channel, threefold protrusion, and the twofold depression. C Model of cellular entry and trafficking of AAV vectors. Following binding to a receptor/co-receptor complex, AAV enters target cell through endocytosis. Virions traverse through the trans-Golgi network including early and late endosomes. The conformation change in capsid exposes the phospholipase A2 (PLA2) domain to enable endosome escape and nuclear import via the nuclear pore complex (NPC). After nuclear import, intact capsids accumulate in the nucleolus followed by genome release in the nucleoplasm. The single-strand DNA is converted to double strand via the ITR and host-cell DNA polymerase. The rep and cap genes are then transcribed, and the new virion particles are assembled. The capsid can be also neutralized or eliminated by the antibody and ubiquitylation-proteasome system within the extracellular and cytosolic space, respectively

The highly symmetric geometry of the viral capsids suggests a simple and static function of enclosing genome DNA. On the contrary, the capsid constitutes a dynamic entity with exquisite structural rearrangement and protein interaction to enable a range of functions throughout the viral life cycle. These include cell surface receptor recognition, endosomal trafficking and escape, nuclear entry, genome uncoating, viral replication and assembly, and immune neutralization and proteosome degradation [16, 17]. In this brief review, we summarize and discuss the current literature on the structural characterization of capsid during each phase. The structure–function understanding about capsid proteins will aid the development of AAV vectors with improved efficiency for gene therapy.

Discussion

Overall capsid structure

The capsid structures for multiple AAV serotypes have been solved by X-ray crystallography and cryogenic electron microscopy (cEM) [18,19,20,21,22,23,24]. The studies reveal highly conserved features across different serotypes, which includes a core comprised of eight-stranded antiparallel β-barrel and connecting loops. The 60 copies of the monomeric VPs assemble the icosahedral shell through two-, three-, and fivefold symmetric interactions. These interactions define the surface characteristics of the AAV, including depressions at the twofold (2F) axes, protrusions at the threefold (3F) axes, and cylindrical channels at the fivefold (5F) axes (Fig. 1B). The length of the connecting loops varies from a few to ~ 200 amino acids. Variable regions in the surface loops render distinct cellular tropism among different serotypes. Notably, only the conserved C-terminal VP residues are visible in the solved AAV structures, whereas the N-terminal sequences are not observed. The lack of defined N-terminal structure is likely due to the low copy numbers of VP1 and VP2 in the icosahedral assembly as well as the conformational flexibility. Interestingly, cEM studies of empty AAV2 capsids identified fuzzy density globules inside the capsid that was postulated as the N-terminal regions of VP1 and VP2 [25]. These globules were absent in mutant AAV2 structure where N-terminal parts of VP1 and VP2 were deleted [26]. In addition, a nucleotide binding pocket carrying an ordered nucleotide was also observed in the crystal structures of several serotypes, indicating the structural conservation of the genome anchoring region [20, 27].

Capsid receptor complex structure

The transduction pathway of the AAV is initiated by the attachment to cell surface glyco-receptors. The primary receptors vary among AAV serotypes, including 2–3/2–6-N-linked sialic acid (SIA) (AAV1), heparan sulfate proteoglycan (HS) (AAV2, AAV3, and AAV6), 2–3 O-linked sialic acids (AAV4), 2–3 N-linked sialic acids (AAV5 and AAV6), and galactose (AAV9) [28]. The structures of AAV in complex with glycan receptors have been solved by crystallography and cEM [29,30,31]. The studies reveal that the 3F protrusion is the primary sites for glycan binding across the serotypes. AAV2 HS binding site is located to basic residues at the 3F protrusions. Additional conformation changes were observed at the 2F and 5F axes, suggesting capsid rearrangement occurring during the cellular entry. The SIA binding site for AAV1 was also located at the base of 3F protrusion. For AAV5, two SIA molecules interact with the depression of the 3F axis and the surface loop close to the 5F axis. AAV-DJ binds to heparinoid pentasaccharide at the right side of the 3F spike.

Importantly, recent studies identified a universal AAV coreceptor (AAVR) which is also required for AAV cellular entry in addition to the primary glycan receptors [32]. The AAVR receptor is a transmembrane protein comprised of five tandem polycystic kidney disease-like (PKD) domains (PKD1-PKD5). The structures of AAV complexed with AAVR have been solved by cEM for multiple serotypes [33,34,35]. AAV2, AAV1, and AAV9 bind to the PKD2 domain of AAVR, whereas AAV5 binds to the PKD1 domain. PKD2 resides on the plateau of the AAV2 capsid and interacts with the inner facet of 3F spike. In contrast, PKD1 contacts the outer rim of the 3F spike on the AAV5 capsid. The high-resolution structure reveals no long-range conformational change in AAVR upon AAV binding. Cryogenic electron tomography (cET) was applied to probe the conformation of the non-AAV engaging domains of AAVR [36]. For AAV5, the nonbinding PKD2 domain displays three configurations extending away from the virus. The PKD1 domain of AAV2 adopts four configurations of PKD1, all different from AAV5 interacting PKD1 conformation. Unlike other serotypes, AAV4 is unable to interact with AAVR, and its cellular co-receptor remains elusive. A comparison of AAV4 structure with AAV5, AAV2, and AAV1 in complex with AAVR explains the lack of affinity for the AAV4 clade [37].

Capsid antibody complex structure

The clinical application of AAV as a gene therapy vector has been hindered by neutralizing antibody (Ab) induced from the humoral immune response, which often limit the clinical administration of AAV therapeutic to single dosage [8, 38]. cEM followed by 3-D image reconstruction have been used to define the epitopes for neutralizing antibodies against AAV1, AAV2, AAV5, and AAV8 [39,40,41,42]. The studies reveal significant conserved epitope footprint on capsid surface across the serotypes despite variations in the amino acid sequences. Ab A20 binds to AAV2 on the plateau between the 2F and 5F axes and the floor surrounding the 5F channel. Ab C37-B binds to 3F protrusions on AAV2, which overlaps with HS receptor binding site. Ab 4E4 and 5H7 bind to 3F protrusions of AAV1. Ab ADK8 targeting AAV8 also binds to the 3F protrusion. The structure of AAV5 in complex with four Abs has been reported, including two neutralizing Abs ADK5b and H2476 and two non-neutralizing Abs, ADK5a and 3C5. Ab ADK5a and 3C5s epitope partially overlap that of PKD1 at the floor of the 2F depression. Ab H2476’s footprint locates in the vicinity to the binding site of SIA near the 3-F protrusion, whereas the epitope for ADK5b resides near the 5F channel. These data provide structural basis for the neutralization effect including steric interference with receptor binding as well as post-cell attachment and pre-nuclear entry.

Endosome trafficking and escape

Once the AAV enters the host cells via receptor-mediated endocytosis, it traverses through the endosomal system before breaking into the cytosolic milieu. The capsid undergoes conformational shift at acidic pH conditions within the late endosome, which exposes the N-terminal domain of the VP1. The phospholipase A2 (PLA2) activity of the N-terminal regions disrupts the lipid membrane leading to endosome escape of the viral particle [43, 44]. The capsid structures at different pH conditions throughout the endosomal system have been investigated by a variety of biophysical methods. The structure of AAV2 was determined by cEM and 3D image reconstructions at different pH conditions experienced in the extracellular space, early endosome, late endosome, and lysosome [45]. The study reveals conformation rearrangements of the variable loops accompanying the drop in pH, whereas the core structure remains constant. Differential scanning calorimetry (DSC) and small-angle scattering (SAS) analysis also indicate mild structural changes in response to pH shift. Small-angle neutron scattering (SANS) analysis confirmed a genomic rearrangement event that accompanied the capsid structural changes. Circular dichroism (CD) demonstrates that the VP1u is structurally ordered in solution as predominantly α-helical, whereas a gradual loss of secondary structure was seen with increasing temperature and/or decreasing pH [46]. When the pH was restored to 7.5, the secondary structure was restored to the original state. The crystal structures of green fluorescent protein (GFP) AAV8 have been determined at pH of 6.0, 5.5, 4.0, and 7.5 after incubation at pH 4.0, following the events during endosomal trafficking [47]. While the overall capsid topologies remain similar, significant amino acid side chain l motion was observed on the interior surface of the capsid near the ordered nucleic acid density. AAV9 capsid structure was determined across pH 7.4, 6.0, 5.5, and 4.0 in free as well as galactose-bound form at pH 7.4 and 5.5 using cEM and three-dimensional image reconstruction [48]. The study observed capsid conformational changes at the 5F channel during the externalization of the VP2/VP3 N termini. In addition, it shows that AAV9-glycan receptor complex remains intact at the late endosome pH 5.5. Taken together, these studies confirm the N-terminal externalization of VP2/VPs occurring at acidic pH and the conformation change at the 5F channels as a possible prelude to genome uncoating.

Ubiquitylation and proteosome degradation

Within the cytosol, the capsid is subjected to ubiquitylation followed by proteasomal degradation [49]. The resulting peptides are presented by major histocompatibility complex (MHC) class 1 molecules to CD8 + T cells. The CD8 + T cell can exert destructive cytotoxic effects to eliminate rAAV-transduced cells, leading to the loss of transgene expression. Western blot of immunoprecipitated AAV2 and 5 capsid proteins from infected HeLa cell lysates revealed the presence of ubiquitin conjugation [48]. Mutation of the surface tyrosine, serine, threonine, or lysine residues significantly reduced ubiquitylation and enhanced AAV2 transduction efficiency [49,50,51].

Nucleus entry

The AAVs enter the nucleus through the nuclear pore complex, which is believed to be mediated by the nuclear location signal in VP N-terminal region. The BC2 and BC3 domains shared by the N-terminus of VP1 and 2 are separated by a 23-amino-acid linker, which classifies them as a nonclassical nuclear localization signal, which confer nuclear localization to a heterologous fusion protein [52]. A high-speed super-resolution single-point edge-excitation sub-diffraction (SPEED) microscopy study revealed that AAV2 particles are imported through nuclear pore complexes (NPCs) rather than nuclear membrane budding into the nucleus [53]. Moreover, approximately 17% of the rAAV2 molecules starting from the cytoplasm successfully transverse the NPCs to reach the nucleoplasm, revealing that the NPCs act as a strict selective step for AAV delivery.

Genome uncoating

The viral particle releases its genome content in the nucleolus. The physical process of genome ejection has been studied in vitro under thermally stressed condition [54] for AAV1, AAV2, AAV5, and AAV8 using differential scanning fluorimetry (DSF), differential scanning calorimetry (DSC), and EM showed that capsid melting temperatures differed by more than 20 °C between the least and most stable serotypes. The ultrastructural differences in genome release from AAV containing single-stranded DNA (ssDNA), or self-complementary DNA (scDNA), were compared by atomic force and electron microscopy [55]. The scAAV vectors required significantly higher thermal threshold than ssDNA. Genome release was also monitored by a fluorometric method, which demonstrated that acidic pH and high osmotic pressure inhibit genome release and an inverse correlation between the genome size and the uncoating temperature. In another study [56], the stability of AAV8 and AAV9 was profiled by atomic force microscopy (AFM) and physical modeling. The study indicates that genome release can proceed via two alternative pathways: intact capsid ejects linear ssDNA or ruptured capsid ejecting entangled ssDNA. To date, the exact cellular condition triggering AAV genome release remains elusive. It was postulated that the disassembly of AAV2 capsids is induced by the structural reorganization of the nucleolus in a cell cycle-dependent manner [57]. Interestingly, AAV2 virions are completely and rapidly dissociated following incubation with a liver nuclear extract [58]. More systematic investigation of the physiological conditions encountered by AAV in the nucleolus is required for the complete understanding of the genome uncoating process.

Replication and assembly

Following the genome uncoating, the life cycle of the virus enters the reproductive stage including replication and assembly. While the therapeutic AAV construct is devoid of the self-replication and assembly functionality, understanding about such process is important for improving the production yield in manufacturing process.

Sucrose density sedimentation experiments showed that VPs oligomerize in the cytoplasm (trimers or pentamers), but do not form capsids [59]. In the nucleus, the vast majority of VPs is assembled into capsids with sedimentation coefficient between 60 and 110S. Mutational studies showed that the VP3 sequence is sufficient for the assembly of AAV capsids by knocking out the start codons of VP1 and VP2 [60]. The composition of several AAV serotypes was investigated by high-resolution native mass spectrometry [61]. The data reveal that the capsids assembly is a stochastic process, forming a heterogeneous population of capsids with variable VP stoichiometry. Specific residues important for the formation of the AAV capsid have been identified via mutagenesis of residues distributed over the entire VP1 sequence followed by characterization of their ability to abolish capsid assembly [62]. Residues located at the 2F and 5F interfaces of the VP and close to the 5F channel play important roles in capsid assembly. In addition to VP3, the viral cofactor assembly-activating protein (AAP) is also essential for capsid assembly [63]. AAP was reported to bind to the VP multimers across the 2F axis to facilitate capsid assembly. The role of AAP was evaluated in 12 naturally occurring AAVs [64]. The results demonstrate that AAP enhances capsid protein stability and interactions. Moreover, the study showed that the dependence on AAP can be partly overcome by strengthening interactions between VP monomers within the capsid assembly.

The genome packaging process occurs in the nucleus and relies on the presence and interaction of the assembled empty capsids, replicated AAV genome, and the Rep proteins. DNA encapsulation is directed by protein–protein interactions between empty capsids and complex of Rep78/68 with the virus genome. During genome replication, Rep78/68 remains covalently attached to the nascent ssDNA and docks on the 5F axis of the assembled capsid [65]. The Rep52/40 proteins are responsible for transferring the AAV genome DNA into empty particles through the 5F channel. The role of various capsid residues in genome packaging was investigated by site-directed mutagenesis [66]. Mutational residues around the 5F channel abolished Rep-capsid interaction and genome packaging. Mutation of a residue near the 3F axis also prevents packaging. This residue is however not exposed on the capsid surface but affects the stability of the capsid, suggesting that capsid stability is required to compensate for the increase in pressure during packaging. Conformational changes in AAV1 induced by genome packaging were studied by cEM that reveals conformational changes upon packaging of the genome [67]. The rearrangements occur at the inner capsid surface and lead to constrictions of the 5F channel.

Conclusion

Collectively, tremendous progress has been made in the past decade in the understanding about the life cycle of AAV through the biophysical and structural studies. It is evident that the capsid structural dynamics plays central role throughout the viral infection and amplification process. Nonetheless, many key questions remain unaddressed, such as the exact mechanism triggering nucleus genome unloading and the high-resolution structure of the rep proteins in complex with the DNA and newly assembled capsid. The in-depth knowledge around the structural and functional properties of the capsid will aid the development of next generation of AAV vectors for gene therapy.