DNA double-strand breaks (DSBs) are the most severe form of DNA damage in eukaryotic cells and are generated by ionising radiation, reactive oxygen species and DNA replication across nicks [1]. DSBs can lead to cell death, and erroneous repair may result in carcinogenesis through chromosomal translocation or modification. Non-homologous end joining (NHEJ) and homologous recombination (HR) are the two major pathways of DSB repair in human cells. HR functions mainly in the late S/G2 phases of the cell cycle due to the requirement for a sister chromatid. In contrast, NHEJ repairs DSBs directly without a DNA template and can function throughout G1/early S phases [1].

The NHEJ process comprises DNA synapsis, end processing and ligation [2]. During synapsis, Ku70/Ku80 heterodimers assemble around and maintain proximity of broken DNA ends [3]. DNA-dependent protein kinase catalytic subunit (DNA-PKcs), a PI3-kinase-related kinase, is recruited through interaction with the Ku80 C-terminus to form DNA-PK [4, 5]. Two DNA-PK complexes are thought to hold DNA ends close together [6]. DNA-PKcs phosphorylates itself and various other proteins, including NHEJ components. End processing involves nucleases such as Artemis, which exhibits endonuclease activity after activation by DNA-PKcs phosphorylation [7]. The final ligation step is mediated by DNA ligase IV (LigIV) in a stable complex with dimeric XRCC4 [8]. XLF/Cernunnos also interacts with XRCC4 and enhances LigIV DNA ligation, promoting NHEJ [9, 10].

Structural and functional studies of individual components in NHEJ previously reported include DNA-PKcs [11], LigIV [12], XLF [13] and XRCC4 [14]. We have investigated the following binary complexes: LigIV/XRCC4 [14], XRCC4/XLF [15], LigIV/Artemis [16] and DNA-PKcs/Ku80 C-terminus [11; Sibanda BL, Chirgadze DY, Ascher D, Blundell TL, In preparation]. Last year we discovered PAXX [17], an additional component involved in NHEJ (Fig. 1).

Fig. 1
figure 1

NHEJ protein network. Arrows indicate a known interaction. Blue labels indicate the structure has been determined. Green arrows indicate complexes with experimentally defined structural interactions (Color figure online)

These studies and others (e.g. [18, 19]) still leave unanswered questions, including how NHEJ components are coordinated across space and time throughout synapsis, end processing and ligation. We observed three different mechanisms that may ensure appropriate spatial colocation of components: stages that are preformed stable structures where the main actors gather and engage, scaffolds that are constructed quickly and disassembled easily to facilitate cell responses, and strings that tether components together. For NHEJ, these include:

  1. 1.

    A stage comprised of Ku heterodimer, interacting with DNA broken ends and binding a variety of components including DNA-PKcs, aprataxin- and PNKP-like factor (APLF), XLF, BRCA1 and PAXX.

  2. 2.

    A further stage, DNA-PKcs, which interacts with Ku heterodimer, DNA, PARP1, BRCA1, Artemis and other components, often depending on post-translational modification and binding of other factors.

  3. 3.

    A scaffold assembled from protein filaments of alternating XRCC4 and XLF dimers to bridge Ku heterodimers. This bridging is regulated by LigIV, which the tails of XRCC4 bind tightly, therefore bringing LigIV close to DNA broken ends.

  4. 4.

    An intrinsically disordered string comprising the 300-residue C-terminal region of Artemis. Foldable regions of the Artemis C-terminus interact with DNA-PKcs, LigIV and PTIP, which may keep these components close to the DNA repair process in order to be available to function when needed.

Apart from ensuring correct colocation at the appropriate time, we have contended that the complexity of such assemblies is selectively advantageous [2022]. Binary interactions in regulatory or signalling systems would occur opportunistically in the crowded environment of the cell, giving rise to noise in the system. Conversely, cooperative formation of multiprotein systems is less likely to form opportunistically, especially if they have many components and ordered-assembly mechanisms.

Here, we show that such stages, scaffolds and strings have complementary roles, often binding to the same partners but operating in different ways over space and time. Understanding spatial and temporal organisation of NHEJ may provide insight into whether such mechanisms are universal for mediating colocation in other complex regulatory systems.

Assembling and coordinating the actors in NHEJ

Spatial organisation of molecular assemblies in NHEJ has been studied using nanoelectrospray mass spectrometry, small-angle X-ray scattering (SAXS), X-ray crystallography, NMR and cryo-electron microscopy (cryo-EM). Cumulative insight from these techniques allows us to understand further how NHEJ proceeds efficiently over time.

A stage: Ku heterodimer forms a platform with DNA

The Ku heterodimer assembles around broken DNA ends, forming a stage in which the core ring structure is responsible for interacting with APLF, WRN, XLF, PAXX and BRCA1, and the C-terminal domain of Ku80 interacts with DNA-PKcs [4, 5, 23]. We are using X-ray crystal-structure analysis, cryo-EM and biophysical approaches to define the interactions between components and describe the subsequent structural changes. We comment here on two of the proteins that assemble on this stage and whose interactions have been topics of recent discussion.

PAXX, identified recently in our lab as PAralog of XRCC4 and XLF by using structure-based homology searches, interacts with Ku to promote DNA-DSB repair [17, 24, 25] (Fig. 2). Using an in vitro pull-down assay and biochemical analysis, we showed that the PAXX C-terminus (residues 177–204) interacts with the core ring of Ku in a DNA-dependent manner. We suggested this interaction might be regulated by phosphorylation of the PAXX C-terminus by DNA-PK. In collaboration with the laboratory of Dr. Steve Jackson [17], we demonstrated that PAXX is recruited to DNA-damage sites in cells, and PAXX depletion leads to DSB-repair defects and hypersensitivity to ionising radiation. Using RNA interference and CRISPR-Cas9 to generate PAXX-/- human cells, we showed that PAXX functions with XRCC4 and XLF to mediate DSB repair and cell survival in response to DSB-inducing agents. PAXX promotes Ku-dependent DNA ligation in vitro and assembly of core non-homologous end-joining (NHEJ) components on damaged chromatin in cells.

Fig. 2
figure 2

Structure of PAXX. The crystal structure of PAXX homodimer (PDB code: 3WTD) is presented using a cartoon representation. One of the chains is indicated by rainbow colour; the N- and C-termini are blue and red, respectively (Color figure online)

Two groups have reported an interaction between BRCA1 and Ku [26, 27]. BRCA1 functions as a tumour suppresser gene, and patients carrying the germline or somatic mutations have high risk of developing breast and ovarian cancer [28]. The expression and phosphorylation of BRCA1 is cell cycle-dependent with the highest level at S and M phases [29]. BRCA1 recruitment is important for influencing Ku70/80 retention at the DNA-damage sites and promoting the chromosomal G1 phase of NHEJ. The exact region in BRCA1 responsible for this interaction with Ku was reported as amino acids 1–200 in one report [27] but more recently as 262–552 in the other [26]. Our ongoing objective is to define the region of BRCA1 responsible for the interaction with Ku.

A second stage: DNA-PKcs

In 2010, we described the crystal structure of the ~4000 amino-acid chain of DNA-PKcs complexed with the C-terminus of Ku80 at low resolution, in which the two molecules in the asymmetric unit have slightly different relative positions of the head domain and the large circular structure of HEAT repeats that comprise the stage [11]. More recently, we extended the study to a higher resolution, allowing the chain to be traced. We are using both X-ray crystallography and cryo-EM to define how other molecules, such as Ku and DNA, assemble on this stage. Interestingly, in the S phase, BRCT domains of BRCA1 (see “The strings” section) are reported to interact with DNA-PKcs (near S2056) in a phosphorylation-independent manner to block DNA-PKcs autophosphorylation, reducing NHEJ and increasing HR [30].

A scaffold: XRCC4/XLF filaments

XRCC4 and XLF are both homodimers with similar architectures, including the N-terminal head domains and C-terminal tail regions [13, 14]. Our group [12, 15] and others [3135] have shown by mutagenesis study, size chromatography, SAXS, native mass spectrometry and X-ray crystallography that XRCC4 and XLF dimers interact through their head domains, leading to the formation of XRCC4/XLF filaments, elongation of which is regulated by LigIV [12]. The alternating XRCC4/XLF helical polymers are flexible, and multiple copies of filament can assemble into cylinders with a diameter that may accommodate and colocate other components of NHEJ (Fig. 3). As a result, XRCC4/XLF filaments may function as a scaffold to allow the alignment of the two broken DNA ends and the local assembly of NHEJ components to enhance LigIV function.

Fig. 3
figure 3

XRCC4 and XLF. In our crystal structure, alternating XRCC4/XLF helical polymers can assemble into cylinders with diameters that may accommodate and colocate other components of NHEJ (PDB ID: 3W03). a XLF and XRCC4 dimers interact through their head domains to form a helical XRCC4/XLF filament. b The filaments assemble as a cylinder in the crystal structure. The view perpendicular to the filament axis is shown with a single filament. c The central space created by the filament could potentially accommodate nucleosome, Ku and DNA-PKcs

The strings

Artemis C-terminal region

Artemis, which belongs to the metallo-β-lactamase superfamily, is the major nuclease involved in the end-processing step of NHEJ. It has been estimated that Artemis repairs 10–15 % of DSBs generated by IR [36]. The Artemis gene DCLRE1C was first discovered mutated in patients who had radiosensitive-severe combined immunodeficiency (RS-SCID) and Athabascan SCID (SCIDA) [7]. It was later found that Artemis is the essential endonuclease that opens the hairpin intermediate structure upon stimulation by DNA-PKcs during V(D)J recombination [37]. The nuclease also has intrinsic 5′ exonuclease activity [38] and weak endonuclease activity on ssDNA in vitro, which is likely stimulated when forming a complex with DNA-PKcs [39].

The Artemis structure can be divided into two regions, each accounting for approximately half of its size (Fig. 4). These regions correspond to the N-terminal nuclease, adopting β-metallo-lactamase and β-CASP domains [40] homologous to RNaseJ, Apollo and SNM1A, and a C-terminal region, which has no enzyme activity but is nevertheless indispensable for normal Artemis function [41]. IUPRED (Prediction of Intrinsically Unstructured Proteins) [42] predicts the C-terminal region of Artemis to be mostly intrinsically disordered, and ANCHOR (Prediction of Protein Binding Regions in Disordered Proteins) [43] analysis predicts several short regions within the intrinsically disordered C-terminal region to bind globular or structured protein partners (Fig. 5).

Fig. 4
figure 4

Schematic of the structure of Artemis: the nuclease in the N-terminal region and the intrinsically disordered C-terminal region. Artemis recognises other NHEJ components including DNA-PKcs and LigIV via the indicated regions that undergo concerted folding and binding

Fig. 5
figure 5

IUPred and ANCHOR analysis of Artemis with sequence alignment. The C-terminal region of Artemis is predicted to be extensively intrinsically disordered according to IUPred. ANCHOR analysis shows that within the intrinsically disordered region, several segments are likely to interact with other proteins. Sequence alignment also shows that many of these segments are conserved

We previously showed that Artemis residues 485–495 undergo concerted folding and binding with the first two helices of LigIV to form a three-helix bundle (Fig. 6). Residue W489 of Artemis forms a hydrogen bond with D18 of LigIV, stabilised by interactions between F492/F493 of Artemis and F49/F42 of LigIV. Moreover, P487 of Artemis resides in the hydrophobic pocket formed by L53, A52 and F49 of LigIV [16].

Fig. 6
figure 6

Interaction between LigIV and Artemis. The crystal structure of the catalytic domains of LigIV (orange) and the Artemis fragment (amino-acid residues 485–495; silver) is shown on the left. The interaction of the first two helices of LigIV with Artemis is shown in close view on the right (Color figure online)

Although Artemis was predicted to interact with DNA-PKcs via residues 399–404 [44], recent pull-down results from our group show that residues 399–404 are not sufficient. While peptides containing residues 399–408 and 399–426 of Artemis could not pull down DNA-PKcs, residues 385–413 of Artemis could do so, and even showed a stronger interaction with DNA-PKcs compared to full-length Artemis, indicating that residues that are N-terminal to 399–404 are needed for the Artemis/DNA-PKcs interaction. Furthermore, residues 641–660 of Artemis may bind to the second BRCT domain of an adaptor protein, PTIP, via phosphorylation [43].

Thus, it is likely that the Artemis C-terminus binds to several key components, including DNA-PKcs, LigIV and PTIP [45], facilitating colocation of these components with broken DNA ends. The list may be longer since other studies reveal that the Artemis C-terminal region interacts with other proteins. For example, it was shown that Artemis binds to the Mre11/Rad50/Nbs1 complex (MRN complex) in an ATM-dependent manner under ionising radiation [46]. Although the precise binding region is not defined, it is likely that the Artemis C-terminus is involved given that hyperphosphorylation of Artemis, especially S645 of the SQ/TQ site on the C-terminal region, is dependent on ATM. Additionally, Artemis was co-immunoprecipitated with Nbs1, and Nbs1/Mre11 were co-immunoprecipitated with Artemis reciprocally [46].


APLF (Aprataxin- and-PNK-Like Factor) similarly includes multiple binding sites throughout the protein, which facilitate its function both as a stabiliser for the NHEJ holo-complex at DNA-damage sites and as a stimulator of DNA-end ligation [23, 47]. The N-terminal FHA domain of APLF interacts with XRCC4 in a phosphorylation-dependent manner [48]. The amino acid sequence of a large region of APLF in between the structured N- and C-terminal regions is predicted to be intrinsically disordered and is poorly conserved in evolution (Fig. 7). However, a short sequence (residues 182–192) in this region is conserved in evolution from the sponge (Amphimedon queenslandica) to human. This conserved sequence is important for the interaction with Ku [23, 4951], likely through concerted folding and binding. Consequently, this binding site within the intrinsically disordered region of APLF facilitates colocation of the Ku heterodimer with regions of APLF that have globular structures.

Fig. 7
figure 7

Domain and interaction sites of APLF. Crystal structures of the FHA domain [48] and PAR-binding zinc-finger (PBZ) [74] of APLF are shown. The Ku-interaction region with sequence alignment is indicated, along with other potential interaction regions predicted by Anchor [75]


BRCA1 contains 1863 residues with the two most conserved domains in BRCA1 located at the N- and C-termini. The N-terminal RING domain, which forms a heterodimer with another RING domain from BARD1 [52], functions as an E3 ubiquitin ligase [53, 54]. The N-terminal region (1–304) was also recently found to bind the protein OLA1 during centrosome regulation [55]. The C-terminal region of BRCA1 contains tandem BRCT domains, which appear to bind DNA-PKcs in a non-phosphorylation-dependent manner, as discussed above, and also bind phosphopeptides containing the pSXXF motif [5658]. We recently determined the structure of the complex of BRCA1 with the C-terminal phosphopeptide of Abraxas (Fig. 8) [59]. Abraxas is part of BRCA1-A complex facilitating the recruitment of BRCA1 to the DNA-damaged site. In collaboration with Dr. Bin Wang (MD Anderson Cancer Center, Houston, TX), we demonstrated that ionising-radiation-induced and ATM-dependent phosphorylation of a serine residue of Abraxas next to the pSPxF motif induces dimerisation of the BRCT/Abraxas complex. Mutation of the ionising-radiation-induced phosphorylation site of Abraxas leads to a deficiency in BRCA1 accumulation and cellular sensitivity to ionising radiation. These findings suggest a novel role of the BRCT/Abraxas complex in the DNA-damage response [46].

Fig. 8
figure 8

Dimerisation of two BRCT/Abraxas complexes. The surface representation of two BRCT/Abraxas complexes is shown in pink and blue, respectively (PDB code: 4Y18). The motif-interaction site, which contains phosphorylated S406 (pS406), is indicated by the circle formed from a dashed line. Phosphorylation at S404 is indicated as pS404 (Color figure online)

Protein interactions within the N-terminal and C-terminal domains of BRCA1 facilitate its role in the DNA-damage response and DNA repair, while the long and less conserved flexible region between these domains provides additional sites for more dynamic and regulated protein interactions (Fig. 9). This disordered region contains the nuclear localisation signal (NLS; amino-acid residues 500–508) [60], PALB-interaction site (amino-acid residues 1397–1424) [61, 62] and phosphorylation sites that are targeted by ATM, Chk2, Cdk2 and DNA-PK at different phases of the cell cycle [6367]. Although the function of these phosphorylation events is not understood completely, key phosphorylation events include the following: S1423 is phosphorylated by ATM for check-point G2/M function [66], and S988 is phosphorylated by Chk2 [65] for release from Chk2 and HR activity [65, 68]. Thus, the flexible inter-domain region of BRCA1 functions as a string to tether proteins for use as needed.

Fig. 9
figure 9

Domains and interaction sites of BRCA1. The protein sequence alignment and crystal structures of the RING domain [76] and tandem BRCT domains [77] of BRCA1 are shown. The PALB2 interaction region is indicated, along with other potential interaction regions predicted by Anchor [75]


In order for NHEJ to function accurately, spatial and temporal organisation of NHEJ proteins should be as efficient as possible. This is achieved by colocation of components through macromolecular stages, scaffolds and strings. Despite efforts to understand the spatial assembly of NHEJ complexes by observing their structures, the exact coordination of NHEJ proteins at a DSB site remains unclear. This is due to the dynamic and complex nature of NHEJ proteins, which should gather in a DSB-dependent manner and also disassemble quickly after DNA damage is repaired. Therefore, many NHEJ protein–protein interactions are weak or require the presence of DNA. Moreover, the proteins may assemble differently depending on the type of DNA damage. To observe such a flexible and dynamic biological system, we need to design DNA structures that we would like to study, “freeze” intermediate states of the NHEJ process, for instance, using mutations or crosslinking, and sort heterogeneous samples according to structural state. These are extremely challenging tasks for X-ray crystallography. However, recent advances in cryo-EM offer an attractive approach to overcome these difficulties.

Nevertheless, the spatial organisation of NHEJ complexes is better understood than the temporal organisation of NHEJ. Varying models for NHEJ recruitment have been proposed, and our view is that NHEJ is not a pre-programmed process, though it does require the sequential recruitment of stages, scaffolds and strings. The scaffold proteins may have multiple weak interactions with the stage proteins, therefore permitting stabilisation of the scaffolds only when the complete stage is assembled. Alternatively, the interactions may be regulated through allostery involving conformational change; DNA-PKcs conformational changes appear to be activated by post-translational modification, including auto-phosphorylation. To study the temporal organisation of NHEJ proteins, it will be necessary to exploit single-molecule approaches. Indeed, the synapsis by core NHEJ proteins [69] and a dynamic nature of the filament formation of XRCC4 and XLF, which are hard to be visualised by current structural techniques, have been investigated by those approaches [70, 71]. In combination with structural studies of NHEJ complexes, single-molecule approaches may reveal how NHEJ proteins are recruited to DSB sites. However, one of the pitfalls of such in vitro approaches is the assumption that all NHEJ proteins are known. The discovery of PAXX in the past 2 years underlines the possibility that there are components of NHEJ that remain unknown.

Apart from advancing our knowledge of this important DNA DSB-repair system, understanding the interactions of NHEJ components across space and time will indicate novel druggable targets for therapeutic intervention in cancer and a number of rare genetic diseases. Linear foldable regions within intrinsically disordered polypeptides bind to ligandable sites with well-defined pockets [72, 73]. In HR, we have exploited this feature by using fragment-based design to target Rad51-BRCA2 and develop nanomolar drug candidates [73]. We continue to explore the ligandability of Artemis-LigIV and other interactions mediated by the intrinsically disordered C-terminus of Artemis (Esswein SE, Ochi T and Blundell TL, unpublished).

Our research not only sheds light on the spatial and temporal organisation of NHEJ, but also invites speculation as to whether similar stages, scaffolds and strings operate in other complex regulatory systems. The existence of many globular proteins or complexes that bind to multiple components (stages), systems that assemble to form rods and other cell structural elements (scaffolds), and areas of concerted folding and binding within intrinsically disordered regions that facilitate colocation of other globular proteins (strings) suggest that our model may be generic. If so, defining these properties will further explain how other multicomponent complexes are coordinated across space and time in cells and also guide future efforts to regulate these pathways. Such regulation may be possible through modulation of component colocation, particularly by targeting the sites where intrinsically disordered regions interact with globular proteins.