Solution structures of the DNA-binding domains of immune-related zinc-finger protein ZFAT

ZFAT is a transcriptional regulator, containing eighteen C2H2-type zinc-fingers and one AT-hook, involved in autoimmune thyroid disease, apoptosis, and immune-related cell survival. We determined the solution structures of the thirteen individual ZFAT zinc-fingers (ZF) and the tandemly arrayed zinc-fingers in the regions from ZF2 to ZF5, by NMR spectroscopy. ZFAT has eight uncommon bulged-out helix-containing zinc-fingers, and six of their structures (ZF4, ZF5, ZF6, ZF10, ZF11, and ZF13) were determined. The distribution patterns of the putative DNA-binding surface residues are different among the ZFAT zinc-fingers, suggesting the distinct DNA sequence preferences of the N-terminal and C-terminal zinc-fingers. Since ZFAT has three to five consecutive tandem zinc-fingers, which may cooperatively function as a unit, we also determined two tandemly arrayed zinc-finger structures, between ZF2 to ZF4 and ZF3 to ZF5. Our NMR spectroscopic analysis detected the interaction between ZF4 and ZF5, which are connected by an uncommon linker sequence, KKIK. The ZF4–ZF5 linker restrained the relative structural space between the two zinc-fingers in solution, unlike the other linker regions with determined structures, suggesting the involvement of the ZF4–ZF5 interfinger linker in the regulation of ZFAT function. Electronic supplementary material The online version of this article (doi:10.1007/s10969-015-9196-3) contains supplementary material, which is available to authorized users.


Introduction
Autoimmune thyroid disease (AITD) is a general disease caused by the immune system responding to its own normal cells or organs, and to foreign antigens such as bacteria, viruses, and tumors [1][2][3]. The development of antibodies to antigenic thyroid components is a main feature of autoimmune diseases. ZFAT (Zinc finger gene in AITD susceptibility region; also known as ZNF406) was identified as a gene involved in the regulation of the autoimmune system [4]. The ZFAT protein is conserved from fish to human, and the human ZFAT protein is composed of eighteen C 2 H 2 -type zinc-fingers (ZFs) and one AT-hook motif between ZF1 and ZF2 [5] (Fig. 1a). ZFAT is expressed in peripheral B and T lymphocytes, and is also found in the human acute T lymphoblastic leukaemia cell line MOLT-4 and human umbilical vein endothelial cells [6,7]. Notably, the ZFAT-knockdown in MOLT-4 induces apoptosis via the activation of caspases, suggesting that ZFAT is a transcriptional regulator involved in apoptosis and cell survival for immune-related cells. [6]. Furthermore, ZFAT is an essential transcriptional regulator for hematopoietic differentiation and indispensable for mouse embryonic development [8,9], which indicates the critical role of ZFAT not only in AITD but also in a broad range of development and differentiation.
The transcriptional activity of ZFAT is considered to be mediated by its DNA-binding ZFs. The C 2 H 2 -type ZF, consisting of 20-30 residues, forms one N-terminal short antiparallel b-sheet and one helix [10]. The canonical C 2 H 2 ZFs bind to specific DNA sequences, and the amino acids located at positions -1, ?2, ?3 and ?6, from the N-terminal residue of the helix, directly contact a DNA base. The DNA recognition modes by these base-contacting residues were predicted from previous structural analyses, and the relationships between the base-contacting residues and the predicted DNA bases have been summarized as the recognition code [11]. In most cases, the C 2 H 2 -type ZF is repeated from two to more than thirty times in a protein [10]. Such tandem sets of ZFs are typically connected by a well-conserved TGEKP linker sequence [11][12][13][14][15]. These consecutive ZFs are known to bind to cognate DNA sequences as one functional unit [10,[16][17][18][19]. To understand the functional role of ZFAT in the regulation of the immune system, we determined the solution structures of single or consecutive tandem ZFs of ZFAT through an NMR method. We describe the structural features of the ZFAT ZFs, including the structural differences on the putative DNA recognition surfaces among the ZFAT ZFs, and the unique interaction mode within the tandem ZFs of ZF4 and ZF5, which are connected by an uncommon linker sequence.

Protein expression and purification
The DNA sequences encoding the ZFs of the human and mouse ZFAT proteins (SwissProt accession numbers: Q9P243 and Q7TS63) were subcloned by PCR from the human and mouse cDNA clones by the two-step PCR method [20]. The individual domain regions used in this study are listed in Table 1. The cDNA fragments encoding these regions, along with those containing tandem ZF sequences, were cloned into the expression vector pCR2.1-TOPO (Invitrogen, Carlsbad, CA), as a fusion with an N-terminal poly-histidine affinity tag and a tobacco etch virus (TEV) protease cleavage site, and an artificial linker sequence (GSSGSSG) [20]. The actual sequences of the NMR samples contain these seven extra residues at their N-termini. The 13 C/ 15 N-labeled fusion proteins were synthesized by the cell-free protein expression system [21,22], and were purified using a chelating column, as described previously [23,24]. The purified proteins were concentrated to 0.1-1.2 mM in 20 mM Tris-d 11 -HCl buffer (pH 7.0), containing 100 mM NaCl, 1 mM dithiothreitol-d 10 , 50 lM ZnCl 2 , 1 mM iminodiacetic acid (IDA), 10 % D 2 O, and 0.02 % NaN 3 .

NMR spectroscopy and spectral assignments
All spectra were recorded on Bruker Avance 600, 700, 800, and 900 spectrometers at 296 or 298 K. Samples were first screened by 1 H, 15 N-HSQC spectroscopy [25]. The resonance assignments were accomplished using a conventional set of triple resonance spectra, as described previously [23,24], and have been deposited in the Biological Magnetic Resonance data Bank (BMRB; Table 1).
Inter-proton distance restraints were obtained from 15 N and 13 C edited NOESY spectra, both recorded with a mixing time of 80 ms. All spectra were processed using NMRPipe [26], and the programs Kujira [27] and NMRView [28] were employed for optimal visualization and spectral analyses. The green and blue boxes indicate the C 2 H 2 zinc-finger and the AT-hook motif, respectively. The positions of the zinc-fingers with solved structures are marked by asterisks (black human; violet mouse). b Sequence alignment of the ZFAT zinc-fingers. All of the human ZFAT zinc-fingers and mouse ZFAT zinc-fingers (mZF5 and mZF8 in violet) with solved structures are listed, with h and m indicating human and mouse, respectively. The zinccoordinating Cys and His residues are colored cyan and magenta, respectively. The hash mark indicates the residues expected to be involved in DNA recognition. Secondary structures corresponding to the sequence are shown at the bottom

Structure calculations
Automated NOE cross-peak assignments and structure calculations with torsion angle dynamics were performed using the software package CYANA [29,30]. The backbone dihedral angle restraints from the TALOS program [31] were also included for the calculations, with allowed ranges of ±30°. The final structure calculations with CYANA were started from 100 conformers with random torsion angle values. The 20 conformers with the lowest final CYANA target function values were further refined with the AMBER12 program, using an Amber ff99SB force field and a generalized Born model, as described previously [32]. The tetrahedral zinc coordination was restrained by lower and upper distance limits, with force constants of 1000 kcal/mol/Å . All of the structures were validated using MolProbity [33,34] and PROCHECK-NMR [35].

Results and discussion
Structural overview of the ZFAT zinc-fingers The domain architecture of ZFAT is shown in Fig. 1a. The 1 H, 15 N and 13 C assignments of each individual ZFAT ZF ( Fig. 1b) expressed in the cell-free system were obtained by combining selected triple-resonance spectra. By screening the nature of the candidate protein samples, such as expression, solubility, and folding, we finally determined the following thirteen ZFAT ZF solution structures: human ZF2, ZF3, ZF4, ZF5, ZF6, ZF10, ZF11, ZF12, ZF13, ZF14 and ZF15; and mouse ZF5 and ZF8. All of the individual ZFs consisted of one N-terminal short antiparallel b-sheet and one helix (Figs. 1b, 2; Table 1), and their overall structures were similar to each other. On the other hand, the compositions of the putative DNA base-contacting surfaces differed among the solved zinc-finger structures (Fig. 3), suggesting functional divergence regarding their involvement and sequence specificity in DNA recognition. Among the determined ZFAT ZF structures, six ZFs (i.e. hZF4, hZF5, mZF5, mZF8, hZF10, and hZF12) have a bulged-out helix structure, instead of a canonical helix structure (Fig. 4a). In a canonical helix, the zinc atom is held in a tetrahedral complex by the two Sc of the C-X 2-4 -C sequence and the two Ne2 of the H-X 3 -H sequence, where C represents Cys, H is His, and X is any amino acid residue, and the subscript number represents the number of amino acid residues (Figs. 1b, 4a). On the other hand, the zinc atom in a bulged-out helix is held by the two Sc of the C-X 2 -C sequence and the two Ne2 of the H-X 4 -H sequence (Figs. 1b, 4a). The bulged-out helix is amphipathic, with the side-chains of their hydrophobic faces packing the core of the domain and the exposed surface of the helix facing the hydrophilic residues involved in DNA recognition [37]. These structural features of the canonical or bulged-out helices are common among all of the ZFAT ZFs (Fig. 1b), and are also similar to those of other canonical C 2 H 2 ZFs.
C-X 4 -C type zinc-finger and bulged-out helixcontaining zinc-fingers There are two interesting structural features in the folds of the ZFAT ZFs. The first is that all of the ZFAT ZFs, except ZF11, have a short two-residue-spacer between the two zinc-coordinating cysteines, which is typically observed in Krüppel-type ZFs (i.e. C-X 2 -C). On the other hand, the ZF11 ZF has a long four-residue-spacer in the corresponding region (i.e. C-X 4 -C; Fig. 1b), yielding an extended b loop structure between the two antiparallel b strands (Fig. 4b). The N-terminal antiparallel b loop structure, which is formed by the interaction of zinc with the two zinc-coordinating cysteines, is essential for the stability of the overall ZF structure. When the zinc ion binds to an unfolded apo-form finger, it first interacts with the Cys residues and subsequently with the His residues [10]. The difference in the length between the two Cys residues is assumed to either modulate the stability or facilitate the interactions with other intramolecular ZFs [38]. The b loop structure may also function as a scaffold and affect the DNA-binding activity [39].
The second feature is that the ZFAT ZFs have an abundance of the abovementioned bulged-out helix structures (Figs. 1b, 4a). In the SMART database, (containing 274,117 C 2 H 2 ZFs), approximately 80 % of the C 2 H 2 ZFs (212,646) have the canonical H-X 3 -H motif, while only 15 % of the ZFs (40,767) have the H-X 4 -H motif. Notably, the ZFAT protein has eight bulged-out helix ZFs (44 %), including five with determined structures (i.e. hZF4, h/mZF5, mZF8, hZF10, and hZF12) and three putative bulged-out helix-containing ZFs (i.e. ZF1, ZF17, and ZF18), as judged from its amino acid sequence (Figs. 1b,  2). The percentage of bulged-out helix-containing ZFs of ZFAT, 44 %, is higher than that of frog TFIIIA (30 %), another known bulged-out helix ZF-containing protein [40]. The four-residue-spacing between the two histidines of the bulged-out helix, in which one amino acid is inserted into the canonical helix, is assumed to be critical for the structure and function of ZFAT. In order to maintain an ideal position for zinc coordination (see the two histidines in Fig. 4a), the additionally inserted residue causes the helix to bulge out slightly relative to those of the canonical helix ZFs. Consequently, this H-X 4 -H region forms a slightly larger and looser helical structure, as compared with the canonical H-X 3 -H helix, without distorting the overall ZF structure (Fig. 4a).
Although the backbone i-i-5 hydrogen bond, known as a p hydrogen bond, was formed between two His residues in each bulged-out helix, the backbone dihedral angles were quite different from those of the ideal p-helix (u = -57.1; w = -69.7) [41,42], as well as those of the ideal a helix (u = -65.0; w = -40.0) [43,44] (e.g., u = -111.0 ± 16.1 for His349; and w = -36.6 ± 12.7 for Val348, respectively, in hZF4; see also Table 2). Therefore, as defined in the structural study of TFIIIA by Wuttke et al. [40], we used the term 'bulged-out helix' to describe an H-X 4 -H ZFAT ZF helix in this study, rather than the term 'p-helix'. The unique bulged-out helix structure can allow distinct non-coordinating amino acids located in invariant positions to form hydrogen bonds with specific nucleotide bases in the major groove of DNA [10]. It also allows the canonical ZF helix, which is located adjacent to the bulged-out helix, to form extensive interactions with DNA [40].

Expected DNA recognition sequences of the ZFAT zinc-fingers
As for the molecular surfaces of the ZFAT ZFs, although all of the folds of the ZFAT ZFs are well conserved, the exposed surface of the helix for putative DNA-binding has a wide variety of physiochemical properties in the individual ZFAT ZFs ( Fig. 3; see also Table 1). This suggested that some of the ZFAT ZFs contribute to the recognition of different DNA sequences or protein interactions [19]. In order to predict the DNA sequences recognized by the ZFAT ZFs, we applied the DNA-ZF recognition code [11], using our structural data. The possible DNA-contacting    (hZF2, blue). The zinc atom is depicted by a yellow ball. The zinccoordinating Cys and His residues are depicted by cyan and magenta sticks, respectively. b Comparison of the loop structure of the C-X 4 -C type (hZF11, green) with that of the standard C-X 2 -C type zinc-finger (hZF12, blue). Other color codes are the same as in (a). The Ca positions of the protein-DNA complex structure of the DNA-binding zinc-finger of GLI (PDB ID: 2GLI) are used as the reference Ca positions. The position of the DNA (orange) is also from the GLI structure, for reference residues of the ZFAT ZFs at the key positions within the canonical or bulged-out helices (left panels), and the nucleotide preferred by each key residue of each ZF (right panels), are shown in Fig. 5. As for the bulged-out helixcontaining ZFs, the residues involved in the extended interaction with DNA [40] are also shown in Fig. 5 (see the ?10 residues in ZF4 and ZF17). From this prediction, it is plausible that the N-terminal half of the ZFAT ZFs may prefer DNA subsites containing AT-rich sequences (Fig. 5). This assumption is consistent with the fact that the AT-hook region prefers to bind an AT sequence existing between ZF1 and ZF2 (Fig. 1a). On the other hand, the C-terminal half of the ZFAT ZFs may prefer the DNA subsites containing GC-rich sequences (Fig. 5). Since consecutive ZFs bind to their corresponding DNA sequences in an anti-parallel fashion, where one ZF binds to one triplet DNA sequence and the adjacent C-terminal ZF binds to another triplet on the 5 0 -side [45,46], the DNA sequence preferentially recognized by ZFAT may be a GCrich sequence followed by an AT-rich sequence. Many C 2 H 2 ZF proteins contain tandemly arrayed ZFs connected by specific linker sequences, while other members contain single or duplicated pairs of ZFs [10,17]. Since the C 2 H 2 ZFs are frequently involved in DNA-binding, variations in the numbers of ZFs and their spacing may affect DNA recognition [47]. Especially, multiple tandemly arrayed C 2 H 2 ZFs can bind to the cognate DNA through two to three consecutive fingers [10,[16][17][18][19]. Based on the ZFAT domain architecture and the amino acid lengths of the linkers between the individual ZFAT ZFs, the following   (Fig. 1a). However, the precise target DNA sequence of ZFAT could not be identified, because of the lack of information about how these tandem ZF units cooperate with each other in recognizing a particular DNA sequence and how the bulged-out helix recognizes bases in a particular DNA sequence.

Structural analysis of tandemly arrayed ZFAT zincfingers
In order to reveal the structural features of the tandemly arrayed ZFAT ZFs, we tried to determine the tertiary structures of tandem ZFAT ZF regions. We determined the solution structures of the tandem repeats ZF2-ZF3-ZF4 and ZF3-ZF4-ZF5 (Figs. 6a, 7; Table 1). The structures of the individual ZFs in the tandem ZF regions are quite similar to the corresponding isolated ZFs. Furthermore, the chemical shifts of almost all of the signals in both the tandem ZF regions and the isolated ZFs did not change, except for those detected in the terminal regions (data not shown). However, we found that the chemical shifts of the Ile352 (in the interfinger linker connecting ZF4 and ZF5) and Tyr330 (in ZF4) residues were quite different between the cases of the isolated ZF and the tandem ZF (Fig. 6b). Additionally, we observed several NOEs from Ile352 (in the interfinger linker connecting ZF4 and ZF5) to Tyr330 (in the b loop of ZF4), His349 (in the helix of ZF4), and Gln354 (in the b strand of ZF5). Although we could not determine the position of ZF5 relative to ZF4, because of the lack of clear interfinger NOEs between ZF4 and ZF5, these NOEs suggested that Ile352 may function as a clamp to limit the interdomain mobility between ZF4 and ZF5 (Figs. 6a, 7, 8a).
The linker sequences between the canonical DNAbinding C 2 H 2 ZFs are highly conserved, and are typically TGEKP. This sequence is necessary for DNA-binding and Fig. 6 The uncommon interfinger linker reduces the flexibility. a The solution structures of the tandem ZF regions, ZF2-ZF3-ZF4 (top) and ZF3-ZF4-ZF5 (bottom). ZF2, ZF3, ZF4, and ZF5 are colored magenta, green, blue, and orange, respectively. In each structure, the central ZF is used for fitting. b Comparison of the 1 H, 13 C HSQC spectra between ZF3-ZF4-ZF5 (top, black) and ZF5 (top, red), and ZF3-ZF4-ZF5 (bottom, black) and ZF4 (bottom, red). Signal assignments are labeled in the spectra the interactions between two neighboring ZFs [11][12][13][14][15]. This canonical TGEKP linker is flexible in solution in the absence of DNA, whereas the linker in the DNA-bound complex forms a compact structure with a ''snap-lock'' helix-cap for stabilization of the DNA complex structure [13]. In addition, the TGEKP linker can be phosphorylated or acetylated, to regulate the DNA-binding activity of the tandemly arrayed C 2 H 2 ZFs [49][50][51]. Intriguingly, this canonical linker sequence is not conserved in the several ZFAT interfinger regions. The linker sequence intervening between ZF4 and ZF5 is KKIK, which is completely different from the canonical linker sequence (Fig. 8b). In the case of the two tandem ZFs in Tramtrack, in which the linker sequence is KRNVKV (Fig. 8b), this linker is more flexible than the canonical TGEKP linker sequence, even upon DNA binding [52]. This flexibility reflected the absence of the helix-cap by the interfinger linker upon DNA binding and might contribute to the DNA binding mode where the DNA structure was distorted from the B form [13,52].
In contrast, the structure of the KKIK linker between ZF4 and ZF5, which is another atypical linker sequence, was slightly restrained, even in the absence of DNA (Figs. 6a, 8). In the case of three tandemly repeats of TZFP which has the more rigid linker between ZF2 and ZF3, the mutation of the native GAAP linker to the canonical TGEKP linker obviously decreased the DNA binding of ZF3 [48]. The other interfinger linker sequences of ZFAT also differ from the highly conserved canonical TGEKP linker sequence, which may be related to the functions of  the ZFAT ZFs in gene regulation. Further structural and biochemical analyses involving DNA-bound forms of ZFAT with tandem ZFAT ZFs, bulged-out helix-containing ZFs, and ZFAT interfinger linker sequences will be necessary to understand the molecular function of ZFAT.