Structural insights into Arabidopsis ethylene response factor 96 with an extended N-terminal binding to GCC box

The phytohormone ethylene is widely involved in many developmental processes and is a crucial regulator of defense responses against biotic and abiotic stresses in plants. Ethylene-responsive element binding protein, a member of the APETALA2/ethylene response factor (AP2/ERF) superfamily, is a transcription factor that regulates stress-responsive genes by recognizing a specific cis-acting element of target DNA. A previous study showed only the NMR structure of the AP2/ERF domain of AtERF100 in complex with a GCC box DNA motif. In this report, we determined the crystal structure of AtERF96 in complex with a GCC box at atomic resolution. We analyzed the binding residues of the conserved AP2/ERF domain in the DNA recognition sequence. In addition to the AP2/ERF domain, an N-terminal α-helix of AtERF96 participates in DNA interaction in the flanking region. We also demonstrated the structure of AtERF96 EDLL motif, a unique conserved motif in the group IX of AP2/ERF family, might involve in the transactivation of defense-related genes. Our study establishes the structural basis of the AtERF96 transcription factor in complex with the GCC box, as well as the DNA binding mechanisms of the N-terminal α-helix and AP2/ERF domain. Electronic supplementary material The online version of this article (doi:10.1007/s11103-020-01052-5) contains supplementary material, which is available to authorized users.


Introduction
Plants are exposed to natural environments that may negatively affect their growth and development. To rapidly adapt to environmental change, numerous genes are regulated in response to biotic and abiotic stresses (Xie et al. 2019), such as herbivore damage, pathogenic infection, UV irradiation, temperature variation, drought, and high salt content (Bechtold and Field 2018;Kazan 2015;Penninckx et al. 1996Penninckx et al. , 1998. These stresses can induce the biosynthesis of ethylene, a gaseous plant hormone confirmed as a mediator of plant stress responses (Ecker 1995). A cis-acting element GCC box motif that constitutes the conserved sequence AGC CGC C is widely present in the promoter region of ethylene-inducible genes and has been suggested as a core sequence of the ethylene-responsive element (ERE) for ethylene signaling in plants (Buttner and Singh 1997;Hao et al. 1998;Sessa et al. 1995;Xu et al. 1998).
Such a group of transcription factors in plants is activated by ethylene signaling under biotic or abiotic stresses. APETALA2/ethylene response factors (AP2/ERFs) are a superfamily of plant-exclusive transcription factors involved in the ethylene-inducible response (Gutterson and Reuber 2004;Mizoi et al. 2012;Najafi et al. 2018). APETALA2 was first isolated from the floral development-related proteins in Arabidopsis (Jofuku et al. 1994). Additionally, an AP2-LIKE ERE BINDING FACTOR (ERF), also known as ERE BINDING PROTEIN (EREBP), was first isolated as the GCC box-binding protein from tobacco . AP2/ERFs contain a highly conserved DNA-binding domain consisting of approximately 60 amino acids (Nakano et al. 2006), specifically to recognize the GCC box at the upstream operators of defense-related genes, such as pathogen-inducible PLANT DEFENSIN 1.2 (PDF1.2) and PATHOGENESIS-RELATED PROTEIN 4 (PR4) Shinshi et al. 1995). Therefore, AP2/ERFs play an important role in the regulation of defense responses in plants (Chandler 2018).
The AP2/ERF family has 147 members that can be divided into three subfamilies: AP2, RAV, and ERF (Nakano et al. 2006). The AP2 subfamily is composed of one or two AP2 domains, which primarily participate in the process of floral development (Elliott et al. 1996). The RAV subfamily contains an ERF domain and a B3 DNAbinding domain involved in ethylene and brassinosteroid responses (Hu et al. 2004;Witthoft and Harter 2011). The largest ERF subfamily comprises 122 members and mainly involves an ERF domain that interacts with the GCC box DNA sequence (Brown et al. 2003;Hao et al. 1998). The ERF family can also recognize non-GCC box cis elements. For instance, the Dehydration-Responsive Element Binding Proteins (DREB1A and DREB2A) bind to the DRE box (5′-[A/G]CCGAC-3′) of dehydration-and cold stress-related genes (Liu et al. 1998). AtERF13,RAP2.4,and RAP2.4 L are involved in the abscisic acid (ABA) response by binding to the ABA-related cis-acting coupling element (CE1) upstream of the CE1 BINDING FACTOR (CEBF) in Arabidopsis (Lee et al. 2010;Novillo et al. 2007). The ERF subfamily can be further divided into ten groups (groups I-X) based on sequence similarity. Among these, groups VIII and IX play an important role in the interaction between plant and pathogen (Nakano et al. 2006). The members of group IX-c, such as Octadecanoid-Responsive AP2/ERF 59 (ORA59), Transcriptional Regulator of Defense Response 1 (TDR1), AtERF14, AtERF15, as well as members of group IX-a, such as AtERF1 and AtERF13, positively regulate pathogen defense responses by recognizing the GCC box of defense-related genes basic chitinase (Chi-B) and PDF1.2 (Berrocal-Lobo et al. 2002;Onate-Sanchez et al. 2007;Pre et al. 2008;Zhang et al. 2015). Furthermore, group IX-c members of the AP2/ERF family interact with Mediator25 (MED25) using a highly conserved sequence EDLL motif (also known as CMIX-1) at the C-terminus (Li et al. 2017;Nakano et al. 2006;Tiwari et al. 2012). Yeast two-hybrid analysis revealed that AtERF98 loses its binding function to MED25 when the conserved leucine is replaced by valine in the EDLL motif (Tiwari et al. 2012).
Arabidopsis Ethylene Response Factor 96 (AtERF96, AT5G43410), a member of ERF group IX, contains an AP2/ERF domain and an EDLL motif, and positively regulates defense responses against the necrotrophic pathogens Botrytis cinerea and Pectobacterium carotovorum (Catinot et al. 2015). Previous studies indicate that the expression level of AtERF96 can be induced via ethylene and jasmonate (JA) signaling pathways, but is antagonized by the salicylic acid (SA) response (Catinot et al. 2015;Zander et al. 2014). The impact of TGACG sequence-specific binding proteins (TGA2, TGA5, TGA6) on AtERF96 mRNA levels has been confirmed (Phukan et al. 2017;Zander et al. 2014). Additionally, microarray analysis revealed that 45% of the 126 up-regulated genes in transgenic overexpressing AtERF96 are strongly related to the plant defense response (Catinot et al. 2015).
Until now, we lack a full-length protein structure of the AP2/ERF transcription factor. Only the GCC box-binding domain (GBD, namely, the AP2/ERF domain) of AtERF100 (previously designated as ERF1 and renamed At4g17500 by Nakano et al. 2006) was previously determined using nuclear magnetic resonance spectroscopy (NMR; PDB ID: 1GCC) (Allen et al. 1998). The AP2/ERF domain is composed of three β-sheets and an α-helix; the β-sheets interact monomerically with the target 11 base pairs of double-strand DNA (5′-GCT AGC CGC CAG C-3′). Even if the classification of AP2/ERFs provides some information on their potential role in plants, only a limited number of AP2/ERF transcription factors have been functionally characterized for their unstable manner. Considering the protein stability and smaller size properties, AtERF96 was chosen as the candidate to study its protein structure and DNA binding ability. Here, we resolved the crystal structure of the AtERF96-DNA complex, including an AP2/ERF domain and a unique EDLL motif. We determined the AtERF96-GCC11 binding mechanism, demonstrated that the N-terminal α-helix of AtERF96 binds to the DNA minor groove, and clarified the influence of AtERF96 on the target gene expression by transactivation analysis.

Cloning, expression, and purification of AtERF96
The cDNA of full-length Arabidopsis thaliana ERF96 (AT5G43410) was cloned into pET28a expression vector (Novagen) with a hexahistidine tag (6xHis-tag) at the N-terminus. The expression vector was transformed into Escherichia coli strain BL21 (DE3) (Novagen), and then incubated at 37 °C in a 2 L flask with shaking until 0.4-0.6 absorbance at 600 nm was achieved. The expression of AtERF96 protein was induced by 0.1 mM isopropyl β-D-1thiogalactopyranoside (IPTG) for 18 h at 16 °C. The cells were harvested by centrifugation at 9820×g for 30 min at 4 °C, then resuspended and lysed in lysis buffer (30 mM HEPES pH 8.0, 500 mM NaCl, 20 mM imidazole, 0.5 mg/ mL DNase I). The cells were lysed by sonication and the cell debris was removed via centrifugation at 18,900×g for 25 min at 4 °C. Protein purification was performed by Fast Protein Liquid Chromatography (FPLC) using the AKTA prime plus system (GE Healthcare). The filtered supernatant was applied to a 5 mL HisTrap™FF column (GE Healthcare) pre-equilibrated with binding buffer (30 mM HEPES pH 8.0, 500 mM NaCl, 20 mM imidazole). Proteins were eluted with elution buffer (30 mM HEPES pH 8.0, 500 mM NaCl, 500 mM imidazole). The purified protein solution was applied to a HiTrap Heparin HP column (GE Healthcare) to remove endogenous DNA fragments of the host cells. The column was pre-equilibrated with buffer A (30 mM HEPES pH 8.0), and the protein was eluted using the linear gradient of buffer exchange from 0 to 100% with buffer B (30 mM HEPES pH 8.0, 1 M NaCl). The eluted proteins were further desalted to storage buffer (30 mM HEPES pH 8.0, 250 mM NaCl, 10% glycerol) using a HiTrap™ Desalting column (GE Healthcare). Purified protein concentrations were determined at 595 nm absorbance using an ELISA reader (FlexStation 3, Molecular Devices) and the standard curve method. The Coomassie brilliant blue G250 (CBB) protein assay solution (5×) (Bio-Rad) was used as the blank solution, and a dilution series of CBB mixed with 5 µL bovine serum albumin (BSA) at 1000, 500, 250, and 125 µg/mL were used as standards. The purified AtERF96 proteins were concentrated to 10 mg/mL using 10 kDa centrifuge tubes (Amicon Ultra-15 Centrifugal Filter; Merck Millipore) and stored at − 80 °C.

Site-directed mutagenesis
The AtERF96 single and double mutations were generated using the QuikChange Lightning Site-Directed Mutagenesis Kit (Agilent). The AtERF96 mutations were introduced into cDNA fragments through PCR using the primers listed in Table S1, and the fragments were cloned into the pET28a expression vector. Protein expression and purification were performed as for wild-type ERF96.

Preparation of fluorescein-labelled double-stranded DNA probes
Individual single-stranded oligonucleotides of GCC-box fragments were synthesized by commercial gene synthesis (Genomics). The 5′ end of forwarding oligonucleotides were labeled with fluorescein. Annealing of the two complementary strands was performed in 30 mM HEPES (pH 7.4) by heating at 95 °C for 1 min, followed by slowly cooling to room temperature for 20 min. The concentration of the annealed DNA probes was measured by spectrophotometry DeNoVix) and stored at − 20 °C.

Size-exclusion chromatography (SEC)
The SEC column was pre-equilibrated with one column volume of GF1 buffer (30 mM HEPES pH 8.0, 500 mM NaCl, 10% glycerol). To determine the molecular weight of the target protein, 300 µL of protein standard (Bio-Rad) containing γ-globulin (bovine, 158 kDa), ovalbumin (chicken, 44 kDa), myoglobin (horse, 17 kDa), and vitamin B12 (1.35 kDa) was applied to the column, and the elution volumes of the standards were plotted against the logarithm of the standards' molecular weights.
The polymer characterization of AtERF96 proteins was performed by the AKTA prime plus system (GE Healthcare) using a Superdex 75 column (GE Healthcare). The AtERF96 protein sample (< 5 mL volume) was applied to the column, and the fractions of elution peaks, including monomeric AtERF96 proteins, were pooled and concentrated to 10 mg/ mL using a 10 kDa centrifuge tube.
The binding analysis of ERF96-GCC-box was performed by the AKTA prime plus system (GE Healthcare) using a Superdex 75 column (GE Healthcare). A mixture of AtERF96 protein and double-stranded GCC12 DNA fragments were pre-incubated on ice at a 1:2 molar ratio for 30 min. The AtERF96 protein, double-stranded GCC12 DNA fragments, and the ERF96-GCC12 complex were applied to the column with GF2 buffer (30 mM HEPES pH 8.0, 62.5 mM NaCl, 10% glycerol). The fractions of each elution peak were pooled and concentrated to 10 mg/mL using a 10 kDa centrifuge tube.

Fluorescein-based electrophoretic mobility shift assay (fEMSA)
The fluorescein-labeled probes and AtERF96 recombinant proteins were incubated in 30 mM HEPES (pH 7.4), 62.5 mM NaCl, and 5% glycerol for 30 min at room temperature in the dark. A 10% polyacrylamide gel was pre-run at 120 V for 40 min at 4 °C. Samples were mixed with 5× loading dye and run at 120 V for 90 min at 4 °C in the dark. The gel was scanned for fluorescent band shift using the FluorChem™ M system (ProteinSimple) and a luminescence imaging system (Fuji LAS-3000). Quantitative analysis of fluorescent band shift was performed using ImageJ software, with the band intensity of the AtERF96-GCC box set as a baseline (100%) to determine the relative binding level.

Dynamic light scattering (DLS) assay
The purified AtERF96 protein sample (20 µL at 1 mg/mL) was loaded into the cuvette, and the particles of the protein molecules in the solutions were measured at 25 °C. The batch light scattering data were recorded using the DynaPro Plate Reader I (Wyatt Technology) and analyzed with 1 3 DYNAMICS 7.0 software (Wyatt Technology). The diffusion coefficient was calculated from the intensities of light scatter from the molecule particles, and further analysis of the hydrodynamic radius, diameters, and molecular weights of the target protein particles by the Stokes-Einstein Law is described as follows: where D is the diffusion constant (m 2 /s), k B is Boltzmann constant, T is the absolute temperature, π is the ratio of a circle's circumference to its diameter, η is the dynamic viscosity, and r is the radius of the spherical particle.

Fluorescence polarization (FP) assay
Purified AtERF96 proteins were desalted to FP buffer (30 mM HEPES pH 7.4, 250 mM NaCl, 10% glycerol) using a HiTrap™ Desalting column (GE Healthcare), and the concentration was determined by the Bradford protein assay. The AtERF96 protein samples were two-fold serially diluted in FP butter to 16 or 24 concentrations, and 50 µL of each diluted protein sample was added to a 96-well plate, along with 50 µL of 20 nM GCC12 probes in each sample well. A set of wells containing 100 µL FP buffer, and another set of wells containing 50 µL FP buffer mixed with 50 µL of 20 nM GCC12 probes were used as blanks and controls, respectively. FP enables the study of molecular interactions by monitoring changes in the apparent size of fluorescentlylabelled or inherently fluorescent molecules, which are often referred to as the tracer or the ligand (Checovich et al. 1995;Heyduk et al. 1996;Moerke 2009). The samples of fluorescent molecules were excited by plane-polarized light, and the emission spectra were recorded and analyzed by PARA-DIGM™ (Beckman Coulter/Molecular Devices). Quantification of fluorescence polarization (FP) is defined as the difference between the emission intensities of horizontally ( F ∥ ) and perpendicularly polarized light ( F ⟂ ) to the excitation light plane normalized by the total fluorescence emission intensity (Moerke 2009). The formula of FP is described as follows: where P is the polarization obtained by subtracting the blank value of both the horizontally and perpendicularly polarized light. The anisotropic levels of polarized fluorescence were plotted against the concentrations of protein samples using Prism 7 (GraphPad Software, Inc.) with the two-site binding equation. The dissociation constant (K d ) is determined by the correlation between polarizations and sample concentrations, and the formula of two-site binding is described as follows: where x is the protein concentration, y is the polarized value. B maxHi and B maxLo are the maximum specific bindings to the two sites in the same units as y. K dHi and K dLo are the equilibrium binding constants, in the same units as x. All experiments were done from three independent replicates.

Protoplast transactivation analysis (PTA)
The experimental procedure is described in a previous report (Yoo et al. 2007). Two kinds of plasmids were used: (1) A plasmid contained CaMV 35S promoter with AtERF96 or AtERF96 mutant genes as the effector.
(2) The other plasmid contained PDF1.2 GCC box promoter with firefly LUX (fLUC) gene as the reporter. Two plasmids were co-transfection into protoplasts at the same time. Wild-type Arabidopsis plants were grown on sterile soil in an environment-controlled chamber. True leaves number 5-7 from 4-week-old plants were chosen before flowering and 1-mm leaf strips were cut from the middle part of a leaf. Leaf strips were quickly and gently transferred into the prepared enzyme solution (20 mM 2-(N-morpholino) ethanesulfonic acid (MES) pH 5.7, 1.5% (w/v) cellulase R10, 0.4% (w/v) macerozyme R10, 0.4 m mannitol, 20 mM KCl). To enhance enzyme solubility, the solution was heated at 55 °C for 10 min to inactivate DNase and proteases. While the solution was cooling to room temperature (25 °C), 10 mM CaCl 2 , 1 mM β-mercaptoethanol, and 0.1% BSA were added. Leaf strips were vacuum-infiltrated for 30 min in the dark using a desiccator, then the digestion was continued in the dark at room temperature for at least 3 h. The enzyme/protoplast solution was diluted with an equal volume of W5 solution (2 mM MES pH 5.7, 154 mM NaCl, 125 mM CaCl 2 , 5 mM KCl) before filtration to remove undigested leaf tissues. The enzyme/protoplast solution was filtered using 75-µm nylon mesh wetted with W5 solution, centrifuged at 200×g for 2 min to pellet the protoplasts, then the supernatant was removed and the pellet was re-suspended in W5 solution with gentle swirling. Protoplasts were centrifuged again for 15 min to remove W5 solution, and the protoplast pellet was re-suspended with MMG solution (4 mM MES pH 5.7, 0.4 m mannitol, 15 mM MgCl 2 ) at room temperature. Ten microliter of DNA plasmids and 100 µL of protoplasts were gently mixed in the microfuge tube. Then, 110 µL polyethylene glycol (PEG) solution (40% (w/v) PEG4000, 0.2 m mannitol, 100 mM CaCl 2 ) was added and mixed gently, and the transfection mixture was incubated at room temperature for 10 min. The transfection was stopped by diluting the mixture with 400 µL W5 solution and gentle mixing. The protoplast mixture was centrifuged at 100×g for 2 min to remove the supernatant and re-suspended gently with 1 mL WI solution (4 mM MES pH 5.7, 0.5 m mannitol, 20 mM KCl). Protoplasts were transferred to a tissue culture plate and incubated at room temperature for 8 h, then re-suspended and harvested by centrifugation at 100×g for 2 min to remove the supernatant. Protoplast lysis buffer (100 µL) was added to the protoplasts and mixed vigorously by vortexing for 10 s, then incubated on ice for 5 min and centrifuged at 1000×g for 2 min. Twenty microliter of lysate was added to 100 µL luciferase mix (Dual-Luciferase® Reporter Assay System, Promega), and the luciferase activity was measured firstly with firefly luciferase (LUC) and then measured renilla luciferase (REN) as control reporter. All luminance was detected by a luminometer (Infinite M200 pro, TECAN). Final relative luminescence unit (RLU) could be obtained by LUC/ REN. All experiments were obtained from at least three independent replicates.

Protein crystallization and data collection
The AtERF96 protein and GCC11 double-stranded DNA probe (5′-TAG CCG CCAGC-3′) were incubated in a tube at a 1:2 molar ratio, and concentrated to 6 mg/mL with GF2 buffer for crystallization. Screening for suitable crystallization conditions was performed using the Crystal Phoenix Liquid Handling System robot (Art Robbins Instruments, LLC). The program was set to a sitting-drop method, which dispensed an equal volume of the protein-DNA mixture and screening buffer to a volume of 1 µL to each well of a 96-well plate. The AtERF96-GCC11 complex crystals were observed at a temperature of 295 K at four crystal-

Structure determination and refinement
Diffraction data were indexed, integrated, and scaled using the HKL2000 package (Otwinowski and Minor 1997). The crystallographic structure was solved by the PHENIX platform (Adams et al. 2010). The assessment of data quality was analyzed by the phenix.xtriage program. Data from the crystal that grew in Natrix™ 2 No. 6 had the best diffraction quality, and the resolution limit reached 1.76 Å. The AtERF96-GCC11 complex was co-crystallized in space group P1 2 1 1, which comprised of one AtERF96 and one GCC11 DNA fragment in an asymmetric unit. Twinning analysis by the phenix.xtriage program showed that the data consists of five pseudo-merohedral twins with 3-fold axes (− h − l, k, h/l, k, − h − l) and 2-fold axes (l, − k, h/h, − k, − h − l/− h − l, − k, l).  (Allen et al. 1998) as a template model. The unknown region of the AtERF96 structure was built manually using COOT software, according to the F o -F c electron density map. The resulting electron density map was sharpened by density modification using RESOLVE (Afonine et al. 2012). Refinement was continued with several cycles of positional, B-factor, occupancies, and TLS (Translation-Libration-Screw-rotation) refinement. Data was detwinned against the twin operators by phenix.xtriage, and further improvement of the density map was achieved by using the twin fraction refinement by REFMAC5 of the CCP4 platform (Murshudov et al. 2011;Winn et al. 2011) to filter out those small twin fractions so that the major twin domain remains. The revised structure factor data was refined again by phenix.refine, using new R-free flag for several cycles. Validation was performed by the Mol-Probity program (Chen et al. 2010) to check the real-space correlation, molecular geometry, and Ramachandran plots. Several Ramachandran outliers might be caused by the weak electron density in disordered regions. All structural models were generated using PyMOL (Schrödinger, LLC).

AtERF96 recognizes the core sequence of the GCC box motif
The full-length AtERF96 protein consists of 131 amino acids. We constructed and expressed a series of AtERF96 proteins, including wild-type and different mutants, in an E. coli system, and purified these using fast protein liquid chromatography (FPLC). Two elution peaks of the AtERF96 protein from the size-exclusion chromatography (SEC) analysis were determined at approximately 286.5 and 26.7 kDa (Fig. S1A). However, the molecular weight of the AtERF96 protein ranges between 15 and 20 kDa based on SDS-PAGE (Fig. S1B). We used dynamic light 1 3 scattering (DLS) to further confirm protein homogeneity and the size distribution profile. The results showed that the precise monomeric form (62.7 mL) was 17 kDa, which is consistent with the results of the SDS-PAGE analysis ( Fig. S1D and E). The AtERF family widely regulates defense-related genes by recognizing the GC-rich sequences at the upstream promoter. Hence, we designed the SEC experiments to clarify whether the AtERF96 protein interacts with the GCC box motif. An earlier elution volume of the SEC trace indicated that the AtERF96 protein interacts with the GCC12 DNA probe composed of 12 base pairs (Fig. 1a). To determine whether the length of the GCC box sequence influences the binding ability of AtERF96, we designed different lengths of GCC probes comprised of a core sequence and a variable flanking region according to the GC-rich promoter sequence in Arabidopsis (Table S2). Fluorescence-based electrophoretic mobility shift assay (fEMSA) analysis showed that all GCC probes bound to the AtERF96 protein, especially GCC11 and GCC12 (Fig. 1b). Therefore, we co-crystallized AtERF96 with the GCC11 DNA site and determined the structure to a resolution of 1.76 Å with final R work /R free values of 20.7%/22.7% (Fig. S1C, Table 1).

Complex structure of the AtERF96-GCC11
The complex structure consists of the AtERF96 protein with all 131 amino acids, and a double-stranded GCC box motif with 11 base pairs (Fig. 1c). The AtERF96 structure is composed of five α-helices and three β-sheets, including an AP2/ ERF domain (K14-E74, β1-β3) for target gene recognition, as well as an EDLL motif (F105-L119, α5) for transcriptional activation. The three-stranded antiparallel β-sheet fragment of the AP2/ERF domain binds to the GCC11 motif and crosses the adjacent major groove region. We found that AtERF96 and AtERF100 could be superimposed with a backbone root-mean-square deviation of 1.31 Å across 55 Cα atoms in the AP2/ERF domain (Fig. S2). The front of the AP2/ERF domain is the N-terminal α-helix (M1-G9, α1), which docks into the minor groove of the GCC11 motif. A linker consisting of eight residues (A10-G17) connects the α1 helix and β1 sheet, gripping one strand of the DNA double helix between the α1 helix and the β1 sheet of the Fig. 1 Characterization and crystal structure of the AtERF96-GCC box complex. a SEC traces of the AtERF96 protein, GCC12 probe, and AtERF96-GCC12 complex are shown as orange, gray, and blue lines. b EMSA binding analysis of AtERF96 proteins with vari-ous GCC probes. c Ribbon representation of crystal structure of the AtERF96-GCC11 complex. The N-terminal binding region, AP2/ ERF domain, and C-terminal EDLL motif are colored in green, red, and yellow AP2/ERF domain (Fig. 1c). Extending from the AP2/ERF domain, three α-helices (α3-α5), including an EDLL motif, constitute the C-terminal region (Y75-K131) in a triangular-shaped architectural design. Most residues of the AP2/ ERF domain have a positively charged electric potential and are highly conserved in group IX of the AP2/ERF family (Fig. 2a, Fig. S3). The AP2/ERF domain consists of several arginines in the three-stranded antiparallel β-sheets bound to the thymines or guanines of the GCC box, which generates a protein-DNA binding network (Fig. 2b, Table S3). Residues R16, R31, and R39 of the AP2/ERF domain interact with the phosphate group of base G15, as well as the guanine group of bases G6, G15, and G16 (Fig. 2c). Residues R19 and R21 interact with bases T1, G3, G18, and G19 of the GCC11 motif at the adjacent interface (Fig. 2d). Furthermore, the conserved tryptophans W23 and W41 provide hydrophobic interactions to stabilize nucleobases T1, G3, and C4 of the GCC11 motif (Fig. 2e). We observed that residues D2, Q3 and R6 of the N-terminal α1 helix bind to nucleotides G10, C11, and T14 of the GCC11 motif and partially disturb the interactions of DNA base pairs in the 3′ flanking region (Fig. 2f). We therefore analyzed the nucleic acid structure of AtERF96-GCC11 using the w3DNA server (Zheng et al. 2009). The conformational analysis indicated that the GCC11 structure shows an obvious shift and twist in the base step C7/C8, as well as a large tilt and roll in the base step T14/G15 (Table S4). The parameters imply hydrogen bond disruption of DNA base pairs C8-G15 and A9-T14 (Table S5 and S6). The results suggest that the AtERF96 protein binds the GCC box core sequence through the AP2/ ERF domain, and the N-terminal α1 helix binds to the 3′ flanking region of GCC11.

Effect of mutations on the AtERF96-GCC box interaction
In view of the structural information about the AtERF96-GCC box complex, we investigated the importance of conserved residues in the AP2/ERF domain of AtERF96 for GCC box binding. We present a series of AtERF96 mutants corresponding to the binding residues of the structural data and analyzed the dissociation constant (K d ) with a fluorescently labeled GCC box probe using a fluorescence polarization (FP) assay (Fig. 3a). We chose the GCC12 probe to perform the analysis due to its significant binding shift in the fEMSA assay (Fig. 1b). The results showed that the curves of concentration-dependent polarization fit two sites binding with two independent K d values (K dHi and K dLo ) (Table S7). AtERF96 mutants had a significant reduction in the binding ability of R16A, R19A, R21A, R39A, and W41A to the GCC12 probes (Fig. 3, Table S7). All of above mutants showed raised levels of K dHi value, implying that these residues are necessary for specific binding in the GCC box. Except for the R16A, W23A, and double-mutant proteins, most mutants remained roughly at the same level of K dLo relative to the wild-type (Fig. 3, Table S7). The raised K dLo levels of R16A and W23A reflected that these residues are involved in non-specific binding in the GCC box, including the π-π stacking of the indole ring and phosphate group binding (Fig. 2c and e). The R19A/R21A and R31A/R39A mutants showed a severe interference in the binding to the GCC12 probes ( Fig. 3j and k, Table S7), indicating that these double mutants nearly lost their ability to recognize the core sequence. We noticed that the values of K dHi in the R39A, R19A/R21A, and R31A/ R39A mutants were approximately equal to the K dLo values ( Fig. 3j and k). Thus, we further analyzed all the polarization data using the equation of one-site binding. The results showed that the polarized curves of R19A, R21A, R39A, W41A and double mutants could be also fitted by the onesite binding (Fig. S4). In addition, R39A, R19A/R21A, and R31A/R39A mutants revealed the similar K d value to the K dHi and K dLo , respectively (Fig. S4G, I, and J, Table S7).  This indicates that the binding specificity of these mutants was weakened as the features of two-site binding became insignificant. We also performed a fEMSA assay to verify the binding ability of various AtERF96 mutants with different lengths of the GCC box probe. Irrespective of the GCC box probe length used, the binding affinities of the R19A, R21A, R31A, R39A, and W41A mutants were severely decreased ( Fig. S5 and S6). The R16A mutant showed minor affinities with the shorter GCC8 and GCC10 probes ( Fig. S5A and B), whereas the R19A/R21A and R31A/R39A double mutants barely had the ability to bind any probes (Fig. S7A). Similar to the results of the FP analysis, the W23A mutant showed a lower binding ability with the GCC8, GCC11, and GCC15 probes, and the W41A mutant revealed a more severely reduced interaction with all of the GCC probes ( Fig. S5 and  S6). We further investigated the importance of AtERF96 N-terminus in GCC box binding, and designed a series of AtERF96 mutants in view of the probable DNA-binding residues in the α1 helix (Fig. 4a). All N-terminal mutants had limited influence on K dHi levels, except for the N-terminal truncated protein (Fig. 4b-f). However, the K dLo levels of D2A/Q3A/R6A and ND10 significantly increased ( Fig. 4e and f, Table S7). The results indicate that the residues in the α1 helix are involved in non-specific binding. Overall, most of the conserved arginines and tryptophans in the AP2/ERF domain of the AtERF96 protein are crucial for recognizing the GCC box motif.

Protoplast transactivation analysis
Our structural data showed that both the AP2/ERF domain and the N-terminal region of AtERF96 interact with the GCC box motif (Fig. 2e). We compared two GCC box motifs of the AtERF96-GCC11 and AtERF100-GCC11 complexes ( Fig. 5a and b) and noticed that the α1 helix (M1-G9) of AtERF96 binds to the flanking region of the GCC11 DNA motif (Fig. 2e). The α1 helix contacts the template strand of the GCC box in the 3′ end region, resulting in a conformational change of the template strand and slight flipping of the ten nucleotide base pairs from C7 to G16 (Fig. 5c and d). In view of the N-terminal region of AtERF96 altering the DNA architecture of GCC11 in the crystals, we performed a transactivation analysis in AtERF96-overexpressing protoplasts to investigate whether the N-terminal region of AtERF96 is involved in transcription regulation. A luciferase (LUC)encoding reporter gene, PDF1.2 pro:LUC, which contains two copies of the GCC box sequence from the PDF1.2 promoter, and an effector plasmid consisting of each AtERF under the control of the cauliflower mosaic virus (CaMV) 35 s promoter, were co-transfected into the Arabidopsis protoplasts (Fig. 6a). The results showed that LUC activity was not affected by the N-terminal-mutated or N-terminal-truncated AtERF96 proteins. However, decreased LUC activity was detected when the reporter plasmids were coexpressed with the effector plasmids of the AtERF96 R6A mutant (Fig. 6b). The data indicate that the N-terminal region of AtERF96 has a minor effect on transcription of the target gene. By contrast, the LUC activities affected by the AP2/ ERF domain mutants of the AtERF96 effector plasmids were significantly reduced (Fig. 6c). These data coincide with the results of the binding between the AtERF96 mutants and the GCC box probes from the FP analysis. Coexpression of the EDLL-truncated AtERF96 with the reporter construct resulted in significant LUC inactivation (Fig. 6c). The mutants W23A and W41A showed a reduced binding ability to the GCC box in the transactivation assay, consistent with the results of the FP analysis and fEMSA (Fig. 3, S5 and S6). Furthermore, we observed two regions of the EDLL motif with high B-factor distributions, including the residues G80 to S84 and V104 to Y109 (Fig. 6d) (Cevik et al. 2012;Tiwari et al. 2012). The sequence alignment showed that glutamate, aspartate, and leucine are enriched and highly conserved in the EDLL motif of group IX of the AP2/ERF family (Fig. 6e, Fig. S3). The results suggest that the EDLL motif of AtERF96 is important for transactivation.

DNA binding specificity of AtERF96
To determine whether AtERF96 proteins interact with non-GCC box motifs, we tested three DNA motifs with GC-rich sequences: P box, CS1 box, and DRE box (A/GCC GAC ) (Hao et al. 2002). We designed these three probes with fluorescein fused to the 5′-or 3′-end and tested the binding ability between AtERF96 proteins and these DNA motifs using fEMSA and FP analyses. The GCC12 probe and the W box (TTG ACC ) probe were used as positive and negative controls for fEMSA analysis, respectively. (Fig. S7B, S7C).
The fEMSA results showed that AtERF96 protein has a slight binding ability to P box, CS1 box, and DRE box motifs (Fig. 7a). The FP assay also revealed that the K dHi levels of these motifs to AtERF96 protein were much weaker than the GCC box by 8 to 25 fold, respectively ( Fig. 7b-d, Table S8). By contrast, the influence of the K dLo levels on the P box and DRE box motifs was insignificant (Table S8). These data suggest that the AtERF96 protein retains a limited binding ability for other DNA motifs through partial specific interactions.

Discussion
AP2/ERF-family proteins regulate transcription by recognizing the GCC box sequence in the promoters of target genes (Gu et al. 2017;Mizoi et al. 2012;Xie et al. 2019). Group IX of the AP2/ERF family is composed of three subgroups, IX-a, IX-b, and IX-c, characterized by the conserved motifs (CM) CMIX-3, CMIX-2, and CMIX-1 (specifically, the EDLL motif), respectively (Nakano et al. 2006). Among these, the members of group IX of AP2/ERFs have been linked to defensive gene expression in response to pathogen infection (Cao et al. 2018;Gu et al. 2002;Gutterson and Reuber 2004). AtERF96 phylogenetically belongs to group  (Catinot et al. 2015). A microarray assay coupled to chromatin immunoprecipitation-PCR of overexpressed AtERF96 revealed that AtERF96 regulates the activation of JA/ET-responsive genes, such as PDF1.2a, , as well as the transcription factor ORA59, through direct binding to existing GCC elements in their promoters (Berrocal-Lobo et al. 2002;Catinot et al. 2015;Pre et al. 2008).
In this study, we determined the crystal structure of the AtERF96-GCC11 complex, including an AP2/ERF domain Fig. 6 Transient expression assay and EDLL motif of AtERF96. a Schematic diagram of the reporter and effector plasmids used in transient assays. Effector plasmids were under the control of the cauliflower mosaic virus (CaMV) 35 s promoter. The plasmids constructed with the Arabidopsis PDF1.2a promoter, which contains the two GCC boxes, were fused to a firefly luciferase gene as the reporter. HA tag, human influenza hemagglutinin tag. 35 s ter, CaMV 35 s terminator. Nos, the terminator signal of the gene for nopaline synthase. b and c Relative LUC activity from transient expression analysis of PDF1.2a promoter co-infiltrated with a plasmid containing AtERF96 genes fused to the 35 s promoter. Plots of the LUC activity level influenced by AtERF96 genes with the mutations of N-terminal region (b) or AP2/ERF domain region (c) are shown, and the ERF9 is a negative control. Experiments were obtained from three independent replicates. EV empty vector, WT wild type, double D2A/Q3A double mutations, triple D2A/Q3A/R6A triple mutations, ND10 N-terminal deletion of first 10 residues, dEDLL C-terminal deletion of the residues R102 to K131. Multiple comparisons of group vectors were performed using Fisher's least-significant-difference (LSD) procedure. d Crystallographic B-factor distribution of the AtERF96-GCC11 complex. The residues of relatively higher B-factor are highlighted from high to low values as red > orange > yellow colors. e Ribbon representation and sequence logo of the AtERF96 EDLL motif. The designated EDLL region and conserved residues are indicated as yellow ribbons and cyan sticks. Sequence logo of the EDLL motif is shown through the full-length alignment of the paralogues from AtERF95 to AtERF98. The bit score indicates the information content for each position in the sequence and an EDLL motif at a resolution of 1.76 Å (Fig. 1c, Table 1). The conformation of the AP2/ERF domain in AtERF96 shows a similar framework to AtERF100 upon binding to the target DNA (Fig. S2) (Allen et al. 1998). Nevertheless, the potential propensity of residue-nucleotide interactions shows some differences between these two structures. For example, residue R31 of AtERF96 (R162 of AtERF100) contacts the phosphate group of nucleotide G15; at the same structural position, the arginine of AtERF100 binds to the guanine base. Residue R21 of AtERF96 (R152 of AtERF100) contacts nucleotides T1 and G3 at the 5′-end, but residue R152 of AtERF100 binds to nucleotide G19 closer to the 3′-end of another strand. Residue R39 of AtERF96 (R170 of AtERF100) contacts three guanines, G6, G15, and G16, instead of the sugar-phosphate backbone of nucleotide C5. There are two causes for these differences: one is the discrepancy of the polar residues from the few non-conserved amino acids between these two AP2/ERF domains; the other is the influence of the neighboring α1 helix at the N-terminus of AtERF96. The N-terminal α1 helix interacts with the flanking region following the core sequence of the GCC11 motif at the minor groove. Interestingly, residue Q3 of the α1 helix provides polar interactions with nearby nucleotides, especially G10 and T14, resulting in unpairing and unstacking of base pairs from C7 to G16 ( Fig. 5c and d). We used the 3DNA suite of programs to analyze the conformation of DNA base pairs in the residuebinding region. Results indicated that nucleotides C8, T14, and G15 exhibit shifting, tilting, and rolling (Table S4). Disruption of base stacking in single-stranded polynucleotides significantly alters the base pair conformation, leading to a lack of information on the spatial configurations of base pairs C8-G15 and A9-T14 (Table S5 and S6). The residuebase interaction of R39-G16 combined with the shifting and twisting of base C7/C8 directly leads to a shear in the base pair C7-G16 (Table S5). Thus, the unpaired and unstacked nucleotides further affect the interaction with the residues of the β1-β2 strands at the major groove. This result explains why few conserved arginines in the AP2/ERF domain of AtERF96 show a binding mode distinct from AtERF100 using target nucleotides. No previous reports indicate that binding of the ethylene-responsive element binding factors with the GCC box motif results in the unstacking of DNA bases. Regarding AtERF96 acting as a positive regulator of target gene transcription, we suggest that binding of the N-terminal α1 helix with the 3' flanking region may facilitate DNA unwinding for further transcription initiation.
Mutagenesis coupled with binding experiments confirmed the relevance of the protein-DNA contacts identified in our structure and helped us delineate the residue conservation in both the AP2/ERF proteins and the GCC box DNA targets. We analyzed the binding efficiency of AtERF96 wild-type and various mutants via fluorescence polarization analysis. We noticed that the interaction of AtERF96-GCC box showed the capacity for both specific and non-specific binding in our FP analysis. The polarization curve of wildtype AtERF96 showed a clear trend with two rises, which can be also observed in the results of R16A, W23A, and R31A mutants. (Fig. 3b, c, f and g). Interestingly, when attempting to analyze the data using the one-site binding Fig. 7 Characterization of binding ability between AtERF96 proteins and various DNA motifs. a EMSA binding analysis of AtERF96 proteins with the GCC12, P box, CS1 box, and DRE box probes. b-d Fluorescence polarization analysis of AtERF96 proteins with the P box (b), CS1 box (c), and DRE box (d) DNA probes. All data are representative of three independent experiments with the two-site binding equation, and the error bars are calculated as the standard deviation equation: y = B max × x/K dHi + x, the data of wild-type and mutants again were difficult to fit to the sigmoid curve (Fig.  S4). However, the equation can fit the data of R19A, R21A, R39A, W41A, and double mutants (Fig. S4). The results imply that some residues are crucial for the recognition of the GCC box, and that the mutations caused the functional loss of specific binding. Among these, R19 and R39 showed the major influence in the specific binding, due to their interactions with the bases G7, G16, G17, G19 and G20 in the GCC12 probe (G6, G15, G16, G18 and G19 in the GCC11 probe) (Fig. 3, Table S7). At the N-terminus of AtERF96, all mutants retained the two-site binding feature in the raw data (Fig. 4). The ND10 truncated protein showed a limited effect on specific binding, accompanied by a raised K dHi level compared to wild-type, suggesting that the N-terminal region is not involved in GCC box recognition (Fig. 4f). In view of the above, we suggest that the K dHi is implicated in the residue-base conservation, whereas the K dLo reflects the stabilization of residue-sugar phosphate backbone, according to the effects caused by mutations of conserved residues of their specific functions. To better understand the impact of AtERF96 mutations in vivo, we designed a transactivation analysis in Arabidopsis protoplasts with an overexpressing effector and a luciferase-fused reporter. Although the N-terminal α1 helix of AtERF96 made contact with the 5′ end of the template strand and structurally disrupted DNA base pairing, the N-terminal mutants only showed limited influence on the transactivation analysis (Fig. 5b). These results show that the α1 helix acts as an auxiliary domain in promoting transcription initiation. The transactivation assay revealed that most mutations of conserved arginines in the AP2/ERF domain seriously disrupted protein-DNA interactions, including the conserved tryptophans W23 and W41 (Fig. 6c). However, we noticed that the sequence region excluding the AP2/ERF domain is highly diverse in all of group IX members in the AP2/ERF family, meaning that the N-terminal regions of other ERFs are structurally distinct from AtERF96 (Fig. S3). In addition, the EDLL-truncated AtERF96 lost its transactivation function in vivo, suggesting that the EDLL motif might interact with the MEDIATOR25 subunit of the eukaryotic Mediator complex (Cevik et al. 2012;Kazan 2017). Previous work showed that MED25 interacts with the four members of group IX of the AP2/ ERF family, i.e., AtERF92, AtERF93, ORA59, and TDR1/ AtERF98. Interestingly, the α5 helix of the EDLL motif exhibited a higher B-factor in the whole structural data, implying that this region probably plays an important role in attaching to the MED25 subunit (Fig. 6d). Nevertheless, structural studies and mechanistic insights into MED25 are still needed.
In summary, we have shown that AtERF96, an AP2/ERFfamily regulator recognizing the GCC box DNA motif, is an ethylene-responsive transcription factor that directly modulates the defense-related gene PDF1.2a. Our studies of the AtERF96-GCC11 complex provide a structural framework for AP2/ERF transcription factors, including the binding capability of the AP2/ERF domain, together with the influence of DNA base-pair opening via the N-terminal helix of AtERF96.