Review

Genome sequencing projects have inundated us with information regarding the genetic basis of life. While this wealth of information provides a foundation for our understanding of biology, it has become clear that the DNA code alone does not hold all the answers. Epigenetic modifications and higher order DNA structures beyond the double helix also contribute to basic biological processes and maintaining cellular stability. Local alternative DNA structures are known to exist in all life forms [1]. The negative supercoiling of DNA can induce local nucleotide sequence-dependent conformational changes that give rise to cruciforms, left-handed DNA, triplexes and quadruplexes [24]. The formation of cruciforms is strongly dependent on base sequence and requires perfect or imperfect inverted repeats of 6 or more nucleotides in the DNA sequence [5, 6]. Over-representation of inverted repeats, which occurs nonrandomly in the DNA of all organisms, has been noted in the vicinity of breakpoint junctions, promoter regions, and at sites of replication initiation [3, 7, 8]. Cruciform structures may affect the degree of DNA supercoiling, the positioning of nucleosomes in vivo[9], and the formation of other secondary structures of DNA. Cruciforms contain a number of structural elements that serve as direct protein-DNA targets. Numerous proteins have been shown to interact with cruciforms, recognizing features such as DNA crossovers, four-way junctions, and curved or bent DNA. Structural transitions in chromatin occur concomitantly with DNA replication or transcription and in processes that involve a local separation of DNA strands. Such transitions are believed to facilitate the formation of alternative DNA structures [10, 11]. Transient supercoils are formed in the eukaryotic genome during DNA replication and transcription, and these often involve protein binding [12]. Indeed, active chromatin remodeling is a typical feature for many promoters and is essential for gene transcription [13]. Notably, DNA supercoiling can have a strong impact on gene expression [14]. Using microarrays covering the E. coli genome, it was recently shown that expression of 7% of genes was rapidly and significantly affected by a loss of chromosomal supercoiling [15]. Several complexes that involve extensive DNA-protein interactions, whereby the DNA wraps around the protein, can only occur under conditions of negative DNA supercoiling [10]. Other proteins are reported to interact with the supercoiled DNA (scDNA) at crossing points or on longer segments of the interwound supercoil [16, 17]. Interestingly, the eukaryotic genome has been shown to contain a percentage of unconstrained supercoils, part of which can be attributed to transcriptional regulation [3]. The spontaneous generation of DNA supercoiling is also a requirement for genome organization [18]. Transient supercoils are formed both in front of and behind replication forks as superhelical stress is distributed throughout the entire replicating DNA molecule [19]. A number of additional processes may operate to create transient and localized superhelical stresses in eukaryotic DNA.

The recognition of cruciform DNA seems to be critical not only for the stability of the genome, but also for numerous, basic biological processes. As such, it is not surprising that many proteins have been shown to exhibit cruciform structure-specific binding properties. In this review, we focus on these proteins, many of which are involved in chromatin organization, transcription, replication, DNA repair, and other processes. To organize our review, we have divided cruciform binding proteins into four groups (see Table 1) according to their primary functions: (a) junction-resolving enzymes, (b) transcription factors and DNA repair proteins, (c) replication machinery, and (d) chromatin-associated proteins. For each group, we describe in detail recent examples of research findings. Lastly, we review how dysregulation of cruciform binding proteins is associated with the pathology of certain diseases found in humans.

Table 1 Proteins involved in interactions with cruciform structures

Formation and presence of c ruciform structures in the genome

Cruciform structures are important regulators of biological processes [3, 5]. Both stem-loops and cruciforms are capable of forming from inverted repeats. Cruciform structures consist of a branch point, a stem and a loop, where the size of the loop is dependent on the length of the gap between inverted repeats (Figure 1). Direct inverted repeats lead to formation of a cruciform with a minimal single-stranded loop. The formation of cruciforms from indirect inverted repeats containing gaps is dependent not only on the length of the gap, but also on the sequence in the gap. In general, the AT-rich gap sequences increase the probability of cruciform formation. It is also possible that the gap sequence can form an alternative DNA structure. The formation of DNA cruciforms has a strong influence on DNA geometry whereupon sequences that are normally distal from one another can be brought into close proximity [20, 21]. The structure of cruciforms has been studied by atomic force microscopy [2224]. These studies have identified two distinct classes of cruciforms. One class of cruciforms, denoted as unfolded, have a square planar conformation characterized by a 4-fold symmetry in which adjacent arms are nearly perpendicular to one another. The second class comprises a folded (or stacked) conformation where the adjacent arms form an acute angle with the main DNA strands (Figure 2). Two of the three structural motifs inherent to cruciforms, the branch point and stem, are also found in Holliday junctions. Holliday junctions are formed during recombination, double-strand break repair, and fork reversal during replication. Resolving Holliday junctions is a critical process for maintaining genomic stability [25, 26]. These junctions are resolved by a class of structure-specific nucleases: the junction-resolving enzymes.

Figure 1
figure 1

Changes associated with transition from the linear to cruciform state in the p53 target sequence from the p21 promoter. The promoter sequence contains a 20 bp p53 target sequence with 7 bp long inverted repeat (red), (A) as linear DNA and (B) as an inverted repeat as a cruciform structure. In the cruciform structure, the p53 target sequence is presented as stems and loops.

Figure 2
figure 2

Conformations of a cruciform structure. Conformations of a cruciform can vary from (A) "unfolded" with 4-fold symmetry to (B) bent, and to (C) "stacked" with 4 chains of DNA in close vicinity. D) Topology of a Holliday junction stabilized by a psoralen cross-linking agent (PDBID 467D). Here, the junction takes the form of an anti-parallel stacked x-structure.

Cruciforms are not thermodynamically stable in naked linear DNA due to branch migration [27]. Cruciform structure formation in vivo has been shown in both prokaryotes and eukaryotes using several methodological approaches. The presence of the cruciform structure was first described in circular plasmid DNA where the negative superhelix density can stabilize cruciform formation. Plasmids with native superhelical density usually contain cruciform structures in vitro and in vivo[28]. For example, higher order structure in the pT181 plasmid was shown to exist in vivo using bromoacetaldehyde treatment [29]. Deletion of the sequence which forms this structure at the ori site leads either to a reduction or failure in replication [30]. Similarly, deletion of the cruciform binding domain in 14-3-3 proteins results in reduced origin binding which affects the initiation of DNA replication in budding yeast [31]. Monoclonal antibodies against cruciform structures have also been used successfully to isolate cruciform-containing segments of genomic DNA. Furthermore, these sequences were able to replicate autonomously when transfected into HeLa cells [32]. Stabilization of the cruciform structures by monoclonal antibodies 2D3 and 4B4, with anti-cruciform DNA specificity, resulted in a 2- to 6-fold enhancement of replication in vivo[33]. 14-3-3 sigma was found to associate in vivo with the monkey origins of DNA replication ors8 and ors12 in a cell cycle-dependent manner, as assayed by a chromatin immunoprecipitation (ChIP) assay that involved formaldehyde cross-linking, followed by immunoprecipitation with anti-14-3-3 sigma antibody and quantitative PCR [34]. Similarly, the 14-3-3 protein homologs from Saccharomyces cerevisiae, Bmh1p and Bmh2p, have cruciform DNA-binding activity and associate in vivo with ARS307 [35]. Several studies show that transcription is regulated directly by the presence of cruciform structure in vivo. Another example includes the ability of the d(AT)n-d(AT)n insert to spontaneously adopt a cruciform state in E. coli, resulting in a block of protein synthesis [36]. Using site-directed mutational analysis and P1 nuclease mapping, it was demonstrated that the formation of a cruciform structure is required for the repression of enhancer function in transient transfection assays and that Alu elements may contribute to regulation of the CD8 alpha gene enhancer through the formation of secondary structure that disrupts enhancer function [37]. Transcriptionally driven negative supercoiling also mediates cruciform formation in vivo and enhanced cruciform formation correlates with an elevation in promoter activity [38]. It was also shown that the secondary DNA structures of the ATF/CREB element play a vital role in protein-DNA interactions and its cognate transcription factors play a predominant role in the promoter activity of the RNMTL1 gene [39]. Hypo-methylation of inverted repeats by the Dam methylase show that these sequences are consistent with an unusual secondary structure, such as DNA cruciform or hairpin in vivo[40]. The in vivo effects of cruciform formation during transcription have been studied in detail by Krasilnikov et al. [4]. Interestingly hairpin-capped linear DNA (in which the replication of hairpin-capped DNA and cruciform formation and resolution play central roles) was stably maintained for months in a human cancer cell line as numerous extra-chromosomal episomes [41]. Long palindromes can also induce DNA breaks after assuming a cruciform structure. Palindromes in S. cerevisiae are resolved, in vivo, by structure-specific enzymes. In vivo resolution requires either the Mus81 endonuclease or, as a substitute, the bacterial HJ resolvase RusA. These findings provide confirmation of cruciform extrusion and resolution in the context of eukaryotic chromatin [42]. Taken together, these studies show that cruciforms have been detected in vivo using a variety of independent techniques and that they are an intriguing and integral phenomenon of DNA biology and biochemistry.

Proteins involved in interactions with cruciform structures

Junction-resolving enzymes

There are a large number of proteins that recognize cruciforms (summarized in Table 1) and, of these, the junction-resolving enzymes have been studied extensively. These proteins have been identified in many organisms from bacteria (and their phages) to yeast, archea and mammals [43]. The majority of the junction-resolving enzymes can be divided into one of two superfamilies [44]. Those in the first class target specific DNA sequences for enzymatic activity, although they will bind equally well to junctions of any sequence. This superfamily includes E. coli RuvC, the yeast integrases, Cce1, Ydc2, and RnaseH. The second group includes the phage T7, endonuclease RecU, the Hjc and Hje resolving enzymes, the MutH protein family and related restriction enzymes. The x-ray structures of the junction-resolving enzymes in complex with 4-way junctions highlight the flexibility inherent to DNA (Figure 3) [25] in that these enzymes recognize and distort the junction. This enables them to carry out such key roles as the cleavage of allogene DNAs and maintenance of genomic stability to name but a few. The recognition of non-B-DNA structure by junction-resolving enzymes has been the subject of several reviews [25, 43, 45, 46].

Figure 3
figure 3

Crystal structure of the E. coli RuvA tetramer in complex with a Holliday junction (PDBID 1C7Y). A) The Holliday junction is depressed at the center where it makes close contacts with RuvA. Each of the arms outside of the junction center takes on a standard beta-DNA conformation B) Rotation of A) by 90°.

Proteins involved in transcription and DNA repair

The maintenance of a cell's genomic stability is achieved through several independent mechanisms. Arguably, the most important of these mechanisms is DNA repair. Protein binding to damaged DNA and to the local alternative DNA structures is therefore a key function of these processes. The promoter regions of genes are often characterized by presence of inverted repeats that are capable of forming cruciforms in vivo. A number of DNA-binding proteins, such as those of the HMGB-box family [47], Rad54 [48], BRCA1 protein [49, 50], as well as PARP-1 (poly(ADP-ribose) polymerase-1) [51], display only a weak sequence preference but bind preferentially to cruciform structures. Moreover, some proteins can induce the formation of cruciform structures upon DNA binding [51, 52]. Among the DNA repair proteins which bind to cruciforms are the junction-resolving enzymes Ruv and RuvB [53, 54], DNA helicases [55], XPG protein [56], and multifunctional proteins like HMG-box proteins [57] BRCA1, 14-3-3 protein family including homolog's Bmh1 and Bmh2 from S. cerevisiae, and GF14 from plants. Footprinting analysis of the gonadotropin-releasing hormone gene promoter region indicated the human estrogen receptor (ER) to be another potential cruciform binding protein. In this case, extrusion of the cruciform structure allowed the estrogen response elements motifs to be accessed by the ER protein [58].

PARP-1

PARP-1 is an abundant, nuclear, zinc-finger protein present in ~ 1 enzyme per 50 nucleosomes. It has a high affinity for damaged DNA and becomes catalytically active upon binding to DNA breaks [59]. In the absence of DNA damage, the presence of PARP-1 leads to the perturbation of histone-DNA contacts allowing DNA to be accessible to regulatory factors [60]. PARP-1 activity is also linked to the coordination of chromatin structure and gene expression in Drosophila [61]. It was reported that PARP can bind to the DNA hairpins in heteroduplex DNA and that the auto-modification of PARP in the presence of NAD+ inhibited its hairpin binding activity. Atomic force microscopy studies revealed that, in vitro, PARP protein has a preference for the promoter region of the PARP gene in superhelical DNA where the dyad symmetry elements form hairpins (Figure 4) [62]. PARP-1 recognizes distortions in the DNA backbone allowing it to bind to three- and four-way junctions [63]. Kinetic analysis has revealed that the structural features of non-B form DNA are important for PARP-1 catalysis activated by undamaged DNA. The order of PARP-1's substrate preference has been shown to be: cruciforms > loops > linear DNA. These results suggest a link between PARP-1 binding to cruciforms structures in the genome and its function in the modulation of chromatin structure in cellular processes. Moreover, it was shown that the binding of PARP-1 to DNA can induce changes in DNA topology as was demonstrated using plasmid DNA targets [51].

Figure 4
figure 4

AFM and SFM images of proteins binding to a cruciform structure. A) AFM images of PARP-1 binding to supercoiled pUC8F14 plasmid DNA containing a 106 bp inverted repeat. PARP-1 binds to the end of the hairpin arm (white arrow). Images show 300 × 300 nm2 surface areas (reprinted with permission from [51]. B) The interaction between p53CD and supercoiled DNA gives rise to cruciform structures. Shown is an SFM image of complex formed between p53CD and sc pXG(AT)34 plasmid DNA at a molar ratio of 2.5; the complexes were mounted in the presence of 10 mM MgAc2. The scale bars represent 200 nm (reprinted with permission from [132].

P53

P53 is arguably one of the most intensively studied tumor suppressor genes. More than 50% of all human tumors contain p53 mutations and the inactivation of this gene plays a critical role in the induction of malignant transformation [64]. Sequence-specific DNA binding is crucial for p53 function. P53 target sequences, which consist of two copies of the sequence 5'-RRRC(A/T)(T/A)GYYY-3, often form inverted repeats [65]. It was reported that p53 binding is temperature sensitive and dependent on DNA fragment length [66, 67]. Moreover, it was demonstrated, in vivo, that p53 binding to its target sequence is highly dependent on the presence of an inverted repeat at the target site. Preferential binding of p53 to superhelical DNA has also been described [68, 69]. Non-canonical DNA structures such as mismatched duplexes, cruciform structures [70], bent DNA [71], structurally flexible chromatin DNA [13], hemicatenated DNA [72], DNA bulges, three- and four-way junctions [73], or telomeric t-loops [74] can all be bound selectively by p53. There is a strong correlation between the cruciform-forming targets and an enhancement of p53 DNA binding [75]. Target sequences capable of forming cruciform structures in topologically constrained DNA bound p53 with a remarkably higher affinity than did the internally asymmetrical target site [76]. These results implicate DNA topology as having an important role in the complex, with possible implications in modulation of the p53 regulon.

Chromatin-associated proteins

The chromatin-associated proteins cover a broad spectrum of the proteins localized in the cell nucleus. They are partly involved in modulating chromatin structure, but are also implicated in a range of processes associated with DNA function. They fine-tune transcriptional events (DEK, BRCA1) and are involved in both DNA repair and replication (HMG proteins, Rad51, Rad51ap, topoisomerases). Another family of enzymes deemed important in these processes is that of topoisomerases. These enzymes occur in all known organisms and play crucial roles in the remodeling of DNA topology. Topoisomerase I binds to Holliday junctions [77], and topoisomerase II recognizes and cleaves cruciform structures [78] and interacts with the HMGB1 protein [57]. These processes are particularly important for maintaining genomic stability due to their ability to diffuse the stresses that are levied upon a DNA molecule during transcription, replication and the resolving of long cruciforms that would otherwise hinder DNA chain separation. The Rad54 protein plays an important role during homologous recombination in eukaryotes [79]. Yeast and human Rad54 bind specifically to Holliday junctions and promote branch migration [80]. The binding preference for the open conformation of the X-junction appears to be common for many proteins that bind to Holliday junctions. Human Rad54 binds preferentially to the open conformation of branched DNA as opposed to the stacked conformation [48]. Similarly, RAD51AP1, the RAD51 accessory protein, specifically stimulates joint molecule formation through the combination of structure-specific DNA binding and by interacting with RAD51. RAD51AP1 has a particular affinity for branched-DNA structures that are obligatory intermediates during joint molecule formation [81]. The recognition of branched structures during homologous recombination is a critical step in this process.

DEK

The human DEK protein is an abundant nuclear protein of 375 amino acids that occurs in numbers greater than 1 million copies per nucleus [82]. Its interactions with transcriptional activators and repressors suggest that DEK may have a role in the formation of transcription complexes at promoter and enhancer sites [reviewed in [83]]. The binding of DEK to DNA is not sequence specific and DEK has a clear preference for supercoiled and four-way junctions [84]. Work with isolated and recombinant DEK has shown that it has intrinsic DNA-binding activity with a preference for four-way junction and superhelical DNA over linear DNA and introduces positive supercoils into relaxed circular DNA [83, 85]. DEK has two DNA-binding domains. The first domain is centrally located and harbors a conserved sequence element, the SAF (scaffold attachment factor). The second DNA-binding domain is located at the C-terminus of DEK which is also post-translationally modified by phosphorylation. In fact, the DNA-binding properties of DEK are clearly influenced by phosphorylation as phosphorylated DEK binds with a weaker affinity to DNA than does unmodified DEK and induces the formation of DEK multimers [86, 87]. DEK's monomeric SAF box (residues 137-187) does not appear to interact with DNA in solution. However, when many SAF boxes are brought into close proximity, it cooperativity drives DNA binding. A DEK construct spanning amino acids 87-187 binds to DNA much like the intact DEK preferring four-way DNA junctions over linear DNA. This fragment forms large aggregates in the presence of DNA and is also able to introduce supercoils into relaxed circular DNA. Interestingly, the 87-187 amino acid peptide induces negative DNA supercoils [88].

BRCA1

BRCA1 is a multifunctional tumor suppressor protein having roles in cell cycle progression, transcription, DNA repair and chromatin remodeling. Mutations to the BRCA1 gene are associated with a significant increase in the risk of breast cancer. The function of BRCA1 likely involves interactions with both DNA and an array of proteins. BRCA1 associates directly with RAD51 and both proteins co-localize to discrete subnuclear foci that redistribute to sites of DNA damage under genotoxic stress [89]. BRCA1 also co-localizes with phosphorylated H2AX (γH2AX) in response to double strand breaks [90].

The central region of human BRCA1 binds strongly to negatively supercoiled plasmid DNA with native superhelical density [50] and binds with high affinity to cruciform DNA [91]. The BRCA1 cruciform DNA complex must dissociate to allow the nuclease complex to work in DNA recombinational repair of double stranded breaks. BRCA1 also acts as a scaffold for assembly of the Rad51 ATPase which is responsible for homologous recombination in somatic cells. The full-length BRCA1 protein binds strongly to supercoiled plasmid DNA and to junction DNA. The difference in affinity was on the order of 6- to 7-fold between linear and junction DNA in reactions containing physiological levels of magnesium [92]. BRCA1 230-534 binds with a higher affinity to four-way junction DNA as compared to duplex and single-stranded DNA [91]. Residues 340-554 of BRCA1 have been identified as the minimal DNA-binding region [93]. The highest affinity among the different DNA targets which mimic damaged DNA (four-way junction DNA, DNA mismatches, DNA bulges and linear DNA) was for DNA four-way junctions. To this end, a 20-fold excess of linear DNA was unable to compete off any of the BRCA1 230-534 bound to DNA molecules mimicking damaged DNA [49]. Furthermore, the loss of the BRCA1 gene prevents cell survival after exposure to DNA cross-linkers such as mitomycin C [94]. These results speak to the importance of BRCA1's ability to recognize cruciform structures.

HMGB family

The high mobility-group (HMG) proteins are a family of abundant and ubiquitous non-histone proteins that are known to bind to eukaryotic chromatin. The three HMG protein families comprise the (a) HMGA proteins (formerly HMGI/Y) containing A/T-hook DNA-binding motifs, (b) HMGB proteins (formerly HMG1/2) containing HMG-box domain(s), and (c) HMGN proteins (formerly HMG14/17) containing a nucleosome-binding domain [95].

HMGB proteins bind DNA in a sequence independent manner and are known to bind to certain DNA structures (four-way junctions, DNA minicircles, cis-platinated DNA, etc.) with high affinity as compared to linear DNA [96, 97]. The chromatin architectural protein HMGB1 can bind with extremely high affinity to DNA structures that form DNA loops [72], while other studies have shown that the HMG box of different proteins can induce DNA bending [98100]. The HMG box is an 80 amino acid domain found in a variety of eukaryotic chromosomal proteins and transcription factors. HMG box binding to DNA is associated with distortions in DNA structure. Members of the HMG protein family are involved in transcription [101103] and DNA repair [57, 104, 105]. The HMG protein T160 was found to be co-localized with DNA replication foci [106]. The fact that all HMG box domains bind to four-way DNA junctions suggests that a common feature in the binding targets of this protein family must exist. Single HMG box domains interact exclusively with the open square form of the junction, and conditions that stabilize the stacked × structure conformation significantly weaken the HMG box DNA interaction [107]. Binding of the isolated A domain of HMGB1 protein to four-way junction DNA substrates is abolished by mutation of both Lys2 and Lys11 together to alanine, indicating that these residues play an important role in DNA binding [108].

Proteins involved in replication

Transient transitions from B-DNA to cruciform structures are correlated with DNA replication and transcription [109]. It has been shown that cruciforms serve as recognition signals at or near eukaryotic origins of DNA replication [110112]. There are a large number of proteins involved in replication which bind to cruciform structures (see Table 1). We focus here primarily on the 14-3-3 protein family and MLL and WRN proteins. We will comment briefly on other systems of interest.

S16 is a structure-specific DNA-binding protein displaying preferential binding for cruciform DNA structures [113]. The AF10 protein binds cruciform DNA via a specific interaction with an AT-hook motif and is localized to the nucleus by a defined bipartite nuclear localization signal in the N-terminal region [114]. The structural maintenance of chromosomes (SMC) protein family, with members from lower and higher eukaryotes, may be divided into four subfamilies (SMC1 to SMC4) and two SMC-like protein subfamilies (SMC5 and SMC6) [115117]. Members of this family are implicated in a large range of activities that modulate chromosome structure and organization. Smc1 and smc2 proteins have a high affinity for cruciform DNA molecules and for AT-rich DNA fragments including fragments from the scaffold-associated regions [118]. The baculovirus very late expression factor 1 (VLF-1), a member of the integrase protein family, does not bind to single and double strand structures, but it does bind (listed with increasing affinity) to Y-forks, three-way junctions and cruciform structures. This protein is involved in the processing of branched DNA molecules at the late stages of viral genome replication [119].

14-3-3

The 14-3-3 protein family consists of a highly conserved and widely distributed group of dimeric proteins which occur as multiple isoforms in eukaryotes [120]. There are at least seven distinct 14-3-3 genes in vertebrates, giving rise to nine isoforms (α, β, γ, δ, ε, ζ, η, σ and τ) and at least another 20 have been identified in yeast, plants, amphibians and invertebrates [110]. A striking feature of the 14-3-3 proteins is their ability to bind a multitude of functionally diverse signaling proteins, including kinases, phosphatases, and transmembrane receptors. This plethora of proteins allows 14-3-3s to modulate a wide variety of vital regulatory processes, including mitogenic signal transduction, apoptosis and cell cycle regulation [121]. The 14-3-3 proteins are found mainly within the nucleus and are involved in eukaryotic DNA replication via binding to the cruciform DNA that forms transiently at replication origins at the onset of the S phase [122].

14-3-3 cruciform binding activity was first observed in proteins purified from sheep's brain. More recently, immunofluorescence analyses showed that 14-3-3 isoforms with cruciform-binding activity are present in HeLa cells [123]. The direct interaction with cruciform DNA was confirmed with 14-3-3 isoforms β, γ, σ, ε, and ζ [34]. 14-3-3 analogs with cruciform-specific binding are also found in yeast (Bmh1 and Bmh2) and plants (GF14) [35].

The prevalence of the 14-3-3 family proteins in all eukaryotes combined with a high degree of sequence conservation between species is indicative of their importance. Genetic studies have shown that knocking out the yeasts homologs of the 14-3-3 proteins is lethal [124]. Moreover, 14-3-3 proteins are involved in interactions with numerous transcription factors and it has been reported that several of the 14-3-3 proteins functions are associated with its cruciform binding properties.

Mixed lineage leukemia (MLL) protein

The MLL gene encodes a putative transcription factor with regions of homology to several other proteins including the zinc fingers and the so-called "AT-hook" DNA-binding motif of high mobility group proteins [125]. The 11q23 chromosomal translocation, found in both acute lymphoid and myeloid leukemias, results in disruption of the MLL gene. Leukemogenesis is often correlated with alternations in chromatin structure brought about by either a gain or loss in function of the regulatory factors due to their being disrupted by chromosomal translocations. The MLL gene, a target of such translocation events, forms a chimeric fusion product with a variety of partner genes [126].

The MLL AT-hook domain binds cruciform DNA, recognizing the structure rather than the sequence of the target DNA. This interaction can be antagonized both by Hoechst 33258 dye and distamycin. In a nitrocellulose protein-DNA binding assay, the MLL AT-hook domain was shown to bind to AT-rich SARs, but not to non-SAR DNA fragments [125]. MLL appears to be involved in chromatin-mediated gene regulation. In translocations involving MLL, the loss of the activation domain combined with the retention of a repression domain alters the expression of downstream target genes, thus suggesting a potential mechanism of action for MLL in leukemia [126]. AF10 translocations to the vicinity of genes other than MLL also result in myeloid leukemia. A biochemical analysis of the MLL partner gene AF10 showed that its AT-hook motif is able to bind to cruciform DNA, but not to double-stranded DNA, and that it forms a homo-tetramer in vitro[114].

WRN

The Werner syndrome protein belongs to the RecQ family of evolutionary conserved 3' → 5' DNA helicases [127]. WRN encodes a single polypeptide of 162 kDa that contains 1432 amino acids. Prokaryotes and lower eukaryotes generally have one RecQ member while higher eukaryotes possess multiple members and five homologs have been identified in human cells. All RecQ members share a conserved helicase core with one or two additional C-terminal domains, the RQC (RecQ C-terminal) and HRDC (helicase and RNaseD C-terminal) domains. These domains bind both to proteins and DNA. Eukaryotic RecQ helicases have N- and C-terminal extensions that are involved in protein-protein interactions and have been postulated to lend unique functional characteristics to these proteins [55, 128]. WRN has been shown to bind at replication fork junctions and to Holliday junction structures. Binding to junction DNA is highly specific because little or no WRN binding is visualized at other sites along these substrates [129]. Upon binding to DNA, WRN assembles into a large complex composed of four monomers.

Cruciform binding proteins and disease

The recognition of DNA junctions and cruciform structures is critical for genomic stability and for the regulation of basic cellular processes. The resolution of Holliday junctions and long cruciforms is necessary for genomic stability where the dysregulation of these proteins can lead to DNA translocations, deletions, loss of genomics stability and carcinogenesis. The large numbers of proteins which bind to these DNA structures work together to keep the genome intact. We believe that the formation of cruciform structures serves as a marker for the proper timing and initiation of some very basic biological processes. The mutations and epigenetic modifications that alter the propensity for cruciform formation can have drastic consequences for cellular processes. Thus, it is unsurprising that the dysregulation of cruciform binding proteins is often associated with the pathology of disease.

As stated above, the cruciform binding proteins including p53, BRCA1, WRN and the proto-oncogenes DEK, MLL and HMG are also associated with cancer development and/or progression. Some of these proteins play such important roles that their mutation and/or inactivation result in severe genomic instability and sometimes lethality. For example, Brca1 -/- mouse embryonic stem cells show spontaneous chromosome breakage, profound genomic instability and hypersensitivity to a variety of damaging agents (e.g. γ radiation) all of which suggests a defect in DNA repair. The connection between the BRCA1 mutation and breast cancer is well known. P53's transcriptional regulation is fine-tuned by its timely binding to promoter elements. The formation of a cruciform structure in p53 recognition elements may be an important determinant of p53 transcription activity.

The dHMGI(Y) family of "high mobility group" non-histone proteins comprises architectural transcription factors whose over expression is highly correlated with carcinogenesis, increased malignancy and metastatic potential of tumors in vivo[95]. 14-3-3 proteins are related to several diseases, including cancer, Alzeheimer's disease, the neurological Miller Dieker and Spinocerebellar ataxia type 1 diseases, and spongiform encephalopathy. The deletion of 14-3-3σ in human colorectal cancer cells leads to the loss of the DNA damage checkpoint control [130]. The human DEK protein was discovered as a fusion with a nuclear pore protein in a subset of patients with acute myeloid leukemia. It was also identified as an autoantigen in a relatively high percentage of patients with autoimmune diseases. In addition, DEK mRNA levels are higher in transcriptionally active and proliferating cells than in resting cells, and elevated mRNA levels are found in several transformed and cancer cells [6, 7]. Werner syndrome is an autosomal recessive disorder characterized by features of premature aging and a high incidence of uncommon cancers [127]. The Werner syndrome protein (WRN) plays central roles in maintaining the genomic stability of organisms [131]. Individuals harboring mutations in WRN have a rare, autosomal recessive genetic disorder manifested by early onset of symptoms characteristic of aged individuals.

Conclusions

Cruciform structures are fundamentally important for a wide range of biological processes, including DNA transcription, replication, recombination, control of gene expression and genome organization. The putative mechanistic roles of cruciform binding proteins in transcription, DNA replication, and DNA repair are shown in Figure 5. Alternative DNA structures, including cruciforms, are often formed at sites of negatively supercoiled DNA by perfect or imperfect inverted repeats of 6 or more nucleotides. Longer DNA palindromes present a threat to genomic stability as they are recognized by junction-resolving enzymes. Shorter palindromic sequences are essential for basic processes like DNA replication and transcription. The presence of cruciform structures may also play an important role in epigenetics, such that cruciform structures are protected from DNA methylation. For example, the Dam methylase is not able to modify its GATC target site when it occurs in a cruciform or hairpin conformation. The center of a long perfect palindrome located in bacteriophage lambda has also been shown to be methylation-resistant in vivo[40]. Moreover, the centers of long palindromes are hypo-methylated as compared to identical sequences in non-palindromic conformations [40]. To this end, transient cruciforms can directly influence DNA methylation and therefore provide another layer for regulation of the DNA code. Proteins that bind to cruciforms can be divided into several categories. In addition to a well defined group of junction-resolving enzymes, we have classified cruciform binding proteins into groups involved in transcription and DNA repair (PARP, BRCA1, p53, 14-3-3), chromatin-associated proteins (DEK, BRCA1, HMG protein family, topoisomerases), and proteins involved in replication (MLL, WRN, 14-3-3, helicases) (see Table 1). Within these groups are proteins indispensable for cell viability, as well as tumor suppressors, proto-oncogenes and DNA remodeling proteins. Similarly, triplet repeat expansion, a phenomenon important in several genetic diseases, including Friedreich's ataxia, cardiomyopathy, myotonic dystrophy type I and other neurological disorders, can change the spectrum of cruciform binding proteins. Lastly, single nucleotide polymorphisms and/or insertion/deletion mutations at inverted repeats located in promoter sites can also influence cruciform formation, which might be manifested through altered gene regulation. A deeper understanding of the processes related to the formation and function of alternative DNA structures will be an important component to consider in the post-genomic era.

Figure 5
figure 5

Scheme of the putative mechanistic roles of cruciform binding proteins in transcription, DNA replication, and DNA repair. A) A model for the structure-specific binding of transcription factors to a cognate palindrome-type cruciform implicated in transcription. The equilibrium between classic B-DNA and the higher order cruciform favors duplex DNA, but, when cruciform binding proteins are present, they either preferentially bind to and stabilize the cruciform or bind to the classic form and convert it to the cruciform. This interaction results in both an initial melting of the DNA region covered by transcription factor and an extension of the melt region in both directions. The melting region continues to extend in response to the needs of the active transcription machinery. B) A model for the initiation of replication enhanced by extrusion to a cruciform structure. Dimeric cruciform binding proteins interact with and stabilize the cruciform structure. The replisome is assembled concomitantly and is assumed to include polymerases, single-strand binding proteins and helicases. C) Model for the influence of cruciform binding proteins on DNA structure in DNA damage regulation. Naked cruciforms are sensitive to DNA damage and are covered by proteins in order to protect these sequences from being cleaved. In these cases, a deficiency in cruciform binding proteins can lead to DNA breaks. Here, cruciform-DNA complexes can also serve as scaffolds to recruit the DNA damage machinery.