The architecture of interphase chromosomes presents a major challenge for our understanding of the functioning of the mammalian genome. Chromosomes are composed of hierarchical levels of chromatin loops or folds. Several models have attempted to describe chromatin organization above the level of the nucleosomal fiber [13]. Of these, the 'multi-loop subcompartment' model, in which rosettes of approximately 1-2 Mb are built up from smaller chromatin loops of 50-200 kb, is compatible with most of the recent experimental findings [3]. Although there is no definitive proof so far for this or any other model of higher-order chromatin architecture, it is clear that the folding and looping of chromatin leads to the formation of discrete 'territories' for individual chromosomes in the interphase nucleus [4]. Accumulating experimental evidence suggests that these chromatin loops or folds are maintained by attachments to the nuclear matrix [5].

The nuclear matrix extends throughout the nucleus and consists of proteins that are retained after unbound chromatin and soluble proteins are removed using high-strength ionic buffers [69]. Although the nature of the nuclear matrix is still under debate [7], it has achieved prominence as many of its best-characterized components, including lamins, topoisomerase II, special AT-rich sequence binding protein 1 (SATB1) and scaffold attachment factor-B1 (SAFB1), are key players in fundamental nuclear processes [1013]. In eukaryotic organisms, chromatin is anchored to the nuclear matrix by short DNA sequences of about 100-2,000 bp called matrix attachment regions (MARs) [5, 14]. The strong interaction between MARs and the insoluble proteins of the nuclear matrix protects these sequences from high-strength ionic buffers and nuclease digestion [9]. In general, MARs are rich in AT and repetitive sequences, and map to regions where the DNA is intrinsically curved or kinked and has a propensity for base unpairing [1519]. The spacing of AT sequences is crucial for matrix binding, but there is no consensus DNA motif for the estimated 30,000-80,000 MARs in the human genome [6, 20].

MARs are bound to the nuclear matrix either constitutively or transiently. The higher-order chromatin structure of interphase and metaphase chromosomes is likely to be maintained by constitutive MARs. The dynamic associations of transient MARs are more likely to be implicated in genomic function, as they correlate with transcription or replication of the genetic loci with which they are associated [9]. In this review, we draw together evidence from higher eukaryotes that, further to their role in chromosome structure, MARs are key mediators of genome regulation, and we will discuss their roles in human disease.

MARs and transcriptional regulation

The tethering of DNA to the nuclear matrix plays a vital role in transcription [9, 21, 22]. Using T-cell differentiation as a model we will describe how MARs facilitate transcription and reveal how they shape chromatin architecture to insulate chromatin domains from the effects of flanking chromatin.

Upon stimulation by antigen, naive CD4 helper T cells differentiate into effector Th1 and Th2 cells. In mice, Ifng (the gene for the cytokine interferon-γ) is silenced in naive T cells but transcribed in activated Th1 cells. The architecture of the Ifng locus has been analyzed in these two cell types by a combination of chromosome conformation capture and microarray technology [22]. In naive T cells Ifng was found to exist in a linear conformation, but in Th1 cells it is present in a chromatin loop, due to tethering of DNA to the nuclear matrix by MARs 7 kb upstream and 14 kb downstream of the locus. The absence of this selective DNA attachment to the nuclear matrix in naive T cells suggests that dynamic DNA anchors mediate the formation of the looped structure and the expression of the Ifng locus [22].

The molecular mechanisms by which MARs reorganize higher-order chromatin structure have been investigated in detail at the murine Th2 cytokine locus, which contains the cluster of coordinately regulated genes Il4, Il13 and Il5 in a region of about 120 kb [23]. These genes are expressed in Th2 cells but are silent in naive T cells. Following Th2 activation, expression of the nuclear matrix protein SATB1 is rapidly induced, and MARs within the locus mediate the formation of small loops by anchoring the loops onto a common protein core associated with SATB1 [12]. Down-regulation of SATB1 expression by RNA interference prevents both the formation of this looped structure and transcriptional activation of the locus [12]. In SATB1-null thymocytes (developing T cells) the expression of many genes is spatially and temporally misregulated, and T-cell development in SATB1-deficient mice is prematurely blocked. These results indicate that the binding of SATB1 at MARs regulates the expression of T-cell differentiation genes by reorganizing higher-order chromatin architecture [24, 25]. A similar MAR-mediated loop-formation mechanism regulates expression of the human β-globin gene cluster [26, 27].

Cai et al. [25] reported that SATB1 recruits several chromatin-remodeling enzymes at MARs to activate or repress the expression of nearby sequences. Other studies have shown that MARs interact dynamically with basal components of the transcription machinery and with splicing factors [28, 29]. In eukaryotic cells, mRNA synthesis is concentrated at discrete transcription 'factories' or foci within the nucleus, which contain RNA polymerases, RNA transcripts, transcription factors and mRNA-processing factors [30]. The retention of RNA polymerase II and general transcription factors in nuclei after extraction of soluble proteins and nuclease digestion suggests that transcription factories are assembled onto the nuclear matrix [31, 32]. As MARs associate with components of transcription factories as well as the nuclear matrix, it is tempting to speculate that dynamic interactions between MARs and the matrix bring together proximal and distal regulatory sequences and localize them close to transcription factories, thus promoting efficient regulation of gene expression (Figure 1).

Figure 1
figure 1

A simplified model depicting the function of matrix-attachment regions (MARs) in gene regulation. Activation of transcription is accompanied by the anchoring of MARs to the nuclear matrix. This results in the formation of an anchored chromatin loop that is insulated from the stimulatory or repressive effects of the flanking chromatin. The transcription machinery is assembled at the site of the MAR-nuclear matrix attachments. Interaction of MARs with the nuclear matrix brings together gene coding sequences, regulatory DNA elements and the transcription machinery, thus enabling specific genes to be coordinately regulated. At the end of S phase, the replication machinery is dismantled.

Many genes are known to be shielded by so-called 'insulator' elements from stimulatory or repressive effects attributable to the chromatin state and regulatory elements in flanking regions. MARs commonly map to sequences flanking genes, and co-localize with some of the most extensively analyzed insulator elements, including gypsy, a retrotransposon in Drosophila melanogaster, suggesting that MARs have an insulator function [33]. In Drosophila, the nuclear matrix protein Su(Hw) binds to gypsy, creating chromatin loops [34]. Certain mutations in Su(Hw) that disrupt the loop structures render the insulator non-functional [34, 35]. This suggests that the tethering of MARs to the nuclear matrix topologically constrains the DNA into looped structures, protecting the intervening DNA from the influence of cis-regulatory elements outside the loop. In vertebrates, CTCF, a ubiquitous nuclear matrix protein, binds to insulators and has also been shown to interact with MARs [36]. While the precise mechanisms of CTCF insulation remain unclear, the binding of CTCF to MARs might block interactions between promoters and unrelated enhancers and create looped structures that delimit different chromosomal domains [37]. Experiments in a wide variety of higher eukaryotes have shown that in stably transfected cells, MAR-containing transgenes were expressed at higher levels compared with transgenes lacking MARs, indicating that the MARs shield the transgenes from the effects of the neighboring host chromatin [38, 39].

Taken together, the experimental evidence described above supports the view that MARs function as landing platforms for a wide range of matrix proteins. Such interactions form complex higher-order nucleoprotein structures, which insulate chromatin domains and also control gene expression by forming bridges between components of the basal transcription machinery and distal and proximal regulatory elements. MARs can thus be defined as cis-acting elements constituting a critical layer of transcriptional regulation.

MARs and DNA replication

To ensure that the genome is copied accurately, and only once per cell cycle, eukaryotes have evolved intricate mechanisms to regulate DNA replication. Some of the best-characterized origins of replication (ORIs) have been mapped to AT-rich genomic regions with base-unpairing elements. Futhermore, sequences at or near the ORIs for the human lamin B2 gene, the Chinese hamster dihydrofolate reductase β and β' genes, the human β-globin gene, the chicken α-globin and lysozyme genes, and the Xenopus and mouse c-myc genes, function as dynamic MARs during the cell cycle [4046].

These findings are in agreement with observations that DNA replication is temporally and spatially ordered in the nuclei of animal cells. Several replicons are coordinately replicated at foci in the S-phase nucleus [47, 48]. Evidence that replication foci are associated with the nuclear matrix came first from electron microscopy [49]. Further support came from a study of nuclear matrix structures where DNA synthesis occurred at replication sites that were indistinguishable from those found in intact cells [50]. Radichev and colleagues [51] found that DNA replication initiates at discrete chromosomal sites attached to the nuclear matrix.

At replication foci, the nuclear matrix houses factors necessary for DNA replication, such as DNA polymerases, the sliding clamp (PCNA) and single-strand binding protein (RPA), and provides structural support throughout the replication process. Wu and Gilbert [52] proposed that origins are selected and replicon size is determined in early G1 phase of the cell cycle. Using an in vitro system, it was subsequently shown that MCM2, a component of the pre-replicative complex, is loaded onto chromatin gradually and cumulatively throughout G1, but is rapidly excluded from active replication foci in S phase [53]. Tatsumi and colleagues [54] reported a similar cycle of events for ORC1, a component of the replication initiation complex at ORIs. This coincides with recruitment of the chromatin-bound ORC2-5 complex to a structure likely to be the nuclear matrix [55], suggesting a link between the accumulation of ORC1 and the assembly of the replication complex in human nuclei.

These observations fit a model in which MARs stably anchor the replicon ends and, during G1, small-scale sub-chromosomal chromatin refolding recruits ORIs to the nuclear matrix, where factors accumulate to form the pre-replicative complexes (Figure 2). Subsequently, as ORIs begin to replicate in S phase, certain protein factors dissociate from the chromatin and undergo proteolysis - as part of a control mechanism to prevent re-replication - thus releasing the ORIs from the nuclear matrix. In the meantime, replication continues at the initial location as DNA is reeled through the replication machinery or replication factory [49]. At the ends of replicons, stable MARs could act as barriers between adjacent replicons by preventing the accumulation of supercoiled DNA structures, while providing binding sites for topoisomerase II, which can resolve replication intermediates.

Figure 2
figure 2

DNA replication is organized at the nuclear matrix. (a) Replicons are defined in early G1 phase of the cell cycle by attachment of MARs to the nuclear matrix. (b) In late G1, origins of replication (ORIs) are recruited to the nuclear matrix and replication factors assemble at these sites, licensing the chromatin for replication. (c) Once the appropriate mitogenic stimuli have been received, cells enter S phase, at which ORIs become activated. Following initiation of replication at a particular locus, the two identical newly replicated ORIs probably dissociate from the nuclear matrix. Two loops of replicated DNA gradually emerge (shown in blue), while the yet-to-be replicated DNA of the replicon moves through the replication factory. (d) At the end of S phase, the replication machinery is dismantled. Adapted from [71].

Genome anchoring and disease

Integration of retroviral DNA into the host genome is essential for viral replication. Although retroviral integration sites lack a consensus sequence, they are often AT-rich with base unpairing and DNA-bending and unwinding elements [56, 57]. DNA sequence analysis indicates that both DNA tumor viruses and retroviruses integrate within or close to MARs (Figure 3) [58, 59]. Furthermore, the efficiency of transcription of the retrovirus HIV-1 is determined by the proximity of its integration to MARs [57]. As SATB1 binds to MARs flanking HIV-1 integration sites and silencing of SATB1 gene expression alters the pattern of integration sites, it has been suggested that retroviruses use MARs to form viral pre-integration complexes [60].

Figure 3
figure 3

Schematic representation of viral genome integration. Tumor viruses and HIV-1 integrate near MARs attached to the nuclear matrix, where the transcription and DNA replication machinery is assembled. The viral genome is thus integrated near the machinery required for its transcription and replication. Adapted from [56].

MARs also appear to play a role in some cancers. Chromosome rearrangements are hallmarks of certain malignancies and inherited genetic disorders. The breakpoints of recurrent translocations in leukemia as well as deletions involving the breast-cancer susceptibility genes BRCA1 and BRCA2 occur at MARs, indicating that the bringing together of these sequences at the nuclear matrix facilitates their illegitimate recombination [61, 62]. Patients who develop leukemia following treatment of a primary tumor with inhibitors of topoisomerase II often have specific chromosome translocations in their cancer cells whose breakpoints contain MARs, emphasizing the importance of the chromatin environment in the generation of chromosome aberrations [63, 64].

Fragile sites are hypervariable regions that generate genomic instability in tumors. Certain fragile sites contain long AT-rich minisatellites, called AT-islands, which function as MARs [65]. AT-islands are susceptible to considerable repeat expansion, which, in the fragile site FRA16B associated with leukemia, appears to strengthen their attachment to the nuclear matrix [65]. The presence of abnormal transcripts of the tumor suppressor gene WWOX (which spans FRA16B) in the absence of detectable mutations or deletions may be caused by aberrant chromatin architecture due to enhanced MAR anchoring by expanded AT-islands [66].

Identification of AT-islands has led to the emergence of a new class of drugs that specifically alkylate them [67]. These drugs exhibit an extraordinary cytotoxicity, which is likely to be due to their disruption of replication and transcription, the two essential nuclear processes organized at MARs (Figure 4). One of these drugs, bizelesin, binds specifically to the minor groove of DNA at AT-rich regions and generates interstrand crosslinks. It has high cytotoxic activity in vitro towards a broad spectrum of human cancer cell lines and, more importantly, high activity against various tumors engrafted in mice [68, 69]. While extensive development will be needed to make these compounds safe anti-cancer drugs for clinical use, their DNA-sequence specificity might offer a novel approach for targeting tumor cells containing expanded AT-repeat sequences.

Figure 4
figure 4

Proposed mechanism for the cytotoxic action of AT-specific drugs. The drugs bind to AT-rich MARs in chromatin, crosslinking the two strands of the DNA. This leads to the disruption of processes such as transcription and DNA replication that are initiated at or in the vicinity of MARs.

Our understanding of how the genome functions in the context of the nucleus has been propelled by indisputable evidence that distinct genomic sites bind to regulatory proteins at the nuclear matrix. The emerging picture is that these genomic anchors regulate transcription and replication by dynamically organizing chromatin in three-dimensional space. The recognition that these essential nuclear processes are compartmentalized into microenvironments that are compromised in diseases such as cancer [70] emphasizes the need to define chromatin architecture more accurately in relation to the various nuclear domains. In reaching beyond the linear genome, we will approach a more comprehensive view of genomic function and are likely to identify truly novel targets for therapy.