Introduction

Human chromosomes are arguably the largest and most important biomolecules, yet we still know relatively little about how they are folded or how folding affects function. For example, we cannot yet predict how active a gene might be when inserted into a new genomic location. This contrasts with our knowledge of protein structure, and the way it determines function. Thus, biologists are all taught about fundamental architectural motifs like α-helices and β-sheets that are used to build proteins. They also understand the principles that drive folding into tertiary structures (e.g., by positioning hydrophobic residues in the interior), and assembly into quaternary complexes (e.g., through interactions between complementary surfaces). They expect to be able to tinker with the structure and alter enzyme activity. Our purpose here is to tease out analogous principles that underlie genome structure, with the idea that one day we will be able to use these to modulate gene expression. (For reviews, see Lanctôt et al. 2007; Langowski and Heermann 2007; Misteli 2007).) We concentrate on the roles played by non-specific, and particularly entropic forces, as we believe their importance has been underestimated.

Some principles

One might hope that proteins and genomes would be folded using the same principles; although many are certainly shared, there is one over-riding difference (Misteli 2001; Karsenti 2008). Consider a protein like one in an icosahedral capsid of a virus. Such a protein first folds into a structure in which we can specify exactly where one particular residue is relative to another, before “self-assembling” into the larger capsid where again every component has a well-defined position (Fig. 1a; Zandi et al. 2004). Then, X-ray crystallography provides the gold standard method for determining structure. Contrast this with the cytoskeleton. This lacks a rigid architecture and is intrinsically unstable; it persists only by exchanging subunits with others in its surroundings, and—if those subunits are removed or active turnover is prevented—it collapses and disappears (Fig. 1b). No two skeletons have exactly the same shape, and the structure changes from moment to moment. Then, statements about the position of any residue are necessarily probabilistic, and it is foolish to attempt to “solve” the structure by crystallography.

Fig. 1
figure 1

Two design principles. a During assembly of a virus capsid, components come together into a stable and static structure that reaches thermodynamic equilibrium (through thermodynamic forces acting alone); the position of individual residues can be specified precisely in the larger structure. b During the self-organization of the cytoskeleton, components in the complex exchange continuously with others in the soluble pool; the continued existence of the structure depends on a continuing supply of energy (so the system is not in thermodynamic equilibrium), and statements about residue position are inevitably probabilistic

We will argue that genome structure is dependent upon both self-assembly and self-organization. Then, current shape during interphase will depend on past history; for example, in which cell lineage that genome is found, and what the “initial” conditions were (e.g., the conformation during the previous mitosis). It will also depend on a continuing energy input (e.g., through transcription and its effects on looping; Cook 2002; Marenduzzo et al. 2007). As a result, long genomic polymers are probably metastable, balanced on a cusp by several competing forces (depending on thermodynamics, activity, and kinetics; Rosa and Everaers 2008; Cook and Marenduzzo 2009). They will also be exquisitely sensitive to the slightest perturbation (which facilitates regulation over a greater dynamic range).

We might imagine that the critical interactions involved in maintaining the structure would be conserved. It seems they are. Thus, segments of bacterial and yeast DNA can integrate into mammalian DNA and fold correctly to give viable interphase and mitotic structures (Heng et al. 1994; McManus et al. 1994). This implies that the integrated DNA encodes the required structural cues, and that mammalian proteins can interpret these appropriately. In other words, the critical interactions cannot be kingdom specific. If so, many biologists would then expect those interactions to involve conserved proteins bound to conserved DNA targets, but the various genome sequencing projects have signally failed to uncover any obvious candidates. (The structural maintenance of chromosomes complex proteins (e.g., condensin, cohesin, MukBEF complexes) are found in all three domains of life, and conditional knockdowns show that no single one is the sole organizer of mitotic structure in eukaryotes (Belmont 2006; Peters et al. 2008).) We will suggest that some critical interactions involve non-specific (entropic) forces acting on polymerases and their transcription units.

One way of thinking about genome structure

One might hope that the same set of four forces that determine protein structure (i.e., hydrogen or H-bonds, van der Waals forces, hydrophobic, and charge interactions) would also shape genomes—and they certainly do. But before the effect of these forces is considered, it is important to ask what shape a chromosome might take in their absence. In other words, what would be the structure of an “unorganized” chromosome? It is often implicitly assumed that chromosomes, if unperturbed by other forces, would intermingle freely into an unintelligible mess (like cooked spaghetti). But the mass and contour length of a typical 100-Mbp human chromosome are both ~7 orders of magnitude larger than those of an average-sized (5-nm) protein, so additional forces can become significant. Computer modeling shows that the structures formed by highly confined DNA are non-trivial, and respond in surprising ways to non-specific perturbations such as protein binding and macromolecular crowding. Then, some aspects of chromosome structure reflect the chromosome's tendency to “self-assemble” into thermodynamically preferable conformations, and this can occur even in the absence of all four specific forces and any direct energy input.

Segregating bacterial chromosomes

A simple thought experiment can demonstrate that DNA can adopt a surprisingly organized structure. Consider the red and blue spheres representing proteins in Fig. 2 (Jun 2008). If the wall between the two compartments is removed, diffusion allied to the entropy of mixing drives the system towards greater disorder and complete intermixing. But if we now connect particles of the same colour to give two long strings (now representing red and blue chromosomes), continued diffusion can now drive de-mixing. This follows because one string in a mixture provides a set of obstacles to the other, so restricting the number of possible conformations. But on de-mixing, each string can access more conformations by occupying a smaller volume free of obstacles. A related effect ensures that circular polymers de-mix even better than their linear counterparts, and that supercoiled circular polymers de-mix better than both (Frank-Kamenetskii et al. 1975; Müller et al. 2000; Jun and Mulder 2006).

Fig. 2
figure 2

Entropic forces can drive chromosome segregation (de-mixing). Red and blue spheres (representing proteins) are confined in separate compartments; when the interconnecting wall is removed, diffusion allied to the entropy of mixing drives the system towards greater disorder and complete intermixing. If spheres of the same colour are now connected (to give two long strings representing chromosomes), continued diffusion can drive de-mixing (when each string can access more conformations by occupying a smaller volume free of obstacles provided by the other string). This effect can drive the segregation of chromosomes in rod-shaped bacteria (Jun 2008)—but not in the spherical nuclei of eukaryotes

One might imagine that this effect could have important implications for chromosome structure. Indeed, Monte Carlo simulations of bacteria have shown that replicated DNA will rapidly segregate to the cell poles as it is synthesized (just as it does in vivo) in the absence of any mechanism other than random diffusion (Jun and Mulder 2006). A similar effect explains why a DNA hairpin confined in a nano-slit (originally introduced by electrophoresis) spontaneously de-mixes to give a linear molecule (Levy et al. 2008). (This effect does not occur at high-volume fractions in a spherical container (Jun 2008).)

Forming chromosomal territories

Chromosomes are not intermixed within the nuclei of higher eukaryotes; they are confined to discrete “territories” (Gilbert et al. 2005; Cremer and Cremer 2006; Misteli 2007). But does this conformation represent anything more than another unexpected manifestation of a random configuration (e.g., like self-avoiding or worm-like chains), at least during most of the cell cycle when they are not being organized by the division apparatus? It is clear that it does. Thus, computer simulations indicate that random walks generate intermingled fibres with many inter-fibre contacts (as in cooked spaghetti) and not compact territories with many intra-fibre ones (e.g., Münkel et al. 1999; Bohn et al. 2007; Jhunjhunwala et al. 2008). Moreover, the physical separation between any two human genes in 3D nuclear space (determined by fluorescence in situ hybridization (FISH)) depends on the number of intervening base-pairs according to a power law (i.e., with an exponent of ~0.5 below genomic separations of ~4 Mbp, and ~0.32 above) that is inconsistent with a random walk; rather, the fibre must fold back on itself to give the required compaction, and the best fit is given by models involving mixtures of local and giant loops (of ~0.1 and ~1 Mbp). This kind of modeling also shows that looped polymers—but not linear ones—also form discrete territories, with the appropriate aspherical shapes (Khalil et al. 2007; Cook and Marenduzzo 2009; de Nooijer et al. 2009). Although looping seems at first like a relatively specific mechanism, we will argue later that it could arise inevitably and non-specifically due to basic aspects of primary chromatin structure.

Positioning human chromosomes

Chromosome territories are often positioned non-randomly in nuclei (Gilbert et al. 2005; Cremer and Cremer 2006; Misteli 2007). For example, gene-poor chromosomes in human lymphocytes tend to be peripheral and gene-rich ones internal, inactive heterochromatin often aggregates at the periphery, and centromeres may cluster into chromocenters. For reasons that remain unclear, G/C content proves to be one of the best predictors of such radial positioning; a high value correlates with an interior position, a high gene content, transcriptional activity, and an increased flexibility and decompaction of the chromatin fibre (Gilbert et al. 2005; Cremer and Cremer 2006; Misteli 2007). Such positioning has important consequences, for example in repressing genes by bringing them closer to inactive heterochromatin. It is again usually assumed that chromosomal shape and positioning results from the action of the four specific forces acting locally—for example, between one nucleosome and another, or between nucleosomes and the lamina.

Monte Carlo simulations were used to demonstrate that entropic forces acting alone can position and shape self-avoiding polymers within crowded nuclei in the ways seen experimentally (Fig. 3; Cook and Marenduzzo 2009; de Nooijer et al. 2009). Polymers composed of strings of beads were allowed to “diffuse” in a confining sphere until they reached equilibrium. Flexible polymers (like GC-rich and gene-rich chromosomes) tend to be found more towards the centre, probably through the resolution of forces acting in an “entropic” centrifuge. Thus, at low packing fractions, a stiff polymer statistically occupies more volume than a flexible one (it has a larger radius of gyration), and—when it approaches the confining wall—it “feels” the wall sooner to lose more entropy; therefore, it tends to be found more towards the interior. Thus, in the cartoon on the right of Fig. 3a(i) the stiffer blue polymer is larger and more surface-phobic, and so tends to be excluded from the (larger) grey volume at the periphery; as a result, it is more frequently found towards the centre in the smaller yellow volume. But at high packing fractions, the entropic effect illustrated in the cartoon in Fig. 3a(iii) probably becomes significant. Here, one end of the first persistence length in a stiff polymer (represented by the blue rod) abuts the confining wall. If we imagine this blue rod is tethered to the wall, it can access all conformations in the light blue volume—but not the grey volume outside the confining wall. If the rod is now divided into two (shown in red) to increase flexibility, the light blue volume still remains accessible, but the particular conformation shown is not permissible (as the second half of the red rod penetrates the wall). This qualitatively suggests that flexible (red) polymers lose more configurations (and so entropy) when squashed against the wall. Then, they have become the most surface-phobic, and so tend to be found internally where they lose less entropy. This is what is seen experimentally (indicated by the tick in Fig. 3a(iii)), where heterochromatic (stiffer) regions are often—albeit not always—peripheral (Solovei et al. 2009).

Fig. 3
figure 3

Monte Carlo simulations of two sets of five polymers confined within a sphere. Polymers composed of strings of beads were allowed to “diffuse” in the computer until they reached equilibrium. In each case, typical configurations (left), normalized radial probabilities (middle), and cartoons illustrating major determinants of position (right) are shown for the volume density (φ) indicated. All polymers are self-avoiding (i.e., no bead occupies the same space as another) and—unless stated otherwise—there are 50 beads (diameter 30 nm) in a polymer with a contour length of 1.5 μm (representing 150 kbp) and a persistence length of 40 nm. a Five stiff (blue) and five flexible (red) polymers, with persistence lengths (ξ) of 90 and 40 nm. i At low volume fractions, stiff polymers tend to lie more internally than flexible ones. Cartoon: Stiff polymers statistically occupy more volume and so lose more entropy when positioned in the (larger) grey volume close to the wall; as a consequence, their centres of mass tend to be concentrated in the yellow volume at the centre. ii, iii At volume fractions above 11%, this trend reverses and stiff polymers now tend to be peripheral. This is what is seen experimentally (tick), where stiff—heterochromatic—regions are often peripheral. Cartoon: see text. b. Five compact (blue) and five open/swollen (red) polymers. Compaction is achieved by allowing monomers in one set to interact with other monomers in the same polymer with an attractive potential of 1 k B T in the range between 30–50 nm (centre-to-centre distances). i, ii Compact polymers are more peripheral at both low- and high-volume fractions, in accord with what is seen experimentally (as heterochromatic regions are more compact; tick). Cartoon: This entropic bias is due to compact fibres being able to approach closer to the confining wall (i.e., the inaccessible grey volume is less, and the accessible yellow volume is more). c Five thick (30-nm beads; blue) and five thin (25-nm beads; red) polymers. i At low volume fractions, thicker polymers tend to be slightly more internal. Cartoon: thicker polymers tend to be excluded from a larger grey volume. ii At high-volume fractions thicker polymers tend to be peripheral; this is what is seen experimentally (tick). Cartoon: see main text for discussion. From Cook and Marenduzzo (2009)

Entropic forces can also position compact/thick fibres (like heterochromatin) towards the periphery, in accord with experimental observations; compact fibres can approach closer to the confining wall (Fig. 3b), while a larger depletion attraction (below) favours packing of thicker fibres against the wall (Fig. 3c; Cook and Marenduzzo 2009). This is also in accord with experimental observations: compact heterochromatin is often peripheral (Solovei et al. 2009). In addition, flexible territories tend to intermingle less with others, in accord with gene-dense (and so flexible) chromosomes being poor translocation partners (Bickmore and Teague 2002). If the polymers carry a large terminal bead (to represent centromeric heterochromatin at one end of a telocentric chromosome), beads are found both at the edge of their own territories and clustered at the nuclear periphery—again as found in vivo (Cook and Marenduzzo 2009; de Nooijer et al. 2009).

Entropic forces can drive looping

Entropy can also create order through the “depletion attraction” (Asakura and Oosawa 1958; Marenduzzo et al. 2006b). Consider a few large multi-protein complexes and many smaller macromolecules (mainly soluble proteins of ~5 nm) crowded into the nucleus. The mega-complexes will be bombarded from all sides by the smaller macromolecules (Fig. 4a(i)). As two mega-complexes approach, the smaller macromolecules are sterically prevented from entering the volume immediately between the two. As a result, the small macromolecules exert an unopposed force equivalent to their osmotic pressure on opposite sides of the two, keeping them together. Again, ordering the minority increases the disorder of the majority. The energy involved scales with the ratio of the diameters of the large and small spheres, and—in our case of the crowded cell—becomes equivalent to that of several H-bonds when the large spheres are >10 nm in diameter. Measurements (using “optical tweezers”) of the force required to pull apart two large plastic beads in a crowded solution of smaller spheres confirms the accuracy of the underlying theory (Yodh et al. 2001).

Fig. 4
figure 4

Some local effects. a The depletion attraction. i In a crowded volume, many small soluble macromolecules (purple) bombard large complexes from all sides (grey arrows). When two complexes come into contact, small macromolecules are sterically excluded from the green volume between the two and so cannot knock the two large complexes apart; as a result, a “depletion attraction” (equivalent to the osmotic pressure exerted by small macromolecules on opposite sides of the two large complexes), keeps the large complexes together. A similar force drives the large spheres to the surrounding wall (not shown). ii When the large spheres (polymerases) are threaded on a string (DNA or chromatin), this depletion attraction is only partially countered by the entropic cost of looping. It has the strength of a few H-bonds, and will act for as long as polymerases remain engaged. iii The attraction can also drive large beads (e.g., NORs, centromeric heterochromatin) into clusters (e.g., nucleoli, chromocentres). Modified from Marenduzzo et al. (2006b) with permission. b Two preferred conformations of tubes in which inter-segment interactions are stronger than segment:solvent ones; maximizing the buried surface area of the tube drives compaction into single or double helices. This kind of force transcends chemical detail, and acts over many scales (e.g., it probably underlies the formation of α-helices in proteins, double helices in DNA, and the coiling of chromatin fibres in vitro)

Now imagine the mega-complex is bound to DNA in a crowded nucleus (Rippe 2007). As an example, we will use an RNA polymerase II complex containing the multi-subunit enzyme (~15 nm), and its nascent RNA (compacted diameter ~14 nm) with associated proteins (which might include a ~25 nm spliceosome); however, the argument applies equally to any cluster of factors bound to DNA. Such a mega-complex will be subject to the depletion attraction, and when two bound to the same DNA molecule come together they will inevitably loop the intervening DNA. Such looping has an entropic cost, and a cost/benefit analysis shows that the attraction can make a significant contribution towards stabilizing 20-kbp loops (Fig. 4a(ii); Marenduzzo et al. 2006a). Theory also suggests crowding affects the rate at which equilibrium is attained (Toan et al. 2006)—by speeding looping (by reducing effective loop length and so increasing diffusive encounters) and slowing unlooping (by increasing viscosity). This effect is generalizable to any imperfection in the fibre (e.g., due to bound transcription factors, or more open chromatin; St-Jean et al. 2008), and the resulting loops will persist for as long as the mega-complexes remain bound. In our example, this is the time taken to transcribe a gene, which is usually many minutes—and can be hours if polymerases pause (as many do; Margaritis and Holstege 2008). In our example, the looped structure persists only as long as the polymerase continues to transcribe; the structure is self-organizing as it depends on an energy input. There is now good experimental evidence (e.g., provided by FISH, chromosome conformation capture and its derivatives) that genomes are looped, and that active polymerases are the ties (Fullwood et al. 2009; Cook 2010). Although individual loops may cluster together to give a neat rosette—and so into strings of rosettes (Cook 2002, 2010)—simulations suggest that more complicated structures will form (Marenduzzo et al. 2006a; Junier et al. 2010). The attraction can also bring together large heterochromatic clumps—which might be nucleolar organizing regions (NORs) or centromeres—into nucleoli or chromocentres (Fig. 4a(iii)).

Solenoids

We now consider one architectural motif often depicted in our textbooks—the solenoid. This is usually considered to be the first member in a helical hierarchy in which nucleosomal strings are coiled into 30-nm solenoids, solenoids into 200–300 nm fibres, and so on. There are powerful theoretical reasons why helices might form. Consider a long flexible tube in which different segments have a higher affinity for each other than the solvent (justifiable as nucleosomes so often aggregate). This affinity could arise through action of some/all of our four specific forces and/or the depletion attraction (Snir and Kamien 2005; Hansen-Goos et al. 2007). Both analytic treatments and Monte Carlo simulations show the tube will spontaneously fold into a helix, with different helices packing side-by-side into double and triple helices; this arrangement maximizes segment/segment interactions and minimizes segment/solvent ones (Fig. 4b; Banavar et al. 2007). This kind of analysis transcends chemical detail, and the resulting folding probably underlies the formation of α-helices in proteins, double helices in DNA, and the coiling of chromatin fibres in vitro. Experiments also show that cations compact nucleosomal strings into 30-nm fibres in vitro, while X-ray crystallography confirms these fibres are helical (van Holde and Zlatanova 2007). Unfortunately, there remains little agreement as to whether equivalent coils exist in vivo, and—if they do—about nearly all details of their structure (van Holde and Zlatanova 2007). For example, what is the path of the DNA (is it a one- or two-start helix?), how wide is the helix, and what are the major contacts between turns? Moreover, nucleosomes seen in electron micrographs generally follow random zig-zagging paths, and not helical ones (Horowitz-Scherer and Woodcock 2006); most disturbingly, no solenoids or 30-nm fibres are seen in vitrified cell sections prepared under conditions where they should be detectable (Eltsov et al. 2008). If solenoids and higher-order coils are important architectural motifs, we are still some way from describing their details. It may even be that Nature prevents solenoids from forming, simply because the forces described above are so strong that they would generate (if acting unopposed) structures so stable that they could never be uncoiled (a requirement if polymerases are to read the sequence DNA). Indeed, canonical 30-nm solenoids are only seen in electron micrographs of inert cells devoid of polymerases (e.g., sea urchin sperm, chicken erythrocytes; Horowitz-Scherer and Woodcock 2006).

Conclusions

Biologists typically focus on four types of (specific) force (i.e., H-bonds, van der Waals forces, hydrophobic and charge interactions) that shape many biological structures, and many assume they will be the sole determinants of all structures in the cell. We have reviewed evidence that other non-specific (entropic) forces are major players in shaping and positioning chromosomes. We do not wish to suggest that these entropic forces act alone, without contribution from the four familiar ones; rather, we imagine the ultimate outcome is determined by resolution of the combined forces—which sometimes may be conflicting. For example, charge interactions acting between bound transcription factors (Rippe 2001) will undoubtedly contribute with the depletion attraction to looping. And although an entropic centrifuge may drive heterochromatin to the periphery in higher animals and plants, specific nucleosome:lamin interactions are probably also involved in animals (Polioudaki et al. 2001)—but not in plants (as they lack lamin proteins). Similarly, heterochromatin may often be peripheral, but the rod cells in the retinas of diurnal mammals provide a striking exception (Solovei et al. 2009), so here specific forces may become the most significant.

It has now become a relatively trivial matter to modify a specific site within a protein and so change that protein’s function, and this ability is built on a secure knowledge of the major forces that determine the structure. We believe that we will only be able to modify specific sites within the genome and so change function once we have an equally secure knowledge of the relevant forces, and this will only be gained by combining experimental approaches with the theoretical ones described here.