Consideration of gene regulation as an interaction between the basal transcription machinery, regulators, and a segment of naked DNA containing a gene has been an extremely successful model for understanding how genetic information is read [1]. It is impossible to separate this model from the great success of modern molecular genetics. Naked DNA does not exist in nature, however. Packing DNA into the nucleus, segregating it during cell division, and making sure that it is readily available when needed for transcription is an enormously complex task for the cell and needs extensive interactions of proteins with DNA. Structural considerations have dominated our views of nuclear architecture but have not greatly influenced our concepts of transcription, other than by the broad assumption that such structure can hinder transcription. As with a building, however, the structure of the nucleus has a significant, albeit possibly subtle, influence on the work performed within it.

Recent work in many different eukaryotes has suggested that genes with particular expression patterns are sometimes found in contiguous regions of the genome. We call these regions gene-expression neighborhoods; we avoid using the term 'cluster' because this usually refers to potentially co-regulated genes, regardless of their genomic position. The human eye is exceedingly adept at finding patterns and does so even in randomness (the constellations of stars are an example of patterns that humans have imposed on a random distribution). Here, we review the evidence that gene-expression neighborhoods are real. Although there are some lingering questions about the best methods for finding them and about how to avoid being tricked by spurious or artifactual patterns, the body of available evidence leaves little doubt that they exist. With a few exceptions from classical molecular biology, the regulatory mechanisms underlying gene-expression neighborhoods are not understood. Investigating these mechanisms will be a major challenge, but we believe that some will involve the structure of the nucleus. We review the evidence for non-random positioning of chromosomes and genes in the nucleus and suggest ways in which the regulation of gene-expression neighborhoods can be studied.

Evidence for gene-expression neighborhoods

It has long been known that prokaryotic genes are organized into operons [2]. Even in eukaryotes, it has been recognized that some genes encoding related functions are near neighbors in the genome: the clusters of genes encoding histones and tRNAs and the arrays of ribosomal RNA (rRNA) genes were early examples [3]. Although the lists of known gene neighborhoods like these have grown and now include important regulatory genes such as the Hox and microRNA genes [4, 5], most researchers have considered gene neighborhoods oddities.

There have been hints that gene-expression neighborhoods are widespread. For example, direct examination of Drosophila polytene chromosomes showed nascent transcripts being generated from many genes in some multigene segments [6], and the systematic analysis of the effect of position on transgene expression supported the idea that there are regions, both euchromatic and heterochromatic, that are broadly repressive [7]. Functional assays like these, in conjunction with structural studies, raise the possibility that loop domains within chromosomes (the domains between matrix-attachment regions) are units of transcriptional regulation [8]. Genome-scale analyses of gene expression using microarrays have resulted in an explosion of papers describing gene-expression neighborhoods. Non-random expression of pairs of adjacent genes has been reported in yeast [9] and plants [10], and among highly expressed human genes [11], genes expressed in Caenorhabditis elegans muscle [12], genes differentially expressed in embryonic versus adult Drosophila [13, 14], genes differing in expression between the sexes [15, 16], and genes expressed in cancerous cells [17].

There are a number of pitfalls to the analysis of neighborhoods of contiguous genes, calling for great analytical rigor. For example, the position of the elements on the array must be taken into account, as non-random hybridization across the array landscape is often significant: for example, some early arrays had probes printed in the same order as they appear in the genome [18]. But expression data generated using arrays with probes printed randomly with respect to chromosome position, analyzed under very stringent criteria comparing real data to 100,000 random sets, still show gene-expression neighborhoods (an example from our own analyses on Drosophila [16] is shown in Figure 1a). Findings of such non-random patterns are also independent of the technique used. Sequencing-based assays such as expressed sequence tags (ESTs) and serial analysis of gene expression (SAGE) are not subject to array artifacts of any kind; experiments using these techniques have shown in mammals that highly expressed genes [19] and genes with testis-biased [20] or organ-biased [21] expression are organized into neighborhoods. Finally, a computational analysis of the gene correlation between gene location and inferred gene function in many eukaryotes shows neighborhood structure [19].

Figure 1
figure 1

An illustration of gene neighborhoods and chromosome territories. (a) A heat diagram showing normalized hybridization intensities along a segment of a Drosophila chromosome. Samples are arrayed from left to right, and samples from testis and males (including testis) are indicated; genes that are adjacent on chromosome 3R are listed from top to bottom. Four contiguous genes, including the don juan gene (dj) that encodes a sperm tail protein, show testis-biased and male-biased expression. Figure generated using data from [16]. (b) A micrograph of liver cells, showing the positions that two chromosomes preferentially occupy within the nucleus. Chromosome 12 (green) is frequently found towards the periphery, whereas chromosome 15 (red) tends to localize towards the center of the nucleus. Blue indicates total DNA staining.

The fact that genes can be moved to new locations in the genome and often behave more or less as expected in the new location suggests that the effects of neighborhoods on gene expression are subtle. But subtle is not synonymous with unimportant. Reverse-genetic studies have shown that many genes have little overt phenotype when mutated or deleted - usually because of redundancy with other genes - but even such apparently redundant duplicated genes cannot last through evolutionary timescales without evolving functions that differentiate them from other genes [22]. Our inability to assay small differences in fitness limits the identification of subtle effects. Conservation of neighborhood structure over a long evolutionary history, as seen for the Hox genes [4] and in prokaryotic genomes [23], is a very powerful indicator that neighborhoods are functional. Gene expression within the blocks of the Drosophila melanogaster genome that are conserved with Drosophila pseudoobscura is highly correlated, suggesting that neighborhood structure is generally conserved [15].

Non-random positioning of chromosomes in the nucleus

In the interphase nucleus of virtually all eukaryotes, the genetic material of each chromosome occupies a spatially limited, roughly spherical volume, with a diameter about a tenth of that of the nucleus, referred to as a chromosome territory [24, 25] (Figure 1b). Despite the homogeneous dense appearance of these territories when visualized by in situ hybridization methods, their interiors are accessible to regulatory factors, and they are open enough to allow transport of mRNA and proteins through the nucleus. Chromosome territories are arranged non-randomly within the volume of the nucleus. In plants, flies and yeast, chromosomes are often arranged with their telomeres clustered at one end of the nucleus and their centromeres associated with the other [2629]. An extreme example is the polarized nuclei of Drosophila embryos, where chromosomes are aligned in apical-basal orientation with each gene localized according to its chromosomal position [30]. In mammalian cells, chromosomes are not aligned in this way, but they do occupy non-random positions. Analysis of human lymphocytes and fibroblasts suggests preferential localization of chromosomes relative to the center of the nucleus [31, 32]: in this radial arrangement, gene-dense chromosomes tend to localize towards the center of the nucleus, whereas chromosomes with low gene density tend to associate with the nuclear periphery [32]. Other studies provide evidence for a correlation between chromosome size and radial position, with small chromosomes clustering towards the center of the nucleus and larger chromosomes towards the periphery [33]. We are largely ignorant about the rules that determine this organization.

Although the molecular mechanisms determining chromosome position are unknown, it seems unlikely that the positioning of entire chromosomes is controlled by a precise positioning mechanism involving dedicated machinery because chromosome position differs between cell types and even varies widely within a cell population [31, 34]. It seems more probable that preferential radial chromosome positions are a reflection of the global physical properties of a chromosome, such as size, the amount of chromatin condensation, and levels of gene expression. For example, the correlation between radial position and gene density agrees with findings that gene-poor chromosome regions are generally more condensed than gene-rich regions. This is consistent with the idea that the physical nature of a chromosome contributes to its position [35]. A report that highly transcribed genes are in neighborhoods on human chromosomes [11] suggests that the transcription of these genes might drive the positioning of host chromosomes within the nucleus, or certain positions might enable higher expression.

Chromosomes are also non-randomly positioned with respect to each other within the nuclear space [36]. The classic example is the clustering of chromosomes bearing the genes encoding rRNAs. In mammalian cells, these chromosomes congregate to form a nucleolus where ribosomal RNA is transcribed from the tandemly repeated rRNA genes [37]. Although it can be argued that this is a special case, other non-rDNA-bearing chromosomes associate near the nucleolus. For example, in mouse cells, the rDNA-bearing chromosomes 12 and 15 form a triplet cluster with the non-rDNA-bearing chromosome 14 at high frequency [25]. This kind of higher-order arrangement could link gene-expression neighborhoods that are distant in linear terms along the genome.

Is the chromosome a unit of expression, such that all genes on a chromosome share some aspect of their regulation? While this might seem unlikely, the sex chromosomes of many organisms show an unusual expression pattern. The mammalian Y chromosome is highly heterochromatic and has gene-expression neighborhoods that are required for testis function [38]. In mammalian females, X chromosomes undergo inactivation [39], and the inactive X is characteristically positioned at the nuclear periphery. In Drosophila males, the X chromosome associates with chromosome-specific chromatin-remodeling machines that upregulate expression [40]. More recently, it has been shown that the X chromosome has fewer genes with testis-biased expression than other chromosomes [4143]. Although it seems likely that sex chromosomes are the exceptions in this respect, the small chromosome 4 in Drosophila is decorated with a specific chromatin-associated protein of unknown function [44], which might regulate expression and/or positioning.

Non-random positioning of genes

In contrast to entire chromosomes, a gene's position relative to various nuclear landmarks is emerging as an important contributor to its function [45]. Association of genes with the nuclear periphery is a hallmark of silencing. Transcriptionally silent heterochromatin is enriched at the edges of the nucleus in many organisms; for example, silenced Saccharomyces cerevisiae telomeres are always at the nuclear periphery [46]. But the story is more complicated. It is now clear that silencing of telomere regions does not require association with the periphery but occurs throughout the nucleus [47]. In addition, a genome-wide survey shows that a large number of S. cerevisiae genes appear to translocate towards the nuclear periphery upon activation and associate with a nuclear pore complex in their active state, suggesting that the nuclear periphery is not a silencing compartment per se but rather a general gene-regulatory environment [45, 48].

It is not clear how closely these observations in yeast can be applied to mammalian cells, considering that a typical mammalian nucleus is 50-100 times larger than a yeast nucleus. Given that it is known that a gene locus in both yeast and mammals has a similar random motion, exploring a sphere about 1 μm in diameter, the probability that a locus will encounter the periphery is significantly lower in mammalian cells than in yeast cells. Perhaps association with the periphery has different functional meaning in yeast from that in higher eukaryotes [49, 50]. There is evidence, however, that radial position is a regulatory mechanism in mammals [51]. In cell types where the locus encoding the cystic fibrosis transmembrane regulator (CFTR) is silent, it is generally closely associated with peripheral heterochromatin, but in cell types where CFTR is expressed it dissociates from the periphery. Importantly, this behavior appears to be a property of the locus itself, as gene neighbors within 50 kilobases show a different association behavior, correlating with their own transcriptional activity [51]. Additional evidence for a role of peripheral localization in gene function comes from the correlation between gene silencing and preferential peripheralization of several marker genes for B-cell and T-cell differentiation. Several differentiation-specific genes have been found nearer the periphery in their inactive state [5255].

A clear correlation between gene activity and positioning has been established for the association of loci with heterochromatin domains [56]. Inactive genes are frequently found associated with centromeric heterochromatin regions and upon activation dissociate from them. Well characterized examples of such positioning effects are several genes specific to certain stages of differentiation in B-cell and T-cell development [57, 58]. In Drosophila the insertion of a heterochromatin block near the brown locus leads to the association of this normally euchromatic region with heterochromatin and its consequent silencing (an example of the position effect); this makes it clear that heterochromatin regions can silence loci in trans [59]. It is not known whether the dissociation of genes from heterochromatin regions upon their reactivation occurs prior to reactivation or is a consequence of new transcriptional activity [45].

Similarly to the situation for chromosomes, it can be asked whether the arrangement of gene loci with respect to each other relates to their function. The clearest example for such functional spatial grouping is the previously mentioned organization of rDNA at nucleoli. Similar spatial association has also been observed for tRNA genes in S. cerevisiae, where the loci congregate near the nucleolus [60]. Such neighborhood structure presumably arises because the concentration of loci with similar requirements for transcriptional regulators facilitates their coordinated and efficient expression. This model is attractive, but there is limited experimental evidence for spatial positioning of genes transcribed by RNA polymerase II. The β-globin-like gene Hbb-b1 and the gene encoding the α-hemoglobin-stabilizing protein Eraf are examples. These genes are separated by more than 20 megabases on the same chromosome, but they converge onto a shared transcription site upon their activation in erythroid progenitor cells [61]. How generally applicable this finding is, and whether it also applies to genes located on distinct chromosomes, remain to be seen.

New methods for getting the complete picture

It is increasingly clear that genes in neighborhoods are co-regulated. The broad correlation between gene activity and spatial positioning suggests that the spatial position of a gene in the nucleus is important for its function and regulation. To move towards a better understanding of how gene neighborhoods are regulated, we will need to map chromatin status and nuclear structure onto the genome, in addition to expression data [62]. Scaffold-attachment sites, origins of replication, and RNA polymerase will need to be mapped in addition to histone codes and transcription factors. These efforts are underway.

In order to codify the rules that link positioning with genome function, systematic analysis of whole genomes must be extended to three dimensions. As a first step, the positions of all chromosomes must be analyzed simultaneously in cells whose expression profiles have been determined and for which the chromatin status of the whole genome has been carefully mapped. Systematic positional analysis of gene-expression neighborhoods and individual genes will be required, especially under various physiological conditions, such as differentiation, development and disease progression. Such visualization of the whole genome using multicolor microscopy methods has been recently accomplished and will provide an invaluable tool to comparatively determine the precise higher-order arrangement of genomes.

Unfortunately, state-of-the-art spatial mapping of a single locus is highly labor-intensive and involves the acquisition and analysis of imaging data from several hundred cells. Clearly, spatial mapping of neighborhoods and genes will require the development of automated microscopy systems and image-analysis methods - a revolution in scale analogous to the development of the microarray. We have become accustomed to image-analysis packages that can find spots on a microarray, but these pattern-recognition methodologies are primitive compared with what will be required for the three-dimensional analysis of the genome. At present only simple spatial relationships such as pairing, clustering or association of a chromosome or a gene with a cellular structure can be visualized, and the more complex patterns involving multiple genes, each present in two alleles that are by definition indistinguishable, are currently not amenable to analysis.

Reciprocal mapping of expression, structure and position onto the genome sequence and the interphase nucleus will undoubtedly be complicated by biological realities. Genome expression and position will be cell-type specific, so the work performed by multiple research groups will need to be coordinated. Indeed, this is an important aspect of new efforts to map all kinds of DNA elements onto the genome [63]. More importantly, none of the positioning patterns is absolute; they are probabilistic, most likely reflecting the stochastic nature of genome expression programs. Gene expression may be probabilistic as well. The probabilistic nature of such events highlights the increasing need for statistical analysis. But none of the limitations is insurmountable. The advent of full genome sequences and the capacity to probe expression of whole genomes using microarray analysis, together with the ongoing development of fully automated imaging systems, has laid the foundation to map the genome in space and time. Although this is a colossal challenge, the promise of understanding how genomes are organized and function in their natural environment, the cell nucleus, is worth the persistent pursuit. It will be a fun ride.