Homologous recombination is the process whereby two DNA sequence substrates that share a significant stretch of identity are brought together, in an enzyme-catalyzed reaction, and undergo strand exchange to give a product that is a novel amalgamation of the two substrates. It occurs during meiosis, leading to crossovers between alleles (allelic homologous recombination, AHR), and during repair of double-strand breaks in DNA and other processes, leading to recombination between paralogous sequences (non-allelic homologous recombination, NAHR, also known as ectopic recombination). The intermediates of NAHR can be resolved to give several products, including deletions, duplications, and inversion rearrangements or, as in the case of AHR, the replacement of one sequence by a homologous one (gene conversion). When NAHR results in a duplication in one product it is usually accompanied by a reciprocal deletion in the other. Low-copy repeats that can induce NAHR account for 5-10% of the human genome [1], and rearrangements between them can result in a class of diseases known as genomic disorders [2, 3].

Finding hotspots

It might be thought that homologous recombination is driven only by shared sequence identity among substrates. If this was the case, strand exchange would be expected to occur with equal frequency all the way along a segment of homology. Experimental observations suggest, however, that this is not the case and have provided evidence for local 'hotspots' - short regions of the genome where strand exchanges are more common than elsewhere. These observations come from pedigree studies that examined the parent-to-offspring transmission of alleles, linkage disequilibrium (LD) studies and, more recently, direct DNA sequencing of the products of recombination using either sperm (which represent a large number of recombination products from a single meiosis) or junction fragments from ectopic recombination (NAHR) [4, 5]. These recombination hotspots are a common feature of both AHR and NAHR. Such hotspots have important implications for how linked genes and other markers are inherited in haplotypes (their amount of LD [4, 68]) and for studies of LD and haplotypes including the International HapMap project [9], as well as potentially for disease-association studies and susceptibility to rearrangements causing genomic disorders in different world populations.

The distribution of meiotic recombination events along chromosomes has been examined at several levels of resolution, from the megabase (Mb) scales of genetic mapping (1 Mb is approximately equal to 1 centiMorgan (cM) for average recombination rates) to the nucleotide levels of resolution afforded by sequencing of strand-exchange products. High-resolution examination, at the nucleotide sequence level, defines hotspots as localized sites of recombination and enables recombination hotspots to be examined for common features. The mechanism underlying the formation of recombination hotspots remains obscure, but recent studies suggest that a 'punctate' distribution of recombination events (in other words, a hotspot-like pattern of recombination) occurs throughout the human genome [6, 10]. Furthermore, the local positions of recombination hotspots may not be conserved among closely related primate species [11], and in some cases hotspots are characterized by signatures of concerted evolution [7], whereby duplicated sequences are more similar to one another than to their orthologs in a closely related species.

The distribution of AHR across the genome has been reviewed recently [4, 8]. Initial high-resolution analysis of human crossover hotspots characterized using sperm DNA studies identified a 1.5 kb region adjacent to the MS32 minisatellite [12] and several 1-2 kb intervals containing hotspots across the 210 kb class II region of the major histocompatibility complex [13, 14]. Sperm analysis also identified a hotspot initially inferred from the observed nonuniform distribution of recombination within the human β-globin gene cluster [15, 16]. These and other AHR hotspots cluster within small regions (1-2 kb), with crossover breakpoints spread in a normal distribution within the narrow hotspot; they have no obvious sequence similarities with one another, and coincide with gene-conversion hotspots [4]. The location of AHR hotspots is not conserved across distantly related mammalian species (human and mouse) [4], consistent with the fact that hotspots do not reflect conserved primary sequence motifs.

Jeffreys and colleagues [4] have pointed out that the punctate distribution of human recombination hotspots is very similar to that of meiotic double-strand breaks in budding yeast [17]; the latter are sequence-nonspecific and occur at yeast recombination hotspots [18, 19], suggesting that hotspots could reflect where recombination is initiated by double-strand breaks. Furthermore, the observation that a recombination reporter placed in different positions in the yeast genome acquires properties of its location is argued [4] to support a model in which higher-order chromatin structures and/or chromosome dynamics contribute to the control of the local frequency of recombination-initiation events.

Hotspots have also been observed in association with NAHR (reviewed in [5]). The recombination event can be readily ascertained because the rearrangement (deletion or duplication) conveys a phenotype or produces a genomic disorder. Also, as paralogous sequences are used in NAHR, rather than allelic homologous sequences as in AHR, paralogous sequence variations (also known as cis-morphisms [3]) can be used to map crossover sites precisely. NAHR hotspots were initially observed in diverse populations as the recombinations associated with duplication and deletion rearrangements responsible for two common dominant peripheral neuropathies [5, 2023]. DNA structures that have been shown to induce double-strand breaks (such as palindromes, minisatellites and DNA transposons) have often been reported near NAHR hotspots (reviewed in [5, 23]). Sequence analyses of the NAHR hotspots [21, 22] revealed proximity to some of these structures, suggesting a link between double-strand breaks and NAHR hotspots [24]. Hotspots were observed subsequently in all NAHR crossovers examined at the nucleotide sequence level [2528]. Like AHR hotspots, common features shared among NAHR hotspots include clustering within small regions (under 1 kb), no obvious sequence similarities with one another, and coincidence with apparent gene conversion events. Interestingly, recombination hotspots associated with reciprocal deletion and duplication events coincide; those associated with either the deletion or duplication could be used to predict the position of the hotspot associated with the reciprocal event [20, 26].

Studying hotspot distribution systematically

The fine-scale structure of recombination-rate variation throughout the human genome was reported recently [6, 10]. Both studies used surveys of single-nucleotide polymorphisms (SNPs) in different populations, and both developed novel statistical methods to infer patterns of fine-scale variation in the recombination rate along the genome. One study [10] focused on a 10 Mb region of chromosome 20 in European (Caucasian) and African-American populations, whereas the other [6] examined 74 candidate genes to search for hotspots by resequencing DNA from 23 European-Americans and 24 African-Americans. Both studies [6, 10] found evidence for recombination-rate variation, with hotspots occurring at least every 200 kb and potentially as frequently as every 50 kb, the latter value being the same as has been observed in yeast [29]. No single factor was consistently associated with the presence of hotspots - neither GC content, the frequency of CpG dinucleotides, the presence of (AC)n repeats, nor any primary DNA sequence motif that had previously been hypothesized to influence the existence of hotspots. Whereas one fine-scale study [6] found extensive recombination-rate variations both within and between genes, the other [10] suggested that recombination occurs preferentially outside genes. The degree to which SNPs residing within segmental duplications (paralogous sequence variations or cis-morphisms [3, 3032]) influence the interpretation of these analyses remains to be determined.

Both studies [6, 10] provided some evidence for differences in recombination-rate variation among different populations, but to what extent this reflects differences in the genetic background of the populations is not clear. The absence in the chimpanzee of a hotspot in the region homologous to the human recombination hotspot in the major histocompatibility complex TAP2 gene suggests that recombination rates can change between very closely related species and raises the possibility that recombination rates may differ among human populations [11].

What is the origin of recombination hotspots in the human genome? One recent study [7] of NAHR between two paralogous sequences that mediate deletions causing male infertility - human endogenous retrovirus (HERV) proviral sequences flanking the Y-chromosome locus Azoospermia factor a (AZFa) - provided evidence that several hominid-specific gene-conversion events have rendered the associated hotspots better substrates for chromosomal rearrangements in humans than in chimpanzees or gorillas. But, as the authors state [7], because gene conversion and chromosomal rearrangement reflect the alternative products of a common intermediate, it may be that a recombinogenic sequence motif or structure underpins the association, and increased sequence identity may play only a minor role in determining the frequency of chromosomal rearrangement. Nevertheless, the coincidence of the signatures of concerted evolution and recurrent breakpoints of chromosomal rearrangements (mapped at the DNA sequence level) may enable the identification of putative rearrangement hotspots from analysis of comparative sequences from great apes.

What causes hotspots?

What is the signal for recombination hotspots in the human genome? Does it reflect only the positional preference of double-stranded breaks by the recombination machinery? If so, is this dictated by access to the DNA because of a unique chromatin structure or is the signal contained within the DNA itself? We do know that the signal is not likely to be a cis-acting primary sequence motif similar to the chi of Escherichia coli, which stimulates recombination [33], as no such common motif has been identified in the multitude of AHR [4] and NAHR [5, 28] hotspots studied to date, and the position of hotspots does not appear to be conserved among closely related primate species (at least for the TAP2 hotspot) [11]. Such a signal could be embedded in a configuration consisting of a non-B form of DNA (such as Z DNA) [34], however, or could reflect an epigenetic mark such as methylation or the absence thereof in the hotspot region.

Recombination hotspots are being revealed as a global feature of the human genome [6, 10]. Such hotspots have implications for studies of LD [68], the International HapMap Project [9], and for disease association studies in different world populations, because meiotic recombination exerts a profound influence on genome diversity and evolution [4]. They may also potentially be responsible for susceptibility within a population for NAHR-induced rearrangements associated with genomic disorders. Thus, functional studies to delineate the precise molecular mechanisms responsible for hotspots in the human genome are essential and are likely to enable further insights into the most basic properties of homologous recombination.