Abstract
Cre-lox and other systems are used as genetic tools to control site-specific recombination (SSR) events in genomic DNA. If multiple recombination sites are organized in a compact cluster within the same genome, a series of random recombination events may generate substantial cell specific genomic diversity. This diversity is used, for example, to distinguish neurons in the brain of the same multicellular mosaic organism, within the brainbow approach to neuronal connectome. In this paper, we study an exactly solvable statistical model for SSR operating on a cluster of recombination sites. We consider two types of recombination events: inversions and excisions. Both of these events are available in the Cre-lox system. We derive three properties of the sequences generated by multiple recombination events. First, we describe the set of sequences that can in principle be generated by multiple inversions operating on the given initial sequence. We call this description the ergodicity theorem. On the basis of this description, we calculate the number of sequences that can be generated from an initial sequence. This number of sequences is experimentally testable. Second, we demonstrate that after a large number of random inversions every sequence that can be generated is generated with equal probability. Lastly, we derive the equations for the probability to find a sequence as a function of time in the limit when excisions are much less frequent than inversions, such as in shufflon sequences.
Similar content being viewed by others
References
Cao, G., Oyibo, H. H., Zhan, H., Znamenskiy, P., Koulakov, A., Enquist, L., Dubnau, J., & Zador, A. (2011). Neural connectivity as a DNA sequencing problem in vitro. In Society for neuroscience annual meeting (p. 840.11/ZZ63).
Hampel, S., et al. (2011). Drosophila brainbow: a recombinase-based fluorescence labeling technique to subdivide neural expression patterns. Nat. Methods, 8, 253.
Horn, R. A., & Johnson, C. R. (1994). Topics in matrix analysis. Cambridge: Cambridge University Press. ed. 1st pbk, (pp. viii, 607 p.).
Komano, T. (1999). Shufflons: multiple inversion systems and integrons. Annu. Rev. Genet., 33, 171.
Lichtman, J. W., Livet, J., & Sanes, J. R. (2008). A technicolour approach to the connectome. Nat. Rev. Neurosci., 9, 417.
Livet, J., et al. (2007). Transgenic strategies for combinatorial expression of fluorescent proteins in the nervous system. Nature, 450, 56.
Lu, R., Neff, N. F., Quake, S. R., & Weissman, I. L. (2011). Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nat. Biotechnol., 29, 928.
Nagy, A. (2000). Cre recombinase: the universal reagent for genome tailoring. Genesis, 26, 99.
Norris, J. R. (1997). Markov chains. Cambridge series in statistical and probabilistic mathematics (pp. xvi, 237). Cambridge: Cambridge University Press.
Oyibo, H. H., Cao, G., Zhan, H., Znamenskiy, P. C., Koulakov, A., Enquist, L., Dubnau, J., & Zador, A. (2011). Neural connectivity as a DNA sequencing problem in vivo. In Society for neuroscience annual meeting (pp. 617.25/XX57).
Sauer, B. (1987). Functional expression of the cre-lox site-specific recombination system in the yeast Saccharomyces cerevisiae. Mol. Cell. Biol., 7, 2087.
Sauer, B., & Henderson, N. (1988). Site-specific DNA recombination in mammalian cells by the cre recombinase of bacteriophage P1. Proc. Natl. Acad. Sci. USA, 85, 5166.
Van Duyne, G. D. (2001). A structural view of cre-loxp site-specific recombination. Annu. Rev. Biophys. Biomol. Struct., 30, 87.
Acknowledgement
The authors thank Tony Zador for suggesting this problem to us and multiple valuable comments. The authors thank Dawen Cai, Jeff Lichtman, and Teruya Komano for a helpful communications. This work was supported by NIH R01EY018068 and R01MH092928 and Swartz Foundation. A.K. acknowledges the hospitality of the Aspen Center for Physics, which is supported in part by NSF Grant No. PHY-1066293.
Author information
Authors and Affiliations
Corresponding author
Appendix: Constrained Inversions
Appendix: Constrained Inversions
In this section, we generalize the results obtained in Sects. 2 and 3 to systems in which inversions can happen between inverted SSR-sites (RL), but not between matching SSR-sites (LR) (Komano, personal communications). We call this type of recombination, when only one type of inversions is possible, the case of constrained inversions. As before, we assume the DNA sequence starts with an R and ends with an L SSR-site. Here, we will show that most of the results obtained in the present study can apply in the case of constrained inversions. However, some of sequences cannot be obtained due to the constraint, as detailed below.
Before we present our results, we will illustrate the effects of the constraint on a simple example (Fig. 15). This sequence has M=0 LL or RR sites, and N=3 LR or RL sites. Equation (2) yields \(Z_{M,N} = 2^{N}M! [\frac{N + 1}{2}]! [\frac{N}{2}]!d_{(M,N)} = 16\) sequences. Here, as above, [x] means the largest integer smaller than or equal to x. However, as illustrated in Fig. 15, only 8 of these sequences can be reached using the constrained inversions. We show here that this result is general, i.e., with the constrained inversion, the number of possible sequences in always equal to one-half of that for the case of all inversions possible:
Here, d (M,N) is given in Eq. (1).
Below we will sketch the proof of Eq. (15). Let us consider a DNA sequence that includes two LR units. It can be written as follows: <A<B>C<D>E<. Here, <B> and <D> are LR, while other units can contain arbitrary combinations of units as well. <B> and <D> cannot be inverted individually without the affecting rest of the sequence. It is easy to check by enumeration of all possible inversions that impossible combinations satisfy a simple constraint. Let us introduce the number of reverse-compliments amongst LR units w.r.t. the initial orientation, t. Thus, for the sequence >A<D′>C<B>E< (<D′> means reverse-complement of <D>), t=1. Let us also introduce the number s, which is the number of exchanges in the <B> and <D> pair. For the sequence >A<D′>C<B>E<, s=1, while for >A<B′>C<D′>E<, s=0. It is possible to check that only the sequences for which s+t is even can be obtained from the initial sequence >A<B>C<D>E<. The sequences for which s+t is odd are not possible through the constrained inversions. This is only true for the fixed remainder of the sequence, i.e., >A<∗>C<∗> E<. Here, ‘∗’ denotes either B or D or their reverse-complement. We therefore call the number χ=s+t the index of sequence. One can obtain >A<∗>C<∗>E< from >A<B>C<D>E< if χ (>A<e∗>C<∗>E<) is even.
Let us now consider the sequence with more than two LR elements >A<B>C<D>E<⋯>Z<. A set of sequences within LR units can be obtained by a permutation P of the original sequence. Permutations form a group of transformations called the symmetric group. Every permutation can be written as a product of several neighboring exchanges. The permutation is called even or odd if it can be written as a product of an even/odd number of neighboring exchanges. Even/odd permutations will be assigned index s equal to 0 or 1, respectively. Although there are several ways to implement P as a superposition of neighboring exchanges, they all have the same index s. The number of reverse complement LR elements can be defined as above, as well as the index χ=s+t. We showed above that a possible exchange does not change the evenness of index χ. Thus, impossible configurations are such that index χ is odd, because the original configuration has an even index. Because for unconstrained inversions both even and odd χ are possible for the same fixed residual sequence, the number of configurations is reduced by a factor of 2 in the case of constrained inversions. Therefore, Eq. (15) describes the number of configurations in the constrained case. This describes the modifications to Theorem 2.
It is possible to show that in the case of constrained inversions, Properties 1–4 and Theorem 1 still hold. In the proof of Theorem 1, the first step remains the same while in the second step, we use the operation I=(c,e)(a,c) shown in Fig. 16.
Theorem 3 holds as before, therefore, we still have equal probability to observe all possible DNA sequences included in Eq. (15).
Rights and permissions
About this article
Cite this article
Wei, Y., Koulakov, A.A. An Exactly Solvable Model of Random Site-Specific Recombinations. Bull Math Biol 74, 2897–2916 (2012). https://doi.org/10.1007/s11538-012-9788-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-012-9788-z