Abstract
Pedigrees are directed acyclic graphs that represent ancestral relationships between individuals in a population. Based on a schematic recombination process, we describe two simple Markov models for sequences evolving on pedigrees—Model R (recombinations without mutations) and Model RM (recombinations with mutations). For these models, we ask an identifiability question: is it possible to construct a pedigree from the joint probability distribution of extant sequences? We present partial identifiability results for general pedigrees: we show that when the crossover probabilities are sufficiently small, certain spanning subgraph sequences can be counted from the joint distribution of extant sequences. We demonstrate how pedigrees that earlier seemed difficult to distinguish are distinguished by counting their spanning subgraph sequences.
Similar content being viewed by others
Abbreviations
- (ΣXL : P, RM(p, μ)):
-
The space of alignments on the extant vertices of P as a probability space under model RM(p, μ); other probability spaces are denoted analogously
- [a, b], [a, b), . . .:
-
Intervals in integers and reals
- [m]:
-
The set {1, 2, . . . , m}
- ≅:
-
Isomorphism between pedigrees, X-forests, graphs, etc.
- \({{||\mathcal{T}_P||, ||\mathcal{U}_P||}}\) :
-
The sets of isomorphism classes of directed and undirected X-forests (or distinct directed or undirected X-forests) in a pedigree P, respectively
- ||S||:
-
Isomorphism class of objects, then it is classes of objects in the class the set of isomorphism
- ||T||:
-
The isomorphism class of an X-forest T
- f(A) := (f 1, f 2, . . . , f N ):
-
Vector of fractional site pattern frequencies in an alignment A
- G := (G 1, G 2, . . . , G m ):
-
A spanning forest sequence of length m
- p(T, μ) := (p 1, p 2, . . . , p N ):
-
Defined as \({p_i := \texttt{Pr}\{C_i | T,M(\mu) \}}\)
- T := (T 1, T 2, . . . , T m ):
-
An X-forest sequence of length m
- \({\mathcal{A} := \mathcal{A}_{1}:\mathcal{A}_{2}:\ldots:\mathcal{A}_m}\) :
-
A set of alignments obtained by concatenating alignments from sets \({\mathcal{A}_{i}, i = 1, 2, \ldots, m }\)
- \({\mathcal{A}(T_i, r_0, L)}\) :
-
The set of alignments of length L whose fractional site pattern frequencies are within a radius r 0 from p(T i )
- \({\mathcal{A}, \mathcal{A}_{i}}\) :
-
Subsets of an alignment space such as ΣXL
- \({\mathcal{G}_P}\) :
-
The set of spanning forests in a pedigree P
- \({\mathcal{T}_P, \mathcal{U}_P}\) :
-
The sets of directed and undirected X-forests in a pedigree P, respectively
- μ :
-
The substitution probability in models RM(p, μ) and M(μ)
- \({\mathbb{N}}\) :
-
The set of natural numbers
- \({\texttt{Pr}\{.\}}\) :
-
The probability of an event
- ρ(s, r):
-
A ball of radius r centred at s
- Σ:
-
Finite alphabet
- ΣX :
-
The set of site patterns on X
- ΣXL :
-
The set of alignments of length L on X
- \({\mathbb{Z}}\) :
-
The set of integers
- \({\mathbb{Z}_+}\) :
-
The set of positive integers
- A, A i , . . . , :
-
Alignments on X, i.e., maps from X ΣL or elements of ΣXL
- A := A 1 : A 2, . . . , A m :
-
An alignment obtained by concatenating alignments A i , i = 1, 2, . . . , m
- C, C i , . . .:
-
Characters on X, i.e., maps C : X → Σ
- d(x, y):
-
1-Norm distance between x and y
- d −(u), d +(u), d(u):
-
In-degree, out-degree, degree (or total degree) of a vertex u
- G ≅ H :
-
G and H are isomorphic
- G ≤ H or H ≥ G :
-
When used for graphs (or isomorphism classes) of graphsG and H, it means G is isomorphic to a subgraph of H
- \({G \subseteq H}\) or \({H \supseteq G}\) :
-
When used for labelled graphs G and H, it means G is a subgraph of H
- G, G i , . . .:
-
Spanning forests in a pedigree
- n(G > T : P):
-
Number of sequences G of spanning forests in P for which T u (G i ) ≅ T i for all G i in G and consecutive G i are separated by exactly 1 recombination
- p :
-
The crossover probability in models R(p) and RM(p, μ)
- P, Q, P(X, Y, U, E), . . .:
-
Pedigrees
- r(G):
-
The number of recombinations in G, See Definition 7
- s(G):
-
The number of points of no recombination in G, see Definition 7
- S k :
-
The set of k-tuples of elements of a set S
- S X :
-
The set of all functions from X to S
- T, T i , . . .:
-
X-forests in a pedigree or X-forests
- T d (G):
-
The unique directed X-forest in a spanning forest G in a pedigree
- T u (G):
-
The unique undirected X-forest in a spanning forest G in a pedigree
- u ≤ v :
-
(for vertices u and v in a pedigree) There is a directed path from v to u
- V(G), E(G):
-
Vertex and edge sets of a graph, respectively
- v(G), e(G):
-
Cardinalities of vertex and edge sets of a graph, respectively
- X :
-
The set of extant vertices of a pedigree
References
Bininda-Emonds O (2004) Phylogenetic supertrees: combining information to reveal the tree of life. Kluwer, Dordrecht
Bresler G, Mossel E, Sly A (2008) Reconstruction of markov random fields from samples: some observations and algorithms. In: Goel A et al (eds) 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008, LNCS, vol 5171. Springer, Berlin, pp 343–356. http://front.math.ucdavis.edu/0712.1402
Buneman P (1971) The recovery of trees from measures of dissimilarity. In: Mathematics in the archaeological and historical sciences. Edinburgh University Press, Edinburgh, pp 387–395
Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137(1): 51–73
Gascuel O, Steel M (eds) (2007) Reconstructing evolution. New mathematical and computational advances. Oxford University Press, Oxford
Haldane JBS (1919) The combination of linkage values, and the calculation of distance between the loci of linked factors. J Genet 8: 299–309
Lange K (2002) Mathematical and statistical methods for genetic analysis, 2nd edn. Statistics for biology and health. Springer, New York
Lovász L (1993) Combinatorial problems and exercises, 2nd edn. North-Holland, Amsterdam
McKay BD (1997) Small graphs are reconstructible. Australas J Combin 15: 123–126
Pearl J, Tarsi M (1986) Structuring causal trees. J Complex 2: 60–77
Petrie T (1969) Probabilistic functions of finite state markov chains. Ann Math Stat 40: 97–115
Semple C, Steel M (2003) Phylogenetics, oxford lecture series in mathematics and its applications, vol 24. Oxford University Press, Oxford
Steel M (1994) Recovering a tree from the leaf colourations it generates under a Markov model. Appl Math Lett 7(2): 19–23
Steel M, Hein J (2006) Reconstructing pedigrees: a combinatorial perspective. J Theor Biol 240(3): 360–367
Thatte BD (2008) Combinatorics of pedigrees-I: counterexamples to a reconstruction question. SIAM J Discrete Math 22(3):961–970. doi:10.1137/060675964, http://link.aip.org/link/?SJD/22/961/1
Thatte BD, Steel M (2008) Reconstructing pedigrees: a stochastic perspective. J Theor Biol 251(3): 440–449
Torfason EF, Sveinbjörnsson JI (2008) Combinatorial pedigree inference from genomic data. Technical report, School of Computer Science, Reykjavík University
Ulam SM (1960) A collection of mathematical problems. Interscience Tracts in pure and applied mathematics, vol 8. Interscience Publishers, New York
Whitney H (1932) Congruent graphs and the connectivity of graphs. Am J Math 54: 160–168
Zareckiĭ KA (1965) Constructing a tree on the basis of a set of distances between the hanging vertices. Uspehi Mat Nauk 20(6): 90–92
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thatte, B.D. Reconstructing pedigrees: some identifiability questions for a recombination-mutation model. J. Math. Biol. 66, 37–74 (2013). https://doi.org/10.1007/s00285-011-0503-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-011-0503-8