Skip to main content
Log in

Reconstructing pedigrees: some identifiability questions for a recombination-mutation model

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

Pedigrees are directed acyclic graphs that represent ancestral relationships between individuals in a population. Based on a schematic recombination process, we describe two simple Markov models for sequences evolving on pedigrees—Model R (recombinations without mutations) and Model RM (recombinations with mutations). For these models, we ask an identifiability question: is it possible to construct a pedigree from the joint probability distribution of extant sequences? We present partial identifiability results for general pedigrees: we show that when the crossover probabilities are sufficiently small, certain spanning subgraph sequences can be counted from the joint distribution of extant sequences. We demonstrate how pedigrees that earlier seemed difficult to distinguish are distinguished by counting their spanning subgraph sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

XL : P, RM(p, μ)):

The space of alignments on the extant vertices of P as a probability space under model RM(pμ); other probability spaces are denoted analogously

[a, b], [a, b), . . .:

Intervals in integers and reals

[m]:

The set {1, 2, . . . , m}

≅:

Isomorphism between pedigrees, X-forests, graphs, etc.

\({{||\mathcal{T}_P||, ||\mathcal{U}_P||}}\) :

The sets of isomorphism classes of directed and undirected X-forests (or distinct directed or undirected X-forests) in a pedigree P, respectively

||S||:

Isomorphism class of objects, then it is classes of objects in the class the set of isomorphism

||T||:

The isomorphism class of an X-forest T

f(A) := (f 1, f 2, . . . , f N ):

Vector of fractional site pattern frequencies in an alignment A

G := (G 1, G 2, . . . , G m ):

A spanning forest sequence of length m

p(Tμ) := (p 1, p 2, . . . , p N ):

Defined as \({p_i := \texttt{Pr}\{C_i | T,M(\mu) \}}\)

T := (T 1, T 2, . . . , T m ):

An X-forest sequence of length m

\({\mathcal{A} := \mathcal{A}_{1}:\mathcal{A}_{2}:\ldots:\mathcal{A}_m}\) :

A set of alignments obtained by concatenating alignments from sets \({\mathcal{A}_{i}, i = 1, 2, \ldots, m }\)

\({\mathcal{A}(T_i, r_0, L)}\) :

The set of alignments of length L whose fractional site pattern frequencies are within a radius r 0 from p(T i )

\({\mathcal{A}, \mathcal{A}_{i}}\) :

Subsets of an alignment space such as ΣXL

\({\mathcal{G}_P}\) :

The set of spanning forests in a pedigree P

\({\mathcal{T}_P, \mathcal{U}_P}\) :

The sets of directed and undirected X-forests in a pedigree P, respectively

μ :

The substitution probability in models RM(pμ) and M(μ)

\({\mathbb{N}}\) :

The set of natural numbers

\({\texttt{Pr}\{.\}}\) :

The probability of an event

ρ(s, r):

A ball of radius r centred at s

Σ:

Finite alphabet

ΣX :

The set of site patterns on X

ΣXL :

The set of alignments of length L on X

\({\mathbb{Z}}\) :

The set of integers

\({\mathbb{Z}_+}\) :

The set of positive integers

A, A i , . . . , :

Alignments on X, i.e., maps from X ΣL or elements of ΣXL

A := A 1 : A 2, . . . , A m :

An alignment obtained by concatenating alignments A i , i = 1, 2, . . . , m

C, C i , . . .:

Characters on X, i.e., maps C : X → Σ

d(x, y):

1-Norm distance between x and y

d (u), d +(u), d(u):

In-degree, out-degree, degree (or total degree) of a vertex u

GH :

G and H are isomorphic

G ≤ H or HG :

When used for graphs (or isomorphism classes) of graphsG and H, it means G is isomorphic to a subgraph of H

\({G \subseteq H}\) or \({H \supseteq G}\) :

When used for labelled graphs G and H, it means G is a subgraph of H

G, G i , . . .:

Spanning forests in a pedigree

n(G > T : P):

Number of sequences G of spanning forests in P for which T u (G i ) ≅ T i for all G i in G and consecutive G i are separated by exactly 1 recombination

p :

The crossover probability in models R(p) and RM(pμ)

P, Q, P(X, Y, U, E), . . .:

Pedigrees

r(G):

The number of recombinations in G, See Definition 7

s(G):

The number of points of no recombination in G, see Definition 7

S k :

The set of k-tuples of elements of a set S

S X :

The set of all functions from X to S

T, T i , . . .:

X-forests in a pedigree or X-forests

T d (G):

The unique directed X-forest in a spanning forest G in a pedigree

T u (G):

The unique undirected X-forest in a spanning forest G in a pedigree

u ≤ v :

(for vertices u and v in a pedigree) There is a directed path from v to u

V(G), E(G):

Vertex and edge sets of a graph, respectively

v(G), e(G):

Cardinalities of vertex and edge sets of a graph, respectively

X :

The set of extant vertices of a pedigree

References

  • Bininda-Emonds O (2004) Phylogenetic supertrees: combining information to reveal the tree of life. Kluwer, Dordrecht

    MATH  Google Scholar 

  • Bresler G, Mossel E, Sly A (2008) Reconstruction of markov random fields from samples: some observations and algorithms. In: Goel A et al (eds) 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008, LNCS, vol 5171. Springer, Berlin, pp 343–356. http://front.math.ucdavis.edu/0712.1402

  • Buneman P (1971) The recovery of trees from measures of dissimilarity. In: Mathematics in the archaeological and historical sciences. Edinburgh University Press, Edinburgh, pp 387–395

  • Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137(1): 51–73

    Article  MathSciNet  MATH  Google Scholar 

  • Gascuel O, Steel M (eds) (2007) Reconstructing evolution. New mathematical and computational advances. Oxford University Press, Oxford

  • Haldane JBS (1919) The combination of linkage values, and the calculation of distance between the loci of linked factors. J Genet 8: 299–309

    Article  Google Scholar 

  • Lange K (2002) Mathematical and statistical methods for genetic analysis, 2nd edn. Statistics for biology and health. Springer, New York

    Book  Google Scholar 

  • Lovász L (1993) Combinatorial problems and exercises, 2nd edn. North-Holland, Amsterdam

    MATH  Google Scholar 

  • McKay BD (1997) Small graphs are reconstructible. Australas J Combin 15: 123–126

    MathSciNet  MATH  Google Scholar 

  • Pearl J, Tarsi M (1986) Structuring causal trees. J Complex 2: 60–77

    Article  MathSciNet  MATH  Google Scholar 

  • Petrie T (1969) Probabilistic functions of finite state markov chains. Ann Math Stat 40: 97–115

    Article  MathSciNet  MATH  Google Scholar 

  • Semple C, Steel M (2003) Phylogenetics, oxford lecture series in mathematics and its applications, vol 24. Oxford University Press, Oxford

    Google Scholar 

  • Steel M (1994) Recovering a tree from the leaf colourations it generates under a Markov model. Appl Math Lett 7(2): 19–23

    Article  MathSciNet  MATH  Google Scholar 

  • Steel M, Hein J (2006) Reconstructing pedigrees: a combinatorial perspective. J Theor Biol 240(3): 360–367

    Article  MathSciNet  Google Scholar 

  • Thatte BD (2008) Combinatorics of pedigrees-I: counterexamples to a reconstruction question. SIAM J Discrete Math 22(3):961–970. doi:10.1137/060675964, http://link.aip.org/link/?SJD/22/961/1

    Google Scholar 

  • Thatte BD, Steel M (2008) Reconstructing pedigrees: a stochastic perspective. J Theor Biol 251(3): 440–449

    Article  MathSciNet  Google Scholar 

  • Torfason EF, Sveinbjörnsson JI (2008) Combinatorial pedigree inference from genomic data. Technical report, School of Computer Science, Reykjavík University

  • Ulam SM (1960) A collection of mathematical problems. Interscience Tracts in pure and applied mathematics, vol 8. Interscience Publishers, New York

    Google Scholar 

  • Whitney H (1932) Congruent graphs and the connectivity of graphs. Am J Math 54: 160–168

    Article  MathSciNet  Google Scholar 

  • Zareckiĭ KA (1965) Constructing a tree on the basis of a set of distances between the hanging vertices. Uspehi Mat Nauk 20(6): 90–92

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhalchandra D. Thatte.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thatte, B.D. Reconstructing pedigrees: some identifiability questions for a recombination-mutation model. J. Math. Biol. 66, 37–74 (2013). https://doi.org/10.1007/s00285-011-0503-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-011-0503-8

Keywords

Mathematics Subject Classification (2000)

Navigation