Abstract
Several applications necessitate an unbiased determination of relatedness, be it in linkage or association studies or in a forensic setting. An appropriate model to compute the joint probability of some genetic data for a set of persons given some hypothesis about the pedigree structure is then required. The increasing number of markers available through high-density SNP microarray typing and NGS technologies intensifies the demand, where using a large number of markers may lead to biased results due to strong dependencies between closely located loci, both within pedigrees (linkage) and in the population (allelic association or linkage disequilibrium (LD)). We present a new general model, based on a Markov chain for inheritance patterns and another Markov chain for founder allele patterns, the latter allowing us to account for LD. We also demonstrate a specific implementation for X chromosomal markers that allows for computation of likelihoods based on hypotheses of alleged relationships and genetic marker data. The algorithm can simultaneously account for linkage, LD, and mutations. We demonstrate its feasibility using simulated examples. The algorithm is implemented in the software FamLinkX, providing a user-friendly GUI for Windows systems (FamLinkX, as well as further usage instructions, is freely available at www.famlink.se). Our software provides the necessary means to solve cases where no previous implementation exists. In addition, the software has the possibility to perform simulations in order to further study the impact of linkage and LD on computed likelihoods for an arbitrary set of markers.
Similar content being viewed by others
References
Abecasis GR, Wigginton JE (2005) Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. Am J Hum Genet 77(5):754–67
Abecasis GR, Cherny SS, Cookson WO, Cardon LR (2002) Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30(1):97–101
Boyles AL, Scott WK, Martin ER, Schmidt S, Li YJ, Ashley-Koch A, Bass MP, Schmidt M, Pericak-Vance MA, Speer MC, Hauser ER (2005) Linkage disequilibrium inflates type I error rates in multipoint linkage analysis when parental genotypes are missing. Hum Hered 59(4):220–227
Brinkmann B, Klintschar M, Neuhuber F, Huhne J, Rolf B (1998) Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am J Hum Genet 62(6):1408–1415
Chakraborty R, Stivers DN, Zhong Y (1996) Estimation of mutation rates from parentage exclusion data: applications to STR and VNTR loci. Mutat Res 354(1):41–48
Dawid AP, Mortera J, Pascali VL (2001) Non-fatherhood or mutation? A probabilistic approach to parental exclusion in paternity testing. Forensic Sci Int 124(1):55–61
Egeland T, Sheehan N (2008) On identification problems requiring linked autosomal markers. Forensic Sci Int Genet 2(3):219–25
Elston RC, Stewart J (1971) A general model for the genetic analysis of pedigree data. Hum Hered 21(6):523–42
Gudbjartsson DF, Jonasson K, Frigge ML, Kong A (2000) Allegro, a new computer program for multipoint linkage analysis. Nat Genet 25(1):12–13
Huang Q, Shete S, Amos CI (2004) Ignoring linkage disequilibrium among tightly linked markers induces false-positive evidence of linkage for affected sib pair analysis. Am J Hum Genet 75(6):1106–1112
Idury R, Elston R (1997) A faster and more general hidden markov model algorithm for multipoint likelihood calculations. Hum Hered 47(4):197–202
Kling D, Egeland T, Tillmar AO (2012a) Famlink-a user friendly software for linkage calculations in family genetics. Forensic Sci Int: Genet 6(5):616–620
Kling D, Welander J, Tillmar A, Skare Ø Egeland T, Holmlund G (2012b) DNA microarray as a tool in establishing genetic relatedness—current status and future prospects. Forensic Sci Int: Genet 6(3):322–329
Krawczak M (2007) Kinship testing with X-chromosomal markers: mathematical and statistical issues. Forensic Sci Int: Genet 1(2):111–114
Kruglyak L, Lander ES (1998) Faster multipoint linkage analysis using fourier transforms. J Comput Biol 5(1):1–7
Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58(6):1347
Kurbasic A, Hossjer O (2008) A general method for linkage disequilibrium correction for multipoint linkage and association. Genet Epidemiol 32(7):647–57
Lander ES, Green P (1987) Construction of multilocus genetic linkage maps in humans. Proc Natl Acad Sci U S A 84(8):2363–7
Nothnagel M, Szibor R, Vollrath O, Augustin C, Edelmann J, Geppert M, Alves C, Gusmao L, Vennemann M, Hou Y, Immel UD, Inturri S, Luo H, Lutz-Bonengel S, Robino C, Roewer L, Rolf B, Sanft J, Shin KJ, Sim JE, Wiegand P, Winkler C, Krawczak M, Hering S (2012) Collaborative genetic mapping of 12 forensic short tandem repeat (str) loci on the human x chromosome. Forensic Sci Int Genet 6(6):778–84
Pinto N, Gusmao L, Amorim A (2011) X-chromosome markers in kinship testing: a generalisation of the IBD approach identifying situations where their contribution is crucial. Forensic Sci Int Genet 5(1):27–32
Pinto N, Silva PV, Amorim A (2012) A general method to assess the utility of the x-chromosomal markers in kinship testing. Forensic Sci Int Genet 6(2):198–207
Skare O, Sheehan N, Egeland T (2009) Identification of distant family relationships. Bioinformatics 25(18):2376–82
Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68(4):978–89
Szibor R (2007) X-chromosomal markers: past, present and future. Forensic Sci Int Genet 1(2):93–9
Szibor R, Krawczak M, Hering S, Edelmann J, Kuhlisch E, Krause D (2003) Use of X-linked markers for forensic purposes. Int J Legal Med 117(2):67–74
Tillmar AO (2012) Population genetic analysis of 12 X-STRs in Swedish population. Forensic Sci Int Genet 6(2):e80–81
Tillmar AO, Egeland T, Lindblom B, Holmlund G, Mostad P (2011) Using X-chromosomal markers in relationship testing: calculation of likelihood ratios taking both linkage and linkage disequilibrium into account. Forensic Sci Int Genet 5(5):506–511
Weir BS, Anderson AD, Hepler AB (2006) Genetic relatedness analysis: modern data and new challenges. Nat Rev Genet 7(10):771–780
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Appendix
Appendix
The following section includes a more detailed description of the notation used in the paper. First, we assume locus i, (i = 1, … , I) has A i possible alleles, and let p i be a vector specifying the probabilities of a haplotype’s alleles at locus i given the haplotype’s alleles at lower indexes. We let r 2, … , r I denote the recombination rates between the loci, which are assumed known. For a locus i, let t be a transmission, specifying a start allele in the parent, a resulting allele in the child, and whether the parent is a mother or a father. We then denote with m i (t) the probability that the child obtains the resulting allele, given that the parent has the start allele. This function specifies the mutation model at locus i. The parameters of our model are p = (p 1, … , p I ), r = (r 2, … , r I ), and m = (m 1, … , m I ).
If parents’ alleles follow the population frequencies, the probabilities for a child to have various alleles are not given by the population frequencies, unless the process represented by the mutation model happens to have the population frequencies as stationary distribution. This means that adding the untyped father or mother to a person in the pedigree may change the probability results we are computing. To avoid this nuisance, we recommend that all untyped founders with only one child in the pedigree are (recursively) removed prior to computations. In our pedigree, a person may have specified no parents, only a mother, only a father, or both parents. Founders are those who have no parents in the pedigree. We also assume the pedigree does not contain untyped children with no descendants as such children cannot affect the result.
Our observed data is divided into data s for S typed founders and data d for M typed non-founders: Let s i j for i = 1, … , I, j = 1, … , S denote the observed allele or alleles of typed founder j at locus i. For males and X- chromosomal data, s i j specifies only one allele, otherwise s i j specifies the two observed alleles in no particular order. For the typed non-founders, let d i j specify the similar data. We write s i = (s i1, … , s i S ), s = (s 1, … , s I ), d i = (d i1, … , d i M ), and d = (d 1, … , d I ).
We also need a number of ancillary variables: The inheritance pattern at locus i can be described as a vector v i of length N, with one component for each parent-child relationship in the pedigree when the locus is autosomal, and one for each mother-child relationship for X- chromosomal loci. Each component is 0 or 1 depending on whether the paternal or maternal allele is inherited, we write v = (v 1, … , v I ). We also need to describe the founder alleles of the pedigree: These are maternal or paternal alleles whose relevant parent is not in the pedigree. First, there are founder alleles belonging to typed founders: Let g i j be the allele or alleles of typed founder j at locus i listed with the paternal allele first. Write g i = (g i1, … , g i S ) and g = (g 1, … , g I ). For the remaining F founder alleles, let f i j denote the j ′ t h founder allele at locus i. Finally, we write f i = (f i1, … , f i F ) and f = (f 1, … , f I ).
Rights and permissions
About this article
Cite this article
Kling, D., Tillmar, A., Egeland, T. et al. A general model for likelihood computations of genetic marker data accounting for linkage, linkage disequilibrium, and mutations. Int J Legal Med 129, 943–954 (2015). https://doi.org/10.1007/s00414-014-1117-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00414-014-1117-7