Research in Computational Molecular Biology

Volume 3500 of the series Lecture Notes in Computer Science pp 569-584

Improved Recombination Lower Bounds for Haplotype Data

  • Vineet BafnaAffiliated withDepartment of Computer Science and Engineering, University of California at San Diego
  • , Vikas BansalAffiliated withDepartment of Computer Science and Engineering, University of California at San Diego

* Final gross prices may vary according to local VAT.

Get Access


Recombination is an important evolutionary mechanism responsible for the genetic diversity in humans and other organisms. Recently, there has been extensive research on understanding the fine scale variation in recombination rates across the human genome using DNA polymorphism data. A combinatorial approach toward this is to estimate the minimum number of recombination events in any history of the sample. Recently, Myers and Griffiths [1] proposed two measures, R h and R s , that give lower bounds on the minimum number of recombination events. In this paper, we provide new and improved methods (both in terms of running time and ability to detect past recombination events) for computing recombination lower bounds. Our principal results include:
  • We show that computing the lower bound R h is NP-hard and adapt the greedy algorithm for the set cover problem [2] to obtain a polynomial time algorithm for computing a diversity based bound R g . This algorithm is several orders of magnitude faster than the Recmin program [1] and the bound R g matches the bound R h almost always.

  • We also show that computing the lower bound is also NP-hard using a reduction from MAX-2SAT. We give a O(m 2 n ) time algorithm for computing R s for a dataset with n haplotypes and m SNP’s. We propose a new bound R I which extends the history based bound R s using the notion of intermediate haplotypes. This bound detects more recombination events than both R h and R s bounds on many real datasets.

  • We extend our algorithms for computing R g and R s to obtain lower bounds for haplotypes with missing data. These methods can detect more recombination events for the LPL dataset [3] than previous bounds and provide stronger evidence for the presence of a recombination hotspot.

  • We apply our lower bounds to a real dataset [4] and demonstrate that these can provide a good indication for the presence and the location of recombination hotspots.