Improved Recombination Lower Bounds for Haplotype Data
We show that computing the lower bound Rh is NP-hard and adapt the greedy algorithm for the set cover problem  to obtain a polynomial time algorithm for computing a diversity based bound Rg. This algorithm is several orders of magnitude faster than the Recmin program  and the bound Rg matches the bound Rh almost always.
We also show that computing the lower bound is also NP-hard using a reduction from MAX-2SAT. We give a O(m 2n) time algorithm for computing Rs for a dataset with n haplotypes and m SNP’s. We propose a new bound RI which extends the history based bound Rs using the notion of intermediate haplotypes. This bound detects more recombination events than both Rh and Rs bounds on many real datasets.
We extend our algorithms for computing Rg and Rs to obtain lower bounds for haplotypes with missing data. These methods can detect more recombination events for the LPL dataset  than previous bounds and provide stronger evidence for the presence of a recombination hotspot.
We apply our lower bounds to a real dataset  and demonstrate that these can provide a good indication for the presence and the location of recombination hotspots.
Unable to display preview. Download preview PDF.