On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction Problem
Haplotype assembly is the computational problem of reconstructing the two parental copies, called haplotypes, of each chromosome starting from sequencing reads, called fragments, possibly affected by sequencing errors. Minimum Error Correction (MEC) is a prominent computational problem for haplotype assembly and, given a set of fragments, aims at reconstructing the two haplotypes by applying the minimum number of base corrections.
By using novel combinatorial properties of MEC instances, we are able to provide new results on the fixed-parameter tractability and approximability of MEC. In particular, we show that MEC is in FPT when parameterized by the number of corrections, and, on “gapless” instances, it is in FPT also when parameterized by the length of the fragments, whereas the result known in literature forces the reconstruction of complementary haplotypes. Then, we show that MEC cannot be approximated within any constant factor while it is approximable within factor \(O(\log nm)\) where \(n m\) is the size of the input. Finally, we provide a practical 2-approximation algorithm for the Binary MEC, a variant of MEC that has been applied in the framework of clustering binary data.
This work has been stimulated by discussions between PB, GK, and NP during the No.045 NII Shonan workshop on Exact Algorithms for Bioinformatics Research, March 2014, Japan.
The authors acknowledge the support of the MIUR PRIN 2010-2011 grant 2010LYA9RH (Automi e Linguaggi Formali: Aspetti Matematici e Applicativi), of the Cariplo Foundation grant 2013-0955 (Modulation of anti cancer immune response by regulatory non-coding RNAs), of the FA 2013 grant (Metodi algoritmici e modelli: aspetti teorici e applicazioni in bioinformatica).
- 12.Halldórsson, B.V., Aguiar, D., Istrail, S.: Haplotype phasing by multi-assembly of shared haplotypes: phase-dependent interactions between rare variants. In: PSB, pp. 88–99. World Scientific Publishing (2011)Google Scholar
- 15.Khot, S.: On the power of unique 2-prover 1-round games. In: STOC, pp. 767–775. ACM (2002)Google Scholar
- 16.Kleinberg, J., Papadimitriou, C., Raghavan, P.: Segmentation problems. In: STOC, pp. 473–482. ACM (1998)Google Scholar
- 22.Pirola, Y., et al.: Haplotype-based prediction of gene alleles using pedigrees and SNP genotypes. In: BCB, pp. 33–41. ACM (2013)Google Scholar