On a Fixed Haplotype Variant of the Minimum Error Correction Problem

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10976)


Haplotype assembly is the problem of reconstructing the two parental chromosomes of an individual from a set of sampled DNA-sequences. A combinatorial optimization problem that models haplotype assembly is the Minimum Error Correction problem (MEC). This problem has been intensively studied in the computational biology literature and is also known in the clustering literature: essentially we are required to find two cluster centres such that the sum of distances to the nearest centre, is minimized. We introduce here the problem Fixed haplotype-Minimum Error Correction (FH-MEC), a new variant of MEC which corresponds to instances where one of the haplotypes/centres is already given. We provide hardness results for the problem on various restricted instances. We also propose a new and very simple 2-approximation algorithm for MEC on binary input matrices.



The last author acknowledges the support of an NWO TOP 2 grant.


  1. 1.
    Alimonti, P., Kann, V.: Hardness of approximating problems on cubic graphs. In: Bongiovanni, G., Bovet, D.P., Di Battista, G. (eds.) CIAC 1997. LNCS, vol. 1203, pp. 288–298. Springer, Heidelberg (1997). Scholar
  2. 2.
    Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153–i159 (2008)CrossRefGoogle Scholar
  3. 3.
    Bonizzoni, P., Dondi, R., Klau, G.W., Pirola, Y., Pisanti, N., Zaccaria, S.: On the minimum error correction problem for haplotype assembly in diploid and polyploid genomes. J. Comput. Biol. 23(9), 718–736 (2016)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Cilibrasi, R., Van Iersel, L., Kelk, S., Tromp, J.: The complexity of the single individual SNP haplotyping problem. Algorithmica 49(1), 13–36 (2007)MathSciNetCrossRefGoogle Scholar
  5. 5.
    International HapMap Consortium, et al.: A haplotype map of the human genome. Nature 437(7063), 1299 (2005)Google Scholar
  6. 6.
    Downey, R.G., Fellows, M.R.: Fundamentals of Parameterized Complexity, vol. 201. Springer, London (2016). Scholar
  7. 7.
    Etemadi, M., Bagherian, M., Chen, Z.-Z., Wang, L.: Better ILP models for haplotype assembly. BMC Bioinform. 19(1), 52 (2018)CrossRefGoogle Scholar
  8. 8.
    Feige, U.: NP-hardness of hypercube 2-segmentation (2014). arXiv preprint arXiv:1411.0821
  9. 9.
    Jiao, Y., Xu, J., Li, M.: On the k-closest substring and k-consensus pattern problems. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 130–144. Springer, Heidelberg (2004). Scholar
  10. 10.
    Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: SNPs problems, complexity, and algorithms. In: auf der Heide, F.M. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001). Scholar
  11. 11.
    Lippert, R., Schwartz, R., Lancia, G., Istrail, S.: Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Brief. Bioinform. 3(1), 23–31 (2002)CrossRefGoogle Scholar
  12. 12.
    Ostrovsky, R., Rabani, Y.: Polynomial-time approximation schemes for geometric min-sum median clustering. J. ACM 49(2), 139–156 (2002)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Papadimitriou, C.H., Yannakakis, M.: Optimization, approximation, and complexity classes. J. Comput. Syst. Sci. 43(3), 425–440 (1991)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Phelps, K.T., Rifa, J., Villanueva, M.: Rank and kernel of binary hadamard codes. IEEE Trans. Inf. Theory 51(11), 3931–3937 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Data Science and Knowledge EngineeringMaastricht UniversityMaastrichtThe Netherlands

Personalised recommendations