Skip to main content
Log in

PLEACH: a new heuristic algorithm for pure parsimony haplotyping problem

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Haplotype inference is an important issue in computational biology due to its various applications in diagnosing and treating genetic diseases such as diabetes, Alzheimer, and heart defects. There are different criteria to choose the solution from the alternatives. Parsimony is one of the most important criteria according to which the problem is known as Pure Parsimony Haplotyping (PPH) problem. The approaches to solve PPH are classified to two groups: exact and non-exact. The exact approaches often model the problem as a Mixed Integer Linear Programming (MILP) problem. Although in solving the small instances, these models generate the optimal solution in a reasonable time, because of the NP-hardness characteristic of PPH problem, they are ineffective in solving very large instances. This deficiency is compensated by non-exact algorithms. In this paper, we present a non-exact algorithm for large instances of PPH problem based on the divide-and-conquer technique. This algorithm, first, divides the problem into small sub-problems, which are solved by one of the previous exact approaches, and finally the solutions of the sub-problems are combined through solving an MILP. The appeared MILPs for solving the sub-problems and those for combining the solutions are so small that are solved rapidly. The performance of this algorithm has been evaluated by implementing it on real and simulated instances and in comparison with two well-known methods of PHASE and WinHap2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of data and materials

The data are available upon request.

References

  1. Li WH, Sadler LA (1991) Low nucleotide diversity in man. Genetics 129:513–523. https://doi.org/10.1093/genetics/129.2.513

    Article  Google Scholar 

  2. Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N et al (1999) Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet 22:231–238. https://doi.org/10.1038/10290

    Article  Google Scholar 

  3. Wang DG, Fan JB, Siao CJ, Berno A, Young P, Sapolsky R (1998) Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280:1077–1082. https://doi.org/10.1126/science.280.5366.1077

    Article  Google Scholar 

  4. Halushka MK, Fan JB, Bentley K, Hsie L, Shen N, Weder A et al (1999) Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat Genet 22:239–247. https://doi.org/10.1038/10297

    Article  Google Scholar 

  5. Catanzaro D, Labbé M (2009) The pure parsimony haplotyping problem: overview and computational advances. Int Trans Oper Res 16:561–584. https://doi.org/10.1111/j.1475-3995.2009.00716.x

    Article  MathSciNet  Google Scholar 

  6. Zhang XS, Wang RS, Wu LY, Chen L (2006) Models and algorithms for haplotyping problem. Curr Bioinform 1:105–114. https://doi.org/10.2174/157489306775330570

    Article  Google Scholar 

  7. Faye A, Faye A, Diome T, Sembene M (2023) Genetic diversity and structure of Callosobruchus maculatus populations in the different agro-ecological zones of Senegal. J Asian Sci Res 13(1):16–27. https://doi.org/10.55493/5003.v13i1.4720

    Article  Google Scholar 

  8. Verstegen C (2020) Reconstructing phylogenies from genotype sequence collections: Merging the Pure Parsimony Haplotyping problem with the Haplotype Phylogeny problem. Louvain School of Management,Université catholique de Louvain, 2020. Prom. : Catanzaro, Daniele. http://hdl.handle.net/2078.1/thesis:24495

  9. Sramkó G, Kosztolányi A, Laczkó L, Rácz R, Szatmári L, Varga Z, Barta Z (2022) Range-wide phylogeography of the flightless steppe beetle Lethrus apterus (Geotrupidae) reveals recent arrival to the Pontic Steppes from the west. Sci Rep 12(1):5069. https://doi.org/10.1038/s41598-022-09007-0

    Article  Google Scholar 

  10. Bell GI, Horita S, Karam JH (1984) A polymorphic locus near the human insulin gene is associated with insulin-dependent diabetes mellitus. Diabetes 33:176–183. https://doi.org/10.2337/diab.33.2.176

    Article  Google Scholar 

  11. Dorman JS, LaPorte RE, Stone RA, Trucco M (1990) Worldwide differences in the incidence of type I diabetes are associated with amino acid variation at position 57 of the HLA-DQ beta chain. Proc Natl Acad Sci 87(19):7370–7374. https://doi.org/10.1073/pnas.87.19.7370

    Article  Google Scholar 

  12. Nisticò L, Buzzetti R, Pritchard LE, Van der Auwera B, Giovannini C, Bosi E et al (1996) The CTLA-4 gene region of chromosome 2q33 is linked to, and associated with, type 1 diabetes. Hum Mol Genet 5:1075–1080. https://doi.org/10.1093/hmg/5.7.1075

    Article  Google Scholar 

  13. Altshuler D, Hirschhorn JN, Klannemark M, Lindgren CM, Vohl MC, Nemesh J et al (2000) The common PPARγ Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat Genet 26:76–80. https://doi.org/10.1038/79216

    Article  Google Scholar 

  14. Deeb SS, Fajas L, Nemoto M, Pihlajamäki J, Mykkänen L, Kuusisto J et al (1998) A Pro12Ala substitution in PPARγ2 associated with decreased receptor activity, lower body mass index and improved insulin sensitivity. Nat Genet 20:284–287. https://doi.org/10.1038/3099

    Article  Google Scholar 

  15. Chapuis J, Hot D, Hansmannel F, Kerdraon O, Ferreira S, Hubans C et al (2009) Transcriptomic and genetic studies identify IL-33 as a candidate gene for Alzheimer’s disease. Mol Psychiatry 14:1004–1016. https://doi.org/10.1038/mp.2009.10

    Article  Google Scholar 

  16. Strittmatter WJ, Roses AD (1996) Apolipoprotein E and Alzheimer’s disease. Annu Rev Neurosci 19:53–77. https://doi.org/10.1146/annurev.ne.19.030196.000413

    Article  Google Scholar 

  17. Gretarsdottir S, Thorleifsson G, Reynisdottir ST, Manolescu A, Jonsdottir S, Jonsdottir T et al (2003) The gene encoding phosphodiesterase 4D confers risk of ischemic stroke. Nat Genet 35:131–138. https://doi.org/10.1038/ng1245

    Article  Google Scholar 

  18. Van Eerdewegh P, Little RD, Dupuis J, Del Mastro RG, Falls K, Simon J et al (2002) Association of the ADAM33 gene with asthma and bronchial hyperresponsiveness. Nature 418:426–430. https://doi.org/10.1038/nature00878

    Article  Google Scholar 

  19. Trégouët DA, König IR, Erdmann J, Munteanu A, Braund PS, Hall AS et al (2009) Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease. Nat Genet 41:283–285. https://doi.org/10.1038/ng.314

    Article  Google Scholar 

  20. Lancia G, Pinotti MC, Rizzi R (2004) Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS J Comput 16(4):348–359. https://doi.org/10.1287/ijoc.1040.0085

    Article  MathSciNet  Google Scholar 

  21. Gusfield D (2001) Inference of haplotypes from samples of diploid populations: complexity and algorithms. J Comput Biol 8:305–323. https://doi.org/10.1089/10665270152530863

    Article  Google Scholar 

  22. Gusfield D (2003) Haplotype inference by pure parsimony. In: Annual symposium on combinatorial pattern matching, pp. 144–155. https://doi.org/10.1007/3-540-44888-8_11

  23. Lancia G, Serafini P (2009) A set-covering approach with column generation for parsimony haplotyping. INFORMS J Comput 21:151–166. https://doi.org/10.1287/ijoc.1080.0285

    Article  MathSciNet  Google Scholar 

  24. Halldórsson BV, Bafna V, Edwards N, Lippert R, Yooseph S, Istrail S (2003) Combinatorial problems arising in SNP and haplotype analysis. In: Discrete mathematics and theoretical computer science. Springer, Cham, pp. 26–47. https://doi.org/10.1007/3-540-45066-1_3

  25. Brown DG, Harrower IM (2006) Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Tran Comput Biol Bioinform (TCBB) 3:141–154. https://doi.org/10.1109/TCBB.2006.24

    Article  Google Scholar 

  26. Bertolazzi P, Godi A, Labbé M, Tininini L (2008) Solving haplotyping inference parsimony problem using a new basic polynomial formulation. Comput Math Appl 55:900–911. https://doi.org/10.1016/j.camwa.2006.12.095

    Article  MathSciNet  Google Scholar 

  27. Jäger G, Climer S, Zhang W (2016) The complete parsimony haplotype inference problem and algorithms based on integer programming, branch-and-bound and Boolean satisfiability. J Discrete Algorithms 37:68–83. https://doi.org/10.1016/j.jda.2016.06.001

    Article  MathSciNet  Google Scholar 

  28. Dal Sasso V, De Giovanni L, Labbé M (2016) A column generation approach for pure Parsimony haplotyping. In: OASIcs-OpenAccess Series in Informatics. https://doi.org/10.4230/OASIcs.SCOR.2016.5

  29. Brown H, Zuo L, Gusfield D (2020) Comparing Integer Linear Programming to SAT-Solving for Hard Problems in Computational and Systems Biology. In: International Conference on Algorithms for Computational Biology (pp. 63–76). Springer, Cham. https://doi.org/10.1007/978-3-030-42266-0_6

  30. Lancia G (2008) The phasing of heterozygous traits: algorithms and complexity. Comput Math Appl 55:960–969. https://doi.org/10.1016/j.camwa.2006.12.089

    Article  MathSciNet  Google Scholar 

  31. Feizabadi R, Bagherian M, Vaziri H, Salahi M (2016) A new mathematical modeling for pure parsimony haplotyping problem. Math Biosci 281:92–97. https://doi.org/10.1016/j.mbs.2016.09.004

    Article  MathSciNet  Google Scholar 

  32. Wang L, Xu Y (2003) Haplotype inference by maximum parsimony. Bioinformatics 19:1773–1780. https://doi.org/10.1093/bioinformatics/btg239

    Article  Google Scholar 

  33. Lynce I, Marques-Silva J (2006) Efficient haplotype inference with Boolean satisfiability. In: National conference on artificial intelligence (AAAI) 2006. AAAI Press, Washington.

  34. Lynce I, Marques-Silva J (2006) SAT in bioinformatics: Making the case with haplotype inference. InInternational Conference on Theory and Applications of Satisfiability Testing (pp. 136–141). Springer, Berlin. https://doi.org/10.1007/11814948_16

  35. Graça A, Marques-Silva J, Lynce I, Oliveira AL (2007) Efficient haplotype inference with pseudo-boolean optimization. In: Algebraic biology: second International Conference, AB 2007, Castle of Hagenberg, Austria, July 2–4, 2007. Proceedings 2 2007 (pp. 125–139). Springer, Berlin. https://doi.org/10.1007/978-3-540-73433-8_10

  36. Di Gaspero L, Roli A (2008) Stochastic local search for large-scale instances of the haplotype inference problem by pure parsimony. J Algorithms 63:55–69. https://doi.org/10.1016/j.jalgor.2008.02.004

    Article  MathSciNet  Google Scholar 

  37. Godi A, Tininini L, Bertolazzi P (2004) Haplotype inference by parsimony for large datasets. Technical Report 616, IASI, Istituto di Analisi dei Sistemi ed Informatica–CNR, Rome.

  38. Huang YT, Chao KM, Chen T (2005) An approximation algorithm for haplotype inference by maximum parsimony. J Comput Biol 12:1261–1274. https://doi.org/10.1145/1066677.1066714

    Article  Google Scholar 

  39. Kalpakis K, Namjoshi P (2005) Haplotype phasing using semidefinite programming. In: Bioinformatics and Bioengineering. BIBE 2005. Fifth IEEE Symposium on, 2005, pp 145–152. https://doi.org/10.1109/BIBE.2005.36

  40. Lancia G, Rizzi R (2006) A polynomial case of the parsimony haplotyping problem. Oper Res Lett 34:289–295. https://doi.org/10.1016/j.orl.2005.05.007

    Article  MathSciNet  Google Scholar 

  41. Li Z, Zhou W, Zhang XS, Chen L (2005) A parsimonious tree-grow method for haplotype inference. Bioinformatics 21:3475–3481. https://doi.org/10.1093/bioinformatics/bti572

    Article  Google Scholar 

  42. Wang RS, Zhang XS, Sheng L (2005) Haplotype inference by pure parsimony via genetic algorithm. In: Operations Research and Its Applications: the Fifth International Symposium (ISORA’05), Tibet, China, August, 2005, pp. 8–13.

  43. Wei B, Zhao J (2014) Haplotype inference using a novel binary particle swarm optimization algorithm. Appl Soft Comput 21:415–422. https://doi.org/10.1016/j.asoc.2014.03.034

    Article  Google Scholar 

  44. Do DD, Le SV, Hoang XH (2013) ACOHAP: an efficient ant colony optimization for the haplotype inference by pure parsimony problem. Swarm Intell 7:63–77. https://doi.org/10.1007/s11721-013-0077-8

    Article  Google Scholar 

  45. Rosa RS, Cambuim LF, Barros EN (2019) An ensemble strategy for Haplotype Inference based on the internal variability of algorithms. In: 2019 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). IEEE, New York. https://doi.org/10.1109/IJCNN.2019.8851693

  46. Zhou Y, Zhang H, Yang Y (2019) CSHAP: efficient haplotype frequency estimation based on sparse representation. Bioinformatics 35(16):2827–2833. https://doi.org/10.1093/bioinformatics/bty1040

    Article  Google Scholar 

  47. Bulteau L, Weller M (2019) Parameterized algorithms in Bbioinformatics: an overview. Algorithms 12(12):256. https://doi.org/10.3390/a12120256

    Article  Google Scholar 

  48. Leiserson CE, Rivest RL, Cormen TH, Stein C (1994) Introduction to algorithms. MIT Press, Cambridge

    Google Scholar 

  49. Stephens M, Donnelly P (2003) A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Human Genet 73:1162–1169. https://doi.org/10.1371/journal.pone.0033133

    Article  Google Scholar 

  50. Pan W, Zhao Y, Xu Y, Zhou F (2014) WinHAP2: an extremely fast haplotype phasing program for long genotype sequences. BMC Bioinformatics 15:164. https://doi.org/10.1186/1471-2105-15-164

    Article  Google Scholar 

  51. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18:337–338. https://doi.org/10.1093/bioinformatics/18.2.337

    Article  Google Scholar 

  52. Lin Z, Altman RB (2004) Finding haplotype tagging SNPs by use of principal components analysis. Am J Human Genet 75:850–861. https://doi.org/10.1086/425587

    Article  Google Scholar 

  53. Kimmel G, Shamir R (2005) GERBIL: Genotype resolution and block identification using likelihood. Proc Natl Acad Sci 102(1):158–162. https://doi.org/10.1073/pnas.0404730102

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the anonymous reviewers for their helpful comments which lead to this improved version of the paper.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

All authors were involved in proposing and writing the paper. R. Feizabadi conducted the implementations of codes.

Corresponding author

Correspondence to Mehri Bagherian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feizabadi, R., Bagherian, M., Vaziri, H. et al. PLEACH: a new heuristic algorithm for pure parsimony haplotyping problem. J Supercomput 80, 8236–8258 (2024). https://doi.org/10.1007/s11227-023-05746-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05746-7

Keywords

Navigation