Skip to main content

A Nearly Linear-Time General Algorithm for Genome-Wide Bi-allele Haplotype Phasing

  • Conference paper
High Performance Computing - HiPC 2003 (HiPC 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2913))

Included in the following conference series:


The determination of feature maps, such as STSs (sequence tag sites), SNPs (single nucleotide polymorphisms) or RFLP (restriction fragment length polymorphisms) maps, for each chromosome copy or haplotype in an individual has important potential applications to genetics, clinical biology and association studies. We consider the problem of reconstructing two haplotypes of a diploid individual from genotype data generated by mapping experiments, and present an algorithm to recover haplotypes. The problem of optimizing existing methods of SNP phasing with a population of diploid genotypes has been investigated in [7] and found to be NP-hard. In contrast, using single molecule methods, we show that although haplotypes are not known and data are further confounded by the mapping error model, reasonable assumptions on the mapping process allow us to recover the co-associations of allele types across consecutive loci and estimate the haplotypes with an efficient algorithm. The haplotype reconstruction algorithm requires two stages: Stage I is the detection of polymorphic marker types, this is done by modifying an EM–algorithm for Gaussian mixture models and an example is given for RFLP sizing. Stage II focuses on the problem of phasing and presents a method of local maximum likelihood for the inference of haplotypes in an individual. The algorithm presented is nearly linear in the number of polymorphic loci. The algorithm results, run on simulated RFLP sizing data, are encouraging, and suggest that the method will prove practical for haplotype phasing.

Work reported in this paper is funded by grants from NSF Qubic program, DARPA, HHMI biomedical support research grant, US DOE, US Air Force, NIH, New York office of Science and Technology & Academic Research

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Anantharaman, T.S., Mishra, B., Schwartz, D.C.: Genomics via Optical Mapping II: Ordered Restriction Maps. Journal of Computational Biology 4(2), 91–118 (1997)

    Article  Google Scholar 

  2. Bafna, V., Gusfield, D., Lancia, G., Yooseph, S.: Haplotyping as Perfect Phylogeny, A Direct Approach. Technical Report UC Davis CSE–2002–21

    Google Scholar 

  3. Casey, W., Mishra, B., Wigler, M.: Placing Probes on the Genome with Pairwise Distance Data. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 52–68. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  4. Clark, A.: Inference of Haplotypes from PCR-Amplified Samples of Diploid Populations. Mol. Biol. Evol. 7, 111–122 (1990)

    Google Scholar 

  5. Dempster, A., Laird, N.N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J.R. Stat. Soc. 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  6. Excoffier, L., Slatkin, M.: Maximum–Likelihood Estimation of Molecular Haplotype Frequencies in a Diploid Population. Mol. Biol. Evol. 12, 921–927 (1995)

    Google Scholar 

  7. Gusfield, D.: Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms. Journal of Computational Biology 8(3), 305–323 (2001)

    Article  MathSciNet  Google Scholar 

  8. Ma, J., Xu, L., Jordan, M.: Asymptotic Convergence Rate of the EM– Algorithm for Gaussian Mixtures. Neural Computation 12(12), 2881–2907 (2000)

    Article  Google Scholar 

  9. Mitra, R., Church, G.: In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Research 27(24), e34-e34 (1999)

    Google Scholar 

  10. Niu, T., Qin, Z., Xu, X., Liu, J.: Bayesian Haplotype Inference for Multiple Linked Single-Nucleotide Polymorphisms. Am. J. Hum. Genet. 70, 156–169 (2002)

    Article  Google Scholar 

  11. Parida, L., Mishra, B.: Partitioning Single-Molecule Maps into Multiple Populations: Algorithms And Probabilistic Analysis. Discrete Applied Mathematics (The Computational Molecular Biology Series) 104(l-3), 203–227 (2000)

    MATH  MathSciNet  Google Scholar 

  12. Roweis, S., Ghahramani, Z.: A Unifying Review of Linear Gaussian Models. Neural Computation 11(2), 305–345 (1999)

    Article  Google Scholar 

  13. Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001)

    Article  Google Scholar 

  14. Tarjan, R.E.: Data Structures and Network Algorithms, CBMS 44. SIAM, Philadelphia (1983)

    Google Scholar 

  15. Weir, B.: Genetic Data Analysis II. Sinauer Associates, Sunderland (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Casey, W., Mishra, B. (2003). A Nearly Linear-Time General Algorithm for Genome-Wide Bi-allele Haplotype Phasing. In: Pinkston, T.M., Prasanna, V.K. (eds) High Performance Computing - HiPC 2003. HiPC 2003. Lecture Notes in Computer Science, vol 2913. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20626-2

  • Online ISBN: 978-3-540-24596-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics