Skip to main content

Associating Genotype Sequence Properties to Haplotype Inference Errors

  • Conference paper
Advances in Bioinformatics and Computational Biology (BSB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7409))

Included in the following conference series:

  • 1016 Accesses

Abstract

Haplotype analysis has become an important tool in studying species traits and susceptibility to diseases. Several computational methods for determining haplotype information from genotype data have been developed, but none is perfect. Haplotype Inference (HI) approaches based on different strategies or biological principles tend to fail in different loci. In this work we apply Multiple Linear Regression to explore the relevance of several biologically meaningful properties of the genotype sequences for the occurrence of errors in the results of three HI methods based on different principles. We develop models for databases on different elements, using two error metrics. We assess the accuracy of our results through statistical analysis. Our models reveal genotype properties that are relevant in general and others that are suited for particular scenarios. We also show that the Regression models present statistically better performance than Neural Network models developed for the same databases and properties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, D., Harrower, I.: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 141–154 (2006)

    Article  Google Scholar 

  2. Clark, A.: Inference of haplotypes from pcr amplified samples of diploid populations. Journal of Molecular Biology and Evolution 7, 111–122 (1990)

    Google Scholar 

  3. Consortium, T.I.H.: The international hapmap consortium. Nature 426, 789–796 (2003)

    Article  Google Scholar 

  4. Ding, Z., Filkov, V., Gusfield, D.: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 585–600. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Eronen, L., Geerts, F., Toivonen, H.: Haplorec: efficient and accurate large-scale reconstruction of haplotypes. BMC Bioinformatics 7, 542 (2006)

    Article  Google Scholar 

  6. Gusfield, D.: Inference of haplotypes from samples of diploids populations: Complexity and algorithms. Journal of Computational Biology 8, 305–323 (2001)

    Article  Google Scholar 

  7. Gusfield, D.: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. In: International Conference on Research in Computational Molecular Biology (RECOMB), pp. 166–175 (2002)

    Google Scholar 

  8. Gusfield, D.: Haplotype Inference by Pure Parsimony. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Halldórsson, B.V., Bafna, V., Edwards, N., Lippert, R., Yooseph, S., Istrail, S.: A Survey of Computational Methods for Determining Haplotypes. In: Istrail, S., Waterman, M.S., Clark, A. (eds.) SNPs and Haplotype Inference. LNCS (LNBI), vol. 2983, pp. 26–47. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Lancia, G., Pinotti, C.M., Rizzi, R.: Haplotype haplotyping populations by pure parsimony: Complexity of exact and approximation algorithms. INFORMS J. Computing 16, 348–359 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  11. Li, Z., Zhou, W., Zhang, X.S., Chen, L.: A parsimonious tree-grow method for haplotype inference. Bioinformatics 21, 3475–3481 (2005)

    Article  Google Scholar 

  12. Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype inference in random population samples. Am. J. Hum. Genet. 71(5), 1129–1137 (2002)

    Article  Google Scholar 

  13. Montgomery, D., Runger, G.: Applied statistics and probability for engineers, 4th edn. LTC (2003)

    Google Scholar 

  14. Murtaugh, P.A.: Performance of several variable-selection methods applied to real ecological data. Ecology Letters 12(10), 1061–1068 (2009)

    Article  Google Scholar 

  15. Niu, T., Qin, Z.S., Xu, X., Liu, J.S.: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am. J. Hum. Genet. 70, 157–169 (2002)

    Article  Google Scholar 

  16. Rosa, R.S., Guimarães, K.S.: Insights on Haplotype Inference on Large Genotype Datasets. In: Ferreira, C.E., Miyano, S., Stadler, P.F. (eds.) BSB 2010. LNCS, vol. 6268, pp. 47–58. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Rosa, R.S., Santos, R.H.S., Guimarães, K.S.: Accurate prediction of error in haplotype inference methods through neural networks. In: Proc. of the IJCNN 2012 (2012)

    Google Scholar 

  18. Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78(4), 629–644 (2006)

    Article  Google Scholar 

  19. Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001)

    Article  Google Scholar 

  20. Sun, S., Greenwood, C.M., Neal, R.M.: Haplotype inference using a bayesian hidden markov model. Genet. Epidemiol. 31, 937–948 (2007)

    Article  Google Scholar 

  21. Team, R.D.C.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2011)

    Google Scholar 

  22. Wu, L., Zang, J., Chan, R.: Improved approach for haplotype inference based on markov chain. Lecture Notes in Operations Research, pp. 204–215 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rosa, R.S., Santos, R.H.S., Guimarães, K.S. (2012). Associating Genotype Sequence Properties to Haplotype Inference Errors. In: de Souto, M.C., Kann, M.G. (eds) Advances in Bioinformatics and Computational Biology. BSB 2012. Lecture Notes in Computer Science(), vol 7409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31927-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31927-3_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31926-6

  • Online ISBN: 978-3-642-31927-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics