Abstract
Haplotype analysis has become an important tool in studying species traits and susceptibility to diseases. Several computational methods for determining haplotype information from genotype data have been developed, but none is perfect. Haplotype Inference (HI) approaches based on different strategies or biological principles tend to fail in different loci. In this work we apply Multiple Linear Regression to explore the relevance of several biologically meaningful properties of the genotype sequences for the occurrence of errors in the results of three HI methods based on different principles. We develop models for databases on different elements, using two error metrics. We assess the accuracy of our results through statistical analysis. Our models reveal genotype properties that are relevant in general and others that are suited for particular scenarios. We also show that the Regression models present statistically better performance than Neural Network models developed for the same databases and properties.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brown, D., Harrower, I.: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 141–154 (2006)
Clark, A.: Inference of haplotypes from pcr amplified samples of diploid populations. Journal of Molecular Biology and Evolution 7, 111–122 (1990)
Consortium, T.I.H.: The international hapmap consortium. Nature 426, 789–796 (2003)
Ding, Z., Filkov, V., Gusfield, D.: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 585–600. Springer, Heidelberg (2005)
Eronen, L., Geerts, F., Toivonen, H.: Haplorec: efficient and accurate large-scale reconstruction of haplotypes. BMC Bioinformatics 7, 542 (2006)
Gusfield, D.: Inference of haplotypes from samples of diploids populations: Complexity and algorithms. Journal of Computational Biology 8, 305–323 (2001)
Gusfield, D.: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. In: International Conference on Research in Computational Molecular Biology (RECOMB), pp. 166–175 (2002)
Gusfield, D.: Haplotype Inference by Pure Parsimony. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)
Halldórsson, B.V., Bafna, V., Edwards, N., Lippert, R., Yooseph, S., Istrail, S.: A Survey of Computational Methods for Determining Haplotypes. In: Istrail, S., Waterman, M.S., Clark, A. (eds.) SNPs and Haplotype Inference. LNCS (LNBI), vol. 2983, pp. 26–47. Springer, Heidelberg (2004)
Lancia, G., Pinotti, C.M., Rizzi, R.: Haplotype haplotyping populations by pure parsimony: Complexity of exact and approximation algorithms. INFORMS J. Computing 16, 348–359 (2004)
Li, Z., Zhou, W., Zhang, X.S., Chen, L.: A parsimonious tree-grow method for haplotype inference. Bioinformatics 21, 3475–3481 (2005)
Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype inference in random population samples. Am. J. Hum. Genet. 71(5), 1129–1137 (2002)
Montgomery, D., Runger, G.: Applied statistics and probability for engineers, 4th edn. LTC (2003)
Murtaugh, P.A.: Performance of several variable-selection methods applied to real ecological data. Ecology Letters 12(10), 1061–1068 (2009)
Niu, T., Qin, Z.S., Xu, X., Liu, J.S.: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am. J. Hum. Genet. 70, 157–169 (2002)
Rosa, R.S., Guimarães, K.S.: Insights on Haplotype Inference on Large Genotype Datasets. In: Ferreira, C.E., Miyano, S., Stadler, P.F. (eds.) BSB 2010. LNCS, vol. 6268, pp. 47–58. Springer, Heidelberg (2010)
Rosa, R.S., Santos, R.H.S., Guimarães, K.S.: Accurate prediction of error in haplotype inference methods through neural networks. In: Proc. of the IJCNN 2012 (2012)
Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78(4), 629–644 (2006)
Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001)
Sun, S., Greenwood, C.M., Neal, R.M.: Haplotype inference using a bayesian hidden markov model. Genet. Epidemiol. 31, 937–948 (2007)
Team, R.D.C.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2011)
Wu, L., Zang, J., Chan, R.: Improved approach for haplotype inference based on markov chain. Lecture Notes in Operations Research, pp. 204–215 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rosa, R.S., Santos, R.H.S., Guimarães, K.S. (2012). Associating Genotype Sequence Properties to Haplotype Inference Errors. In: de Souto, M.C., Kann, M.G. (eds) Advances in Bioinformatics and Computational Biology. BSB 2012. Lecture Notes in Computer Science(), vol 7409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31927-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-31927-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31926-6
Online ISBN: 978-3-642-31927-3
eBook Packages: Computer ScienceComputer Science (R0)