Abstract
This chapter provides an aggregation of three problems associated with the coarse analysis of biological sequences at subsequence level: gene finding, genome rearrangement, and haplotype inference. Detecting the location of genes in DNA is needed to analyze them efficiently. Gene finding algorithms for this purpose can be classified as statistical and comparison-based methods. The first scheme searches statistically more frequently appearing sequences such as the start and terminating codons, and sequence repeats in or around the location of the genes. In comparison-based methods, our aim is to compare sequences and find their similar structures assuming genes are more conserved throughout the evolution. Subsequences of a genome undergo mutations and discovery of these chain of mutations provides information about the evolutionary process and also the disease state of an organism as some of these arrangements are assumed to be the causes of certain diseases. We review few algorithms that discover the rearrangement events in a genome. The last problem we study is the extraction of data of a single chromosome from the two chromosome data, called haplotype inference. We describe a simple algorithm based on maximum parsimony and a probabilistic algorithm for this purpose. Unfortunately, there are hardly any distributed algorithms for these three important tasks and we propose a distributed algorithm for genome rearrangement and two distributed algorithms for haplotype inference.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altschul SF, Gish W, Miller W, Myers EW et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Axelson-Fisk M (2010) Comparative gene finding: models, algorithms and implementation: Chap. 2, Computational Biology Series, Springer
Bader DA, Moret BME, Yan M (2001) A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. In: FKHA Dehne, J-R Sack, R Tamassia (eds) WADS, LNCS, vol 2125. Springer, pp 365–376
Bafna V, Pevzner PA (1993) Genome rearrangements and sorting by reversals. In: Proceedings of the 34th annual symposium on foundations of computer science, pp 148–157
Bafna V, Pevzner PA (1996) Genome rearrangements and sorting by reversals. SIAM J Comput 25(2):272–289
Bergeron A (2005) A very elementary presentation of the Hannenhalli-Pevzner theory. Discrete Appl Math 146(2):134–145
Berman P, Hannenhalli S, Karpinski M (2002) 1.375-approximation algorithm for sorting by reversals. In: Proceedings of the 10th annual european symposium on algorithms, series ESA 02, Springer, London, UK, pp 200–210
Birney E, Durbin R (2000) Using GeneWise in the Drosophila annotation experiment. Genome Res 10:547–548
Braga MDV, Sagot M, Scornavacca C, Tannier E (2007) The solution space of sorting by reversals. In: Mandoiu II, Zelikovsky A (eds) ISBRA 2007, vol 4463, LNCS (LNBI)Springer, Heidelberg, pp 293–304
Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220(1):49–65
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268(1):78–94
Cai Y, Bork P (1998) Homology-based gene prediction using neural nets. Anal Biochem 265(2):269–274
Caprara A (1997) Sorting by reversals is difficult. In: Proceedings of the 1st ACM conference on research in computational molecular biology (RECOMB’97), pp 75–83
Christie DA (1998) A 3/2-approximation algorithm for sorting by reversals. Proceedings the ninth annual ACM-SIAM symposium on Discrete algorithms, series SODA 98. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 244–252
Christie DA (1999) Genome Rearrangement Problems. Ph.D. thesis, The University of Glasgow
Clark AG (1990) Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol 7:111–122
Das AK, Amritanjali (2011) Parallel algorithm to enumerate sorting reversals for signed permutation. Int J Comp Tech Appl 2(3):579–589
Day RO, Lamont GB, Pachter R (2003) Protein structure prediction by applying an evolutionary algorithm. In: Proceedings of the international parallel and distributed processing symposium
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38
Dobzhansky T, Sturtevant A (1938) Inversions in the chromosomes of drosophila pseudoobscura. Genetics 23:28–64
Duc DD, Le T-T, Vu T-N, Dinh HQ, Huan HX (2012) GA\_SVM: a genetic algorithm for improving gene regulatory activity prediction. In: IEEE RIVF international conference on computing and communication technologies, research, innovation, and vision for the future (RIVF)
Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12(5):921–927
Goel N, Singh S, Aseri TC (2013) A review of soft computing techniques for gene prediction. Hindawi Publishing Corporation ISRN Genomics, vol 2013, Article ID 191206. http://dx.doi.org/10.1155/2013/191206
Gusfield D (2002) Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. In: Proceedings of the 6th annual international conference computational biology, pp 166–175
http://genes.mit.edu/GENSCAN.html. The GENSCAN Web Server at MIT
http://www.fruitfly.org/seq_tools/genie.html. The Genie web server
http://www.genezilla.org/. The GeneZilla web server
Hannenhalli S, Pevzner PA (1999) Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J ACM 46(1):1–27
Hawley ME, Kidd KK (1995) HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. J Heredity 86(5):409411
Henderson H, Salzberg S, Fasman KH (1997) Finding genes in DNA with a Hidden Markov Model. J Comput Biol 4(2):127–141
Kaplan H, Shamir R, Tarjan RE (2000) A faster and simpler algorithm for sorting signed permutations by reversals. SIAM J Comput 29(3):880–892
Karayiannis NB, Venetsanopoulos AN (1993) Artificial neural networks, learning algorithms, performance evaluation, and applications. Springer Science+Business Media, New York
Kececioglu J, Sankoff D (1993) Exact and approximation algorithms for the inversion distance between two permutations. In: Proceedings of the 4th annual symposium on combinatorial pattern matching, volume 684 of Lecture Notes in Computer Science, Springer, New York, pp 87–105
Krogh A (1997) Two methods for improving performance of a HMM and their application for gene finding. In: Gaasterland T, Karp P, Karplus K, Ouzounis C, Sander C, Valencia A (eds) Proceedings of the fifth international conference on intelligent systems for molecular biology. AAAI Press, Menlo park, CA, pp 179–186
Krogh A (1998) An introduction to hidden Markov models for biological sequences. In: Salzberg SL, Searls DB, Kasif S (eds) Computational methods in molecular biology Chapter 4 . Elsevier, Amsterdam, The Netherlands, pp 45–63
Long JC, Williams RC, Urbanek M (1995) An E-M algorithm and testing strategy for multiple-locus haplotypes. Am J Hum Genet 56(3):799–810
Mourad E, Albert YZ (eds) (2011) Algorithms in computational molecular biology: techniques, approaches and applications. Wiley Series in Bioinformatics, Chap 33
Nielsen J, Andreas Sand A (2011) Algorithms for a parallel implementation of Hidden Markov Models with a small state space. IPDPS Workshops 2011:452–459
Palaniappan K, Mukherjee S (2011) Predicting essential genes across microbial genomes: a machine learning approach. In: Proceedings of the IEEE international conference on machine learning and applications, pp 189–194
Palmer JD, Herbon LA (1988) Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J Mol Evol 28(1–2):87–97
Perez-Rodriguez J, Garcia-Pedrajas N (2011) An evolutionary algorithm for gene structure prediction. Industrial engineering and other applications of applied intelligent systems II 6704:386–395
Rebello S, Maheshwari U, Safreena Dsouza RV (2011) Back propagation neural network method for predicting lac gene structure in streptococcus pyogenes M group A streptococcus strains. Int J Biotechnol Mol Biol Res 2:61–72
Schulze-Kremer S (2000) Genetic algorithms and protein folding. Protein Struct Prediction Methods Mol Biol 143:175–222
Siepel AC (2002) An algorithm to find all sorting reversals. Proceedings of the 6th annual international conference computational molecular biology (RECOMB 2002). ACM Press, New York, pp 281–290
Sung W-K (2009) Algorithms in bioinformatics: a practical introduction. CRC Press, Taylor and Francis Group, pp 230–231
Trinca D, Rajasekaran S (2007) Self-optimizing parallel algorithms for haplotype reconstruction and their evaluation on the JPT and CHB genotype data. In: Proceedings of 7th IEEE international conference on bioinformatics and bioengineering
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Erciyes, K. (2015). Genome Analysis. In: Distributed and Sequential Algorithms for Bioinformatics. Computational Biology, vol 23. Springer, Cham. https://doi.org/10.1007/978-3-319-24966-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-24966-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24964-3
Online ISBN: 978-3-319-24966-7
eBook Packages: Computer ScienceComputer Science (R0)