Genome Analysis

Erciyes, K.

doi:10.1007/978-3-319-24966-7_9

K. Erciyes²⁵

Part of the book series: Computational Biology ((COBO,volume 23))

1813 Accesses

Abstract

This chapter provides an aggregation of three problems associated with the coarse analysis of biological sequences at subsequence level: gene finding, genome rearrangement, and haplotype inference. Detecting the location of genes in DNA is needed to analyze them efficiently. Gene finding algorithms for this purpose can be classified as statistical and comparison-based methods. The first scheme searches statistically more frequently appearing sequences such as the start and terminating codons, and sequence repeats in or around the location of the genes. In comparison-based methods, our aim is to compare sequences and find their similar structures assuming genes are more conserved throughout the evolution. Subsequences of a genome undergo mutations and discovery of these chain of mutations provides information about the evolutionary process and also the disease state of an organism as some of these arrangements are assumed to be the causes of certain diseases. We review few algorithms that discover the rearrangement events in a genome. The last problem we study is the extraction of data of a single chromosome from the two chromosome data, called haplotype inference. We describe a simple algorithm based on maximum parsimony and a probabilistic algorithm for this purpose. Unfortunately, there are hardly any distributed algorithms for these three important tasks and we propose a distributed algorithm for genome rearrangement and two distributed algorithms for haplotype inference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altschul SF, Gish W, Miller W, Myers EW et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Article Google Scholar
Axelson-Fisk M (2010) Comparative gene finding: models, algorithms and implementation: Chap. 2, Computational Biology Series, Springer
Google Scholar
Bader DA, Moret BME, Yan M (2001) A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. In: FKHA Dehne, J-R Sack, R Tamassia (eds) WADS, LNCS, vol 2125. Springer, pp 365–376
Google Scholar
Bafna V, Pevzner PA (1993) Genome rearrangements and sorting by reversals. In: Proceedings of the 34th annual symposium on foundations of computer science, pp 148–157
Google Scholar
Bafna V, Pevzner PA (1996) Genome rearrangements and sorting by reversals. SIAM J Comput 25(2):272–289
Article MathSciNet MATH Google Scholar
Bergeron A (2005) A very elementary presentation of the Hannenhalli-Pevzner theory. Discrete Appl Math 146(2):134–145
Article MathSciNet MATH Google Scholar
Berman P, Hannenhalli S, Karpinski M (2002) 1.375-approximation algorithm for sorting by reversals. In: Proceedings of the 10th annual european symposium on algorithms, series ESA 02, Springer, London, UK, pp 200–210
Google Scholar
Birney E, Durbin R (2000) Using GeneWise in the Drosophila annotation experiment. Genome Res 10:547–548
Article Google Scholar
Braga MDV, Sagot M, Scornavacca C, Tannier E (2007) The solution space of sorting by reversals. In: Mandoiu II, Zelikovsky A (eds) ISBRA 2007, vol 4463, LNCS (LNBI)Springer, Heidelberg, pp 293–304
Google Scholar
Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220(1):49–65
Article Google Scholar
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268(1):78–94
Article Google Scholar
Cai Y, Bork P (1998) Homology-based gene prediction using neural nets. Anal Biochem 265(2):269–274
Article Google Scholar
Caprara A (1997) Sorting by reversals is difficult. In: Proceedings of the 1st ACM conference on research in computational molecular biology (RECOMB’97), pp 75–83
Google Scholar
Christie DA (1998) A 3/2-approximation algorithm for sorting by reversals. Proceedings the ninth annual ACM-SIAM symposium on Discrete algorithms, series SODA 98. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 244–252
Google Scholar
Christie DA (1999) Genome Rearrangement Problems. Ph.D. thesis, The University of Glasgow
Google Scholar
Clark AG (1990) Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol 7:111–122
Google Scholar
Das AK, Amritanjali (2011) Parallel algorithm to enumerate sorting reversals for signed permutation. Int J Comp Tech Appl 2(3):579–589
Google Scholar
Day RO, Lamont GB, Pachter R (2003) Protein structure prediction by applying an evolutionary algorithm. In: Proceedings of the international parallel and distributed processing symposium
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38
MathSciNet MATH Google Scholar
Dobzhansky T, Sturtevant A (1938) Inversions in the chromosomes of drosophila pseudoobscura. Genetics 23:28–64
Google Scholar
Duc DD, Le T-T, Vu T-N, Dinh HQ, Huan HX (2012) GA\_SVM: a genetic algorithm for improving gene regulatory activity prediction. In: IEEE RIVF international conference on computing and communication technologies, research, innovation, and vision for the future (RIVF)
Google Scholar
Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12(5):921–927
Google Scholar
Goel N, Singh S, Aseri TC (2013) A review of soft computing techniques for gene prediction. Hindawi Publishing Corporation ISRN Genomics, vol 2013, Article ID 191206. http://dx.doi.org/10.1155/2013/191206
Gusfield D (2002) Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. In: Proceedings of the 6th annual international conference computational biology, pp 166–175
Google Scholar
http://genes.mit.edu/GENSCAN.html. The GENSCAN Web Server at MIT
http://www.fruitfly.org/seq_tools/genie.html. The Genie web server
http://www.genezilla.org/. The GeneZilla web server
Hannenhalli S, Pevzner PA (1999) Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J ACM 46(1):1–27
Article MathSciNet MATH Google Scholar
Hawley ME, Kidd KK (1995) HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. J Heredity 86(5):409411
Article Google Scholar
Henderson H, Salzberg S, Fasman KH (1997) Finding genes in DNA with a Hidden Markov Model. J Comput Biol 4(2):127–141
Article Google Scholar
Kaplan H, Shamir R, Tarjan RE (2000) A faster and simpler algorithm for sorting signed permutations by reversals. SIAM J Comput 29(3):880–892
Article MathSciNet MATH Google Scholar
Karayiannis NB, Venetsanopoulos AN (1993) Artificial neural networks, learning algorithms, performance evaluation, and applications. Springer Science+Business Media, New York
MATH Google Scholar
Kececioglu J, Sankoff D (1993) Exact and approximation algorithms for the inversion distance between two permutations. In: Proceedings of the 4th annual symposium on combinatorial pattern matching, volume 684 of Lecture Notes in Computer Science, Springer, New York, pp 87–105
Google Scholar
Krogh A (1997) Two methods for improving performance of a HMM and their application for gene finding. In: Gaasterland T, Karp P, Karplus K, Ouzounis C, Sander C, Valencia A (eds) Proceedings of the fifth international conference on intelligent systems for molecular biology. AAAI Press, Menlo park, CA, pp 179–186
Google Scholar
Krogh A (1998) An introduction to hidden Markov models for biological sequences. In: Salzberg SL, Searls DB, Kasif S (eds) Computational methods in molecular biology Chapter 4 . Elsevier, Amsterdam, The Netherlands, pp 45–63
Chapter Google Scholar
Long JC, Williams RC, Urbanek M (1995) An E-M algorithm and testing strategy for multiple-locus haplotypes. Am J Hum Genet 56(3):799–810
Google Scholar
Mourad E, Albert YZ (eds) (2011) Algorithms in computational molecular biology: techniques, approaches and applications. Wiley Series in Bioinformatics, Chap 33
MATH Google Scholar
Nielsen J, Andreas Sand A (2011) Algorithms for a parallel implementation of Hidden Markov Models with a small state space. IPDPS Workshops 2011:452–459
Google Scholar
Palaniappan K, Mukherjee S (2011) Predicting essential genes across microbial genomes: a machine learning approach. In: Proceedings of the IEEE international conference on machine learning and applications, pp 189–194
Google Scholar
Palmer JD, Herbon LA (1988) Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J Mol Evol 28(1–2):87–97
Article Google Scholar
Perez-Rodriguez J, Garcia-Pedrajas N (2011) An evolutionary algorithm for gene structure prediction. Industrial engineering and other applications of applied intelligent systems II 6704:386–395
Google Scholar
Rebello S, Maheshwari U, Safreena Dsouza RV (2011) Back propagation neural network method for predicting lac gene structure in streptococcus pyogenes M group A streptococcus strains. Int J Biotechnol Mol Biol Res 2:61–72
Google Scholar
Schulze-Kremer S (2000) Genetic algorithms and protein folding. Protein Struct Prediction Methods Mol Biol 143:175–222
Article Google Scholar
Siepel AC (2002) An algorithm to find all sorting reversals. Proceedings of the 6th annual international conference computational molecular biology (RECOMB 2002). ACM Press, New York, pp 281–290
Google Scholar
Sung W-K (2009) Algorithms in bioinformatics: a practical introduction. CRC Press, Taylor and Francis Group, pp 230–231
Google Scholar
Trinca D, Rajasekaran S (2007) Self-optimizing parallel algorithms for haplotype reconstruction and their evaluation on the JPT and CHB genotype data. In: Proceedings of 7th IEEE international conference on bioinformatics and bioengineering
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Izmir University, Uckuyular, Izmir, Turkey
K. Erciyes

Authors

K. Erciyes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Erciyes .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Erciyes, K. (2015). Genome Analysis. In: Distributed and Sequential Algorithms for Bioinformatics. Computational Biology, vol 23. Springer, Cham. https://doi.org/10.1007/978-3-319-24966-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-24966-7_9
Published: 01 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24964-3
Online ISBN: 978-3-319-24966-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics