Skip to main content

Part of the book series: Computational Biology ((COBO,volume 23))

  • 1813 Accesses

Abstract

This chapter provides an aggregation of three problems associated with the coarse analysis of biological sequences at subsequence level: gene finding, genome rearrangement, and haplotype inference. Detecting the location of genes in DNA is needed to analyze them efficiently. Gene finding algorithms for this purpose can be classified as statistical and comparison-based methods. The first scheme searches statistically more frequently appearing sequences such as the start and terminating codons, and sequence repeats in or around the location of the genes. In comparison-based methods, our aim is to compare sequences and find their similar structures assuming genes are more conserved throughout the evolution. Subsequences of a genome undergo mutations and discovery of these chain of mutations provides information about the evolutionary process and also the disease state of an organism as some of these arrangements are assumed to be the causes of certain diseases. We review few algorithms that discover the rearrangement events in a genome. The last problem we study is the extraction of data of a single chromosome from the two chromosome data, called haplotype inference. We describe a simple algorithm based on maximum parsimony and a probabilistic algorithm for this purpose. Unfortunately, there are hardly any distributed algorithms for these three important tasks and we propose a distributed algorithm for genome rearrangement and two distributed algorithms for haplotype inference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Altschul SF, Gish W, Miller W, Myers EW et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    Article  Google Scholar 

  2. Axelson-Fisk M (2010) Comparative gene finding: models, algorithms and implementation: Chap. 2, Computational Biology Series, Springer

    Google Scholar 

  3. Bader DA, Moret BME, Yan M (2001) A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. In: FKHA Dehne, J-R Sack, R Tamassia (eds) WADS, LNCS, vol 2125. Springer, pp 365–376

    Google Scholar 

  4. Bafna V, Pevzner PA (1993) Genome rearrangements and sorting by reversals. In: Proceedings of the 34th annual symposium on foundations of computer science, pp 148–157

    Google Scholar 

  5. Bafna V, Pevzner PA (1996) Genome rearrangements and sorting by reversals. SIAM J Comput 25(2):272–289

    Article  MathSciNet  MATH  Google Scholar 

  6. Bergeron A (2005) A very elementary presentation of the Hannenhalli-Pevzner theory. Discrete Appl Math 146(2):134–145

    Article  MathSciNet  MATH  Google Scholar 

  7. Berman P, Hannenhalli S, Karpinski M (2002) 1.375-approximation algorithm for sorting by reversals. In: Proceedings of the 10th annual european symposium on algorithms, series ESA 02, Springer, London, UK, pp 200–210

    Google Scholar 

  8. Birney E, Durbin R (2000) Using GeneWise in the Drosophila annotation experiment. Genome Res 10:547–548

    Article  Google Scholar 

  9. Braga MDV, Sagot M, Scornavacca C, Tannier E (2007) The solution space of sorting by reversals. In: Mandoiu II, Zelikovsky A (eds) ISBRA 2007, vol 4463, LNCS (LNBI)Springer, Heidelberg, pp 293–304

    Google Scholar 

  10. Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220(1):49–65

    Article  Google Scholar 

  11. Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268(1):78–94

    Article  Google Scholar 

  12. Cai Y, Bork P (1998) Homology-based gene prediction using neural nets. Anal Biochem 265(2):269–274

    Article  Google Scholar 

  13. Caprara A (1997) Sorting by reversals is difficult. In: Proceedings of the 1st ACM conference on research in computational molecular biology (RECOMB’97), pp 75–83

    Google Scholar 

  14. Christie DA (1998) A 3/2-approximation algorithm for sorting by reversals. Proceedings the ninth annual ACM-SIAM symposium on Discrete algorithms, series SODA 98. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 244–252

    Google Scholar 

  15. Christie DA (1999) Genome Rearrangement Problems. Ph.D. thesis, The University of Glasgow

    Google Scholar 

  16. Clark AG (1990) Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol 7:111–122

    Google Scholar 

  17. Das AK, Amritanjali (2011) Parallel algorithm to enumerate sorting reversals for signed permutation. Int J Comp Tech Appl 2(3):579–589

    Google Scholar 

  18. Day RO, Lamont GB, Pachter R (2003) Protein structure prediction by applying an evolutionary algorithm. In: Proceedings of the international parallel and distributed processing symposium

    Google Scholar 

  19. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  20. Dobzhansky T, Sturtevant A (1938) Inversions in the chromosomes of drosophila pseudoobscura. Genetics 23:28–64

    Google Scholar 

  21. Duc DD, Le T-T, Vu T-N, Dinh HQ, Huan HX (2012) GA\_SVM: a genetic algorithm for improving gene regulatory activity prediction. In: IEEE RIVF international conference on computing and communication technologies, research, innovation, and vision for the future (RIVF)

    Google Scholar 

  22. Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12(5):921–927

    Google Scholar 

  23. Goel N, Singh S, Aseri TC (2013) A review of soft computing techniques for gene prediction. Hindawi Publishing Corporation ISRN Genomics, vol 2013, Article ID 191206. http://dx.doi.org/10.1155/2013/191206

  24. Gusfield D (2002) Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. In: Proceedings of the 6th annual international conference computational biology, pp 166–175

    Google Scholar 

  25. http://genes.mit.edu/GENSCAN.html. The GENSCAN Web Server at MIT

  26. http://www.fruitfly.org/seq_tools/genie.html. The Genie web server

  27. http://www.genezilla.org/. The GeneZilla web server

  28. Hannenhalli S, Pevzner PA (1999) Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J ACM 46(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  29. Hawley ME, Kidd KK (1995) HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. J Heredity 86(5):409411

    Article  Google Scholar 

  30. Henderson H, Salzberg S, Fasman KH (1997) Finding genes in DNA with a Hidden Markov Model. J Comput Biol 4(2):127–141

    Article  Google Scholar 

  31. Kaplan H, Shamir R, Tarjan RE (2000) A faster and simpler algorithm for sorting signed permutations by reversals. SIAM J Comput 29(3):880–892

    Article  MathSciNet  MATH  Google Scholar 

  32. Karayiannis NB, Venetsanopoulos AN (1993) Artificial neural networks, learning algorithms, performance evaluation, and applications. Springer Science+Business Media, New York

    MATH  Google Scholar 

  33. Kececioglu J, Sankoff D (1993) Exact and approximation algorithms for the inversion distance between two permutations. In: Proceedings of the 4th annual symposium on combinatorial pattern matching, volume 684 of Lecture Notes in Computer Science, Springer, New York, pp 87–105

    Google Scholar 

  34. Krogh A (1997) Two methods for improving performance of a HMM and their application for gene finding. In: Gaasterland T, Karp P, Karplus K, Ouzounis C, Sander C, Valencia A (eds) Proceedings of the fifth international conference on intelligent systems for molecular biology. AAAI Press, Menlo park, CA, pp 179–186

    Google Scholar 

  35. Krogh A (1998) An introduction to hidden Markov models for biological sequences. In: Salzberg SL, Searls DB, Kasif S (eds) Computational methods in molecular biology Chapter 4 . Elsevier, Amsterdam, The Netherlands, pp 45–63

    Chapter  Google Scholar 

  36. Long JC, Williams RC, Urbanek M (1995) An E-M algorithm and testing strategy for multiple-locus haplotypes. Am J Hum Genet 56(3):799–810

    Google Scholar 

  37. Mourad E, Albert YZ (eds) (2011) Algorithms in computational molecular biology: techniques, approaches and applications. Wiley Series in Bioinformatics, Chap 33

    MATH  Google Scholar 

  38. Nielsen J, Andreas Sand A (2011) Algorithms for a parallel implementation of Hidden Markov Models with a small state space. IPDPS Workshops 2011:452–459

    Google Scholar 

  39. Palaniappan K, Mukherjee S (2011) Predicting essential genes across microbial genomes: a machine learning approach. In: Proceedings of the IEEE international conference on machine learning and applications, pp 189–194

    Google Scholar 

  40. Palmer JD, Herbon LA (1988) Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J Mol Evol 28(1–2):87–97

    Article  Google Scholar 

  41. Perez-Rodriguez J, Garcia-Pedrajas N (2011) An evolutionary algorithm for gene structure prediction. Industrial engineering and other applications of applied intelligent systems II 6704:386–395

    Google Scholar 

  42. Rebello S, Maheshwari U, Safreena Dsouza RV (2011) Back propagation neural network method for predicting lac gene structure in streptococcus pyogenes M group A streptococcus strains. Int J Biotechnol Mol Biol Res 2:61–72

    Google Scholar 

  43. Schulze-Kremer S (2000) Genetic algorithms and protein folding. Protein Struct Prediction Methods Mol Biol 143:175–222

    Article  Google Scholar 

  44. Siepel AC (2002) An algorithm to find all sorting reversals. Proceedings of the 6th annual international conference computational molecular biology (RECOMB 2002). ACM Press, New York, pp 281–290

    Google Scholar 

  45. Sung W-K (2009) Algorithms in bioinformatics: a practical introduction. CRC Press, Taylor and Francis Group, pp 230–231

    Google Scholar 

  46. Trinca D, Rajasekaran S (2007) Self-optimizing parallel algorithms for haplotype reconstruction and their evaluation on the JPT and CHB genotype data. In: Proceedings of 7th IEEE international conference on bioinformatics and bioengineering

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Erciyes .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Erciyes, K. (2015). Genome Analysis. In: Distributed and Sequential Algorithms for Bioinformatics. Computational Biology, vol 23. Springer, Cham. https://doi.org/10.1007/978-3-319-24966-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24966-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24964-3

  • Online ISBN: 978-3-319-24966-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics