Skip to main content

An integrated algorithm for local sequence alignment

Abstract

Local sequence alignment (LSA) is an essential part of DNA sequencing. LSA helps to identify the facts in biological identity, criminal investigations, disease identification, drug design and research. Large volume of biological data makes difficulties to the performance of efficient analysis and proper management of data in small space has become a serious issue. We have subdivided the data sets into various segments to reduce the data sets as well as for efficient memory use. The integration of dynamic programming (DP) and Chapman–Kolmogorov equations (CKE) makes the analysis faster. The subdivision process is named data reducing process (DRP). DRP is imposed before DP and CKE. This approach needs less space compared with other methods and the time requirement is also improved.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  • Akl S (1985) Parallel sorting algorithms. Academic Press, USA

    MATH  Google Scholar 

  • Alexandersson M, Cawley S, Pachter L (2003) SLAM: cross species gene finding and alignment with a generalized pair hidden Markov model. Genome Res 13:496–502

    Article  Google Scholar 

  • Altschul SF, Gish Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  Google Scholar 

  • Arratia R, Morris P, Waterman MS (1988) Stochastic scrabbles: a law of large numbers for sequence matching with scores. J Appl Probab 25:106–119

    Article  MATH  MathSciNet  Google Scholar 

  • Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES (2000a) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res 10:950–958

    Article  Google Scholar 

  • Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES (2000b) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res 10:950–958

    Article  Google Scholar 

  • Bray N, Dubchak I, Pachter L (2003) Avid: a global alignment program. Genome Res 13:97–102

    Article  Google Scholar 

  • Claverie JM, Poirot O, Lopez F (1997) The difficulty of identifying genes in anonymous vertebrate sequences. Comput Chem 21:203–214

    Article  Google Scholar 

  • Delcher AL et al (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30(11):2478–2483

    Article  Google Scholar 

  • Delcher AL et al (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673–679

    Article  Google Scholar 

  • Dembo A, Karlin S (1990) Strong limit theorems of empirical functional for large exceedances of partial sums of IID variables. Ann Probab 19(4):1737–1755

    Article  MathSciNet  Google Scholar 

  • Dhar PK, Thwin ST, Tun K, Tsumoto Y, Maurer-Stroh S, Eisenhaber F, Surana U (2009) Synthesizing non-natural parts from natural genomic template. J Biol Eng 3:2

    Article  Google Scholar 

  • Doolittle RF (1996) Methods in enzymology, vol 266. Academic Press, San Diego

    Google Scholar 

  • Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194

    Article  Google Scholar 

  • Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR (1995) Whole-genome random sequencing and assembly of Haemophilus influenza Rd. Science 269:496–512

    Article  Google Scholar 

  • Furey T, Kent WJ, Sugnet C, Roskin K, Pringle T, Zahler A, Haussler D (2002) The human genome browser at UCSC. Genome Res 12:996–1006

    Article  Google Scholar 

  • Karlin S, Altschul SF (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA 87:2264–2268

    Article  MATH  Google Scholar 

  • Karlin S, Altschul SF (1993) Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci USA 90:5873–5877

    Article  Google Scholar 

  • Keller O, Kollmar M, Stanke M, Waack S (2011) A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27:757–763

    Article  Google Scholar 

  • Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664

    Article  MathSciNet  Google Scholar 

  • Khan MI, Kamal MS (2013a) RSAM: an integrated algorithm for local sequence alignment. Arch Des Sci 66(5):395–412 (ISSN 1661-464X)

    Google Scholar 

  • Khan MI, Kamal MS (2013b) Sequencing ontology alignment for DNA annotation and damage identification. Eur J Sci Res 103(3):441–450

    Google Scholar 

  • Lewis D (1992) An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, vol 15, pp 37–50

  • Lewis D, Schapire R, Callan J, Papka R (1996) Training algorithms for linear text classifiers. In: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 298–306

  • Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227:1435–1441

    Article  Google Scholar 

  • Lipman DJ, Pearson WR (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448

    Article  Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    Article  Google Scholar 

  • Ning Z, Cox AJ, Mullikin JC (2001) SSAHA: “A fast search method for large DNA databases”. Genome Res 11:1725–1729

    Article  Google Scholar 

  • Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A (2010) GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 7:455–457

    Article  Google Scholar 

  • Robert WF (2002) Molecular biology, 2nd edn. McGraw-Hill, New York, pp 7105–7107 (ISBN: 0-07-112287-7)

    Google Scholar 

  • Ruiz M, Srinivasan P (1999) Hierarchical neural networks for text categorization (poster abstract). In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 281–282

  • Schatz MC et al (2007) High-throughput sequence alignment using Graphics Processing Units. BMC Bioinform 8:474

    Article  MathSciNet  Google Scholar 

  • Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD (2003) Asymmetry in the assembly of the RNAi enzyme complex. Cell 115:199

    Article  Google Scholar 

  • Smith TF, Waterman MS (1981) Comparison of bio-sequences. Adv Appl Math 2:482–489

    Article  MATH  MathSciNet  Google Scholar 

  • Stephens M, Sloan JS, Robertson PD, Scheet P, Nickerson DA (2006) Automating sequence-based detection and genotyping of SNPs from diploid samples. Nat Genet 38:375–381

    Article  Google Scholar 

  • Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982) A use of the ‘Perceptron’ algorithm to distinguish translation initiation site in E. coli. Nucleic Acids Res. 10:2997–3011

    Article  Google Scholar 

  • Tech M, Meinicke P (2006) An unsupervised classification scheme for improving predictions of prokaryotic TIS. BMC Bioinformatics 7:121

    Article  Google Scholar 

  • van Baren MJ, Brent MR (2006) Iterative gene prediction and pseudo gene removal improves genome annotation. Genome Res 16:678–685

    Article  Google Scholar 

  • Waqaar H, Alex A, Bharath R (2008) An efficient algorithm for local sequence alignment. In: 30th Annual international IEEE EMBS conference vancouver, British Columbia, Canada, August 20–24, 2008

  • Watanabe T, Takeda A, Mise K, Okuno T, Suzuki T, Minami N, Imai H (2005) Stage-specific expression of microRNAs during Xenopus development. FEBS Lett 579:318

    Article  Google Scholar 

  • Waterman MS (1989) Mathematical methods for DNA sequences. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Waterman MS (1994) Introduction to computational biology. Chapman & Hall, London

    Google Scholar 

  • Weckx S, Del-Favero J, Rademakers R, Claes L, Cruts M, De Jonghe P, Van Broeckhoven C, De Rijk P (2005) novoSNP, a novel computational tool for sequence variation discovery. Genome Res 15:436–442

    Article  Google Scholar 

  • Wu WS et al (2006) Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle. BMC Bioinform 7:421

    Article  Google Scholar 

  • Yetisgen-Yildiz M, Pratt W (2005) The effect of feature representation on Medline document classification. In AMIA Annual Symposium Proceedings. American Medical Informatics Association, vol 23, p 849

  • Yok NG, Rosen GL (2011) Combining gene prediction methods to improve meta genomic gene annotation. BMC Bioinform 12:20

    Article  Google Scholar 

  • Yu GX, Snyder EE, Boyle SM, Crasta OR, Czar M, Mane SP et al (2007) A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case. Nucleic Acids Res 35:3953–3962

    Article  Google Scholar 

  • Zhang J, Wheeler DA, Yakub I, Wei S, Sood R, Rowe W, Liu PP, Gibbs RA, Buetow KH (2005) SNP detector: a software tool for sensitive and accurate SNP detection. PLoS Comput Biol 1(5):e53

  • Zhu HQ, Hu GQ, Ouyang ZQ, Wang J, She ZS (2004) Accuracy improvement for identifying translation initiation sites in microbial genomes. Bioinformatics 20:3308–3317

    Article  Google Scholar 

  • Zhu HQ, Hu GQ, Yang YF, Wang J, She ZS (2007) MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes”. BMC Bioinform 8:97

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarwar Kamal.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kamal, S., Khan, M.I. An integrated algorithm for local sequence alignment. Netw Model Anal Health Inform Bioinforma 3, 68 (2014). https://doi.org/10.1007/s13721-014-0068-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-014-0068-8

Keywords

  • Dynamic programming
  • Chapman–Kolmogorov equations
  • Data reducing process