Abstract
Local sequence alignment (LSA) is an essential part of DNA sequencing. LSA helps to identify the facts in biological identity, criminal investigations, disease identification, drug design and research. Large volume of biological data makes difficulties to the performance of efficient analysis and proper management of data in small space has become a serious issue. We have subdivided the data sets into various segments to reduce the data sets as well as for efficient memory use. The integration of dynamic programming (DP) and Chapman–Kolmogorov equations (CKE) makes the analysis faster. The subdivision process is named data reducing process (DRP). DRP is imposed before DP and CKE. This approach needs less space compared with other methods and the time requirement is also improved.
Similar content being viewed by others
References
Akl S (1985) Parallel sorting algorithms. Academic Press, USA
Alexandersson M, Cawley S, Pachter L (2003) SLAM: cross species gene finding and alignment with a generalized pair hidden Markov model. Genome Res 13:496–502
Altschul SF, Gish Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Arratia R, Morris P, Waterman MS (1988) Stochastic scrabbles: a law of large numbers for sequence matching with scores. J Appl Probab 25:106–119
Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES (2000a) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res 10:950–958
Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES (2000b) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res 10:950–958
Bray N, Dubchak I, Pachter L (2003) Avid: a global alignment program. Genome Res 13:97–102
Claverie JM, Poirot O, Lopez F (1997) The difficulty of identifying genes in anonymous vertebrate sequences. Comput Chem 21:203–214
Delcher AL et al (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30(11):2478–2483
Delcher AL et al (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673–679
Dembo A, Karlin S (1990) Strong limit theorems of empirical functional for large exceedances of partial sums of IID variables. Ann Probab 19(4):1737–1755
Dhar PK, Thwin ST, Tun K, Tsumoto Y, Maurer-Stroh S, Eisenhaber F, Surana U (2009) Synthesizing non-natural parts from natural genomic template. J Biol Eng 3:2
Doolittle RF (1996) Methods in enzymology, vol 266. Academic Press, San Diego
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR (1995) Whole-genome random sequencing and assembly of Haemophilus influenza Rd. Science 269:496–512
Furey T, Kent WJ, Sugnet C, Roskin K, Pringle T, Zahler A, Haussler D (2002) The human genome browser at UCSC. Genome Res 12:996–1006
Karlin S, Altschul SF (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA 87:2264–2268
Karlin S, Altschul SF (1993) Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci USA 90:5873–5877
Keller O, Kollmar M, Stanke M, Waack S (2011) A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27:757–763
Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664
Khan MI, Kamal MS (2013a) RSAM: an integrated algorithm for local sequence alignment. Arch Des Sci 66(5):395–412 (ISSN 1661-464X)
Khan MI, Kamal MS (2013b) Sequencing ontology alignment for DNA annotation and damage identification. Eur J Sci Res 103(3):441–450
Lewis D (1992) An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, vol 15, pp 37–50
Lewis D, Schapire R, Callan J, Papka R (1996) Training algorithms for linear text classifiers. In: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 298–306
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227:1435–1441
Lipman DJ, Pearson WR (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Ning Z, Cox AJ, Mullikin JC (2001) SSAHA: “A fast search method for large DNA databases”. Genome Res 11:1725–1729
Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A (2010) GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 7:455–457
Robert WF (2002) Molecular biology, 2nd edn. McGraw-Hill, New York, pp 7105–7107 (ISBN: 0-07-112287-7)
Ruiz M, Srinivasan P (1999) Hierarchical neural networks for text categorization (poster abstract). In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 281–282
Schatz MC et al (2007) High-throughput sequence alignment using Graphics Processing Units. BMC Bioinform 8:474
Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD (2003) Asymmetry in the assembly of the RNAi enzyme complex. Cell 115:199
Smith TF, Waterman MS (1981) Comparison of bio-sequences. Adv Appl Math 2:482–489
Stephens M, Sloan JS, Robertson PD, Scheet P, Nickerson DA (2006) Automating sequence-based detection and genotyping of SNPs from diploid samples. Nat Genet 38:375–381
Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982) A use of the ‘Perceptron’ algorithm to distinguish translation initiation site in E. coli. Nucleic Acids Res. 10:2997–3011
Tech M, Meinicke P (2006) An unsupervised classification scheme for improving predictions of prokaryotic TIS. BMC Bioinformatics 7:121
van Baren MJ, Brent MR (2006) Iterative gene prediction and pseudo gene removal improves genome annotation. Genome Res 16:678–685
Waqaar H, Alex A, Bharath R (2008) An efficient algorithm for local sequence alignment. In: 30th Annual international IEEE EMBS conference vancouver, British Columbia, Canada, August 20–24, 2008
Watanabe T, Takeda A, Mise K, Okuno T, Suzuki T, Minami N, Imai H (2005) Stage-specific expression of microRNAs during Xenopus development. FEBS Lett 579:318
Waterman MS (1989) Mathematical methods for DNA sequences. CRC Press, Boca Raton
Waterman MS (1994) Introduction to computational biology. Chapman & Hall, London
Weckx S, Del-Favero J, Rademakers R, Claes L, Cruts M, De Jonghe P, Van Broeckhoven C, De Rijk P (2005) novoSNP, a novel computational tool for sequence variation discovery. Genome Res 15:436–442
Wu WS et al (2006) Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle. BMC Bioinform 7:421
Yetisgen-Yildiz M, Pratt W (2005) The effect of feature representation on Medline document classification. In AMIA Annual Symposium Proceedings. American Medical Informatics Association, vol 23, p 849
Yok NG, Rosen GL (2011) Combining gene prediction methods to improve meta genomic gene annotation. BMC Bioinform 12:20
Yu GX, Snyder EE, Boyle SM, Crasta OR, Czar M, Mane SP et al (2007) A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case. Nucleic Acids Res 35:3953–3962
Zhang J, Wheeler DA, Yakub I, Wei S, Sood R, Rowe W, Liu PP, Gibbs RA, Buetow KH (2005) SNP detector: a software tool for sensitive and accurate SNP detection. PLoS Comput Biol 1(5):e53
Zhu HQ, Hu GQ, Ouyang ZQ, Wang J, She ZS (2004) Accuracy improvement for identifying translation initiation sites in microbial genomes. Bioinformatics 20:3308–3317
Zhu HQ, Hu GQ, Yang YF, Wang J, She ZS (2007) MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes”. BMC Bioinform 8:97
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kamal, S., Khan, M.I. An integrated algorithm for local sequence alignment. Netw Model Anal Health Inform Bioinforma 3, 68 (2014). https://doi.org/10.1007/s13721-014-0068-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-014-0068-8