An integrated algorithm for local sequence alignment

Kamal, Sarwar; Khan, Mohammad Ibrahim

doi:10.1007/s13721-014-0068-8

An integrated algorithm for local sequence alignment

Original Article
Published: 19 August 2014

Volume 3, article number 68, (2014)
Cite this article

Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Sarwar Kamal¹ &
Mohammad Ibrahim Khan¹

224 Accesses
4 Citations
Explore all metrics

Abstract

Local sequence alignment (LSA) is an essential part of DNA sequencing. LSA helps to identify the facts in biological identity, criminal investigations, disease identification, drug design and research. Large volume of biological data makes difficulties to the performance of efficient analysis and proper management of data in small space has become a serious issue. We have subdivided the data sets into various segments to reduce the data sets as well as for efficient memory use. The integration of dynamic programming (DP) and Chapman–Kolmogorov equations (CKE) makes the analysis faster. The subdivision process is named data reducing process (DRP). DRP is imposed before DP and CKE. This approach needs less space compared with other methods and the time requirement is also improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Akl S (1985) Parallel sorting algorithms. Academic Press, USA
MATH Google Scholar
Alexandersson M, Cawley S, Pachter L (2003) SLAM: cross species gene finding and alignment with a generalized pair hidden Markov model. Genome Res 13:496–502
Article Google Scholar
Altschul SF, Gish Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Article Google Scholar
Arratia R, Morris P, Waterman MS (1988) Stochastic scrabbles: a law of large numbers for sequence matching with scores. J Appl Probab 25:106–119
Article MATH MathSciNet Google Scholar
Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES (2000a) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res 10:950–958
Article Google Scholar
Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES (2000b) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res 10:950–958
Article Google Scholar
Bray N, Dubchak I, Pachter L (2003) Avid: a global alignment program. Genome Res 13:97–102
Article Google Scholar
Claverie JM, Poirot O, Lopez F (1997) The difficulty of identifying genes in anonymous vertebrate sequences. Comput Chem 21:203–214
Article Google Scholar
Delcher AL et al (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30(11):2478–2483
Article Google Scholar
Delcher AL et al (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673–679
Article Google Scholar
Dembo A, Karlin S (1990) Strong limit theorems of empirical functional for large exceedances of partial sums of IID variables. Ann Probab 19(4):1737–1755
Article MathSciNet Google Scholar
Dhar PK, Thwin ST, Tun K, Tsumoto Y, Maurer-Stroh S, Eisenhaber F, Surana U (2009) Synthesizing non-natural parts from natural genomic template. J Biol Eng 3:2
Article Google Scholar
Doolittle RF (1996) Methods in enzymology, vol 266. Academic Press, San Diego
Google Scholar
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Article Google Scholar
Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR (1995) Whole-genome random sequencing and assembly of Haemophilus influenza Rd. Science 269:496–512
Article Google Scholar
Furey T, Kent WJ, Sugnet C, Roskin K, Pringle T, Zahler A, Haussler D (2002) The human genome browser at UCSC. Genome Res 12:996–1006
Article Google Scholar
Karlin S, Altschul SF (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA 87:2264–2268
Article MATH Google Scholar
Karlin S, Altschul SF (1993) Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci USA 90:5873–5877
Article Google Scholar
Keller O, Kollmar M, Stanke M, Waack S (2011) A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27:757–763
Article Google Scholar
Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664
Article MathSciNet Google Scholar
Khan MI, Kamal MS (2013a) RSAM: an integrated algorithm for local sequence alignment. Arch Des Sci 66(5):395–412 (ISSN 1661-464X)
Google Scholar
Khan MI, Kamal MS (2013b) Sequencing ontology alignment for DNA annotation and damage identification. Eur J Sci Res 103(3):441–450
Google Scholar
Lewis D (1992) An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, vol 15, pp 37–50
Lewis D, Schapire R, Callan J, Papka R (1996) Training algorithms for linear text classifiers. In: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 298–306
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227:1435–1441
Article Google Scholar
Lipman DJ, Pearson WR (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448
Article Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Article Google Scholar
Ning Z, Cox AJ, Mullikin JC (2001) SSAHA: “A fast search method for large DNA databases”. Genome Res 11:1725–1729
Article Google Scholar
Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A (2010) GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 7:455–457
Article Google Scholar
Robert WF (2002) Molecular biology, 2nd edn. McGraw-Hill, New York, pp 7105–7107 (ISBN: 0-07-112287-7)
Google Scholar
Ruiz M, Srinivasan P (1999) Hierarchical neural networks for text categorization (poster abstract). In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 281–282
Schatz MC et al (2007) High-throughput sequence alignment using Graphics Processing Units. BMC Bioinform 8:474
Article MathSciNet Google Scholar
Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD (2003) Asymmetry in the assembly of the RNAi enzyme complex. Cell 115:199
Article Google Scholar
Smith TF, Waterman MS (1981) Comparison of bio-sequences. Adv Appl Math 2:482–489
Article MATH MathSciNet Google Scholar
Stephens M, Sloan JS, Robertson PD, Scheet P, Nickerson DA (2006) Automating sequence-based detection and genotyping of SNPs from diploid samples. Nat Genet 38:375–381
Article Google Scholar
Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982) A use of the ‘Perceptron’ algorithm to distinguish translation initiation site in E. coli. Nucleic Acids Res. 10:2997–3011
Article Google Scholar
Tech M, Meinicke P (2006) An unsupervised classification scheme for improving predictions of prokaryotic TIS. BMC Bioinformatics 7:121
Article Google Scholar
van Baren MJ, Brent MR (2006) Iterative gene prediction and pseudo gene removal improves genome annotation. Genome Res 16:678–685
Article Google Scholar
Waqaar H, Alex A, Bharath R (2008) An efficient algorithm for local sequence alignment. In: 30th Annual international IEEE EMBS conference vancouver, British Columbia, Canada, August 20–24, 2008
Watanabe T, Takeda A, Mise K, Okuno T, Suzuki T, Minami N, Imai H (2005) Stage-specific expression of microRNAs during Xenopus development. FEBS Lett 579:318
Article Google Scholar
Waterman MS (1989) Mathematical methods for DNA sequences. CRC Press, Boca Raton
MATH Google Scholar
Waterman MS (1994) Introduction to computational biology. Chapman & Hall, London
Google Scholar
Weckx S, Del-Favero J, Rademakers R, Claes L, Cruts M, De Jonghe P, Van Broeckhoven C, De Rijk P (2005) novoSNP, a novel computational tool for sequence variation discovery. Genome Res 15:436–442
Article Google Scholar
Wu WS et al (2006) Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle. BMC Bioinform 7:421
Article Google Scholar
Yetisgen-Yildiz M, Pratt W (2005) The effect of feature representation on Medline document classification. In AMIA Annual Symposium Proceedings. American Medical Informatics Association, vol 23, p 849
Yok NG, Rosen GL (2011) Combining gene prediction methods to improve meta genomic gene annotation. BMC Bioinform 12:20
Article Google Scholar
Yu GX, Snyder EE, Boyle SM, Crasta OR, Czar M, Mane SP et al (2007) A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case. Nucleic Acids Res 35:3953–3962
Article Google Scholar
Zhang J, Wheeler DA, Yakub I, Wei S, Sood R, Rowe W, Liu PP, Gibbs RA, Buetow KH (2005) SNP detector: a software tool for sensitive and accurate SNP detection. PLoS Comput Biol 1(5):e53
Zhu HQ, Hu GQ, Ouyang ZQ, Wang J, She ZS (2004) Accuracy improvement for identifying translation initiation sites in microbial genomes. Bioinformatics 20:3308–3317
Article Google Scholar
Zhu HQ, Hu GQ, Yang YF, Wang J, She ZS (2007) MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes”. BMC Bioinform 8:97
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong, Bangladesh
Sarwar Kamal & Mohammad Ibrahim Khan

Authors

Sarwar Kamal
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Ibrahim Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarwar Kamal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kamal, S., Khan, M.I. An integrated algorithm for local sequence alignment. Netw Model Anal Health Inform Bioinforma 3, 68 (2014). https://doi.org/10.1007/s13721-014-0068-8

Download citation

Received: 27 April 2014
Revised: 22 July 2014
Accepted: 25 July 2014
Published: 19 August 2014
DOI: https://doi.org/10.1007/s13721-014-0068-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An integrated algorithm for local sequence alignment

Abstract

Access this article

Similar content being viewed by others

Global Common Sequence Alignment Using Dynamic Window Algorithm

Sequence Alignment, Analysis, and Bioinformatic Pipelines

Multiple Sequence Alignment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An integrated algorithm for local sequence alignment

Abstract

Access this article

Similar content being viewed by others

Global Common Sequence Alignment Using Dynamic Window Algorithm

Sequence Alignment, Analysis, and Bioinformatic Pipelines

Multiple Sequence Alignment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation