A Dominant Point-Based Algorithm for Finding Multiple Longest Common Subsequences in Comparative Genomics

Motghare, Manish M.; Voditel, Preeti S.

doi:10.1007/978-981-13-5953-8_25

Manish M. Motghare¹⁹ &
Preeti S. Voditel¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 882))

819 Accesses
1 Citations

Abstract

Finding the longest common subsequence is a classic and well-studied problem in the field of computer science and considered as an NP-hard problem. There are many application of LCS in the field of bioinformatics, computational genomics, image processing, file comparison, etc. There are many algorithms are present to find the similarity between the given strings and its special cases. As there is a tremendous increase in the biological data and it requires an efficient mechanism to deal with them, many efforts have been taken to reduce the time and space complexity of the given problem. In this paper, we presented a novel algorithm for the general case of multiple LCS problems, i.e., finding a longest common subsequence in the given two strings. Our algorithm works on dominant point approach to compute the LCS of the given string. When applied to multiple strings of length each 1000, 2000, 3000, 4000, and 5000, characters, it is found that our algorithm works two or three magnitude faster than existing algorithm and it requires less space compared to existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C. (2009). Introduction to algorithm (3rd ed.). The MIT Press.
Google Scholar
Nekrutenko, A., & Li, W.-H. (2001). Transposable elements are found in alarge number of human protein-coding genes. Trends in Genetics, 17(11), 619–621.
Article Google Scholar
Gregory, T. R. (2005). Animal genome size database. Retrieved from http://www.Genomesize.com.
Lodish, H. F. (2003). Molecular cell biology. WH Freeman.
Google Scholar
Paterson, M., Dančík, V. (1994). Longest common subsequence’s. In Proceedings of the 19th International Symposium on Mathematical Foundations of Computer Science (pp. 127–142). Springer.
Google Scholar
Fortnow, L. (2009). The status of the P versus NP problem. Communications of the ACM, 52(9), 78–86.
Article Google Scholar
Elbashir, S. M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K., & Tuschl, T. (2001). Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured Mammalian cells. Nature, 411(6836), 494–498.
Article Google Scholar
Blanchette, M., Kunisawa, T., & Sankoff, D. (1999). Gene order breakpoint evidence in animal mitochondrial phylogeny. Journal of Molecular Evolution, 49(2), 193–203.
Article Google Scholar
Brocchieri, L., & Karlin, S. (2005). Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Research, 33(10), 3390–3400.
Article Google Scholar
Zastrow, M. S., Flaherty, D. B., Benian, G. M., & Wilson, K. L. (2006). Nuclear titin interacts with A-and B-type lamins in vitro and in vivo. Journal of Cell Science, 119(2), 239–249.
Article Google Scholar
Luce, G., & Myoupo, J. F. (1998). Systolic-based parallel architecture for the longest common subsequences problem. VLSI Journal Integration, 25, 53–70.
Article Google Scholar
Sankoff, D., & Blanchette, M. (1999). Phylogenetic invariants for genome rearrangements. Journal of Compuational Biology, 6, 431–445.
Article Google Scholar
Sheridan, R. P., & Venkataraghavan, R. (1992). A systematic search for protein signature sequences. Proteins, 14(1), 16–18.
Article Google Scholar
Hirschberg, D. S. (1977). Algorithms for the longest common subsequence problem. Journal of the ACM, 24, 664–675.
Article MathSciNet Google Scholar
Masek, W. J., & Paterson, M. S. (1980). A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20, 18–31.
Article MathSciNet Google Scholar
Rick, C. (1994, October). New algorithms for the longest common subsequence problem (Technical Report No. 85123-CS). Computer Science Department, University of Bonn.
Google Scholar
Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147, 195–197.
Article Google Scholar
Hakata, K., & Imai, H. (1992). Algorithms for the longest common subsequence problem. In Proceedings of Genome Informatics Workshop III (pp. 53–56).
Google Scholar
Hakata, K., & Imai, H. (1998). Algorithms for the longest common subsequence problem for multiple strings based on geometric maxima. Optimization Methods and Software, 10, 233–260.
Article MathSciNet Google Scholar
Chen, Y., Wan, A., & Liu, W. (2006). A fast parallel algorithm for finding the longest common sequence of multiple biosequences. BMC Bioinformatics, 7, S4.
Article Google Scholar
Korkin, D. (2001). A new dominant point-based parallel algorithm for multiple longest common subsequence problem (Technical Report TR01-148). University of New Brunswick.
Google Scholar
Xu, X., Chen, L., Pan, Y., He, P. (2005). Fast parallel algorithms for the longest common subsequence problem using an optical bus. In Lecture Notes in Computer Science (pp. 338–348). Springer.
Google Scholar
Bork, P., & Koonin, E. V. (1996). Protein sequence motifs. Current Opinion in Structural Biology, 6, 366–376.
Article Google Scholar
Korkin, D., & Goldfarb, L. (2002). Multiple genome rearrangement: A general approach via the evolutionary genome graph. Bioinformatics, 18, S303–S311.
Article Google Scholar
Korkin, D., Wang, Q., & Shang, Y. (2008). An efficient parallel algorithm for the multiple longest common subsequence (MLCS) problem. In Proceedings of the 37th International Conference on Parallel Processing (ICPP’08) (pp. 354–363).
Google Scholar
Bergroth, L., Hakonen, H., & Raita, T. (2000). A survey of longest common subsequence algorithms. In Proceedings of International Symposium. String Processing Information Retrieval (SPIRE’00) (pp. 39–48).
Google Scholar
Chin, F. Y., & Poon, C. K. (1990). A fast algorithm for computing longest common subsequences of small alphabet size. Journal of Information Processing, 13(4), 463–469.
MATH Google Scholar
Wang, Q., Korkin, D., & Shang, Y. (2011, March). A fast multiple longest common subsequence (MLCS) algorithm. IEEE Transactions on Knowledge and Data Engineering, 23(3).
Google Scholar
Yang, J., Yun, X., Sun, G., & Shang, Y. (2013). A new progressive algorithm for a multiple longest common subsequences problem and its efficient parallelization. IEEE Transactions on Parallel and Distributed Systems, 24(5), 862–870.
Article Google Scholar
Hirschberg, D. S. (1975, June). A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18, 341–343.
Article MathSciNet Google Scholar
Irving, R. W., & Fraser, C. (1992). Two algorithms for the longest common subsequence of three (or more) strings. In Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching (pp. 214–229). London, UK: Springer.
Google Scholar
Wang, Q., Korkin, D., & Shang, Y. (2009). Efficient dominant point algorithms for the multiple longest common subsequence (MLCS) problem. In Proceedings of the 21st International Joint Conference on Artifical Intelligence (pp. 1494–1499). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Google Scholar
Chen, Y., Wan, A., & Liu, W. (2006). A fast parallel algorithm for finding the longest common sequence of multiple biosequence. BMC Bioinformatics, 7, 4.
Article Google Scholar
Wang, Q., Korkin, D., & Shang, Y. (2011). A fast multiple longest common subsequence (MLCS) algorithm. IEEE Transactions on Knowledge and Data Engineering, 23(3), 321–334.
Article Google Scholar
Jiang, T., & Li, M. (1994). On the approximation of shortest common supersequences and longest common subsequences. In Proceedings of the 21st International Colloquium on Automata, Languages and Programming (pp. 191–202). London, UK: Springer.
Google Scholar
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms (3rd ed.). Cambridge, MA, USA: MIT Press.
Google Scholar
Jones, E., Oliphant, T., & Peterson, P. (2013). SciPy: open source scientific tools for python. Retrieved fromhttp://www.scipy.org/. Accessed April 19, 2013.
Maier, D. (1978). The complexity of some problems on subsequences and supersequences. Journal of the ACM, 25, 322–336.
Article MathSciNet Google Scholar
Julstrom, B. A., & Hinkemeyer, B. (2006). Starting from scratch: Growing longest common subsequences with evolution. In Proceedings of the 9th International Conference on Parallel Problem Solving from Nature (pp. 930–938). Berlin, Heidelberg: Springer.
Google Scholar
Bergroth, L., Hakonen, H., & Raita, T. (2000). A survey of longest common subsequence algorithms. In Proceedings Seventh International Symposium on String Processing and Information Retrieval SPIRE 2000 (pp. 39–48).
Google Scholar
Attwood, T. K., & Findlay, J. B. C. (1994). Fingerprinting G protein coupled receptors. Protein Engineering, 7(2), 195–203.
Article Google Scholar
Bourque, G., & Pevzner, P. A. (2002). Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Research, 12, 26–36.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Application, Shri Ramdeobaba College of Engineering and Management, Nagpur, India
Manish M. Motghare & Preeti S. Voditel

Authors

Manish M. Motghare
View author publications
You can also search for this author in PubMed Google Scholar
Preeti S. Voditel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manish M. Motghare .

Editor information

Editors and Affiliations

Chancellor, Central University of Karnataka, Kalaburagi, Karnataka, India
N. R. Shetty
INSA Senior Scientist, National Institute of Advanced Studies,, Bangalore, Karnataka, India
L. M. Patnaik
Principal, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India
H. C. Nagaraj
Professor, Nitte Meenakshi Inst of Tech, Bangalore, Karnataka, India
Prasad Naik Hamsavath
Professor, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India
N. Nalini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Motghare, M.M., Voditel, P.S. (2019). A Dominant Point-Based Algorithm for Finding Multiple Longest Common Subsequences in Comparative Genomics. In: Shetty, N., Patnaik, L., Nagaraj, H., Hamsavath, P., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Advances in Intelligent Systems and Computing, vol 882. Springer, Singapore. https://doi.org/10.1007/978-981-13-5953-8_25

Download citation

DOI: https://doi.org/10.1007/978-981-13-5953-8_25
Published: 03 May 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5952-1
Online ISBN: 978-981-13-5953-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics