Skip to main content

A Dominant Point-Based Algorithm for Finding Multiple Longest Common Subsequences in Comparative Genomics

  • Conference paper
  • First Online:
Emerging Research in Computing, Information, Communication and Applications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 882))

Abstract

Finding the longest common subsequence is a classic and well-studied problem in the field of computer science and considered as an NP-hard problem. There are many application of LCS in the field of bioinformatics, computational genomics, image processing, file comparison, etc. There are many algorithms are present to find the similarity between the given strings and its special cases. As there is a tremendous increase in the biological data and it requires an efficient mechanism to deal with them, many efforts have been taken to reduce the time and space complexity of the given problem. In this paper, we presented a novel algorithm for the general case of multiple LCS problems, i.e., finding a longest common subsequence in the given two strings. Our algorithm works on dominant point approach to compute the LCS of the given string. When applied to multiple strings of length each 1000, 2000, 3000, 4000, and 5000, characters, it is found that our algorithm works two or three magnitude faster than existing algorithm and it requires less space compared to existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C. (2009). Introduction to algorithm (3rd ed.). The MIT Press.

    Google Scholar 

  2. Nekrutenko, A., & Li, W.-H. (2001). Transposable elements are found in alarge number of human protein-coding genes. Trends in Genetics, 17(11), 619–621.

    Article  Google Scholar 

  3. Gregory, T. R. (2005). Animal genome size database. Retrieved from http://www.Genomesize.com.

  4. Lodish, H. F. (2003). Molecular cell biology. WH Freeman.

    Google Scholar 

  5. Paterson, M., Dančík, V. (1994). Longest common subsequence’s. In Proceedings of the 19th International Symposium on Mathematical Foundations of Computer Science (pp. 127–142). Springer.

    Google Scholar 

  6. Fortnow, L. (2009). The status of the P versus NP problem. Communications of the ACM, 52(9), 78–86.

    Article  Google Scholar 

  7. Elbashir, S. M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K., & Tuschl, T. (2001). Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured Mammalian cells. Nature, 411(6836), 494–498.

    Article  Google Scholar 

  8. Blanchette, M., Kunisawa, T., & Sankoff, D. (1999). Gene order breakpoint evidence in animal mitochondrial phylogeny. Journal of Molecular Evolution, 49(2), 193–203.

    Article  Google Scholar 

  9. Brocchieri, L., & Karlin, S. (2005). Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Research, 33(10), 3390–3400.

    Article  Google Scholar 

  10. Zastrow, M. S., Flaherty, D. B., Benian, G. M., & Wilson, K. L. (2006). Nuclear titin interacts with A-and B-type lamins in vitro and in vivo. Journal of Cell Science, 119(2), 239–249.

    Article  Google Scholar 

  11. Luce, G., & Myoupo, J. F. (1998). Systolic-based parallel architecture for the longest common subsequences problem. VLSI Journal Integration, 25, 53–70.

    Article  Google Scholar 

  12. Sankoff, D., & Blanchette, M. (1999). Phylogenetic invariants for genome rearrangements. Journal of Compuational Biology, 6, 431–445.

    Article  Google Scholar 

  13. Sheridan, R. P., & Venkataraghavan, R. (1992). A systematic search for protein signature sequences. Proteins, 14(1), 16–18.

    Article  Google Scholar 

  14. Hirschberg, D. S. (1977). Algorithms for the longest common subsequence problem. Journal of the ACM, 24, 664–675.

    Article  MathSciNet  Google Scholar 

  15. Masek, W. J., & Paterson, M. S. (1980). A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20, 18–31.

    Article  MathSciNet  Google Scholar 

  16. Rick, C. (1994, October). New algorithms for the longest common subsequence problem (Technical Report No. 85123-CS). Computer Science Department, University of Bonn.

    Google Scholar 

  17. Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147, 195–197.

    Article  Google Scholar 

  18. Hakata, K., & Imai, H. (1992). Algorithms for the longest common subsequence problem. In Proceedings of Genome Informatics Workshop III (pp. 53–56).

    Google Scholar 

  19. Hakata, K., & Imai, H. (1998). Algorithms for the longest common subsequence problem for multiple strings based on geometric maxima. Optimization Methods and Software, 10, 233–260.

    Article  MathSciNet  Google Scholar 

  20. Chen, Y., Wan, A., & Liu, W. (2006). A fast parallel algorithm for finding the longest common sequence of multiple biosequences. BMC Bioinformatics, 7, S4.

    Article  Google Scholar 

  21. Korkin, D. (2001). A new dominant point-based parallel algorithm for multiple longest common subsequence problem (Technical Report TR01-148). University of New Brunswick.

    Google Scholar 

  22. Xu, X., Chen, L., Pan, Y., He, P. (2005). Fast parallel algorithms for the longest common subsequence problem using an optical bus. In Lecture Notes in Computer Science (pp. 338–348). Springer.

    Google Scholar 

  23. Bork, P., & Koonin, E. V. (1996). Protein sequence motifs. Current Opinion in Structural Biology, 6, 366–376.

    Article  Google Scholar 

  24. Korkin, D., & Goldfarb, L. (2002). Multiple genome rearrangement: A general approach via the evolutionary genome graph. Bioinformatics, 18, S303–S311.

    Article  Google Scholar 

  25. Korkin, D., Wang, Q., & Shang, Y. (2008). An efficient parallel algorithm for the multiple longest common subsequence (MLCS) problem. In Proceedings of the 37th International Conference on Parallel Processing (ICPP’08) (pp. 354–363).

    Google Scholar 

  26. Bergroth, L., Hakonen, H., & Raita, T. (2000). A survey of longest common subsequence algorithms. In Proceedings of International Symposium. String Processing Information Retrieval (SPIRE’00) (pp. 39–48).

    Google Scholar 

  27. Chin, F. Y., & Poon, C. K. (1990). A fast algorithm for computing longest common subsequences of small alphabet size. Journal of Information Processing, 13(4), 463–469.

    MATH  Google Scholar 

  28. Wang, Q., Korkin, D., & Shang, Y. (2011, March). A fast multiple longest common subsequence (MLCS) algorithm. IEEE Transactions on Knowledge and Data Engineering, 23(3).

    Google Scholar 

  29. Yang, J., Yun, X., Sun, G., & Shang, Y. (2013). A new progressive algorithm for a multiple longest common subsequences problem and its efficient parallelization. IEEE Transactions on Parallel and Distributed Systems, 24(5), 862–870.

    Article  Google Scholar 

  30. Hirschberg, D. S. (1975, June). A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18, 341–343.

    Article  MathSciNet  Google Scholar 

  31. Irving, R. W., & Fraser, C. (1992). Two algorithms for the longest common subsequence of three (or more) strings. In Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching (pp. 214–229). London, UK: Springer.

    Google Scholar 

  32. Wang, Q., Korkin, D., & Shang, Y. (2009). Efficient dominant point algorithms for the multiple longest common subsequence (MLCS) problem. In Proceedings of the 21st International Joint Conference on Artifical Intelligence (pp. 1494–1499). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  33. Chen, Y., Wan, A., & Liu, W. (2006). A fast parallel algorithm for finding the longest common sequence of multiple biosequence. BMC Bioinformatics, 7, 4.

    Article  Google Scholar 

  34. Wang, Q., Korkin, D., & Shang, Y. (2011). A fast multiple longest common subsequence (MLCS) algorithm. IEEE Transactions on Knowledge and Data Engineering, 23(3), 321–334.

    Article  Google Scholar 

  35. Jiang, T., & Li, M. (1994). On the approximation of shortest common supersequences and longest common subsequences. In Proceedings of the 21st International Colloquium on Automata, Languages and Programming (pp. 191–202). London, UK: Springer.

    Google Scholar 

  36. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms (3rd ed.). Cambridge, MA, USA: MIT Press.

    Google Scholar 

  37. Jones, E., Oliphant, T., & Peterson, P. (2013). SciPy: open source scientific tools for python. Retrieved fromhttp://www.scipy.org/. Accessed April 19, 2013.

  38. Maier, D. (1978). The complexity of some problems on subsequences and supersequences. Journal of the ACM, 25, 322–336.

    Article  MathSciNet  Google Scholar 

  39. Julstrom, B. A., & Hinkemeyer, B. (2006). Starting from scratch: Growing longest common subsequences with evolution. In Proceedings of the 9th International Conference on Parallel Problem Solving from Nature (pp. 930–938). Berlin, Heidelberg: Springer.

    Google Scholar 

  40. Bergroth, L., Hakonen, H., & Raita, T. (2000). A survey of longest common subsequence algorithms. In Proceedings Seventh International Symposium on String Processing and Information Retrieval SPIRE 2000 (pp. 39–48).

    Google Scholar 

  41. Attwood, T. K., & Findlay, J. B. C. (1994). Fingerprinting G protein coupled receptors. Protein Engineering, 7(2), 195–203.

    Article  Google Scholar 

  42. Bourque, G., & Pevzner, P. A. (2002). Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Research, 12, 26–36.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manish M. Motghare .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Motghare, M.M., Voditel, P.S. (2019). A Dominant Point-Based Algorithm for Finding Multiple Longest Common Subsequences in Comparative Genomics. In: Shetty, N., Patnaik, L., Nagaraj, H., Hamsavath, P., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Advances in Intelligent Systems and Computing, vol 882. Springer, Singapore. https://doi.org/10.1007/978-981-13-5953-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-5953-8_25

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-5952-1

  • Online ISBN: 978-981-13-5953-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics