Abstract
Finding the longest common subsequence is a classic and well-studied problem in the field of computer science and considered as an NP-hard problem. There are many application of LCS in the field of bioinformatics, computational genomics, image processing, file comparison, etc. There are many algorithms are present to find the similarity between the given strings and its special cases. As there is a tremendous increase in the biological data and it requires an efficient mechanism to deal with them, many efforts have been taken to reduce the time and space complexity of the given problem. In this paper, we presented a novel algorithm for the general case of multiple LCS problems, i.e., finding a longest common subsequence in the given two strings. Our algorithm works on dominant point approach to compute the LCS of the given string. When applied to multiple strings of length each 1000, 2000, 3000, 4000, and 5000, characters, it is found that our algorithm works two or three magnitude faster than existing algorithm and it requires less space compared to existing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C. (2009). Introduction to algorithm (3rd ed.). The MIT Press.
Nekrutenko, A., & Li, W.-H. (2001). Transposable elements are found in alarge number of human protein-coding genes. Trends in Genetics, 17(11), 619–621.
Gregory, T. R. (2005). Animal genome size database. Retrieved from http://www.Genomesize.com.
Lodish, H. F. (2003). Molecular cell biology. WH Freeman.
Paterson, M., Dančík, V. (1994). Longest common subsequence’s. In Proceedings of the 19th International Symposium on Mathematical Foundations of Computer Science (pp. 127–142). Springer.
Fortnow, L. (2009). The status of the P versus NP problem. Communications of the ACM, 52(9), 78–86.
Elbashir, S. M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K., & Tuschl, T. (2001). Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured Mammalian cells. Nature, 411(6836), 494–498.
Blanchette, M., Kunisawa, T., & Sankoff, D. (1999). Gene order breakpoint evidence in animal mitochondrial phylogeny. Journal of Molecular Evolution, 49(2), 193–203.
Brocchieri, L., & Karlin, S. (2005). Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Research, 33(10), 3390–3400.
Zastrow, M. S., Flaherty, D. B., Benian, G. M., & Wilson, K. L. (2006). Nuclear titin interacts with A-and B-type lamins in vitro and in vivo. Journal of Cell Science, 119(2), 239–249.
Luce, G., & Myoupo, J. F. (1998). Systolic-based parallel architecture for the longest common subsequences problem. VLSI Journal Integration, 25, 53–70.
Sankoff, D., & Blanchette, M. (1999). Phylogenetic invariants for genome rearrangements. Journal of Compuational Biology, 6, 431–445.
Sheridan, R. P., & Venkataraghavan, R. (1992). A systematic search for protein signature sequences. Proteins, 14(1), 16–18.
Hirschberg, D. S. (1977). Algorithms for the longest common subsequence problem. Journal of the ACM, 24, 664–675.
Masek, W. J., & Paterson, M. S. (1980). A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20, 18–31.
Rick, C. (1994, October). New algorithms for the longest common subsequence problem (Technical Report No. 85123-CS). Computer Science Department, University of Bonn.
Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147, 195–197.
Hakata, K., & Imai, H. (1992). Algorithms for the longest common subsequence problem. In Proceedings of Genome Informatics Workshop III (pp. 53–56).
Hakata, K., & Imai, H. (1998). Algorithms for the longest common subsequence problem for multiple strings based on geometric maxima. Optimization Methods and Software, 10, 233–260.
Chen, Y., Wan, A., & Liu, W. (2006). A fast parallel algorithm for finding the longest common sequence of multiple biosequences. BMC Bioinformatics, 7, S4.
Korkin, D. (2001). A new dominant point-based parallel algorithm for multiple longest common subsequence problem (Technical Report TR01-148). University of New Brunswick.
Xu, X., Chen, L., Pan, Y., He, P. (2005). Fast parallel algorithms for the longest common subsequence problem using an optical bus. In Lecture Notes in Computer Science (pp. 338–348). Springer.
Bork, P., & Koonin, E. V. (1996). Protein sequence motifs. Current Opinion in Structural Biology, 6, 366–376.
Korkin, D., & Goldfarb, L. (2002). Multiple genome rearrangement: A general approach via the evolutionary genome graph. Bioinformatics, 18, S303–S311.
Korkin, D., Wang, Q., & Shang, Y. (2008). An efficient parallel algorithm for the multiple longest common subsequence (MLCS) problem. In Proceedings of the 37th International Conference on Parallel Processing (ICPP’08) (pp. 354–363).
Bergroth, L., Hakonen, H., & Raita, T. (2000). A survey of longest common subsequence algorithms. In Proceedings of International Symposium. String Processing Information Retrieval (SPIRE’00) (pp. 39–48).
Chin, F. Y., & Poon, C. K. (1990). A fast algorithm for computing longest common subsequences of small alphabet size. Journal of Information Processing, 13(4), 463–469.
Wang, Q., Korkin, D., & Shang, Y. (2011, March). A fast multiple longest common subsequence (MLCS) algorithm. IEEE Transactions on Knowledge and Data Engineering, 23(3).
Yang, J., Yun, X., Sun, G., & Shang, Y. (2013). A new progressive algorithm for a multiple longest common subsequences problem and its efficient parallelization. IEEE Transactions on Parallel and Distributed Systems, 24(5), 862–870.
Hirschberg, D. S. (1975, June). A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18, 341–343.
Irving, R. W., & Fraser, C. (1992). Two algorithms for the longest common subsequence of three (or more) strings. In Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching (pp. 214–229). London, UK: Springer.
Wang, Q., Korkin, D., & Shang, Y. (2009). Efficient dominant point algorithms for the multiple longest common subsequence (MLCS) problem. In Proceedings of the 21st International Joint Conference on Artifical Intelligence (pp. 1494–1499). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Chen, Y., Wan, A., & Liu, W. (2006). A fast parallel algorithm for finding the longest common sequence of multiple biosequence. BMC Bioinformatics, 7, 4.
Wang, Q., Korkin, D., & Shang, Y. (2011). A fast multiple longest common subsequence (MLCS) algorithm. IEEE Transactions on Knowledge and Data Engineering, 23(3), 321–334.
Jiang, T., & Li, M. (1994). On the approximation of shortest common supersequences and longest common subsequences. In Proceedings of the 21st International Colloquium on Automata, Languages and Programming (pp. 191–202). London, UK: Springer.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms (3rd ed.). Cambridge, MA, USA: MIT Press.
Jones, E., Oliphant, T., & Peterson, P. (2013). SciPy: open source scientific tools for python. Retrieved fromhttp://www.scipy.org/. Accessed April 19, 2013.
Maier, D. (1978). The complexity of some problems on subsequences and supersequences. Journal of the ACM, 25, 322–336.
Julstrom, B. A., & Hinkemeyer, B. (2006). Starting from scratch: Growing longest common subsequences with evolution. In Proceedings of the 9th International Conference on Parallel Problem Solving from Nature (pp. 930–938). Berlin, Heidelberg: Springer.
Bergroth, L., Hakonen, H., & Raita, T. (2000). A survey of longest common subsequence algorithms. In Proceedings Seventh International Symposium on String Processing and Information Retrieval SPIRE 2000 (pp. 39–48).
Attwood, T. K., & Findlay, J. B. C. (1994). Fingerprinting G protein coupled receptors. Protein Engineering, 7(2), 195–203.
Bourque, G., & Pevzner, P. A. (2002). Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Research, 12, 26–36.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Motghare, M.M., Voditel, P.S. (2019). A Dominant Point-Based Algorithm for Finding Multiple Longest Common Subsequences in Comparative Genomics. In: Shetty, N., Patnaik, L., Nagaraj, H., Hamsavath, P., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Advances in Intelligent Systems and Computing, vol 882. Springer, Singapore. https://doi.org/10.1007/978-981-13-5953-8_25
Download citation
DOI: https://doi.org/10.1007/978-981-13-5953-8_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5952-1
Online ISBN: 978-981-13-5953-8
eBook Packages: EngineeringEngineering (R0)