Finding identical sequence repeats in multiple protein sequences: An algorithm

Maurya, Vikas Kumar; Sanjeevi, Madhumathi; Rahul, Chandrasekar Narayanan; Mohan, Ajitha; Ramachandran, Dhanalakshmi; Siddalingappa, Rashmi; Rauniyar, Roshan; Kanagaraj, Sekar

doi:10.1007/s12038-023-00410-x

Finding identical sequence repeats in multiple protein sequences: An algorithm

Published: 28 February 2024

Volume 49, article number 41, (2024)
Cite this article

Journal of Biosciences Aims and scope Submit manuscript

Vikas Kumar Maurya¹,
Madhumathi Sanjeevi¹,
Chandrasekar Narayanan Rahul¹,
Ajitha Mohan¹,
Dhanalakshmi Ramachandran¹,
Rashmi Siddalingappa¹,
Roshan Rauniyar¹ &
…
Sekar Kanagaraj ORCID: orcid.org/0000-0002-9755-862X¹

174 Accesses
Explore all metrics

Abstract

In recent years, several experimental evidences suggest that amino acid repeats are closely linked to many disease conditions, as they have a significant role in evolution of disordered regions of the polypeptide segments. Even though many algorithms and databases were developed for such analysis, each algorithm has some caveats, like limitation on the number of amino acids within the repeat patterns and number of query protein sequences. To this end, in the present work, a new method called the internal sequence repeats across multiple protein sequences (ISRMPS) is proposed for the first time to identify identical repeats across multiple protein sequences. It also identifies distantly located repeat patterns in various protein sequences. Our method can be applied to study evolutionary relationships, epitope mapping, CRISPR-Cas sequencing methods, and other comparative analytical assessments of protein sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

BUSCO: Assessing Genome Assembly and Annotation Completeness

Introduction to Bioinformatics

References

Abraham A-L, Rocha EPC and Pothier J 2008 Swelfe: a detector of internal repeats in sequences and structures. Bioinformatics 24 1536–1537
Article CAS PubMed PubMed Central Google Scholar
Altschul SF, Gish W, Miller W, et al. 1990 Basic local alignment search tool. J. Mol. Biol. 215 403–410
Article CAS PubMed Google Scholar
Babu V, Uthayakumar M, Kirti Vaishnavi M, et al. 2011 RPS: Repeats in protein sequences. J. Appl. Crystallogr. 44 647–650
Article ADS CAS Google Scholar
Gruber M, Söding J and Lupas AN 2005 REPPER—repeats and their periodicities in fibrous proteins. Nucleic Acids Res. 33 W239–W243
Article CAS PubMed PubMed Central Google Scholar
Heger A and Holm L 2000 Rapid automatic detection and alignment of repeats in protein sequences. Proteins 41 224–237
Article CAS PubMed Google Scholar
Karp R and Rabin MO 1987 Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31 249–260
Article MathSciNet Google Scholar
Klein C and Westenberger A 2012 Genetics of Parkinson’s disease. Cold Spring Harb. Perspect. Med. 2 a008888
Article PubMed PubMed Central Google Scholar
Kohany O, Gentles AJ, Hankus L, et al. 2006 Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinform. 7 474
Article Google Scholar
Luo H and Nijveen H 2014 Understanding and identifying amino acid repeats. Brief. Bioinform. 15 582–591
Article CAS PubMed PubMed Central Google Scholar
Mansour A 2008 ClustalW©: Widespread Multiple sequences alignments program. J. Cell Mol. 7 81–82
Google Scholar
Marcotte EM, Pellegrini M, Yeates TO, et al. 1999 A census of protein repeats. J. Mol. Biol. 293 151–160
Article CAS PubMed Google Scholar
Meena LS 2015 An overview to understand the role of PE _ PGRS family proteins in Mycobacterium tuberculosis H 37 R v and their potential as new drug targets. Biotechnol. Appl. Biochem. 62 145–153
Article CAS PubMed Google Scholar
Michael D, Gurusaran M, Santhosh R, et al. 2019 RepEx: A web server to extract sequence repeats from protein and DNA sequences. Comput. Biol. Chem. 78 424–430
Article CAS PubMed Google Scholar
Nirjhar B, Chidambarathanu N, Daliah M, et al. 2008 An Algorithm to find all identical internal sequence repeats. Curr. Sci. 95 188–195
Google Scholar
Rajathei DM, Parthasarathy S and Selvaraj S 2019 Identification and analysis of long repeats of proteins at the domain level. Front. Bioeng. Biotechnol. 7 250
Article PubMed PubMed Central Google Scholar
Senthilkumar R, Sabarinathan R, Hameed BS, et al. 2010 FAIR: a server for internal sequence repeats. Bioinformation 4 271–275
Article PubMed PubMed Central Google Scholar
Szklarczyk R and Heringa J 2004 Tracking repeats using significance and transitivity. Bioinformatics 20 (Suppl 1) i311–i317
Article CAS PubMed Google Scholar
Tanabe K, Arisue N, Palacpac NM, et al. 2012 Geographic differentiation of polymorphism in the Plasmodium falciparum malaria vaccine candidate gene SERA5. Vaccine 30 1583–1593
Article CAS PubMed Google Scholar
Thompson JD, Higgins DG and Gibson TJ 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22 4673–4680
Article CAS PubMed PubMed Central Google Scholar
Ukkonen E 1995 On-line construction of suffix trees. Algorithmica 14 249–260
Article MathSciNet Google Scholar
Uthayakumar M, Benazir B, Patra S, et al. 2012 Homepeptide repeats: implications for protein structure, function and evolution. Genom. Proteom. Bioinform. 10 217–225
Article CAS Google Scholar
Vetting MW, Hegde SS, Fajardo JE, et al. 2006 Pentapeptide repeat proteins. Biochemistry 45 1–10
Article CAS PubMed Google Scholar
Worsfold P, Townshend A, Poole CF, et al. 2019 Encyclopedia of analytical science, 3rd edition (Elsevier)

Download references

Acknowledgements

KS thanks the ICMR for funding the project ‘Do protein sequence repeats play a role in biological process and disease conditions’ (ISRM/12(34)/2020). KS, RS and DR thank the Center for Development of Advanced Computing (CDAC) for funding the project ‘An Indian Initiative on setting up a high-fidelity structural data archival/retrieval system for Life Sciences-(PDBi)’. RS thanks the Department of Science and Technology-Science and Engineering Research Board (DST-SERB), New Delhi, India, for providing research grant and postdoctoral fellowship (PDF/2019/000254). AM thanks Dr D S Kothari Postdoctoral Fellowship (BL/18-19/0320), funded by the University Grants Commission (UGC). All the authors thank the Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, India, for providing the necessary support.

Author information

Authors and Affiliations

Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, 560 012, India
Vikas Kumar Maurya, Madhumathi Sanjeevi, Chandrasekar Narayanan Rahul, Ajitha Mohan, Dhanalakshmi Ramachandran, Rashmi Siddalingappa, Roshan Rauniyar & Sekar Kanagaraj

Authors

Vikas Kumar Maurya
View author publications
You can also search for this author in PubMed Google Scholar
Madhumathi Sanjeevi
View author publications
You can also search for this author in PubMed Google Scholar
Chandrasekar Narayanan Rahul
View author publications
You can also search for this author in PubMed Google Scholar
Ajitha Mohan
View author publications
You can also search for this author in PubMed Google Scholar
Dhanalakshmi Ramachandran
View author publications
You can also search for this author in PubMed Google Scholar
Rashmi Siddalingappa
View author publications
You can also search for this author in PubMed Google Scholar
Roshan Rauniyar
View author publications
You can also search for this author in PubMed Google Scholar
Sekar Kanagaraj
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Prof. SK conceptualized the study. VKM and RS devised the methodology. AM, MS, RS and DR assessed the algorithm, methods. CNR and AM contributed to case studies. AM and RR wrote the manuscript. MS and CNR reviewed the manuscript.

Corresponding author

Correspondence to Sekar Kanagaraj.

Ethics declarations

Conflict of interest

The authors acknowledge that there is no conflict of interest related to financial and research interest related to this manuscript.

Additional information

Corresponding editor: Deepesh Nagarajan

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 128 KB)

Supplementary file2 (PDF 1530 KB)

Supplementary file3 (PDF 147 KB)

Supplementary file4 (PDF 4490 KB)

Supplementary file5 (PDF 25100 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Maurya, V.K., Sanjeevi, M., Rahul, C.N. et al. Finding identical sequence repeats in multiple protein sequences: An algorithm. J Biosci 49, 41 (2024). https://doi.org/10.1007/s12038-023-00410-x

Download citation

Received: 31 October 2022
Accepted: 16 October 2023
Published: 28 February 2024
DOI: https://doi.org/10.1007/s12038-023-00410-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding identical sequence repeats in multiple protein sequences: An algorithm

Abstract

Access this article

Similar content being viewed by others

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

BUSCO: Assessing Genome Assembly and Annotation Completeness

Introduction to Bioinformatics

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary Information

Supplementary file1 (PDF 128 KB)

Supplementary file2 (PDF 1530 KB)

Supplementary file3 (PDF 147 KB)

Supplementary file4 (PDF 4490 KB)

Supplementary file5 (PDF 25100 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finding identical sequence repeats in multiple protein sequences: An algorithm

Abstract

Access this article

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation