Abstract
Over the last decade, numerous studies have demonstrated the fundamental importance of tandem repeat (TR) proteins in many biological processes. A plethora of new repeat structures have also been solved. The recently published RepeatsDB provides information on TR proteins. However, a detailed structural characterization of repetitive elements is largely missing, as repeat unit annotation is manually curated and currently covers only 3 % of the bona fide TR proteins. Repeat Protein Unit Predictor (ReUPred) is a novel method for the fast automatic prediction of repeat units and repeat classification using an extensive Structure Repeat Unit Library (SRUL) derived from RepeatsDB. ReUPred uses an iterative structural search against the SRUL to find repetitive units. On a test set of solenoid proteins, ReUPred is able to correctly detect 92 % of the proteins. Unlike previous methods, it is also able to correctly classify solenoid repeats in 89 % of cases. It also outperforms two recent state-of-the-art methods for the repeat unit identification problem. The accurate prediction of repeat units increases the number of annotated repeat units by an order of magnitude compared to the sequence-based Pfam classification. ReUPred is implemented in Python for Linux and freely available from the URL: http://protein.bio.unipd.it/reupred/.
Similar content being viewed by others
References
Abraham A-L, Rocha EPC, Pothier J (2008) Swelfe: a detector of internal repeats in sequences and structures. Bioinformatics 24:1536–1537. doi:10.1093/bioinformatics/btn234
Andrade MA, Ponting CP, Gibson TJ, Bork P (2000) Homology-based method for identification of protein repeats using statistical significance estimates. J Mol Biol 298:521–537
Andrade MA, Petosa C, O’Donoghue SI et al (2001) Comparison of ARM and HEAT protein repeats. J Mol Biol 309:1–18. doi:10.1006/jmbi.2001.4624
Biegert A, Soding J (2008) De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics 24:807–814
Bazan JF, Kajava AV (2015) Designs on a curve. Nat Struct Mol Biol 22:103–105. doi:10.1038/nsmb.2966
Binz HK, Amstutz P, Kohl A et al (2004) High-affinity binders selected from designed ankyrin repeat protein libraries. Nat Biotechnol 22:575–582. doi:10.1038/nbt962
Björklund ÅK, Ekman D, Elofsson A (2006) Expansion of protein domain repeats. PLoS Comput Biol 2:0959–0970. doi:10.1371/journal.pcbi.0020114
Brunette TJ, Parmeggiani F, Huang P-S et al (2015) Exploring the repeat protein universe through computational protein design. Nature 528:580–584. doi:10.1038/nature16162
de Wit J, Hong W, Luo L, Ghosh A (2011) Role of leucine-rich repeat proteins in the development and function of neural circuits. Annu Rev Cell Dev Biol 27:697–729. doi:10.1146/annurev-cellbio-092910-154111
Di Domenico T, Potenza E, Walsh I et al (2014) RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res 42:D352–D357. doi:10.1093/nar/gkt1175
Do Viet P, Roche DB, Kajava AV (2015) TAPO: a combined method for the identification of tandem repeats in protein structures. FEBS Lett 589:2611–2619. doi:10.1016/j.febslet.2015.08.025
Finn RD, Bateman A, Clements J et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230. doi:10.1093/nar/gkt1223
Fournier D, Palidwor GA, Shcherbinin S et al (2013) Functional and genomic analyses of alpha-solenoid proteins. PLoS One 8:e79894. doi:10.1371/journal.pone.0079894
Fu L, Niu B, Zhu Z et al (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinforma Oxf Engl 28:3150–3152. doi:10.1093/bioinformatics/bts565
Grove TZ, Cortajarena AL, Regan L (2008) Ligand binding by repeat proteins: natural and designed. Curr Opin Struct Biol 18:507–515. doi:10.1016/j.sbi.2008.05.008
Gruber M, Söding J, Lupas AN (2005) REPPER—repeats and their periodicities in fibrous proteins. Nucleic Acids Res 33:W239–W243
Heger A, Holm L (2000) Rapid automatic detection and alignment of repeats in protein sequences. Proteins 41:224–237. doi:10.1002/1097-0134(20001101)41:2<224:aid-prot70>3.0.co;2-z
Höcker B (2014) Design of proteins from smaller fragments—learning from evolution. Curr Opin Struct Biol 27:56–62. doi:10.1016/j.sbi.2014.04.007
Hrabe T, Godzik A (2014) ConSole: using modularity of Contact maps to locate Solenoid domains in protein structures. BMC Bioinformatics 15:119. doi:10.1186/1471-2105-15-119
Jorda J, Kajava AV (2009) T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics 25:2632–2638
Kajava AV (2001) Review: proteins with repeated sequence–structural prediction and modeling. J Struct Biol 134:132–144. doi:10.1006/jsbi.2000.4328
Kajava AV (2012) Tandem repeats in proteins: from sequence to structure. J Struct Biol 179:279–288. doi:10.1016/j.jsb.2011.08.009
Kim M, Abdi K, Lee G et al (2010) Fast and forceful refolding of stretched α-helical solenoid proteins. Biophys J 98:3086–3092. doi:10.1016/j.bpj.2010.02.054
Kobe B, Kajava AV (2000) When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem Sci 25:509–515
Marcotte EM, Pellegrini M, Ng H-L et al (1999a) Detecting Protein Function and Protein-Protein Interactions from Genome Sequences. Science 285:751–753. doi:10.1126/science.285.5428.751
Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D (1999b) A census of protein repeats. J Mol Biol 293:151–160. doi:10.1006/jmbi.1999.3136
Mistry J, Coggill P, Eberhardt RY et al (2013) The challenge of increasing Pfam coverage of the human proteome. Database 2013. doi:10.1093/database/bat023
Mitchell A, Chang H-Y, Daugherty L et al (2015) The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res 43:D213–D221. doi:10.1093/nar/gku1243
Newman AM, Cooper JB (2007) XSTREAM: a practical algorithm for identification and architecture modeling of tandem repeats in protein sequences. BMC Bioinforma 8:382
Paladin L, Tosatto SCE (2015) Comparison of protein repeat classifications based on structure and sequence families. Biochem Soc Trans 43:832–837. doi:10.1042/BST20150079
Park K, Shen BW, Parmeggiani F et al (2015) Control of repeat-protein curvature by computational protein design. Nat Struct Mol Biol 22:167–174
Parmeggiani F, Pellarin R, Larsen AP et al (2008) Designed armadillo repeat proteins as general peptide-binding scaffolds: consensus design and computational optimization of the hydrophobic core. J Mol Biol 376:1282–1304. doi:10.1016/j.jmb.2007.12.014
Pellegrini M (2015) Tandem repeats in proteins: prediction algorithms and biological role. Front Bioeng Biotechnol. doi:10.3389/fbioe.2015.00143
Pellegrini M, Renda ME, Vecchio A (2012) Ab initio detection of fuzzy amino acid tandem repeats in protein sequences. BMC Bioinformatics 13:1–13. doi:10.1186/1471-2105-13-S3-S8
Sabarinathan R, Basu R, Sekar K (2010) ProSTRIP: a method to find similar structural repeats in three-dimensional protein structures. Comput Biol Chem 34:126–130. doi:10.1016/j.compbiolchem.2010.03.006
Schaper E, Korsunsky A, Messina A et al (2015) TRAL: Tandem repeat annotation library. Bioinformatics btv306. doi:10.1093/bioinformatics/btv306
Söding J, Remmert M, Biegert A (2006) HHrep: de novo protein repeat detection and the origin of TIM barrels. Nucleic Acids Res 34:W137–W142. doi:10.1093/nar/gkl130
Szklarczyk R, Heringa J (2004) Tracking repeats using significance and transitivity. Bioinformatics 20:i311–i317
Varadamsetty G, Tremmel D, Hansen S et al (2012) Designed Armadillo repeat proteins: library generation, characterization and selection of peptide binders with high specificity. J Mol Biol 424:68–87. doi:10.1016/j.jmb.2012.08.029
Walsh I, Sirocco FG, Minervini G et al (2012) RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures. Bioinformatics 28:3257–3264. doi:10.1093/bioinformatics/bts550
Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33:2302
Acknowledgments
The authors are grateful to members of the BioComputing UP lab for insightful discussions. D.P. is funded by the FIRC project no. 16621.This project was partially supported by AIRC grant IG17753 and Elixir-Ita.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Research involving human participants and/or animals
No.
Additional information
Layla Hirsh and Damiano Piovesan Contributed equally.
Rights and permissions
About this article
Cite this article
Hirsh, L., Piovesan, D., Paladin, L. et al. Identification of repetitive units in protein structures with ReUPred. Amino Acids 48, 1391–1400 (2016). https://doi.org/10.1007/s00726-016-2187-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-016-2187-2