Abstract
A massive amount of sequence data is gradually produced by the genome projects that have to be annotated in terms of structure, molecular, and biological functions. In structural genomics, the aim is to resolve several protein structures in an efficient way and to exploit the solved protein structures for assigning the biological function to theoretically solved protein structures. In earlier stages, the protein structures are classified manually in a successful manner and now it suffers from updating problem because of the high throughput of recently solved protein structures. To overcome this issue, several data mining techniques have been examined for the structural classification of the protein world. This review article presents an overview of the existing classification techniques, databases, tools, and performance metrics used for evaluating the performance of protein structure classification algorithms.
Keywords
- Protein structure
- Classification techniques
- Tools
- Databases
- Computational biology
- Challenges
This is a preview of subscription content, access via your institution.
Buying options

References
Richardson J (1981) The anatomy and taxonomy of protein structure. Adv Protein Chem 34:167
Branden C, Tooze J (1991) Introduction to protein structures. Garland Publishing, New York
Kolodny R et al (2013) On the universe of protein folds. Annu Rev Biophys 42:559–582
Ouzounis CA et al (2003) Classification schemes for protein structure and function. Nat Rev Genet 4(7):508–519
Hadley C, Jones DT (1999) A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure 7(9):1099–1112
Pastore A, Lesk AM (1990) Comparison of the structures of globins and phycocyanins: evidence for evolutionary relationship. Proteins 8(2):133–155
Ravantti J et al (2013) Automatic comparison and classification of protein structures. J Struct Biol 183(1):47–56
Palmenberg et al (2009) Sequencing and analyses of all known human rhinovirus genomes reveal structure and evolution. Science 324:55–59
Le Q et al (2009) Structural alphabets for protein structure classification: a comparison study. J Mol Biol 387(2):431–450
Murzin AG et al (1995) Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540
Govindarajan S et al (1999) Estimating the total number of protein folds. Proteins: Struct Funct Bioinform 35:408–414
Andreeva et al (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36:D419–D425
Burley S et al (1999) Structural genomics: beyond the human genome project. Nat Genet 23:151–157
Hieter P, Boguski M (1997) Functional genomics: it’s all how you read it. Science 278:601–602
Jain P et al (2009) Supervised machine learning algorithms for protein structure classification. Comput Biol Chem 33(3):216–223
Røgen P, Fain B (2003) Automatic classification of protein structure by using Gauss integrals. Proc Natl Acad Sci U S A. 100(1):119–124
Levy ED et al (2006) 3D complex: a structural classification of protein complexes. PLoS Comput Biol 2(11):e155
Daras P et al (2006) Three-dimensional shape-structure comparison method for protein classification. IEEE/ACM Trans Comput Biol Bioinform 3(3):193–207
Cui X, Gao X (2017) K-nearest uphill clustering in the protein structure space. Neurocomputing 220:52–59
Leon F et al (2009) Performance analysis of algorithms for protein structure classification. In: 2009 IEEE 20th international workshop on database and expert systems application. https://doi.org/10.1109/dexa.2009.17. ISBN: 978-0-7695-3763-4
Jain P, Hirst JD (2010) Automatic structure classification of small proteins using random forest. BMC Bioinform 11:364
Dietmann S, Holm L (2001) Identification of homology in protein structure classification. Nat Struct Biol 8(11):953–957
Najibi SM et al (2017) Protein structure classification and loop modeling using multiple Ramachandran distributions. Comput Struct Biotechnol J 8(15):243–254
Swindells MB et al (1998) Contemporary approaches to protein structure classification. BioEssays 20(11):884–891
Sali A, Blundell TL (1990) Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J Mol Biol 212:403–428. https://doi.org/10.1016/0022-2836(90)90134-8
Holm L, Sander C (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233:123–138. https://doi.org/10.1006/jmbi.1993.1489
Taylor WR, Orengo CA (1989) Protein structure alignment. J Mol Biol 208:1–22
Pedruzzi I et al (2013) HAMAP in 2013, new developments in the protein family classification and annotation system. Nucleic Acids Res 41:D584–D589
Haft DH, Selengut JD, White O (2003) The TIGRFAMs database of protein families. Nucleic Acids Res 31:371–373
Mi H, Muruganujan A, Thomas PD (2013) PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41:D377–D386
Akiva E et al (2013) The structure–function linkage database. Nucleic Acids Res 42:D521–D530
Finn RD et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230
Letunic I, Doerks T, Bork P (2015) SMART: recent updates, new developments and status in 2015. Nucleic Acids Res 43:D257–D260
Hunter S et al (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40:D306–D312
Attwood TK et al (2012) The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012. Database 2012:bas019
Sillitoe I et al (2015) CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 43:D376–D381
Marchler-Bauer A et al (2013) CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res 41:D348–D352
Cheng H et al (2014) ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10:e1003926
Andreeva A et al (2007) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36:D419–D425
Bernstein FC et al (1977) The protein data bank. Eur J Biochem 80:319–324
Consortium, U (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637
Andreeva A et al (2014) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42:310–314
Acknowledgements
The authors like to thank the Department of Science and Technology (DST), New Delhi (DST/INSPIRE Fellowship/2015/IF150093) for the financial support under INSPIRE Fellowship for this research work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sajithra, N., Ramyachitra, D., Manikandan, P. (2019). A Review on Protein Structure Classification. In: Pandian, D., Fernando, X., Baig, Z., Shi, F. (eds) Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB). ISMAC 2018. Lecture Notes in Computational Vision and Biomechanics, vol 30. Springer, Cham. https://doi.org/10.1007/978-3-030-00665-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-00665-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00664-8
Online ISBN: 978-3-030-00665-5
eBook Packages: EngineeringEngineering (R0)