Abstract
A quantitative feature-vector representation/model of tertiary structural motifs of proteins is presented. Multiclass logistic regression and a probabilistic neural network were employed to apply this representation to large data sets in order to classify them into major families of distinct motif types (including those of functional importance) with high statistical confidence. Scatter plots of random samples of these motifs were obtained through two-dimensional transformation of the feature vector by metric MDS (multidimensional scaling). The plots showed distinct clusters and shapes for different families and demonstrated the relevance and importance of the proposed quantitative feature-vector representation for characterizing protein tertiary structural motifs. The relative importance of the features was analyzed. The scope of the present work to investigate Nature’s prioritization and optimization of functional motif structures is highlighted.
Similar content being viewed by others
References
Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer Series in Statistics. Springer, New York
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Nucl Acids Res 25:3389–3402. http://blast.ncbi.nlm.nih.gov/Blast.cgi
Joshi RR, Jyothi S (2003) Comput Biol Chem 27(3):241–252
Jyothi S, Mustafi SM, Chary KVR, Joshi RR (2005) J Mol Mod 11:481–488
Joshi RR, Sawant V (2006) J Mol Mod 12(6):943–952
Joshi RR, Sawant V (2007) J Mol Mod 13(1):275–282
Joshi RR (2009) Statistical mining of gene protein databanks. In: Fulekar M (ed) Bioinformatics applications in life and environmental sciences. Springer, New York
Tao T, Zhai CX, Lu X, Fang H (2004) Appl Bioinformatics 3(2–3):115–124
Chen BY, Fofanov VY, Kristensen DM, Kimmel M, Lichtarge O, Kavraki LE (2005) Proc Pacific Symp Biocomputing 10:334–345
Cassela G, George EI (1992) Amer Statist 46:167–174
Jun X, Nak-Kyeong K (2005) J Comput Biol 12(7):950–968
Joshi RR, Hira U, Suri D (2009) Protein Peptide Letts 16(11):1393–1398
Joshi RR, Sekharan S (2010) Protein Pept Lett 17(10):1198–1206
Helmer-Citterich M, Tramontano A (1994) J Mol Biol 235:1021–1031
Burkhard P, Taylor P, Walkinshaw MD (1998) J Mol Biol 277(2):449–466
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H (2000) Nucl Acids Res 28:235–242. http://www.rcsb.org
Hulo N, Bairoch A, Bulliard V, Cerrutti L, De E, Castro P (2006) Nucleic Acid Res 34:D227–D230. http://www.prosite.expasy.org
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) J Mol Biol 247:536–540. http://scop.mrc-lmb.cam.ac.uk/scop/
Orengo CA, Michie AD, Jones DT, Swindells MB, Thornton JM (1997) Structure 5:1093–1108. http://www.cathdb.info/
Bateman A, Coin L, Durbin R, Finn RD, Volker Hollich V, Jones SG, Khanna A, Marshall A, Moxon S, Erik L, Sonnhammer L, Studholme DJ, Yeats C, Eddy SR (2004) Nucl Acids 32:D138–D141. http://www.Sanger.ac.uk/Software/Pfam
Joshi RR, Krishnanand K (1996) J Comp Biol 3(1):143–162
Joshi RR (2001) Protein Pept Letts 8(4):257–264
Xu D, Li H, Gu T (2008) In: Chen F, Juttler B (ed) Advances in geometrical modeling and processing (Lect Notes Comp Sci vol 4975). Springer, Berlin, pp 556–562
Chi PH, Scott G, Shyu CR (2005) Int J Softw Eng Know 15(3):527–545
Chi PH, Shyu CR, Xu D (2006) BMC Bioinform 7:362. doi:10.1186/1471-2105-7-362
Joshi RR, Panigrahi P, Patil RN (2012) J Mol Mod 18(6):2741–2754. doi:10.1007/s00894-011-1223-0
Branden C, Tooze J (1999) Introduction to protein structure. Garland, New York
Sreenath S (2011) Project dissertation. M.Sc. Chemistry. Amrita Vishwa Vidyapeetham, Coimbatore
Voet D, Voet JG (2004) Biochemistry. Wiley, Hoboken
Dewasthaly SS, Bhonde GS, ShankarramanV BSM, Ayachit VM, Gore MM (2007) Protein Pept Lett 14(6):543–551
McConkey BJ, Sobolev V, Edelman M (2002) Quantification of protein surfaces, volumes and atom-atom contacts using a constrained Voronoi procedure. Bioinformatics 18:1365–1373 (program: vsurface.exe)
Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, New York
Specht DF (1990) Neural Networks 3:109–118
Beale R, Jackson TC (1990) Introduction to artificial neural networks. Adam Higler, Bristol
Anacona F, Colla AN, Rovetta S, Zunino R (1998) Neural Comput Appl 7:37–51
Härdle W (2002) Applied nonparametric regression. Cambridge University Press, Cambridge
Chou KC, Shen HB (2007) Anal Biochem 370:1–16
Chou KC, Shen HB (2008) Nat Protoc 3:153–162
Montgomery DC, Peck E (1992) Linear regression analysis, 2nd edn. Wiley, New York
Sherrod PH (2012) DTREG: predictive modeling software. User’s guide 2008–2012. http://www.dtreg.com)
Everitt BS, Dunn G (2001) Applied multivariate data analysis. Hodder Arnold, London
Acknowledgments
The authors would like to thank the anonymous reviewers for their distinct comments/suggestions that helped us to improve the quality and scope of the paper. Also, special thanks to the reviewer who suggested representing the structural variation in 3D structure space defined across the eigenspace. This will lead to important extensions of the present study.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Joshi, R.R., Sreenath, S. Quantitative characterization of protein tertiary motifs. J Mol Model 20, 2077 (2014). https://doi.org/10.1007/s00894-014-2077-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00894-014-2077-z