Abstract
Understanding the mechanism of the protein stability change is one of the most challenging tasks. Recently, the prediction of protein stability change affected by single point mutations has become an interesting topic in molecular biology. However, it is desirable to further acquire knowledge from large databases to provide new insights into the nature of them. This paper presents an interpretable prediction tree method (named iPTREE-2) that can accurately predict changes of protein stability upon mutations from sequence based information and analyze sequence characteristics from the viewpoint of composition and order. Therefore, iPTREE-2 based on a regression tree algorithm exhibits the ability of finding important factors and developing rules for the purpose of data mining. On a dataset of 1859 different single point mutations from thermodynamic database, ProTherm, iPTREE-2 yields a correlation coefficient of 0.70 between predicted and experimental values. In the task of data mining, detailed analysis of sequences reveals the possibility of the compositional specificity of residues in different ranges of stability change and implies the existence of certain patterns. As building rules, we found that the mutation residues in wild type and in mutant protein play an important role. The present study demonstrates that iPTREE-2 can serve the purpose of predicting protein stability change, especially when one requires more understandable knowledge.
Similar content being viewed by others
References
Daggett V, Fersht AR (2003) Trends Biochem Sci 28:18–25
Saven JG (2002) Curr Opin Struct Biol 12:453–458
Mendes J, Guerois R, Serrano L (2002) Curr Opin Struct Biol 12:441–446
Bolon DN, Marcus JS, Ross SA, Mayo SL (2003) J Mol Biol 329:611–622
Looger LL, Dwyer MA, Smith JJ, Hellinga HW (2003) Nature 423:185–190
Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A (1999) Protein Eng 12:549–555
Guerois R, Nielsen JE, Serrano L (2002) J Mol Biol 320:369–387
Prevost M, Wodak SJ, Tidor B, Karplus M (1991) Proc Natl Acad Sci USA 88:10880–10884
Gilis D, Rooman M (1997) J Mol Biol 272:276–290
Parthiban V, Gromiha MM, Schomburg D (2006) Nucleic Acids Res 34:W239–W242
Funahashi J, Takano K, Yutani K (2001) Protein Eng 14:127–134
Capriotti E, Fariselli P, Casadio R (2004) Bioinformatics 20 Suppl 1:I63–I68
Capriotti E, Fariselli P, Casadio R (2005) Nucleic Acids Res 33:W306–W310
Cheng J, Randall A, Baldi P (2006) Proteins 62:1125–1132
Xiong W, Wang JTL, Shasha D, Shapiro BA, Rigoutsos I, Kaizhong Z (2002) Knowledge and Data Engineering, IEEE Transactions on 14:731–749
Creighton C, Hanash S (2003) Bioinformatics 19:79–86
Oyama T, Kitano K, Satou K, Ito T (2002) Bioinformatics 18:705–714
Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach. MIT Press, Cambridge, Mass
Larose DT (2005) Discovering knowledge in data: An introduction to data mining. Wiley-Interscience, Hoboken, New York
Huang LT, Gromiha MM, Hwang SF, Ho SY (2006) Computational Biology and Chemistry 30:408–415
Bordner AJ, Abagyan RA (2004) Proteins 57:400–413
Casadio R, Compiani M, Fariselli P, Vivarelli F (1995) Proc Int Conf Intell Syst Mol Biol 3:81–88
Frenz CM (2005) Proteins 59:147–151
Lacroix E, Viguera AR, Serrano L (1998) J Mol Biol 284:173–191
Munoz V, Serrano L (1997) Biopolymers 41:495–509
Huang LT, Saraboji K, Ho SY, Hwang SF, Ponnuswamy MN, Gromiha MM (2007) Biophysical Chemistry 125:462–470
Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A (2004) Nucleic Acids Res 32:D120–D121
Gromiha MM, An J, Kono H, Oobatake M, Uedaira H, Sarai A (1999) Nucleic Acids Res 27:286–288
Breiman L (1984) Classification and regression trees. Wadsworth International Group, Belmont, CA
Bai JP, Utis A, Crippen G, He HD, Fischer V, et al (2004) J Chem Inf Comput Sci 44:2061–2069
Deconinck E, Zhang MH, Coomans D, Vander Heyden Y (2006) J Chem Inf Model 46:1410–1419
Witten IH, Frank E (2005) Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco
Zscherp C, Aygun H, Engels JW, Mantele W (2003) Biochim Biophys Acta 1651:139–145
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, LT., Gromiha, M.M. & Ho, SY. Sequence analysis and rule development of predicting protein stability change upon mutation using decision tree model. J Mol Model 13, 879–890 (2007). https://doi.org/10.1007/s00894-007-0197-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00894-007-0197-4