Skip to main content
Log in

Sequence analysis and rule development of predicting protein stability change upon mutation using decision tree model

  • Original Paper
  • Published:
Journal of Molecular Modeling Aims and scope Submit manuscript

Abstract

Understanding the mechanism of the protein stability change is one of the most challenging tasks. Recently, the prediction of protein stability change affected by single point mutations has become an interesting topic in molecular biology. However, it is desirable to further acquire knowledge from large databases to provide new insights into the nature of them. This paper presents an interpretable prediction tree method (named iPTREE-2) that can accurately predict changes of protein stability upon mutations from sequence based information and analyze sequence characteristics from the viewpoint of composition and order. Therefore, iPTREE-2 based on a regression tree algorithm exhibits the ability of finding important factors and developing rules for the purpose of data mining. On a dataset of 1859 different single point mutations from thermodynamic database, ProTherm, iPTREE-2 yields a correlation coefficient of 0.70 between predicted and experimental values. In the task of data mining, detailed analysis of sequences reveals the possibility of the compositional specificity of residues in different ranges of stability change and implies the existence of certain patterns. As building rules, we found that the mutation residues in wild type and in mutant protein play an important role. The present study demonstrates that iPTREE-2 can serve the purpose of predicting protein stability change, especially when one requires more understandable knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Daggett V, Fersht AR (2003) Trends Biochem Sci 28:18–25

    Article  CAS  Google Scholar 

  2. Saven JG (2002) Curr Opin Struct Biol 12:453–458

    Article  CAS  Google Scholar 

  3. Mendes J, Guerois R, Serrano L (2002) Curr Opin Struct Biol 12:441–446

    Article  CAS  Google Scholar 

  4. Bolon DN, Marcus JS, Ross SA, Mayo SL (2003) J Mol Biol 329:611–622

    Article  CAS  Google Scholar 

  5. Looger LL, Dwyer MA, Smith JJ, Hellinga HW (2003) Nature 423:185–190

    Article  CAS  Google Scholar 

  6. Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A (1999) Protein Eng 12:549–555

    Article  CAS  Google Scholar 

  7. Guerois R, Nielsen JE, Serrano L (2002) J Mol Biol 320:369–387

    Article  CAS  Google Scholar 

  8. Prevost M, Wodak SJ, Tidor B, Karplus M (1991) Proc Natl Acad Sci USA 88:10880–10884

    Article  CAS  Google Scholar 

  9. Gilis D, Rooman M (1997) J Mol Biol 272:276–290

    Article  CAS  Google Scholar 

  10. Parthiban V, Gromiha MM, Schomburg D (2006) Nucleic Acids Res 34:W239–W242

    Article  CAS  Google Scholar 

  11. Funahashi J, Takano K, Yutani K (2001) Protein Eng 14:127–134

    Article  CAS  Google Scholar 

  12. Capriotti E, Fariselli P, Casadio R (2004) Bioinformatics 20 Suppl 1:I63–I68

    Article  CAS  Google Scholar 

  13. Capriotti E, Fariselli P, Casadio R (2005) Nucleic Acids Res 33:W306–W310

    Article  CAS  Google Scholar 

  14. Cheng J, Randall A, Baldi P (2006) Proteins 62:1125–1132

    Article  CAS  Google Scholar 

  15. Xiong W, Wang JTL, Shasha D, Shapiro BA, Rigoutsos I, Kaizhong Z (2002) Knowledge and Data Engineering, IEEE Transactions on 14:731–749

    Article  Google Scholar 

  16. Creighton C, Hanash S (2003) Bioinformatics 19:79–86

    Article  CAS  Google Scholar 

  17. Oyama T, Kitano K, Satou K, Ito T (2002) Bioinformatics 18:705–714

    Article  CAS  Google Scholar 

  18. Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach. MIT Press, Cambridge, Mass

    Google Scholar 

  19. Larose DT (2005) Discovering knowledge in data: An introduction to data mining. Wiley-Interscience, Hoboken, New York

    Google Scholar 

  20. Huang LT, Gromiha MM, Hwang SF, Ho SY (2006) Computational Biology and Chemistry 30:408–415

    Article  CAS  Google Scholar 

  21. Bordner AJ, Abagyan RA (2004) Proteins 57:400–413

    Article  CAS  Google Scholar 

  22. Casadio R, Compiani M, Fariselli P, Vivarelli F (1995) Proc Int Conf Intell Syst Mol Biol 3:81–88

    CAS  Google Scholar 

  23. Frenz CM (2005) Proteins 59:147–151

    Article  CAS  Google Scholar 

  24. Lacroix E, Viguera AR, Serrano L (1998) J Mol Biol 284:173–191

    Article  CAS  Google Scholar 

  25. Munoz V, Serrano L (1997) Biopolymers 41:495–509

    Article  CAS  Google Scholar 

  26. Huang LT, Saraboji K, Ho SY, Hwang SF, Ponnuswamy MN, Gromiha MM (2007) Biophysical Chemistry 125:462–470

    Article  CAS  Google Scholar 

  27. Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A (2004) Nucleic Acids Res 32:D120–D121

    Article  CAS  Google Scholar 

  28. Gromiha MM, An J, Kono H, Oobatake M, Uedaira H, Sarai A (1999) Nucleic Acids Res 27:286–288

    Article  CAS  Google Scholar 

  29. Breiman L (1984) Classification and regression trees. Wadsworth International Group, Belmont, CA

    Google Scholar 

  30. Bai JP, Utis A, Crippen G, He HD, Fischer V, et al (2004) J Chem Inf Comput Sci 44:2061–2069

    Article  CAS  Google Scholar 

  31. Deconinck E, Zhang MH, Coomans D, Vander Heyden Y (2006) J Chem Inf Model 46:1410–1419

    Article  CAS  Google Scholar 

  32. Witten IH, Frank E (2005) Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco

    Google Scholar 

  33. Zscherp C, Aygun H, Engels JW, Mantele W (2003) Biochim Biophys Acta 1651:139–145

    CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shinn-Ying Ho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, LT., Gromiha, M.M. & Ho, SY. Sequence analysis and rule development of predicting protein stability change upon mutation using decision tree model. J Mol Model 13, 879–890 (2007). https://doi.org/10.1007/s00894-007-0197-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00894-007-0197-4

Keywords

Navigation