Skip to main content
Log in

Support vector machine based classification of 3-dimensional protein physicochemical environments for automated function annotation

  • Research Articles
  • Drug Actions
  • Published:
Archives of Pharmacal Research Aims and scope Submit manuscript

Abstract

The knowledge of protein functions as well as structures is critical for drug discovery and development. The FEATURE system developed at Stanford is an effective tool for characterizing and classifying local environments in proteins. FEATURE utilizes vectors of a fixed dimension to represent the physicochemical properties around a residue. Functional sites and non-sites are identified by classifying such vectors using the Naïve Bayes classifier. In this paper, we improve the FEATURE framework in several ways so that it can be more flexible, robust and accurate. The new tool can handle vectors of a user-specified dimension and can suppress noise effectively, with little loss of important signals, by employing dimensionality reduction. Furthermore, our approach utilizes the support vector machine for a more accurate classification. According to the results of our thorough experiments, the proposed new approach outperformed the original tool by 20.13% and 13.42% with respect to true and false positive rates, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ausiello, G., Via, A., and Helmer-Citterich, M., Query3d: a new method for high-throughput analysis of functional residues in protein structures. BMC Bioinformatics, 6(Suppl 4), S5–10 (2005).

    Article  PubMed  Google Scholar 

  • Baker, J. and Thornton, J., An algorithm for constraintbased structural template matching: Application to 3D templates with statistical analysis. Bioinformatics, 19, 1644–1649 (2003).

    Article  Google Scholar 

  • Berman, H. M., Battistuz, T., Bhat, T. N., Bluhm, W. F., Bourne, P. E., Burkhardt, K., Feng, Z., Gilliland, G. L., Iype, L., Jain, S., Fagan, P., Marvin, J., Padilla, D., Ravichandran, V., Schneider, B., Thanki, N., Weissig, H., Westbrook, J. -D., and Zardecki, C., The Protein Data Bank. Acta Crystallogr. D Biol. Crystallogr., 58(Pt 6 No 1), 899–907 (2002).

    Article  PubMed  Google Scholar 

  • Bishop, C. M., Pattern Recognition and Machine Learning, Springer, Heidelberg, (2007).

    Google Scholar 

  • Davis, J. and Goadrich, M., The relationship between precision-recall and ROC curves, In Proceedings of the 23rd international conference on Machine learning, ACM New York, pp. 233–240, (2006).

    Google Scholar 

  • De Maesschalck, R., Jouan-Rimbaud, D., and Massart, D. L., The Mahalanobis distance. Chemometr. Intell. Lab., 50, 1–18 (2000).

    Article  Google Scholar 

  • Di Gennaro, J., Siew, N., Hoffman, B., Zhang, L., Skolnick, J., Neilson, L., and Fetrow, J., Enhanced functional annotation of protein sequences via the use of structural descriptors. J. Struct. Biol., 134, 232–245 (2001).

    Article  PubMed  Google Scholar 

  • Domingos, P. and Pazzani, M., Beyond independence: Conditions for the optimality of the simpleBayesian classifier. Machine Learning, 29, 103–130 (1997).

    Article  Google Scholar 

  • Frank, E., Trigg, L., Holmes, G., and Witten, I. H., Naive Bayes for regression. Mach. Learn., 41, 5–15 (2000).

    Article  Google Scholar 

  • Friedberg, I., Automated protein function prediction-the genomic challenge. Brief. Bioinformatics, 7, 225–242 (2006).

    Article  CAS  PubMed  Google Scholar 

  • Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P. S., Pagni, M., and Sigrist, C. J., The PROSITE database. Nucleic Acids Res., 34(Database Issue), D227–D230 (2006).

    Article  CAS  PubMed  Google Scholar 

  • Jambon, M., Imberty, A., Deleage, G., and Geourjon, C., A new bioinformatic approach to detect common 3D sites in protein structures. Proteins, 52, 137–145 (2003).

    Article  CAS  PubMed  Google Scholar 

  • Landau, M., Mayrose, I., Rosenberg, Y., Glaser, F., Martz, E., Pupko, T., and Ben-Tal, N., ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucleic Acids Res., 33(Web Server Issue), W299–W302 (2005).

    Article  CAS  PubMed  Google Scholar 

  • Liang, M., Banatao, D. R., Klein, T. E., Brutlag, D. L., and Altman, R. B., WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures. Nucleic Acids Res., 31, 3324–3327 (2003).

    Article  CAS  PubMed  Google Scholar 

  • Redfern, O. C., Dessailly, B. H., Dallman, T. J., Sillitoe, I., and Orengo, C. A., FLORA: a novel method to predict protein function from structure in diverse superfamilies, PLoS Comput. Biol., 5, e1000485 (2009).

    Article  PubMed  Google Scholar 

  • Rosner, B., Fundamentals of biostatistics, Fifth ed., Pacific Grove, California, Duxbury, (2000).

    Google Scholar 

  • Witten, I. H. and Frank, E., Data mining: practical machine learning tools andtechniques, 2nd ed., Morgan Kaufmann, California, (2005).

    Google Scholar 

  • Yoon, S., Ebert, J. C., Chung, E. Y., De Micheli, G., and Altman, R. B., Clustering protein environments for function prediction: finding PROSITE motifs in 3D. BMC Bioinformatics, 8(Suppl 4), S10 (2007).

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sungroh Yoon.

Additional information

These authors contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Min, H., Yu, S., Lee, T. et al. Support vector machine based classification of 3-dimensional protein physicochemical environments for automated function annotation. Arch. Pharm. Res. 33, 1451–1459 (2010). https://doi.org/10.1007/s12272-010-0920-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12272-010-0920-z

Key words

Navigation