Improving Multi-Relief for Detecting Specificity Residues from Multiple Sequence Alignments

  • Elena Marchiori
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6023)

Abstract

A challenging problem in bioinformatics is the detection of residues that account for protein function specificity, not only in order to gain deeper insight in the nature of functional specificity but also to guide protein engineering experiments aimed at switching the specificity of an enzyme, regulator or transporter. The majority of the state-of-the art algorithms for this task use multiple sequence alignments (MSA’s) to identify residue positions conserved within- and divergent between- protein subfamilies. In this study, we focus on a recent method based on this approach called multi-RELIEF. We analyze and modify the two core parts of the method in order to improve its predictive performance. A parametric generalization of the popular RELIEF machine learning algorithm for weighting residues is introduced and incorporated in multi-RELIEF. The ensemble criterion of multi-RELIEF for merging the weights of multiple runs is simplified. Finally, the method used by multi-RELIEF for exploiting tertiary structure information is modified by incorporating prior information describing the confidence of the original scores assigned to residues. Extensive computational experiments on six real-life datasets show improvement of both robustness and detection capability of the new multi-RELIEF over the original method.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bickel, P.J., Kechris, K.J., Spector, P.C., Wedemayer, G.J., Glazer, A.N.: Finding important sites in protein sequences. Proc. Natl. Acad. Sci. USA 99, 14764–14771 (2002)MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Carro, A., Tress, M., de Juan, D., Pazos, F., Lopez-Romero, P., Del Sol, A., Valencia, A., Rojas, A.M.: Treedet: a web server to explore sequence space. Nucleic Acids Res. 35(web server issue), 99 (2006)Google Scholar
  3. 3.
    Chakrabarti, S., Panchenko, A.R.: Ensemble approach to predict specificity determinants: benchmarking and validation. BMC Bioinformatics 10, 207 (2009)CrossRefGoogle Scholar
  4. 4.
    Del Sol Mesa, A., Pazos, F., Valencia, A.: Automatic methods for predicting functionally important residues. J. Mol. Biol. 326(4), 1289–1302 (2003)CrossRefGoogle Scholar
  5. 5.
    Feenstra, K.A., Pirovano, W., Krab, K., Heringa, J.: Sequence harmony: detecting functional specificity from alignments. Nucleic Acids Res. 35(web server issue), W495–W498 (2007)Google Scholar
  6. 6.
    Gu, X.: A simple statistical method for estimating type-ii (cluster-specific) functional divergence of protein sequence. Mol. Biol. Evol. 23, 1937–1945 (2006)CrossRefGoogle Scholar
  7. 7.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)MATHCrossRefGoogle Scholar
  8. 8.
    Hannenhalli, S.S., Russell, R.B.: Analysis and prediction of functional sub-types from protein sequence alignments. J. Mol. Biol. 303(1), 61–76 (2000)CrossRefGoogle Scholar
  9. 9.
    Kalinina, O.V., Gelfand, M.S., Russell, R.B.: Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics (2009)Google Scholar
  10. 10.
    Kalinina, O.V., Novichkov, P.S., Mironov, A.A., Gelfand, M.S., Rakhmaninova, A.B.: SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucleic Acids Res. 32(web server issue), W424–W428 (2004)Google Scholar
  11. 11.
    Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)Google Scholar
  12. 12.
    Kuipers, R.K., Joosten, H.-J.J., Verwiel, E., Paans, S., Akerboom, J., van der Oost, J., Leferink, N.G., van Berkel, W.J., Vriend, G., Schaap, P.J.: Correlated mutation analyses on super-family alignments reveal functionally important residues. Proteins 76(3), 608–616 (2009)CrossRefGoogle Scholar
  13. 13.
    Mihalek, I., Res, I., Lichtarge, O.: A family of evolution-entropy hybrid methods for ranking protein residues by importance. J. Mol. Biol. 336(5), 1265–1282 (2004)CrossRefGoogle Scholar
  14. 14.
    Mirny, L.A., Gelfand, M.S.: Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors. J. Mol. Biol. 321(1), 7–20 (2002)CrossRefGoogle Scholar
  15. 15.
    Moore, J.H., White, B.C.: Tuning relieff for genome-wide genetic analysis. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds.) EvoBIO 2007. LNCS, vol. 4447, pp. 166–175. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    Pirovano, W., Feenstra, K.A., Heringa, J.: Sequence comparison by sequence harmony identifies subtype specific functional sites. Nucleic Acids Res. 34, 6540–6548 (2006)CrossRefGoogle Scholar
  17. 17.
    Provost, F., Kohavi, R.: Guest editors’ introduction: On applied research in machine learning. Machine Learning 30, 127–132 (1998)CrossRefGoogle Scholar
  18. 18.
    Shenkin, P.S., Erman, B., Mastrandrea, L.D.: Information-theoretical entropy as a measure of sequence variability. Proteins 11(4), 297–313 (1991)CrossRefGoogle Scholar
  19. 19.
    Sobolev, V., Sorokine, A., Prilusky, J., Abola, E.E., Edelman, M.: Automated analysis of interatomic contacts in proteins. Bioinformatics 15, 327–332 (1999)CrossRefGoogle Scholar
  20. 20.
    Swets, J.A.: Measuring the accuracy of diagnostic systems. Science 240, 1285–1293 (1988)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Whisstock, J.C., Lesk, A.M.: Prediction of protein function from protein sequence and structure. Quart. Rev. Biophys. 36(3), 307–340 (2003)CrossRefGoogle Scholar
  22. 22.
    Ye, K., Feenstra, K.A., Heringa, J., IJzerman, A.P., Marchiori, E.: Multi-relief: a method to recognize specificity determining residues from multiple sequence alignments using a machine-learning approach for feature weighting. Bioinformatics 24(1), 18–25 (2008)CrossRefGoogle Scholar
  23. 23.
    Ye, K., Lameijer, E.W., Beukers, M.W., IJzerman, A.P.: A two-entropies analysis to identify functional positions in the transmembrane region of class a g protein-coupled receptors. Proteins 63, 1018–1030 (2006)CrossRefGoogle Scholar
  24. 24.
    Zhang, Y., Ding, C., Li, T.: Gene selection algorithm by combining relieff and mrmr. BMC Genomics 9(suppl. 2) (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Elena Marchiori
    • 1
  1. 1.Radboud University NijmegenThe Netherlands

Personalised recommendations