Neural Computing and Applications

, Volume 32, Issue 2, pp 323–334 | Cite as

Combining feature engineering and feature selection to improve the prediction of methionine oxidation sites in proteins

  • Francisco J. VeredasEmail author
  • Daniel Urda
  • José L. Subirats
  • Francisco R. Cantón
  • Juan C. Aledo
S.I. : IWANN2017: Learning algorithms with real world applications


Methionine is a proteinogenic amino acid that can be post-translationally modified. It is now well established that reactive oxygen species can oxidise methionine residues within living cells. For a long time, it has been thought that such a modification represents merely an inevitable damage derived from aerobic metabolism. However, several authors have begun to contemplate a possible role for this methionine modification in cell signalling. During the last years, a number of proteomic studies have been carried out with the purpose of detecting proteins containing oxidised methionines. Although these proteomic works allow to pinpoint those methionines being oxidised, they are also arduous, expensive and time-consuming. For these reasons, computational approaches aimed at predicting methionine oxidation sites in proteins become an appealing alternative. In the current work, we address methionine oxidation prediction by combining computational intelligence methods with feature engineering and feature selection techniques to improve the efficacy of several machine learning models, while reducing the number of input characteristics needed to get high accuracy rates. We compare random forests, support vector machines, neural networks and flexible discriminant analysis models. Random forests give the best AUC (\(0.8124 \pm 0.0334\)) and accuracy rates (\(0.7590 \pm 0.0551\)) by using only a reduced set of 16 characteristics. These results surpass the outcomes of previous works. In addition, we present an end-user script that has been developed to take a protein ID as an input and return a list with the oxidation state of all the methionine residues found in the analysed protein. Finally, to illustrate the applicability of this tool, we have selected the human \(\alpha 1\)-antitrypsin protein as a case study. This protein was selected because it was not present among the set of proteins used to build up the predictive models but the protein has been well characterised experimentally in terms of methionine oxidation. The prediction returned by our script fully matches the empirical evidence. Out of the nine methionine residues found in this protein, our model predicts the oxidation of only two of them, M351 and M358, which have been reported, on the base of mass spectrometry analyses, to be particularly susceptible to oxidation.


Protein prediction Post-translational modification Methionine oxidation Predictive computational model 



This work was partially supported by the project TIN2017-88728-C2-1-R, MINECO, Plan Nacional de I+D+I.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

521_2018_3655_MOESM1_ESM.pdf (343 kb)
Supplementary material 1 (pdf 343 KB)


  1. 1.
    Aledo JC (2014) Life-history constraints on the mechanisms that control the rate of ROS production. Curr Genomics 15:217–230. CrossRefGoogle Scholar
  2. 2.
    Aledo JC, Cantón FR, Veredas FJ (2017) A machine learning approach for predicting methionine oxidation sites. BMC Bioinform 18(1):430. CrossRefGoogle Scholar
  3. 3.
    Arnér ES, Holmgren A (2000) Physiological functions of thioredoxin and thioredoxin reductase. Eur J Biochem 267(20):6102–6109. CrossRefGoogle Scholar
  4. 4.
    Bergmeir C, Benítez JM (2012) Neural networks in R using the stuttgart neural network simulator: RSNNS. J Stat Softw 46(7):1–26.
  5. 5.
    Breiman L, Friedman J, Stone C, Olshen R (1984) Classification and regression trees. Chapman & Hall, New York.
  6. 6.
    Caputo B, Sim K, Furesjo F, Smola A (2002) Appearance-based object recognition using SVMs: which kernel should I use? In: Proc of NIPS workshop on statistical methods for computational experiments in visual processing and computer vision, Whistler, vol 2002Google Scholar
  7. 7.
    Collins Y, Chouchani ET, James AM, Menger KE, Cochemé HM, Murphy MP (2012) Mitochondrial redox signalling at a glance. J Cell Sci 125(Pt 4):801–806. CrossRefGoogle Scholar
  8. 8.
    Datta S, Mukhopadhyay S (2015) A grammar inference approach for predicting kinase specific phosphorylation sites. PLoS One 10(4):e0122,294. CrossRefGoogle Scholar
  9. 9.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30,
  10. 10.
    Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923. CrossRefGoogle Scholar
  11. 11.
    Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. In: Computational systems bioinformatics CSB2003. Proceedings of the 2003 IEEE bioinformatics conference CSB2003, vol 3(2), pp 523–528.
  12. 12.
    Drazic A, Miura H, Peschek J, Le Y, Bach NC, Kriehuber T, Winter J (2013) Methionine oxidation activates a transcription factor in response to oxidative stress. Proc Natl Acad Sci USA 110(23):9493–9498. CrossRefGoogle Scholar
  13. 13.
    Erickson JR, MlA Joiner, Guan X, Kutschke W, Yang J, Oddis CV, Bartlett RK, Lowe JS, O’Donnell SE, Aykin-Burns N, Zimmerman MC, Zimmerman K, Ham AJL, Weiss RM, Spitz DR, Shea MA, Colbran RJ, Mohler PJ, Anderson ME (2008) A dynamic pathway for calcium-independent activation of CaMKII by methionine oxidation. Cell 133(3):462–474. CrossRefGoogle Scholar
  14. 14.
    Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22.
  15. 15.
    Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 1–67. MathSciNetCrossRefGoogle Scholar
  16. 16.
    Ghesquière B, Jonckheere V, Colaert N, Van Durme J, Timmerman E, Goethals M, Schymkowitz J, Rousseau F, Vandekerckhove J, Gevaert K (2011) Redox proteomics of protein-bound methionine oxidation. Mol Cell Proteomics 10(5):M110.006,866. CrossRefGoogle Scholar
  17. 17.
    Härndahl U, Kokke BP, Gustavsson N, Linse S, Berggren K, Tjerneld F, Boelens WC, Sundby C (2001) The chaperone-like activity of a small heat shock protein is lost after sulfoxidation of conserved methionines in a surface-exposed amphipathic alpha-helix. Biochim Biophys Acta 1545(1–2):227–237. CrossRefGoogle Scholar
  18. 18.
    Jacques S, Ghesquière B, Van Breusegem F, Gevaert K (2013) Plant proteins under oxidative attack. Proteomics 13(6):932–940. CrossRefGoogle Scholar
  19. 19.
    Jacques S, Ghesquière B, De Bock PJ, Demol H, Wahni K, Willemns P, Messens J, Van Breusegem F, Gevaert K (2015) Protein methionine sulfoxide dynamics in arabidopsis thaliana under oxidative stress. Mol Cell Proteomics 14:1217–1229. CrossRefGoogle Scholar
  20. 20.
    Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab—an S4 package for Kernel methods in R. J Stat Softw 11(9):1–20.
  21. 21.
    Kim G, Weiss SJ, Levine RL (2014) Methionine oxidation and reduction in proteins. BBA-Gen Subjects 1840(2):901–905. CrossRefGoogle Scholar
  22. 22.
    Kim HY (2013) The methionine sulfoxide reduction system: selenium utilization and methionine sulfoxide reductase enzymes and their functions. Antioxid Redox Signal 19(9):958–969. CrossRefGoogle Scholar
  23. 23.
    Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26.
  24. 24.
    Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York. CrossRefGoogle Scholar
  25. 25.
    Lacoste A, Laviolette F, Marchand M (2012) Bayesian comparison of machine learning algorithms on single and multiple datasets. In: Proceedings of the fifteenth international conference on artificial intelligence and statistics, vol 22, pp 665–675.
  26. 26.
    Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22.
  27. 27.
    R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.
  28. 28.
    Rao RSP, Møller IM, Thelen JJ, Miernyk JA (2014) Convergent signaling pathways–interaction between methionine oxidation and serine/threonine/tyrosine O-phosphorylation. Cell Stress Chaperon 20(1):15–21. CrossRefGoogle Scholar
  29. 29.
    Taggart C, Cervantes-Laurean D, Kim G, McElvaney NG, Wehr N, Moss J, Levine RL (2000) Oxidation of either methionine 351 or methionine 358 in alpha 1-antitrypsin causes loss of anti-neutrophil elastase activity. J Biol Chem 275:27,258–27,265.
  30. 30.
    Tang XD, Daggett H, Hanner M, Garcia ML, McManus OB, Brot N, Weissbach H, Heinemann SH, Hoshi T (2001) Oxidative regulation of large conductance calcium-activated potassium channels. J Gen Physiol 117(3):253–274. CrossRefGoogle Scholar
  31. 31.
    Trost B, Kusalik A (2011) Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 27(21):2927–2935. CrossRefGoogle Scholar
  32. 32.
    Veredas FJ, Aledo JC, Cantón FR (2017a) Methionine residues around phosphorylation sites are preferentially oxidized in vivo under stress conditions. Sci Rep 7(40403):1–14.
  33. 33.
    Veredas FJ, Cantón FR, Aledo JC (2017b) Prediction of protein oxidation sites. In: Rojas I, Joya G, Catala A (eds) Advances in computational intelligence: 14th international work-conference on artificial neural networks, IWANN 2017, June 14–16, Proceedings, Part II. Springer, Cham, Cadiz, Spain, pp 3–14. CrossRefGoogle Scholar
  34. 34.
    Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X (2008) GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics 7(9):1598–1608. CrossRefGoogle Scholar
  35. 35.
    Zumel N, Mount J (2014) Practical data science with R, 1st edn. Manning Publications Co., Greenwich.

Copyright information

© The Natural Computing Applications Forum 2018

Authors and Affiliations

  1. 1.Dpto. Lenguajes y Ciencias de la ComputaciónUniversidad de MálagaMálagaSpain
  2. 2.Dpto. de Ingeniería Informática, Escuela Superior de IngenieríaUniversidad de CádizPuerto RealSpain
  3. 3.Dpto. de Biología Molecular y Bioquímica, Facultad de CienciasUniversidad de MálagaMálagaSpain

Personalised recommendations