Abstract
Analyzing the stability of SCF ubiquitin ligases is worth investigating because these complexes are involved in many cellular processes including cell cycle regulation, DNA repair mechanisms, and gene expression. On the other hand, interactions of two (or more) proteins are controlled by their domains – compact functional units of proteins. As a consequence, in this study, we have analyzed the role of Pfam domain interactions in predicting the stability of protein-protein interactions (PPIs) that are known or predicted to occur involving subunit components of the SCF ligase complex. Moreover, employing the most relevant and discriminating features is very important to achieve a successful prediction with low computational cost. Although, different feature selection methods have been recently developed for this purpose, feature grouping is a better idea, especially when dealing with high-dimensional sparse feature vectors, yielding better interpretation of the data. In this paper, a correlation-based feature grouping (CFG) method is proposed to group and combine the features. To demonstrate the strength of CFG, two filter methods of χ 2 and correlation are also employed for feature selection and prediction is performed using different methods including a support vector machine (SVM) and k-Nearest Neighbor (k-NN). The experimental results on a dataset of SCF ligases indicate that employing feature grouping achieves significant increases of 10% for svm and 13% for k-NN, being more efficient than employing feature selection in identifying a set of relevant features
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dezfulian, M.H., Soulliere, D.M., Dhaliwal, R.K., Sareen, M., Crosby, W.L.: The skp1-like gene family of arabidopsis exhibits a high degree of differential gene expression and gene product interaction during development. PLOS One 7(11) (2012)
Chen, L., Wang, R., Zhang, X.: Biomolecular Networks: Methods and Applications in Systems Biology. John Wiley and Sons (2009)
Jones, S., Thornton, J.M.: Principles of protein-protein interactions. Proc. Natl. Acad. Sci., USA 93(1), 13–20 (1996)
Maleki, M., Rueda, L., Dezfulian, M.H., Crosby, W.: Computational Analysis of the Stability of SCF Ligases Employing Domain Information. In: 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB 2014), pp. 625–626 (2014)
Maleki, M.H.M., Rueda, L.: Using desolvation energies of structural domains to predict stability of protein complexes. Journal of Network Modeling Analysis in Health Informatics and Bioinformatics (NetMahib) 2, 267–275 (2013)
Hall, M., Maleki, M., Rueda, L.: Multi-level structural domain-domain interactions for prediction of obligate and non-obligate protein-protein interactions. In: ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM-BCB), Florida, USA, pp. 518–520 (October 2012)
Chandrasekaran, P., Doss, C., Nisha, J., Sethumadhavan, R., Shanthi, V., Ramanathan, K., Rajasekaran, R.: In silico analysis of detrimental mutations in add domain of chromatin remodeling protein atrx that cause atr-x syndrome: X-linked disorder. Network Modeling Analysis in Health Informatics and Bioinformatics 2(3), 123–135 (2013)
Lim, S., Peng, T., Sana, B.: Protein-protein interaction prediction using homology and inter-domain linker region information. In: Ao, S.-I., Gelman, L. (eds.) Advances in Electrical Engineering and Computational Science. LNEE, vol. 39, pp. 635–645. Springer, Heidelberg (2013)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Elsevier Academic Press (2008)
Niu, S., Huang, T., Feng, K., Cai, Y., Li, Y.: Prediction of tyrosine sulfation with mRMR feature selection and analysis. J. Proteome. Res. 9(12), 6490–6497 (2010)
Cai, Y., Huang, T., Hu, L., Shi, X., Xie, L., Li, Y.: Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids (2011)
Maleki, M., Aziz, M., Rueda, L.: Analysis of relevant physicochemical properties in obligate and non-obligate protein-protein interactions. In: IEEE International Conference in Bioinformatics and Biomedicine Workshops (BIBMW), pp. 345–351 (2011)
Liu, L., Cai, Y., Lu, W., Peng, C., Niub, B.: Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection. Biochemical and Biophysical Research Communications 380(2), 318–322 (2009)
Yuan, Y., Shi, X., Li, X., Lu, W., Cai, Y., Gu, L., Liu, L., Li, M., Kong, X., Xing, M.: Prediction of interactiveness of proteins and nucleic acids based on feature selections. Mol. Divers. 14(4), 627–633 (2009)
Mundra, P., Rajapakse, J.: SVM-RFE with mRMR filter for gene selection. IEEE Transactions on Nanobioscience 9(1), 31–37 (2010)
Zhao, Y., Yand, Z.: Improving MSVM-RFE for multiclass gene selection. In: The Fourth International Conference on Computational Systems Biology (ISB 2010) (2010)
Yang, S., Yuan, L., Lai, Y., Shen, X., Wonka, P., Ye, J.: Feature grouping and selection over an undirected graph. In: Proceedings of the International Conference on Knowledge Discovery & Data Mining (KDD) (2012)
Li, C., Li, H.: Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9), 1175–1182 (2008)
Zhong, L.W., Kwok, J.T.: Efficient sparse modeling with automatic feature grouping. IEEE Transactions on Neural Networks and Leraning Systems 23(9), 1436–1447 (2012)
Suzuki, J., Nagata, M.: Supervised model learning with feature grouping based on a discrete constraint. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria (August 2013)
Tibshirani, R.: Regression shrinkage and selection via the lasso: A retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(3), 273–282 (2011)
Shen, X., Huang, H.: Grouping pursuit through a regularization solution surface. Journal of the American Statistical Association 105(490), 729–739 (2010)
Bondell, H., Reich, B.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64(1), 115–123 (2008)
Chen, B.B., Mallampalli, R.K.: F-box protein substrate recognition-a new insight. Cell Cycle 12(7), 1009–1010 (2013)
Berman, H.M., Kleywegt, G.J., Nakamura, H., Markley, J.L.: The Protein Data Bank at 40: reflecting on the past to prepare for the future. Structure 20(3), 391–396 (2012)
Punta, M., Coggill, P., Eberhardt, R., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E., Eddy, S., Bateman, A., Finn, R.: The Pfam protein families database. Nucleic Acids Res. 40(D1), D290–D301 (2012)
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley and Sons, Inc., New York (2000)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)
Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, pp. 388–391 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Maleki, M., Dezfulian, M.H., Rueda, L. (2015). A Computational Domain-Based Feature Grouping Approach for Prediction of Stability of SCF Ligases. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9043. Springer, Cham. https://doi.org/10.1007/978-3-319-16483-0_61
Download citation
DOI: https://doi.org/10.1007/978-3-319-16483-0_61
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16482-3
Online ISBN: 978-3-319-16483-0
eBook Packages: Computer ScienceComputer Science (R0)