Skip to main content

A Computational Domain-Based Feature Grouping Approach for Prediction of Stability of SCF Ligases

  • Conference paper
Bioinformatics and Biomedical Engineering (IWBBIO 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9043))

Included in the following conference series:

Abstract

Analyzing the stability of SCF ubiquitin ligases is worth investigating because these complexes are involved in many cellular processes including cell cycle regulation, DNA repair mechanisms, and gene expression. On the other hand, interactions of two (or more) proteins are controlled by their domains – compact functional units of proteins. As a consequence, in this study, we have analyzed the role of Pfam domain interactions in predicting the stability of protein-protein interactions (PPIs) that are known or predicted to occur involving subunit components of the SCF ligase complex. Moreover, employing the most relevant and discriminating features is very important to achieve a successful prediction with low computational cost. Although, different feature selection methods have been recently developed for this purpose, feature grouping is a better idea, especially when dealing with high-dimensional sparse feature vectors, yielding better interpretation of the data. In this paper, a correlation-based feature grouping (CFG) method is proposed to group and combine the features. To demonstrate the strength of CFG, two filter methods of χ 2 and correlation are also employed for feature selection and prediction is performed using different methods including a support vector machine (SVM) and k-Nearest Neighbor (k-NN). The experimental results on a dataset of SCF ligases indicate that employing feature grouping achieves significant increases of 10% for svm and 13% for k-NN, being more efficient than employing feature selection in identifying a set of relevant features

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dezfulian, M.H., Soulliere, D.M., Dhaliwal, R.K., Sareen, M., Crosby, W.L.: The skp1-like gene family of arabidopsis exhibits a high degree of differential gene expression and gene product interaction during development. PLOS One 7(11) (2012)

    Google Scholar 

  2. Chen, L., Wang, R., Zhang, X.: Biomolecular Networks: Methods and Applications in Systems Biology. John Wiley and Sons (2009)

    Google Scholar 

  3. Jones, S., Thornton, J.M.: Principles of protein-protein interactions. Proc. Natl. Acad. Sci., USA 93(1), 13–20 (1996)

    Article  Google Scholar 

  4. Maleki, M., Rueda, L., Dezfulian, M.H., Crosby, W.: Computational Analysis of the Stability of SCF Ligases Employing Domain Information. In: 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB 2014), pp. 625–626 (2014)

    Google Scholar 

  5. Maleki, M.H.M., Rueda, L.: Using desolvation energies of structural domains to predict stability of protein complexes. Journal of Network Modeling Analysis in Health Informatics and Bioinformatics (NetMahib) 2, 267–275 (2013)

    Article  Google Scholar 

  6. Hall, M., Maleki, M., Rueda, L.: Multi-level structural domain-domain interactions for prediction of obligate and non-obligate protein-protein interactions. In: ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM-BCB), Florida, USA, pp. 518–520 (October 2012)

    Google Scholar 

  7. Chandrasekaran, P., Doss, C., Nisha, J., Sethumadhavan, R., Shanthi, V., Ramanathan, K., Rajasekaran, R.: In silico analysis of detrimental mutations in add domain of chromatin remodeling protein atrx that cause atr-x syndrome: X-linked disorder. Network Modeling Analysis in Health Informatics and Bioinformatics 2(3), 123–135 (2013)

    Article  Google Scholar 

  8. Lim, S., Peng, T., Sana, B.: Protein-protein interaction prediction using homology and inter-domain linker region information. In: Ao, S.-I., Gelman, L. (eds.) Advances in Electrical Engineering and Computational Science. LNEE, vol. 39, pp. 635–645. Springer, Heidelberg (2013)

    Google Scholar 

  9. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Elsevier Academic Press (2008)

    Google Scholar 

  10. Niu, S., Huang, T., Feng, K., Cai, Y., Li, Y.: Prediction of tyrosine sulfation with mRMR feature selection and analysis. J. Proteome. Res. 9(12), 6490–6497 (2010)

    Article  Google Scholar 

  11. Cai, Y., Huang, T., Hu, L., Shi, X., Xie, L., Li, Y.: Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids (2011)

    Google Scholar 

  12. Maleki, M., Aziz, M., Rueda, L.: Analysis of relevant physicochemical properties in obligate and non-obligate protein-protein interactions. In: IEEE International Conference in Bioinformatics and Biomedicine Workshops (BIBMW), pp. 345–351 (2011)

    Google Scholar 

  13. Liu, L., Cai, Y., Lu, W., Peng, C., Niub, B.: Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection. Biochemical and Biophysical Research Communications 380(2), 318–322 (2009)

    Article  Google Scholar 

  14. Yuan, Y., Shi, X., Li, X., Lu, W., Cai, Y., Gu, L., Liu, L., Li, M., Kong, X., Xing, M.: Prediction of interactiveness of proteins and nucleic acids based on feature selections. Mol. Divers. 14(4), 627–633 (2009)

    Article  Google Scholar 

  15. Mundra, P., Rajapakse, J.: SVM-RFE with mRMR filter for gene selection. IEEE Transactions on Nanobioscience 9(1), 31–37 (2010)

    Article  Google Scholar 

  16. Zhao, Y., Yand, Z.: Improving MSVM-RFE for multiclass gene selection. In: The Fourth International Conference on Computational Systems Biology (ISB 2010) (2010)

    Google Scholar 

  17. Yang, S., Yuan, L., Lai, Y., Shen, X., Wonka, P., Ye, J.: Feature grouping and selection over an undirected graph. In: Proceedings of the International Conference on Knowledge Discovery & Data Mining (KDD) (2012)

    Google Scholar 

  18. Li, C., Li, H.: Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9), 1175–1182 (2008)

    Article  Google Scholar 

  19. Zhong, L.W., Kwok, J.T.: Efficient sparse modeling with automatic feature grouping. IEEE Transactions on Neural Networks and Leraning Systems 23(9), 1436–1447 (2012)

    Article  Google Scholar 

  20. Suzuki, J., Nagata, M.: Supervised model learning with feature grouping based on a discrete constraint. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria (August 2013)

    Google Scholar 

  21. Tibshirani, R.: Regression shrinkage and selection via the lasso: A retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(3), 273–282 (2011)

    Article  MathSciNet  Google Scholar 

  22. Shen, X., Huang, H.: Grouping pursuit through a regularization solution surface. Journal of the American Statistical Association 105(490), 729–739 (2010)

    Article  MathSciNet  Google Scholar 

  23. Bondell, H., Reich, B.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64(1), 115–123 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  24. Chen, B.B., Mallampalli, R.K.: F-box protein substrate recognition-a new insight. Cell Cycle 12(7), 1009–1010 (2013)

    Article  Google Scholar 

  25. Berman, H.M., Kleywegt, G.J., Nakamura, H., Markley, J.L.: The Protein Data Bank at 40: reflecting on the past to prepare for the future. Structure 20(3), 391–396 (2012)

    Article  Google Scholar 

  26. Punta, M., Coggill, P., Eberhardt, R., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E., Eddy, S., Bateman, A., Finn, R.: The Pfam protein families database. Nucleic Acids Res. 40(D1), D290–D301 (2012)

    Google Scholar 

  27. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley and Sons, Inc., New York (2000)

    Google Scholar 

  28. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)

    Article  Google Scholar 

  29. Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, pp. 388–391 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Maleki, M., Dezfulian, M.H., Rueda, L. (2015). A Computational Domain-Based Feature Grouping Approach for Prediction of Stability of SCF Ligases. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9043. Springer, Cham. https://doi.org/10.1007/978-3-319-16483-0_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16483-0_61

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16482-3

  • Online ISBN: 978-3-319-16483-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics