Skip to main content
Log in

Prediction of interactiveness of proteins and nucleic acids based on feature selections

  • Full-Length Paper
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

It is important to identify which proteins can interact with nucleic acids for the purpose of protein annotation, since interactions between nucleic acids and proteins involve in numerous cellular processes such as replication, transcription, splicing, and DNA repair. This research tries to identify proteins that can interact with DNA, RNA, and rRNA, respectively. mRMR (Minimum redundancy and maximum relevance), with its elegant mathematical formulation, has been applied widely in processing biological data and feature analysis since its introduction in 2005. mRMR plus incremental feature selection (IFS) is known to be very efficient in feature selection and analysis, and able to improve both effectiveness and efficiency of a prediction model. IFS is applied to decide how many features should be selected from feature list provided by mRMR. In the end, the selected features of mRMR and IFS are further refined by a conventional feature selection method—forward feature wrapper (FFW), by reordering the features. Each protein is coded by 132 features including amino acid compositions and physicochemical properties. After the feature selection, k-Nearest Neighbor algorithm, the adopted prediction model, is trained and tested. As a result, the optimized prediction accuracies for the DNA, RNA, and rRNA are 82.0, 83.4, and 92.3%, respectively. Furthermore, the most important features that contribute to the prediction are identified and analyzed biologically. The predictor, developed for this research, is available for public access at http://chemdata.shu.edu.cn/protein_na_mrmr/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Vigneault F, Guerin SL (2005) Regulation of gene expression: probing DNA–protein interactions in vivo and in vitro. Expert Rev Proteomics 2: 705–718

    Article  CAS  PubMed  Google Scholar 

  2. Hegarat N, Francois JC, Praseuth D (2008) Modem tools for identification of nucleic acid-binding proteins. Biochimie 90: 1265–1272

    Article  CAS  PubMed  Google Scholar 

  3. Li W, Lin K, Feng K, Cai Y (2008) Prediction of protein structural classes using hybrid properties. Mol Divers 12: 171–179

    Article  CAS  PubMed  Google Scholar 

  4. Cai YD, Qian Z, Lu L, Feng KY, Meng X, Niu B, Zhao GD, Lu WC (2008) Prediction of compounds’ biological function (metabolic pathways) based on functional group composition. Mol Divers 12: 131–137

    Article  CAS  PubMed  Google Scholar 

  5. Cai YD, Lu L (2008) Predicting N-terminal acetylation based on feature selection method. Biochem Biophys Res Commun 372: 862–865

    Article  CAS  PubMed  Google Scholar 

  6. Lu L, Shi XH, Li SJ, Xie ZQ, Feng YL, Lu WC, Li YX, Li H, Cai YD (2009) Protein sumoylation sites prediction based on two-stage feature selection. Mol Divers. doi:10.1007/s11030-009-9149-5

  7. Niu B, Jin Y, Lu L, Fen K, Gu L, He Z, Lu W, Li Y, Cai Y (2009) Prediction of interaction between small molecule and enzyme using AdaBoost. Mol Divers 13: 313–320

    Article  CAS  PubMed  Google Scholar 

  8. Niu B, Jin YH, Feng KY, Lu WC, Cai YD, Li GZ (2008) Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Divers 12: 41–45

    Article  CAS  PubMed  Google Scholar 

  9. Jin YH, Niu B, Feng KY, Lu WC, Cai YD, Li GZ (2008) Predicting subcellular localization with AdaBoost learner. Protein Pept Lett 15: 286–289

    Article  CAS  PubMed  Google Scholar 

  10. Lu L, Qian Z, Shi X, Li H, Cai YD, Li Y (2009) A knowledge-based method to predict the cooperative relationship between transcription factors. Mol Divers. doi:10.1007/s11030-009-9177-1

  11. Henikoff S, Greene EA, Pietrokovski S, Bork P, Attwood TK, Hood L (1997) Gene families: the taxonomy of protein paralogs and chimeras. Science 278: 609–614

    Article  CAS  PubMed  Google Scholar 

  12. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D (1999) Detecting protein function and protein–protein interactions from genome sequences. Science 285: 751–753

    Article  CAS  PubMed  Google Scholar 

  13. Yu XJ, Cao JP, Cai YD, Shi TL, Li YX (2006) Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J Theor Biol 240: 175–184

    Article  CAS  PubMed  Google Scholar 

  14. Cai YD, Lin SL (2003) Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. BBA-Proteins Proteomics 1648: 127–133

    Article  CAS  Google Scholar 

  15. Ahmad S, Sarai A (2004) Moment-based prediction of DNA-binding proteins. J Mol Biol 341: 65–71

    Article  CAS  PubMed  Google Scholar 

  16. Shanahan HP, Garcia MA, Jones S, Thornton JM (2004) Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Res 32: 4732–4741

    Article  CAS  PubMed  Google Scholar 

  17. Jones S, Barker JA, Nobeli I, Thornton JM (2003) Using structural motif templates to identify proteins with DNA binding function. Nucleic Acids Res 31: 2811–2823

    Article  CAS  PubMed  Google Scholar 

  18. Szilagyi A, Skolnick J (2006) Efficient prediction of nucleic acid binding function from low-resolution protein structures. J Mol Biol 358: 922–933

    Article  CAS  PubMed  Google Scholar 

  19. Stawiski EW, Gregoret LM, Mandel-Gutfreund Y (2003) Annotating nucleic acid-binding function based on protein structure. J Mol Biol 326: 1065–1079

    Article  CAS  PubMed  Google Scholar 

  20. Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach 27: 1226–1238

    Article  Google Scholar 

  21. Cai YD, He JF, Li XL, Lu L, Yang XY, Feng KY, Lu WC, Kong XY (2009) A novel computational approach to predict transcription factor DNA binding preference. J Proteome Res 8: 999–1003

    Article  CAS  PubMed  Google Scholar 

  22. Xu XC, Yu D, Fang W, Cheng YS, Qian ZL, Lu WC, Cai YD, Feng KY (2008) Prediction of peptidase category based on functional domain composition. J Proteome Res 7: 4521–4524

    Article  CAS  PubMed  Google Scholar 

  23. Liu L, Cai YD, Lu WC, Feng KY, Peng CR, Niu B (2009) Prediction of protein–protein interactions based on PseAA composition and hybrid feature selection. Biochem Biophys Res Commun 380: 318–322

    Article  CAS  PubMed  Google Scholar 

  24. Friedman J, Baskett F, Shustek LJ (1975) An algorithm for finding nearest neighbors. IEEE Trans Comput 24: 1000–1006

    Article  Google Scholar 

  25. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory IT-13: 21–27

    Article  Google Scholar 

  26. Li WZ, Jaroszewski L, Godzik A (2001) Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17: 282–283

    Article  CAS  PubMed  Google Scholar 

  27. Wang GL, Dunbrack RL (2003) PISCES: a protein sequence culling server. Bioinformatics 19: 1589–1591

    Article  CAS  PubMed  Google Scholar 

  28. Chothia C, Finkelstein AV (1990) The classification and origins of protein folding patterns. Annu Rev Biochem 59: 1007–1039

    Article  CAS  PubMed  Google Scholar 

  29. Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27: 329–335

    Article  CAS  PubMed  Google Scholar 

  30. Mucchielli-Giorgi MH, Hazout S, Tuffery P (1999) PredAcc: prediction of solvent accessibility. Bioinformatics 15: 176–177

    Article  CAS  PubMed  Google Scholar 

  31. JenJacobson L (1997) Protein-DNA recognition complexes: conservation of structure and binding energy in the transition state. Biopolymers 44: 153–180

    Article  CAS  Google Scholar 

  32. Shazman S, Mandel-Gutfreund Y (2008) Classifying RNA- binding proteins based on electrostatic properties. Plos Comput Biol 4. doi:10.1371/journal.pcbi.1000146

  33. Sanchez-Diaz P, Penalva LOF (2006) Post-transcription meets post-genomic: the saga of RNA binding proteins in a new era. RNA Biol 3: 101–109

    CAS  PubMed  Google Scholar 

  34. Graveley BR (2004) A protein interaction domain contacts RNA in the prespliceosome. Mol Cell 13: 302–304

    Article  CAS  PubMed  Google Scholar 

  35. Woodson SA, Leontis NB (1998) Structure and dynamics of ribosomal RNA. Curr Opin Struct Biol 8: 294–300

    Article  CAS  PubMed  Google Scholar 

  36. Moine H, Cachia C, Westhof E, Ehresmann B, Ehresmann C (1997) The RNA binding site of S8 ribosomal protein of Escherichia coli: Selex and hydroxyl radical probing studies. RNA 3: 255–268

    CAS  PubMed  Google Scholar 

  37. Powers T, Noller HF (1995) Hydroxyl radical footprinting of ribosomal-proteins on 16s ribosomal-RNA. RNA 1: 194–209

    CAS  PubMed  Google Scholar 

  38. Stern S, Powers T, Changchien LM, Noller HF (1989) RNA–protein interactions in 30s ribosomal-subunits—folding and function of 16s ribosomal-RNA. Science 244: 783–790

    Article  CAS  PubMed  Google Scholar 

  39. Bleichert F, Grannemant S, Osheim YN, Beyer AL, Baserga SJ (2006) The PINc domain protein Utp24, a putative nuclease, is required for the early cleavage steps in 18S rRNA maturation. Proc Natl Acad Sci USA 103: 9464–9469

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to WenCong Lu or YuDong Cai.

Additional information

YouLang Yuan, XiaoHe Shi and XinLei Li are regarded as joint first authors.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, Y., Shi, X., Li, X. et al. Prediction of interactiveness of proteins and nucleic acids based on feature selections. Mol Divers 14, 627–633 (2010). https://doi.org/10.1007/s11030-009-9198-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-009-9198-9

Keywords

Navigation