Prediction of interactiveness of proteins and nucleic acids based on feature selections

Yuan, YouLang; Shi, XiaoHe; Li, XinLei; Lu, WenCong; Cai, YuDong; Gu, Lei; Liu, Liang; Li, MinJie; Kong, XiangYin; Xing, Meng

doi:10.1007/s11030-009-9198-9

Prediction of interactiveness of proteins and nucleic acids based on feature selections

Full-Length Paper
Published: 09 October 2009

Volume 14, pages 627–633, (2010)
Cite this article

Molecular Diversity Aims and scope Submit manuscript

YouLang Yuan¹,
XiaoHe Shi^2,3,
XinLei Li^2,3,
WenCong Lu¹,
YuDong Cai⁴,
Lei Gu^5,6,
Liang Liu¹,
MinJie Li¹,
XiangYin Kong^2,3 &
…
Meng Xing⁴

115 Accesses
12 Citations
Explore all metrics

Abstract

It is important to identify which proteins can interact with nucleic acids for the purpose of protein annotation, since interactions between nucleic acids and proteins involve in numerous cellular processes such as replication, transcription, splicing, and DNA repair. This research tries to identify proteins that can interact with DNA, RNA, and rRNA, respectively. mRMR (Minimum redundancy and maximum relevance), with its elegant mathematical formulation, has been applied widely in processing biological data and feature analysis since its introduction in 2005. mRMR plus incremental feature selection (IFS) is known to be very efficient in feature selection and analysis, and able to improve both effectiveness and efficiency of a prediction model. IFS is applied to decide how many features should be selected from feature list provided by mRMR. In the end, the selected features of mRMR and IFS are further refined by a conventional feature selection method—forward feature wrapper (FFW), by reordering the features. Each protein is coded by 132 features including amino acid compositions and physicochemical properties. After the feature selection, k-Nearest Neighbor algorithm, the adopted prediction model, is trained and tested. As a result, the optimized prediction accuracies for the DNA, RNA, and rRNA are 82.0, 83.4, and 92.3%, respectively. Furthermore, the most important features that contribute to the prediction are identified and analyzed biologically. The predictor, developed for this research, is available for public access at http://chemdata.shu.edu.cn/protein_na_mrmr/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated feature engineering improves prediction of protein–protein interactions

Article 05 July 2019

Predicting protein-binding regions in RNA using nucleotide profiles and compositions

Article Open access 14 March 2017

Sequence-Based Prediction of Hot Spots in Protein-RNA Complexes Using an Ensemble Approach

References

Vigneault F, Guerin SL (2005) Regulation of gene expression: probing DNA–protein interactions in vivo and in vitro. Expert Rev Proteomics 2: 705–718
Article CAS PubMed Google Scholar
Hegarat N, Francois JC, Praseuth D (2008) Modem tools for identification of nucleic acid-binding proteins. Biochimie 90: 1265–1272
Article CAS PubMed Google Scholar
Li W, Lin K, Feng K, Cai Y (2008) Prediction of protein structural classes using hybrid properties. Mol Divers 12: 171–179
Article CAS PubMed Google Scholar
Cai YD, Qian Z, Lu L, Feng KY, Meng X, Niu B, Zhao GD, Lu WC (2008) Prediction of compounds’ biological function (metabolic pathways) based on functional group composition. Mol Divers 12: 131–137
Article CAS PubMed Google Scholar
Cai YD, Lu L (2008) Predicting N-terminal acetylation based on feature selection method. Biochem Biophys Res Commun 372: 862–865
Article CAS PubMed Google Scholar
Lu L, Shi XH, Li SJ, Xie ZQ, Feng YL, Lu WC, Li YX, Li H, Cai YD (2009) Protein sumoylation sites prediction based on two-stage feature selection. Mol Divers. doi:10.1007/s11030-009-9149-5
Niu B, Jin Y, Lu L, Fen K, Gu L, He Z, Lu W, Li Y, Cai Y (2009) Prediction of interaction between small molecule and enzyme using AdaBoost. Mol Divers 13: 313–320
Article CAS PubMed Google Scholar
Niu B, Jin YH, Feng KY, Lu WC, Cai YD, Li GZ (2008) Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Divers 12: 41–45
Article CAS PubMed Google Scholar
Jin YH, Niu B, Feng KY, Lu WC, Cai YD, Li GZ (2008) Predicting subcellular localization with AdaBoost learner. Protein Pept Lett 15: 286–289
Article CAS PubMed Google Scholar
Lu L, Qian Z, Shi X, Li H, Cai YD, Li Y (2009) A knowledge-based method to predict the cooperative relationship between transcription factors. Mol Divers. doi:10.1007/s11030-009-9177-1
Henikoff S, Greene EA, Pietrokovski S, Bork P, Attwood TK, Hood L (1997) Gene families: the taxonomy of protein paralogs and chimeras. Science 278: 609–614
Article CAS PubMed Google Scholar
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D (1999) Detecting protein function and protein–protein interactions from genome sequences. Science 285: 751–753
Article CAS PubMed Google Scholar
Yu XJ, Cao JP, Cai YD, Shi TL, Li YX (2006) Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J Theor Biol 240: 175–184
Article CAS PubMed Google Scholar
Cai YD, Lin SL (2003) Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. BBA-Proteins Proteomics 1648: 127–133
Article CAS Google Scholar
Ahmad S, Sarai A (2004) Moment-based prediction of DNA-binding proteins. J Mol Biol 341: 65–71
Article CAS PubMed Google Scholar
Shanahan HP, Garcia MA, Jones S, Thornton JM (2004) Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Res 32: 4732–4741
Article CAS PubMed Google Scholar
Jones S, Barker JA, Nobeli I, Thornton JM (2003) Using structural motif templates to identify proteins with DNA binding function. Nucleic Acids Res 31: 2811–2823
Article CAS PubMed Google Scholar
Szilagyi A, Skolnick J (2006) Efficient prediction of nucleic acid binding function from low-resolution protein structures. J Mol Biol 358: 922–933
Article CAS PubMed Google Scholar
Stawiski EW, Gregoret LM, Mandel-Gutfreund Y (2003) Annotating nucleic acid-binding function based on protein structure. J Mol Biol 326: 1065–1079
Article CAS PubMed Google Scholar
Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach 27: 1226–1238
Article Google Scholar
Cai YD, He JF, Li XL, Lu L, Yang XY, Feng KY, Lu WC, Kong XY (2009) A novel computational approach to predict transcription factor DNA binding preference. J Proteome Res 8: 999–1003
Article CAS PubMed Google Scholar
Xu XC, Yu D, Fang W, Cheng YS, Qian ZL, Lu WC, Cai YD, Feng KY (2008) Prediction of peptidase category based on functional domain composition. J Proteome Res 7: 4521–4524
Article CAS PubMed Google Scholar
Liu L, Cai YD, Lu WC, Feng KY, Peng CR, Niu B (2009) Prediction of protein–protein interactions based on PseAA composition and hybrid feature selection. Biochem Biophys Res Commun 380: 318–322
Article CAS PubMed Google Scholar
Friedman J, Baskett F, Shustek LJ (1975) An algorithm for finding nearest neighbors. IEEE Trans Comput 24: 1000–1006
Article Google Scholar
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory IT-13: 21–27
Article Google Scholar
Li WZ, Jaroszewski L, Godzik A (2001) Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17: 282–283
Article CAS PubMed Google Scholar
Wang GL, Dunbrack RL (2003) PISCES: a protein sequence culling server. Bioinformatics 19: 1589–1591
Article CAS PubMed Google Scholar
Chothia C, Finkelstein AV (1990) The classification and origins of protein folding patterns. Annu Rev Biochem 59: 1007–1039
Article CAS PubMed Google Scholar
Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27: 329–335
Article CAS PubMed Google Scholar
Mucchielli-Giorgi MH, Hazout S, Tuffery P (1999) PredAcc: prediction of solvent accessibility. Bioinformatics 15: 176–177
Article CAS PubMed Google Scholar
JenJacobson L (1997) Protein-DNA recognition complexes: conservation of structure and binding energy in the transition state. Biopolymers 44: 153–180
Article CAS Google Scholar
Shazman S, Mandel-Gutfreund Y (2008) Classifying RNA- binding proteins based on electrostatic properties. Plos Comput Biol 4. doi:10.1371/journal.pcbi.1000146
Sanchez-Diaz P, Penalva LOF (2006) Post-transcription meets post-genomic: the saga of RNA binding proteins in a new era. RNA Biol 3: 101–109
CAS PubMed Google Scholar
Graveley BR (2004) A protein interaction domain contacts RNA in the prespliceosome. Mol Cell 13: 302–304
Article CAS PubMed Google Scholar
Woodson SA, Leontis NB (1998) Structure and dynamics of ribosomal RNA. Curr Opin Struct Biol 8: 294–300
Article CAS PubMed Google Scholar
Moine H, Cachia C, Westhof E, Ehresmann B, Ehresmann C (1997) The RNA binding site of S8 ribosomal protein of Escherichia coli: Selex and hydroxyl radical probing studies. RNA 3: 255–268
CAS PubMed Google Scholar
Powers T, Noller HF (1995) Hydroxyl radical footprinting of ribosomal-proteins on 16s ribosomal-RNA. RNA 1: 194–209
CAS PubMed Google Scholar
Stern S, Powers T, Changchien LM, Noller HF (1989) RNA–protein interactions in 30s ribosomal-subunits—folding and function of 16s ribosomal-RNA. Science 244: 783–790
Article CAS PubMed Google Scholar
Bleichert F, Grannemant S, Osheim YN, Beyer AL, Baserga SJ (2006) The PINc domain protein Utp24, a putative nuclease, is required for the early cleavage steps in 18S rRNA maturation. Proc Natl Acad Sci USA 103: 9464–9469
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Chemical Data mining Laboratory, Department of Chemistry, College of Sciences, Shanghai University, 99 Shang-Da Road, Shanghai, 200444, People’s Republic of China
YouLang Yuan, WenCong Lu, Liang Liu & MinJie Li
Institute of Health Sciences, Shanghai Jiao Tong University, School of Medicine, Shanghai, People’s Republic of China
XiaoHe Shi, XinLei Li & XiangYin Kong
Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
XiaoHe Shi, XinLei Li & XiangYin Kong
Institute of System Biology, Shanghai University, 99 Shang-Da Road, Shanghai, 200444, People’s Republic of China
YuDong Cai & Meng Xing
Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-University Bonn, Dahlmannstr. 2, 53113, Bonn, Germany
Lei Gu
Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany
Lei Gu

Authors

YouLang Yuan
View author publications
You can also search for this author in PubMed Google Scholar
XiaoHe Shi
View author publications
You can also search for this author in PubMed Google Scholar
XinLei Li
View author publications
You can also search for this author in PubMed Google Scholar
WenCong Lu
View author publications
You can also search for this author in PubMed Google Scholar
YuDong Cai
View author publications
You can also search for this author in PubMed Google Scholar
Lei Gu
View author publications
You can also search for this author in PubMed Google Scholar
Liang Liu
View author publications
You can also search for this author in PubMed Google Scholar
MinJie Li
View author publications
You can also search for this author in PubMed Google Scholar
XiangYin Kong
View author publications
You can also search for this author in PubMed Google Scholar
Meng Xing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to WenCong Lu or YuDong Cai.

Additional information

YouLang Yuan, XiaoHe Shi and XinLei Li are regarded as joint first authors.

Electronic supplementary material

ESM (TXT 216 kb)

ESM (TXT 92 kb)

ESM (TXT 26 kb)

ESM (DOC 35 kb)

ESM (DOC 37 kb)

ESM (DOC 36 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, Y., Shi, X., Li, X. et al. Prediction of interactiveness of proteins and nucleic acids based on feature selections. Mol Divers 14, 627–633 (2010). https://doi.org/10.1007/s11030-009-9198-9

Download citation

Received: 18 May 2009
Accepted: 07 September 2009
Published: 09 October 2009
Issue Date: November 2010
DOI: https://doi.org/10.1007/s11030-009-9198-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of interactiveness of proteins and nucleic acids based on feature selections

Abstract

Access this article

Similar content being viewed by others

Automated feature engineering improves prediction of protein–protein interactions

Predicting protein-binding regions in RNA using nucleotide profiles and compositions

Sequence-Based Prediction of Hot Spots in Protein-RNA Complexes Using an Ensemble Approach

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Electronic supplementary material

ESM (TXT 216 kb)

ESM (TXT 92 kb)

ESM (TXT 26 kb)

ESM (DOC 35 kb)

ESM (DOC 37 kb)

ESM (DOC 36 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Prediction of interactiveness of proteins and nucleic acids based on feature selections

Abstract

Access this article

Similar content being viewed by others

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation