Predicting subcellular location of proteins using integrated-algorithm method

Cai, Yu-Dong; Lu, Lin; Chen, Lei; He, Jian-Feng

doi:10.1007/s11030-009-9182-4

Predicting subcellular location of proteins using integrated-algorithm method

Full-Length Paper
Published: 07 August 2009

Volume 14, pages 551–558, (2010)
Cite this article

Molecular Diversity Aims and scope Submit manuscript

Yu-Dong Cai^1,2,
Lin Lu³,
Lei Chen⁴ &
…
Jian-Feng He³

145 Accesses
19 Citations
Explore all metrics

Abstract

Protein’s subcellular location, which indicates where a protein resides in a cell, is an important characteristic of protein. Correctly assigning proteins to their subcellular locations would be of great help to the prediction of proteins’ function, genome annotation, and drug design. Yet, in spite of great technical advance in the past decades, it is still time-consuming and laborious to experimentally determine protein subcellular locations on a high throughput scale. Hence, four integrated-algorithm methods were developed to fulfill such high throughput prediction in this article. Two data sets taken from the literature (Chou and Elrod, Protein Eng 12:107–118, 1999) were used as training set and test set, which consisted of 2,391 and 2,598 proteins, respectively. Amino acid composition was applied to represent the protein sequences. The jackknife cross-validation was used to test the training set. The final best integrated-algorithm predictor was constructed by integrating 10 algorithms in Weka (a software tool for tackling data mining tasks, http://www.cs.waikato.ac.nz/ml/weka/) based on an mRMR (Minimum Redundancy Maximum Relevance, http://research.janelia.org/peng/proj/mRMR/) method. It can achieve correct rate of 77.83 and 80.56% for the training set and test set, respectively, which is better than all of the 60 algorithms collected in Weka. This predicting software is available upon request.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Subcellular Localization Predictor for Human Proteins Considering the Correlation of Annotation Features and Protein Multi-localization

Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier

Article Open access 21 June 2016

Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features

Article 09 September 2017

References

Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12: 107–118
Article CAS PubMed Google Scholar
Eisenhaber F, Bork P (1998) Wanted: subcellular localization of proteins based on sequence. Trends Cell Biol 8: 169–170
Article CAS PubMed Google Scholar
Hua S, Sun Z (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17: 721–728
Article CAS PubMed Google Scholar
Yuan Z (1999) Prediction of protein subcellular locations using Markov chain models. FEBS Lett 451: 23–26
Article CAS PubMed Google Scholar
Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26: 2230–2236
Article CAS PubMed Google Scholar
Frank E, Witten IH (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco
Google Scholar
Gewehr JE, Szugat M, Zimmer R (2007) BioWeka—extending the Weka framework for bioinformatics. Bioinformatics 23: 651–653
Article CAS PubMed Google Scholar
Gonzalez-Diaz H, Aguero-Chapin G, Varona J, Molina R, Delogu G, Santana L, Uriarte E, Podda G (2007) 2D-RNA-coupling numbers: a new computational chemistry approach to link secondary structure topology with biological function. J Comput Chem 28: 1049–1056
Article CAS PubMed Google Scholar
Munteanu CR, Gonzalez-Diaz H, Magalhaes AL (2008) Enzymes/ non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. J Theor Biol 254: 476–482
Article CAS PubMed Google Scholar
Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Analysis Mach Intell 27: 1226–1238
Article Google Scholar
Cai YD, Chou KC (2006) Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. J Theor Biol 238: 395–400
Article CAS PubMed Google Scholar
Won HH, Kim MJ, Kim S, Kim JW (2008) EnsemPro: an ensemble approach to predicting transcription start sites in human genomic DNA sequences. Genomics 91: 259–266
Article CAS PubMed Google Scholar
Cedano J, Aloy P, Perez-Pons JA, Querol E (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266: 594–600
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Institute of System Biology, Shanghai University, 99 ShangDa Road, 200244, Shanghai, China
Yu-Dong Cai
Centre for Computational Systems Biology, Fudan University, 220 HanDan Road, 200433, Shanghai, China
Yu-Dong Cai
Department of Biomedical Engineering, Shanghai Jiao Tong University, 200040, Shanghai, China
Lin Lu & Jian-Feng He
Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, 200062, Shanghai, China
Lei Chen

Authors

Yu-Dong Cai
View author publications
You can also search for this author in PubMed Google Scholar
Lin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Feng He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu-Dong Cai.

Additional information

Yu-Dong Cai and Lin Lu are contribute equally to this work.

Electronic Supplementary Material

The Below is the Electronic Supplementary Material.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, YD., Lu, L., Chen, L. et al. Predicting subcellular location of proteins using integrated-algorithm method. Mol Divers 14, 551–558 (2010). https://doi.org/10.1007/s11030-009-9182-4

Download citation

Received: 29 April 2009
Accepted: 11 July 2009
Published: 07 August 2009
Issue Date: August 2010
DOI: https://doi.org/10.1007/s11030-009-9182-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting subcellular location of proteins using integrated-algorithm method

Abstract

Access this article

Similar content being viewed by others

A New Subcellular Localization Predictor for Human Proteins Considering the Correlation of Annotation Features and Protein Multi-localization

Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier

Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

ESM 1 (TXT 778 kb)

ESM (DOC 53.5 kb)

ESM (DOC 398 kb)

ESM (DOC 432 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting subcellular location of proteins using integrated-algorithm method

Abstract

Access this article

Similar content being viewed by others

A New Subcellular Localization Predictor for Human Proteins Considering the Correlation of Annotation Features and Protein Multi-localization

Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier

Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

ESM 1 (TXT 778 kb)

ESM (DOC 53.5 kb)

ESM (DOC 398 kb)

ESM (DOC 432 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation