The subcellular localization of a protein is closely correlated with its function. Automatic prediction of subcellular localization based on protein sequence properties remains a challenging problem. Here, we propose a proteomic screening-based machine learning approach for interpreting differential detection of proteins in isolated organellar compartments by high-throughput mass spectrometry. The method deals with some core limitations existing in previous approaches, such as multi-compartmental ambiguity. When applied to a global-scale proteomic study, our method achieved an excellent overall accuracy of 80.5% and precision 75.1% for four major organellar compartments (cytosol, membranes, mitochondria, and nucleus). The classifiers were able to predict the subcellular localization of 2390 previously uncharacterized proteins, 1370 of which were assigned to one or more compartments with at least 80% confidence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andersen, J. S., Lam, Y. W., Leung, A. K., Ong, S. E., Lyon, C. E., Lamond, A. I., and Mann, M., 2005, Nucleolar proteome dynamics, Nature433:77–83.
Beausoleil, S. A., Jedrychowski, M., Schwartz, D., Elias, J. E., Villen, J., Li, J., Cohn, M. A., Bradley, A. P., 1997, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognit. 30:1145–1159.
Boutell, M., Shen, X., Luo, J., and Brown, C, 2004, Learning multi-label semantic scene classification, Pattern Recognit, 37:1757–1771.
Breiman, L., 1996, Bagging predictor, Mach Learn 24:123–140.
Cai, Y. D. and Chou, K. C, 2004, Predicting subcellular localization of proteins in a hybridization space, Bioinformatics 20:1151–1156.
Cai, Y. D., Liu, X. J, Xu, X. B., and Chou, K. C, 2002, Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect, J. Cell. Biochem. 84:343–348.
Chou, K. C, 2000, Prediction of protein subcellular locations by incorporating quasi-sequence-order effect, Biochem. Biophys. Res. Commun.278:477–483.
Chou, K. C. and Cai, Y. D., 2005, Predicting protein localization in budding yeast, Bioinformatics.21:994–950.
Chou, K. C. and Elrod, D. W., 1998, Using discriminant function for prediction of subcellular location of prokaryotic proteins, Biochem. Biophys. Res Commun.252:63–68.
Dudoit, S., Fridlyand, J., and Speed T. P., 2002, Comparison of discrimination methods for the classification of tumors using gene expression data, JAmer StatAssoc.97:77–87.
Hastie, T., Tibshirani, R., and Friedman, J., 2001, The elements of statistical learning. New York: Springer.
Hua, S. and Sun, Z., 2001 Support vector machine approach for protein subcellular localization prediction, Bioinformatics.17:721–728.
Huang, Y. and Li, Y., 2004, Prediction of protein subcellular localizations using fuzzy k-NN method, Bioinformatics.20:21–28.
Kislinger, T., and Emili, A., 2003, Going global: protein expression profiling using shotgun mass spectrometry, Curr OpinMol Ther.5:285–293.
Kislinger, T., Rahman, K., Radulovic, D., Cox, B., Rossant, J., and Emili, A., 2003, PRISM, a Generic Large Scale Proteomic Investigation Strategy for Mammals, Mol Cell Proteomics. 2:96–106.
Krapfenbauer, K., Fountoulakis, M., and Lubec, G., 2003, A rat brain protein expression map including cytosolic and enriched mitochondrial and microsomal fractions, Electrophoresis. 24:1847–1870.
Liu, H., Sadygov, R G., and Yates, J. R., 3rd, 2004, A model for random sampling and estimation of relative protein abundance in shotgun proteomics, Anal Chem.76: 4193–4201.
Lu, Z., Szafron, D., Greiner, R., Lu, P., Wishart, D.S., Poulin, B., Anvik, I, Macdonell, C, and Eisner, R., 2004, Predicting subcellular localization of proteins using machine-learned classifiers, Bioinformatics.20:547–556.
Mitchell, T.M., 1997, Machine Learning. McGraw-Hill, N.Y.
Mootha, V. K., Bunkenborg, J., Olsen, J. V., Hjerrild, M., Wisniewski, J. R., Stahl, E., Bolouri, M. S., Ray, H. N., Sihag, S., Kamal, M., et al., 2003, Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria, Cell. 115:629–640.
Mott, R., Schultz, J., Bork, P., and Ponting, C.P., 2002, Predicting protein cellular localization using a domain projection method, Genome Res.12:1168–1174.
Nakai, K. and Kanehisa, M., 1992, A knowledge base for predicting protein localization sites in eukaryotic cells, Genomics.14:897–911.
Nakashima, H. and Nishikawa, K., 1994, Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, J. Mol. Biol.238: 54–61.
Nielsen, P. A., Olsen, J. V., Podtelejnikov, A. V., Andersen, J. R., Mann, M., and Wisniewski, J. R., 2005, Proteomic mapping of brain plasma membrane proteins, Mol Cell Proteomics.4:402–408.
Park, J. K. and Kanehisa, M., 2003, Prediction of protein subcellular localizations by support vector machines using compositions of amino acids and amino acid pairs, Bioinformatics. 19:1656–1663.
Reinhardt, A. and Hubbard, T., 1998, Using neural networks for prediction of the subcellular location of proteins, Nucleic Acids Res.26:2230–2236.
Ripley, B. D., 1996, Pattern recognition and neural networks. Cambridge: Cambridge University Press.
Schirmer, E. C, Florens, L., Guan, T., Yates, J. R., 3rd, and Gerace, L., 2005, Identification of novel integral membrane proteins of the nuclear envelope with potential disease links using subtractive proteomics, Novartis Found Symp,264:63-76; discussion 76-80, 227–230.
Scott, M. S., Thomas, D. Y., and Hallett, M.T., 2004, Predicting subcellular localization via protein motif co-occurrence, GenomeRes,14:1957–1966.
Tao, D. and Tang, X., 2004, Random sampling based SVM for relevance feedback image retrieval. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04),1063–1069.
Weiss, G.M. and Provost, F., 2003, Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction, JArtiflntell Res.19:315–354.
Wu, C. C, MacCoss, M. J, Howell, K. E., and Yates, J. R., 3rd, 2003, A method for the comprehensive proteomic analysis of membrane proteins, Nat Biotechnol.21:532–538.
Wu, C. C, MacCoss, M. J., Mardones, G., Finnigan, C, Mogelsvang, S., Yates, J. R, 3rd, and Howell, K. E., 2004, Organellar proteomics reveals Golgi arginine dimethylation, Mol Biol Cell.15:2907–2919.
Yates, J. R., 3rd, 2004, Mass spectral analysis in proteomics, Annu Rev Biophys Biomol Struc. 33:297–316.
Yeang, C. H., Ramaswamy, S., Tamayo, P., Mukherjee, S., Rifkin, R. M., Angelo, M., Reich, M., Lander, E., Mesirov, J., and Golub, T., 2001, Molecular classification of multiple tumor types, Bioinformatics.17 suppl., S316–S322.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Hu, P., Chung, C., Jiang, H., Emili, A. (2009). Bioinformatics Application: Predicting Protein Subcellular Localization by Applying Machine Learning. In: Mathura, V.S., Kangueane, P. (eds) Bioinformatics: A Concept-Based Introduction. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-84870-9_13
Download citation
DOI: https://doi.org/10.1007/978-0-387-84870-9_13
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-84869-3
Online ISBN: 978-0-387-84870-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)