Skip to main content

Bioinformatics Application: Predicting Protein Subcellular Localization by Applying Machine Learning

  • Chapter
Bioinformatics: A Concept-Based Introduction
  • 2924 Accesses

The subcellular localization of a protein is closely correlated with its function. Automatic prediction of subcellular localization based on protein sequence properties remains a challenging problem. Here, we propose a proteomic screening-based machine learning approach for interpreting differential detection of proteins in isolated organellar compartments by high-throughput mass spectrometry. The method deals with some core limitations existing in previous approaches, such as multi-compartmental ambiguity. When applied to a global-scale proteomic study, our method achieved an excellent overall accuracy of 80.5% and precision 75.1% for four major organellar compartments (cytosol, membranes, mitochondria, and nucleus). The classifiers were able to predict the subcellular localization of 2390 previously uncharacterized proteins, 1370 of which were assigned to one or more compartments with at least 80% confidence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Andersen, J. S., Lam, Y. W., Leung, A. K., Ong, S. E., Lyon, C. E., Lamond, A. I., and Mann, M., 2005, Nucleolar proteome dynamics, Nature433:77–83.

    Article  PubMed  CAS  Google Scholar 

  • Beausoleil, S. A., Jedrychowski, M., Schwartz, D., Elias, J. E., Villen, J., Li, J., Cohn, M. A., Bradley, A. P., 1997, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognit. 30:1145–1159.

    Article  Google Scholar 

  • Boutell, M., Shen, X., Luo, J., and Brown, C, 2004, Learning multi-label semantic scene classification, Pattern Recognit, 37:1757–1771.

    Article  Google Scholar 

  • Breiman, L., 1996, Bagging predictor, Mach Learn 24:123–140.

    Google Scholar 

  • Cai, Y. D. and Chou, K. C, 2004, Predicting subcellular localization of proteins in a hybridization space, Bioinformatics 20:1151–1156.

    Article  PubMed  CAS  Google Scholar 

  • Cai, Y. D., Liu, X. J, Xu, X. B., and Chou, K. C, 2002, Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect, J. Cell. Biochem. 84:343–348.

    Article  PubMed  Google Scholar 

  • Chou, K. C, 2000, Prediction of protein subcellular locations by incorporating quasi-sequence-order effect, Biochem. Biophys. Res. Commun.278:477–483.

    Article  CAS  Google Scholar 

  • Chou, K. C. and Cai, Y. D., 2005, Predicting protein localization in budding yeast, Bioinformatics.21:994–950.

    Article  Google Scholar 

  • Chou, K. C. and Elrod, D. W., 1998, Using discriminant function for prediction of subcellular location of prokaryotic proteins, Biochem. Biophys. Res Commun.252:63–68.

    Article  PubMed  CAS  Google Scholar 

  • Dudoit, S., Fridlyand, J., and Speed T. P., 2002, Comparison of discrimination methods for the classification of tumors using gene expression data, JAmer StatAssoc.97:77–87.

    CAS  Google Scholar 

  • Hastie, T., Tibshirani, R., and Friedman, J., 2001, The elements of statistical learning. New York: Springer.

    Google Scholar 

  • Hua, S. and Sun, Z., 2001 Support vector machine approach for protein subcellular localization prediction, Bioinformatics.17:721–728.

    Article  PubMed  CAS  Google Scholar 

  • Huang, Y. and Li, Y., 2004, Prediction of protein subcellular localizations using fuzzy k-NN method, Bioinformatics.20:21–28.

    Article  PubMed  CAS  Google Scholar 

  • Kislinger, T., and Emili, A., 2003, Going global: protein expression profiling using shotgun mass spectrometry, Curr OpinMol Ther.5:285–293.

    CAS  Google Scholar 

  • Kislinger, T., Rahman, K., Radulovic, D., Cox, B., Rossant, J., and Emili, A., 2003, PRISM, a Generic Large Scale Proteomic Investigation Strategy for Mammals, Mol Cell Proteomics. 2:96–106.

    Article  PubMed  CAS  Google Scholar 

  • Krapfenbauer, K., Fountoulakis, M., and Lubec, G., 2003, A rat brain protein expression map including cytosolic and enriched mitochondrial and microsomal fractions, Electrophoresis. 24:1847–1870.

    Google Scholar 

  • Liu, H., Sadygov, R G., and Yates, J. R., 3rd, 2004, A model for random sampling and estimation of relative protein abundance in shotgun proteomics, Anal Chem.76: 4193–4201.

    Article  PubMed  CAS  Google Scholar 

  • Lu, Z., Szafron, D., Greiner, R., Lu, P., Wishart, D.S., Poulin, B., Anvik, I, Macdonell, C, and Eisner, R., 2004, Predicting subcellular localization of proteins using machine-learned classifiers, Bioinformatics.20:547–556.

    Article  PubMed  CAS  Google Scholar 

  • Mitchell, T.M., 1997, Machine Learning. McGraw-Hill, N.Y.

    Google Scholar 

  • Mootha, V. K., Bunkenborg, J., Olsen, J. V., Hjerrild, M., Wisniewski, J. R., Stahl, E., Bolouri, M. S., Ray, H. N., Sihag, S., Kamal, M., et al., 2003, Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria, Cell. 115:629–640.

    Article  PubMed  CAS  Google Scholar 

  • Mott, R., Schultz, J., Bork, P., and Ponting, C.P., 2002, Predicting protein cellular localization using a domain projection method, Genome Res.12:1168–1174.

    Article  PubMed  CAS  Google Scholar 

  • Nakai, K. and Kanehisa, M., 1992, A knowledge base for predicting protein localization sites in eukaryotic cells, Genomics.14:897–911.

    Article  PubMed  CAS  Google Scholar 

  • Nakashima, H. and Nishikawa, K., 1994, Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, J. Mol. Biol.238: 54–61.

    Article  PubMed  CAS  Google Scholar 

  • Nielsen, P. A., Olsen, J. V., Podtelejnikov, A. V., Andersen, J. R., Mann, M., and Wisniewski, J. R., 2005, Proteomic mapping of brain plasma membrane proteins, Mol Cell Proteomics.4:402–408.

    Article  PubMed  CAS  Google Scholar 

  • Park, J. K. and Kanehisa, M., 2003, Prediction of protein subcellular localizations by support vector machines using compositions of amino acids and amino acid pairs, Bioinformatics. 19:1656–1663.

    Article  PubMed  CAS  Google Scholar 

  • Reinhardt, A. and Hubbard, T., 1998, Using neural networks for prediction of the subcellular location of proteins, Nucleic Acids Res.26:2230–2236.

    Article  PubMed  CAS  Google Scholar 

  • Ripley, B. D., 1996, Pattern recognition and neural networks. Cambridge: Cambridge University Press.

    Google Scholar 

  • Schirmer, E. C, Florens, L., Guan, T., Yates, J. R., 3rd, and Gerace, L., 2005, Identification of novel integral membrane proteins of the nuclear envelope with potential disease links using subtractive proteomics, Novartis Found Symp,264:63-76; discussion 76-80, 227–230.

    Article  PubMed  CAS  Google Scholar 

  • Scott, M. S., Thomas, D. Y., and Hallett, M.T., 2004, Predicting subcellular localization via protein motif co-occurrence, GenomeRes,14:1957–1966.

    Google Scholar 

  • Tao, D. and Tang, X., 2004, Random sampling based SVM for relevance feedback image retrieval. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04),1063–1069.

    Google Scholar 

  • Weiss, G.M. and Provost, F., 2003, Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction, JArtiflntell Res.19:315–354.

    Google Scholar 

  • Wu, C. C, MacCoss, M. J, Howell, K. E., and Yates, J. R., 3rd, 2003, A method for the comprehensive proteomic analysis of membrane proteins, Nat Biotechnol.21:532–538.

    Article  PubMed  CAS  Google Scholar 

  • Wu, C. C, MacCoss, M. J., Mardones, G., Finnigan, C, Mogelsvang, S., Yates, J. R, 3rd, and Howell, K. E., 2004, Organellar proteomics reveals Golgi arginine dimethylation, Mol Biol Cell.15:2907–2919.

    Article  PubMed  CAS  Google Scholar 

  • Yates, J. R., 3rd, 2004, Mass spectral analysis in proteomics, Annu Rev Biophys Biomol Struc. 33:297–316.

    Article  CAS  Google Scholar 

  • Yeang, C. H., Ramaswamy, S., Tamayo, P., Mukherjee, S., Rifkin, R. M., Angelo, M., Reich, M., Lander, E., Mesirov, J., and Golub, T., 2001, Molecular classification of multiple tumor types, Bioinformatics.17 suppl., S316–S322.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Hu, P., Chung, C., Jiang, H., Emili, A. (2009). Bioinformatics Application: Predicting Protein Subcellular Localization by Applying Machine Learning. In: Mathura, V.S., Kangueane, P. (eds) Bioinformatics: A Concept-Based Introduction. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-84870-9_13

Download citation

Publish with us

Policies and ethics