Protein (Multi-)Location Prediction: Using Location Inter-dependencies in a Probabilistic Framework

  • Ramanuja Simha
  • Hagit Shatkay
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8126)


Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins, assuming that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems have attempted to predict multiple locations of proteins, they typically treat locations as independent or capture inter-dependencies by treating each locations-combination present in the training set as an individual location-class. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the multiple-location-prediction process, using a collection of Bayesian network classifiers. We evaluate our system on a dataset of single- and multi-localized proteins. Our results, obtained by incorporating inter-dependencies are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc + ), without restricting predictions to be based only on location-combinations present in the training set.


Bayesian Network Multiple Location Location Prediction Location Indicator Bayesian Network Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P.: Molecular Biology of the Cell, vol. 4. Garland Science (2002)Google Scholar
  2. 2.
    Rost, B., Liu, J., Nair, R., Wrzeszczynski, K., Ofran, Y.: Automatic prediction of protein function. Cellular and Molecular Life Sciences 60(12), 2637–2650 (2003)CrossRefGoogle Scholar
  3. 3.
    Bakheet, T., Doig, A.: Properties and identification of human protein drug targets. Bioinformatics 25(4), 451–457 (2009)CrossRefGoogle Scholar
  4. 4.
    Dreger, M.: Proteome analysis at the level of subcellular structures. Eur. J. Biochem. 270, 2083–2092 (2003)Google Scholar
  5. 5.
    Simpson, J., Wellenreuther, R., Poustka, A., Pepperkok, R., Wiemann, S.: Systematic subcellular localization of novel proteins identified by large-scale cdna sequencing. EMBO Rep. 1, 287–292 (2000)CrossRefGoogle Scholar
  6. 6.
    Hanson, M., Kohler, R.: Gfp imaging: methodology and application to investigate cellular compartmentation in plants. J. Exp. Bot. 52, 529–539 (2001)CrossRefGoogle Scholar
  7. 7.
    Nakai, K., Kanehisa, M.: Expert system for predicting protein localization sites in gram-negative bacteria. Proteins 11(2), 95–110 (1991)CrossRefGoogle Scholar
  8. 8.
    Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their n-terminal amino acid sequence. J. Mol. Biol. 300(4), 1005–1016 (2000)CrossRefGoogle Scholar
  9. 9.
    Rey, S., Gardy, J., Brinkman, F.: Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria. BMC Genomics 6, 162 (2005)CrossRefGoogle Scholar
  10. 10.
    Shatkay, H., Höglund, A., Brady, S., Blum, T., Dönnes, P., Kohlbacher, O.: Sherloc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. Bioinformatics 23, 1410–1417 (2007)CrossRefGoogle Scholar
  11. 11.
    Blum, T., Briesemeister, S., Kohlbacher, O.: Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 10, 274 (2009)CrossRefGoogle Scholar
  12. 12.
    Foster, L., de Hoog, C., Zhang, Y., Zhang, Y., Xie, X., Mootha, V., Mann, M.: A mammalian organelle map by protein correlation profiling. Cell 125, 187–199 (2006)CrossRefGoogle Scholar
  13. 13.
    Zhang, S., Xia, X., Shen, J., Zhou, Y., Sun, Z.: Dbmloc: a database of proteins with multiple subcellular localizations. BMC Bioinformatics 9, 127 (2008)CrossRefGoogle Scholar
  14. 14.
    Millar, A., Carrie, C., Pogson, B., Whelan, J.: Exploring the function-location nexus: using multiple lines of evidence in defining the subcellular location of plant proteins. Plant Cell 21(6), 1625–1631 (2009)CrossRefGoogle Scholar
  15. 15.
    Murphy, R.: Communicating subcellular distributions. Cytometry A 77(7), 686–692 (2010)Google Scholar
  16. 16.
    Pohlschroder, M., Hartmann, E., Hand, N., Dilks, K., Haddad, A.: Diversity and evolution of protein translocation. Annu. Rev. Microbiol. 59, 91–111 (2005)CrossRefGoogle Scholar
  17. 17.
    Rea, S., James, D.: Moving glut4: The biogenesis and trafficking of glut4 storage vesicles. Diabetes 46(11), 1667–1677 (1997)CrossRefGoogle Scholar
  18. 18.
    Russell, R., Bergeron, R., Shulman, G., Young, H.: Translocation of myocardial glut-4 and increased glucose uptake through activation of ampk by aicar. Am. J. Physiol. 9, H643–H649 (1997)Google Scholar
  19. 19.
    King, B., Guda, C.: ngloc: an n-gram-based bayesian method for estimating the subcellular proteomes of eukaryotes. Genome Biology 8, 3963–3969 (2007)CrossRefGoogle Scholar
  20. 20.
    Li, L., Zhang, Y., Zou, L., Zhou, Y., Zheng, X.: Prediction of protein subcellular multi-localization based on the general form of chou’s pseudo amino acid composition. Protein Pept. Lett. 19(4), 375–387 (2012)CrossRefGoogle Scholar
  21. 21.
    Chou, K., Wu, Z., Xiao, X.: iloc-euk: A multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS ONE 6(3), e18258 (2011)Google Scholar
  22. 22.
    Chou, K., Wu, Z., Xiao, X.: iloc-hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol. Biosyst. 8(2), 629–641 (2012)CrossRefGoogle Scholar
  23. 23.
    Wu, Z., Xiao, X., Chou, K.: iloc-plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. Mol. Biosyst. 7(12), 3287–3297 (2011)CrossRefGoogle Scholar
  24. 24.
    Xiao, X., Wu, Z., Chou, K.: iloc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J. Th. Bio. 284, 42–51 (2011)CrossRefGoogle Scholar
  25. 25.
    Xiao, X., Wu, Z., Chou, K.: A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS ONE 6, e20592 (2011)Google Scholar
  26. 26.
    Wu, Z., Xiao, X., Chou, K.: iloc-gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex gram-positive bacterial proteins. Protein Pept. Lett. 19, 4–14 (2012)CrossRefGoogle Scholar
  27. 27.
    Lin, H., Chen, C., Sung, T., Ho, S., Hsu, W.: Protein subcellular localization prediction of eukaryotes using a knowledge-based approach. BMC Bioinformatics 10, 8 (2009)CrossRefGoogle Scholar
  28. 28.
    He, J., Gu, H., Liu, W.: Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PLoS ONE 7, e37155 (2012)Google Scholar
  29. 29.
    Briesemeister, S., Rahnenfuhrer, J., Kohlbacher, O.: Going from where to why - interpretable prediction of protein subcellular localization. Bioinformatics 26, 1232–1238 (2010)CrossRefGoogle Scholar
  30. 30.
    Mitchell, T.: Machine Learning, 1st edn. McGraw-Hill, Inc., New York (1997)Google Scholar
  31. 31.
    Grossman, D., Domingos, P.: Learning bayesian network classifiers by maximizing conditional likelihood. In: ICML, pp. 361–368. ACM (2004)Google Scholar
  32. 32.
    Höglund, A., Dönnes, P., Blum, T., Adolph, H., Kohlbacher, O.: Multiloc: prediction of protein subcellular localization using n-terminal targeting sequences, sequence motifs, and amino acid composition. Bioinformatics 22, 1158–1165 (2006)CrossRefGoogle Scholar
  33. 33.
    Garg, A., Raghava, G.: Eslpred2: improved method for predicting subcellular localization of eukaryotic proteins. BMC Bioinformatics 9(1), 503 (2008)CrossRefGoogle Scholar
  34. 34.
    Huang, W., Tung, C., Ho, S., Hwang, S., Ho, S.: Proloc-go: Utilizing informative gene ontology terms for sequence-based prediction of protein subcellular localization. BMC Bioinformatics 9 (2008)Google Scholar
  35. 35.
    Chou, K., Shen, H.: A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mploc 2.0. PLoS ONE 5, e9931 (2010)Google Scholar
  36. 36.
    Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using bayesian networks to analyze expression data. J. Comput. Biol. 7(3-4), 601–620 (2000)CrossRefGoogle Scholar
  37. 37.
    Segal, E., Taskar, B., Gasch, A., Friedman, N., Koller, D.: Rich probabilistic models for gene expression. Bioinformatics 17(suppl. 1), S243–S252 (2001)Google Scholar
  38. 38.
    Lee, P., Shatkay, H.: Bntagger: improved tagging snp selection using bayesian networks. Bioinformatics 22(14), e211–e219 (2006)Google Scholar
  39. 39.
    Jensen, F., Nielsen, T.: Bayesian Networks and Decision Graphs, 2nd edn. Springer Publishing Company, Incorporated (2007)Google Scholar
  40. 40.
    Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp. 1022–1029 (1993)Google Scholar
  41. 41.
    Heckerman, D., Chickering, D.: Learning Bayesian networks: The combination of knowledge and statistical data. Kluwer Academic Publishers, Boston (1995)Google Scholar
  42. 42.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, F., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12, 2825–2830 (2011)Google Scholar
  43. 43.
    Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. IJDWM 3, 1–13 (2007)Google Scholar
  44. 44.
    Russell, S., Norvig, P.: Artificial Intelligence - A Modern Approach, 3rd edn. Pearson Education (2010)Google Scholar
  45. 45.
    Chou, K., Shen, H.: A fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J. Proteome Res. 6, 1728–1734 (2007)CrossRefGoogle Scholar
  46. 46.
    Horton, P., Park, K., Obayashi, T., Fujita, N., Harada, H., Adams-Collier, C., Nakai, K.: WoLF PSORT: Protein localization predictor. Nucleic Acids Research 35, W585–W587 (2007)Google Scholar
  47. 47.
    DeGroot, M.: Probability and Statistics, 2nd edn. Addison-Wesley (1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ramanuja Simha
    • 1
  • Hagit Shatkay
    • 1
    • 2
    • 3
  1. 1.Department of Computer and Information SciencesUniversity of DelawareNewarkUSA
  2. 2.Center for Bioinformatics and Computational Biology, DBIUniversity of DelawareNewarkUSA
  3. 3.School of ComputingQueen’s UniversityKingstonCanada

Personalised recommendations