Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs

  • Tien-ho Lin
  • Ziv Bar-Joseph
  • Robert F. Murphy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6577)


Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this paper we present a new method that integrates sequence, motif and protein interaction data to model how proteins are sorted through these targeting pathways. We use a hidden Markov model (HMM) to represent protein targeting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms.

Supplementary results and software implementation are available from


Hide Markov Model Emission Probability Primary Hyperoxaluria Hide Markov Model Model Protein Interaction Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Su, L.: The Universal Protein Resource (UniProt). Nucleic Acids Res. 33(Database issue), D154–D159 (2005), CrossRefGoogle Scholar
  2. 2.
    Bannai, H., Tamada, Y., Maruyama, O., Nakai, K., Miyano, S.: Extensive feature detection of n-terminal protein sorting signals. Bioinformatics 18(2), 298–305 (2002)CrossRefGoogle Scholar
  3. 3.
    Barbe, L., Lundberg, E., Oksvold, P., Stenius, A., Lewin, E., Björling, E., Asplund, A., Pontén, F., Brismar, H., Uhlén, M., Svahn, H.A.: Toward a confocal subcellular atlas of the human proteome. Mol. Cell Proteomics 7(3), 499–508 (2008), CrossRefGoogle Scholar
  4. 4.
    Bendtsen, J.D., Jensen, L.J., Blom, N., Von Heijne, G., Brunak, S.: Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng. Des. Sel. 17(4), 349–356 (2004), CrossRefGoogle Scholar
  5. 5.
    Bendtsen, J.D., Nielsen, H., von Heijne, G., Brunak, S.: Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340(4), 783–795 (2004), CrossRefGoogle Scholar
  6. 6.
    Chen, S.C., Zhao, T., Gordon, G.J., Murphy, R.F.: Automated image analysis of protein localization in budding yeast. Bioinformatics 23(13), i66–i71 (2007), CrossRefGoogle Scholar
  7. 7.
    Cherry, J.M., Adler, C., Ball, C., Chervitz, S.A., Dwight, S.S., Hester, E.T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., Weng, S., Botstein, D.: SGD: Saccharomyces genome database. Nucleic Acids Research 26(1), 73–79 (1998), CrossRefGoogle Scholar
  8. 8.
    Cohen, A.A., Geva-Zatorsky, N., Eden, E., Frenkel-Morgenstern, M., Issaeva, I., Sigal, A., Milo, R., Cohen-Saidon, C., Liron, Y., Kam, Z., Cohen, L., Danon, T., Perzov, N., Alon, U.: Dynamic proteomics of individual cancer cells in response to a drug. Science 322(5907), 1511–1516 (2008), CrossRefGoogle Scholar
  9. 9.
    De Strooper, B., Beullens, M., Contreras, B., Levesque, L., Craessaerts, K., Cordell, B., Moechars, D., Bollen, M., Fraser, P., St. George-Hyslop, P., Van Leuven, F.: Phosphorylation, subcellular localization, and membrane orientation of the Alzheimer’s disease-associated presenilins. Journal of Biological Chemistry 272(6), 3590–3598 (1997), CrossRefGoogle Scholar
  10. 10.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977),, doi:10.2307/2984875MathSciNetzbMATHGoogle Scholar
  11. 11.
    Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300(4), 1005–1016 (2000), CrossRefGoogle Scholar
  12. 12.
    Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Analysis & Applications 13(1), 113–129 (2010), MathSciNetCrossRefGoogle Scholar
  13. 13.
    Gladden, A.B., Diehl, A.A.: Location, location, location: the role of cyclin D1 nuclear localization in cancer. Journal of cellular biochemistry 96(5), 906–913 (2005), CrossRefGoogle Scholar
  14. 14.
    Horton, P., Park, K.J., Obayashi, T., Fujita, N., Harada, H., Collier, C.J.A., Nakai, K.: WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35(Web Server issue), W585–W587 (2007), CrossRefGoogle Scholar
  15. 15.
    Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Howson, R.W., Weissman, J.S., O’Shea, E.K.: Global analysis of protein localization in budding yeast. Nature 425(6959), 686–691 (2003), CrossRefGoogle Scholar
  16. 16.
    Kau, T.R., Way, J.C., Silver, P.A.: Nuclear transport and cancer: from mechanism to intervention. Nat. Rev. Cancer 4(2), 106–117 (2004), CrossRefGoogle Scholar
  17. 17.
    Lee, K., Chuang, H.Y., Beyer, A., Sung, M.K., Huh, W.K., Lee, B., Ideker, T.: Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species. Nucleic Acids Research 36(20), e136+ (2008), CrossRefGoogle Scholar
  18. 18.
    Lin, T.H., Murphy, R.F., Bar-Joseph, Z.: Discriminative motif finding for predicting protein subcellular localization. IEEE/ACM Trans. Comput. Biol. Bioinform. (2009) (to appear)Google Scholar
  19. 19.
    Lodish, H.F.: Molecular cell biology, 5threv. edn. W.H. Freeman and Company, New York (August 2003), Google Scholar
  20. 20.
    Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Barrell, D., Bateman, A., Binns, D., Biswas, M., Bradley, P., Bork, P., Bucher, P., Copley, R.R., Courcelle, E., Das, U., Durbin, R., Falquet, L., Fleischmann, W., Jones, S.G., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lopez, R., Letunic, I., Lonsdale, D., Silventoinen, V., Orchard, S.E., Pagni, M., Peyruc, D., Ponting, C.P., Selengut, J.D., Servant, F., Sigrist, C.J.A., Vaughan, R., Zdobnov, E.M.: The InterPro database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31(1), 315–318 (2003)CrossRefGoogle Scholar
  21. 21.
    Nair, R., Rost, B.: Mimicking cellular sorting improves prediction of subcellular localization. J. Mol. Biol. 348(1), 85–100 (2005), CrossRefGoogle Scholar
  22. 22.
    Newberg, J.Y., Li, J., Rao, A., Pontén, F., Uhlén, M., Lundberg, E., Murphy, R.F.: Automated analysis of human protein atlas immunofluorescence images. In: Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging, pp. 1023–1026 (2009)Google Scholar
  23. 23.
    Osuna, E.G., Hua, J., Bateman, N.W., Zhao, T., Berget, P.B., Murphy, R.F.: Large-scale automated analysis of location patterns in randomly tagged 3T3 cells. Ann. Biomed. Eng. 35(6), 1081–1087 (2007), CrossRefGoogle Scholar
  24. 24.
    Pierleoni, A., Martelli, P.L., Fariselli, P., Casadio, R.: Bacello: a balanced subcellular localization predictor. Bioinformatics 22 (2006),
  25. 25.
    Purdue, P.E., Takada, Y., Danpure, C.J.: Identification of mutations associated with peroxisome-to-mitochondrion mistargeting of alanine/glyoxylate aminotransferase in primary hyperoxaluria type 1. J. Cell Biol. 111(6), 2341–2351 (1990), CrossRefGoogle Scholar
  26. 26.
    Rashid, M., Saha, S., Raghava, G.P.: Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs. BMC Bioinformatics 8, 337 (2007), CrossRefGoogle Scholar
  27. 27.
    Rubartelli, A., Sitia, R.: Secretion of mammalian proteins that lack a signal sequence. In: Unusual Secretory Pathways: From Bacteria to Man, pp. 87–104. RG Landes, Austin (1997)CrossRefGoogle Scholar
  28. 28.
    Scott, M.S., Calafell, S.J., Thomas, D.Y., Hallett, M.T.: Refining protein subcellular localization. PLoS Comput. Biol. 1(6) (November 2005),
  29. 29.
    Shatkay, H., Höglund, A., Brady, S., Blum, T., Dönnes, P., Kohlbacher, O.: SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. Bioinformatics 23(11), 1410–1417 (2007), CrossRefGoogle Scholar
  30. 30.
    Shen, Y.Q., Burger, G.: ’unite and conquer’: enhanced prediction of protein subcellular localization by integrating multiple specialized tools. BMC Bioinformatics 8, 420+ (2007), CrossRefGoogle Scholar
  31. 31.
    Sinha, S.: On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22(14), e454–e463 (2006), CrossRefGoogle Scholar
  32. 32.
    Skach, W.R.: Defects in processing and trafficking of the cystic fibrosis transmembrane conductance regulator. Kidney International 57(3), 825–831 (2000), CrossRefGoogle Scholar
  33. 33.
    Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Research 34(suppl 1), D535–D539 (2006), CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tien-ho Lin
    • 1
  • Ziv Bar-Joseph
    • 1
  • Robert F. Murphy
    • 1
  1. 1.Lane Center for Computational Biology, School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations