Advertisement

Bioinformatics Predictions of Localization and Targeting

  • Shruti Rastogi
  • Burkhard Rost
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 619)

Abstract

One of the major challenges in the post-genomic era with hundreds of genomes sequenced is the annotation of protein structure and function. Computational predictions of subcellular localization are an important step toward this end. The development of computational tools that predict targeting and localization has, therefore, been a very active area of research, in particular since the first release of the groundbreaking program PSORT in 1991. The most reliable means of annotating protein structure and function remains homology-based inference, i.e. the transfer of experimental annotations from one protein to its homologs. However, annotations about localization demonstrate how much can be gained from advanced machine learning: more proteins can be annotated more reliably. Contemporary computational tools for the annotation of protein targeting include automatic methods that mine the textual information from the biological literature and molecular biology databases. Some machine learning-based methods that accurately predict features of sorting signals and that use sequence-derived features to predict localization have reached remarkable levels of performance. Sustained prediction accuracy has increased by more than 30 percentage points over the last decade. Here, we review some of the most recent methods for the prediction of subcellular localization and protein targeting that contributed toward this breakthrough.

Key words

Protein subcellular localization prediction sorting signals neural networks support vector machines hidden Markov models amino acid composition text analysis 

Abbreviations

NLS

nuclear localization signal

NN

neural networks

HMM

hidden Markov model

SP

signal peptide

SVM

support vector machine.

Notes

Acknowledgements

The work of SR and BR was supported by the grant R01-GM079767 from the National Institute of General Medical Sciences (NIGMS) at the NIH. Last but not least, thanks to Amos Bairoch (SIB, Geneva), Rolf Apweiler (EBI, Hinxton), Phil Bourne (San Diego Univ.), and their crews for maintaining excellent databases and to all experimentalists who enabled this analysis by making their data publicly available.

References

  1. 1.
    Rost, B., Liu, J., Nair, R. et al. (2003) Automatic prediction of protein function, Cel Mol Life Sci, 60, 2637–2650.CrossRefGoogle Scholar
  2. 2.
    Sharan, R., Ulitsky, I., Shamir, R. (2007) Network-based prediction of protein function, Mol Syst Biol, 3, 88.PubMedCrossRefGoogle Scholar
  3. 3.
    Smith, T.F. (1998) Functional genomics–bioinformatics is ready for the challenge, Trends Genet, 14, 291–293.PubMedCrossRefGoogle Scholar
  4. 4.
    Koonin, E. V. (2005) Orthologs, paralogs, and evolutionary genomics, Annu Rev Genet, 39, 309–338.PubMedCrossRefGoogle Scholar
  5. 5.
    Koonin, E. V., Wolf, Y. I. (2008) Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world, Nucleic Acids Res, 36, 6688–6719.PubMedCrossRefGoogle Scholar
  6. 6.
    Ashburner, M., Ball, C. A. , Blake, J. A. et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nature Genet, 25, 25–29.PubMedCrossRefGoogle Scholar
  7. 7.
    Camon, E., Barrell, D., Lee, V. et al. (2004) The Gene Ontology Annotation (GOA) Database–an integrated resource of GO annotations to the UniProt Knowledgebase, In Silico Biol, 4, 5–6.PubMedGoogle Scholar
  8. 8.
    Nair, R., Rost, B. (2008) Predicting protein subcellular localization using intelligent systems, Methods Mol Biol, 484, 435–463.PubMedCrossRefGoogle Scholar
  9. 9.
    Marion, J., Bach, L., Bellec, Y. et al. (2008) Systematic analysis of protein subcellular localization and interaction using high-throughput transient transformation of Arabidopsis seedlings, Plant J, 56, 169–179.PubMedCrossRefGoogle Scholar
  10. 10.
    Hu, Y.H., Vanhecke, D., Lehrach, H. et al. (2005) High-throughput subcellular protein localization using cell arrays, Biochem Soc Trans, 33, 1407–1408.PubMedCrossRefGoogle Scholar
  11. 11.
    Barrios-Rodiles, M., Brown, K.R., Ozdamar, B. et al. (2005) High-throughput mapping of a dynamic signaling network in mammalian cells, Science, 307, 1621–1625.PubMedCrossRefGoogle Scholar
  12. 12.
    Kumar, A., Agarwal, S., Heyman, J. A. et al. (2002) Subcellular localization of the yeast proteome, Genes Dev, 16, 707–719.PubMedCrossRefGoogle Scholar
  13. 13.
    Huh, W. K., Falvo, J. V., Gerke L. C. et al. (2003) Global analysis of protein localization in budding yeast, Nature, 425, 686–691.PubMedCrossRefGoogle Scholar
  14. 14.
    Rey, S., Gardy, J. L., Brinkman, F. S. (2005) Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria, BMC Genomics, 6, 162.PubMedCrossRefGoogle Scholar
  15. 15.
    Davis, T. N. (2004) Protein localization in proteomics, Curr Opin Chem Biol, 8, 49–53.PubMedCrossRefGoogle Scholar
  16. 16.
    Schneider, G., Fechner, U. (2004) Advances in the prediction of protein targeting signals, Proteomics, 4, 1571–1580.PubMedCrossRefGoogle Scholar
  17. 17.
    Gardy, J. L., Brinkman, F. S. (2006) Methods for predicting bacterial protein subcellular localization, Nat Rev Microbiol, 4, 741–751.PubMedCrossRefGoogle Scholar
  18. 18.
    Casadio, R., Martelli, P. L., Pierleoni, A. (2008) The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation, Brief Funct Genomic Proteomic, 7, 63–73.PubMedCrossRefGoogle Scholar
  19. 19.
    Nair, R., Rost, B. (2003) LOC3D: annotate sub-cellular localization for protein structures, Nucleic Acids Res, 31, 3337–3340.PubMedCrossRefGoogle Scholar
  20. 20.
    Nair, R., Rost, B. (2005) Mimicking cellular sorting improves prediction of subcellular localization, J Mol Biol, 348, 85–100.PubMedCrossRefGoogle Scholar
  21. 21.
    Lodish, H. (2004) Mol Cell Biol, 5th ed., WH Freeman, New York.Google Scholar
  22. 22.
    Blobel, G., Dobberstein, B. (1975) Transfer of proteins across membranes. II. Reconstitution of functional rough microsomes from heterologous components, J Cell Biol, 67, 852–862.PubMedCrossRefGoogle Scholar
  23. 23.
    Boulikas, T. (1993) Nuclear localization signals (NLS), Crit Rev Eukaryot Gene Expr, 3, 193–227.PubMedGoogle Scholar
  24. 24.
    Moroianu, J. (1999) Nuclear import and export: transport factors, mechanisms and regulation, Crit Rev Eukaryot Gene Expr, 9, 89–106.PubMedGoogle Scholar
  25. 25.
    Cokol, M., Nair, R., Rost, B. (2000) Finding nuclear localization signals, EMBO Rep, 1, 411–415.PubMedCrossRefGoogle Scholar
  26. 26.
    Nair, R., Carter, P., Rost, B. (2003) NLSdb: database of nuclear localization signals, Nucleic Acids Res, 31, 397–399.PubMedCrossRefGoogle Scholar
  27. 27.
    La Cour, T., Gupta, R., Rapacki, K. et al. (2003) NESbase version 1.0: a database of nuclear export signals, Nucleic Acids Res, 31, 393–396.PubMedCrossRefGoogle Scholar
  28. 28.
    Ofran, Y., Mysore, V., Rost, B. (2007) Prediction of DNA-binding residues from sequence, Bioinformatics, 23, i347–353.CrossRefGoogle Scholar
  29. 29.
    Holland, I. B., Schmitt, L., Young, J. (2005) Type 1 protein secretion in bacteria, the ABC-transporter dependent pathway (review), Mol Membr Biol, 22, 29–39.PubMedCrossRefGoogle Scholar
  30. 30.
    Pugsley, A. P. (1993) The complete general secretory pathway in gram-negative bacteria, Microbiol Rev, 57, 50–108.PubMedGoogle Scholar
  31. 31.
    Muller, M., Klosgen, R. B. (2005) The Tat pathway in bacteria and chloroplasts (review), Mol Membr Biol, 22, 113–121.PubMedCrossRefGoogle Scholar
  32. 32.
    Journet, L., Hughes, K. T., Cornelis, G.R. (2005) Type III secretion: a secretory pathway serving both motility and virulence (review), Mol Membr Biol, 22, 41–50.PubMedCrossRefGoogle Scholar
  33. 33.
    Christie, P. J., Cascales, E. (2005) Structural and dynamic properties of bacterial type IV secretion systems (review), Mol Membr Biol, 22, 51–61.PubMedCrossRefGoogle Scholar
  34. 34.
    Thanassi, D. G., Stathopoulos, C., Karkal, A. et al. (2005) Protein secretion in the absence of ATP: the autotransporter, two-partner secretion and chaperone/usher pathways of gram-negative bacteria (review), Mol Membr Biol, 22, 63–72.PubMedCrossRefGoogle Scholar
  35. 35.
    Ofran, Y., Punta, M., Schneider, R. et al. (2005) Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery, Drug DiscovToday, 10, 1475–1482.Google Scholar
  36. 36.
    Gierasch, L. M. (1989) Signal sequences, Biochemistry, 28, 923–930.PubMedCrossRefGoogle Scholar
  37. 37.
    Zheng, N., Gierasch L.M. (1996) Signal sequences: the same yet different, Cell, 86, 849–852.PubMedCrossRefGoogle Scholar
  38. 38.
    Nielsen, H., Engelbrecht, J., Brunak, S. et al. (1997) A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites, Int J Neural Syst, 8, 581–599.PubMedCrossRefGoogle Scholar
  39. 39.
    Nielsen, H., Engelbrecht, J., Brunak, S. et al. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites, Protein Eng, 10, 1–6.PubMedCrossRefGoogle Scholar
  40. 40.
    Nielsen, H., Engelbrecht, J., von Heijne, G. et al. (1996) Defining a similarity threshold for a functional protein sequence pattern: the signal peptide cleavage site, Proteins, 24, 165–177.PubMedCrossRefGoogle Scholar
  41. 41.
    von Heijne, G. (1983) Patterns of amino acids near signal-sequence cleavage sites, Eur J Biochem, 133, 17–21.CrossRefGoogle Scholar
  42. 42.
    Claros, M. G., Brunak, S., von Heijne, G. (1997) Prediction of N-terminal protein sorting signals, Curr Opin Struct Biol, 7, 394–398.PubMedCrossRefGoogle Scholar
  43. 43.
    Boeckmann, B., Bairoch, A., Apweiler, R. et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Res, 31, 365–370.PubMedCrossRefGoogle Scholar
  44. 44.
    Emanuelsson, O., Brunak, S., von Heijne, G. et al. (2007) Locating proteins in the cell using TargetP, SignalP and related tools, Nat Protoc, 2, 953–971.PubMedCrossRefGoogle Scholar
  45. 45.
    Laforet, G. A., Kendall, D. A. (1991) Functional limits of conformation, hydrophobicity, and steric constraints in prokaryotic signal peptide cleavage regions. Wild type transport by a simple polymeric signal sequence, J Biol Chem, 266, 1326–1334.PubMedGoogle Scholar
  46. 46.
    Nielsen, H., Brunak, S., von Heijne, G. (1999) Machine learning approaches for the prediction of signal peptides and other protein sorting signals, Protein Eng, 12, 3–9.PubMedCrossRefGoogle Scholar
  47. 47.
    Emanuelsson, O., Nielsen, H., Brunak, S. et al. (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence, J Mol Biol, 300, 1005–1016.PubMedCrossRefGoogle Scholar
  48. 48.
    Kall, L., Krogh, A., Sonnhammer, E.L. (2004) A combined transmembrane topology and signal peptide prediction method, J Mol Biol, 338, 1027–1036.PubMedCrossRefGoogle Scholar
  49. 49.
    Fujiwara, Y., Asogawa, M., Nakai, K. (1997) Prediction of Mitochondrial Targeting Signals Using Hidden Markov Model, Genome Inform Ser Workshop Genome Inform, 8, 53–60.PubMedGoogle Scholar
  50. 50.
    Emanuelsson, O., von Heijne, G. (2001) Prediction of organellar targeting signals, Biochim Biophys Acta, 1541, 114–119.PubMedCrossRefGoogle Scholar
  51. 51.
    Emanuelsson, O., Nielsen, H., von Heijne, G. (1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites, Protein Sci, 8, 978–984.PubMedCrossRefGoogle Scholar
  52. 52.
    Juncker, A. S., Willenbrock, H., Von Heijne, G. et al. (2003) Prediction of lipoprotein signal peptides in Gram-negative bacteria, Protein Sci, 12, 1652–1662.PubMedCrossRefGoogle Scholar
  53. 53.
    Bendtsen, J. D., Nielsen, H., Widdick, D. et al. (2005) Prediction of twin-arginine signal peptides, BMC Bioinformatics, 6, 167.PubMedCrossRefGoogle Scholar
  54. 54.
    Bendtsen, J.D., Kiemer, L., Fausboll, A. et al. (2005) Non-classical protein secretion in bacteria, BMC Microbiol, 5, 58.PubMedCrossRefGoogle Scholar
  55. 55.
    Nair, R., Rost, B. (2008) Protein subcellular localization prediction using artificial intelligence technology, Methods Mol Biol, 484, 435–463.PubMedCrossRefGoogle Scholar
  56. 56.
    Wrzeszczynski, K. O., Rost, B. (2004) Annotating proteins from endoplasmic reticulum and Golgi apparatus in eukaryotic proteomes, Cell Mol Life Sci, 61, 1341–1353.PubMedCrossRefGoogle Scholar
  57. 57.
    Bendtsen, J. D., Nielsen, H., von Heijne, G. et al. (2004) Improved prediction of signal peptides: SignalP 3.0, J Mol Biol, 340, 783–795.PubMedCrossRefGoogle Scholar
  58. 58.
    Berks, B. C. (1996) A common export pathway for proteins binding complex redox cofactors?, Mol Microbiol, 22, 393–404.PubMedCrossRefGoogle Scholar
  59. 59.
    Sargent, F., Bogsch, E. G., Stanley, N. R. et al. (1998) Overlapping functions of components of a bacterial Sec-independent protein export pathway, EMBO J, 17, 3640–3650.PubMedCrossRefGoogle Scholar
  60. 60.
    Weiner, J. H., Bilous, P. T., Shaw, G. M. et al. (1998) A novel and ubiquitous system for membrane targeting and secretion of cofactor-containing proteins, Cell, 93, 93–101.PubMedCrossRefGoogle Scholar
  61. 61.
    Berks, B. C., Palmer, T., Sargent, F. (2003) The Tat protein translocation pathway and its role in microbial physiology, Adv Microb Physiol, 47, 187–254.PubMedCrossRefGoogle Scholar
  62. 62.
    Yahr, T. L., Wickner, W.T. (2001) Functional reconstitution of bacterial Tat translocation in vitro, EMBO J, 20, 2472–2479.PubMedCrossRefGoogle Scholar
  63. 63.
    Cristobal, S., de Gier, J.W., Nielsen, H. et al. (1999) Competition between Sec- and TAT-dependent protein translocation in Escherichia coli, EMBO J, 18, 2982–2990.PubMedCrossRefGoogle Scholar
  64. 64.
    Mattaj, I. W., Englmeier, L. (1998) Nucleocytoplasmic transport: the soluble phase, Annu Rev Biochem, 67, 265–306.PubMedCrossRefGoogle Scholar
  65. 65.
    Tinland, B., Koukolikova-Nicola, Z., Hall, M. N. et al. (1992) The T-DNA-linked VirD2 protein contains two distinct functional nuclear localization signals, Proc Natl Acad Sci U S A, 89, 7442–7446.PubMedCrossRefGoogle Scholar
  66. 66.
    Moede, T., Leibiger, B., Pour, H. G. et al. (1999) Identification of a nuclear localization signal, RRMKWKK, in the homeodomain transcription factor PDX-1, FEBS Lett, 461, 229–234.PubMedCrossRefGoogle Scholar
  67. 67.
    Jans, D. A., Xiao, C. Y., Lam, M.H. (2000) Nuclear targeting signal recognition: a key control point in nuclear transport?, Bioessays, 22, 532–544.PubMedCrossRefGoogle Scholar
  68. 68.
    Berman, H. M., Westbrook, J., Feng, Z. et al. (2000) The Protein Data Bank, Nucleic Acids Res, 28, 235–242.PubMedCrossRefGoogle Scholar
  69. 69.
    Thornton, J. W., DeSalle, R. (2000) Gene family evolution and homology: genomics meets phylogenetics, Annu Rev Genomics Hum Genet, 1, 41–73.PubMedCrossRefGoogle Scholar
  70. 70.
    Whisstock, J. C., Lesk, A. M. (2003) Prediction of protein function from protein sequence and structure, Quart Rev Biophys, 36, 307–340.CrossRefGoogle Scholar
  71. 71.
    Baxter, S. M., Fetrow, J. S. (2001) Sequence- and structure-based protein function prediction from genomic information, Curr Opin Drug Discov Devel, 4, 291–295.PubMedGoogle Scholar
  72. 72.
    Rost, B., Liu, J., Nair, R. et al. (2003) Automatic prediction of protein function, Cell Mol Life Sci, 60, 2637–2650.PubMedCrossRefGoogle Scholar
  73. 73.
    Wass, M. N., Sternberg, M. J. (2008) ConFunc–functional annotation in the twilight zone, Bioinformatics, 24, 798–806.PubMedCrossRefGoogle Scholar
  74. 74.
    Ng, P., Nagarajan, N., Jones, N. et al. (2006) Apples to apples: improving the performance of motif finders and their significance analysis in the Twilight Zone, Bioinformatics, 22, e393–401.CrossRefGoogle Scholar
  75. 75.
    Nair, R., Rost, B. (2002) Sequence conserved for subcellular localization, Protein Sci, 11, 2836–2847.PubMedCrossRefGoogle Scholar
  76. 76.
    Orengo, C. A., Todd, A. E., Thornton, J.M. (1999) From protein structure to function, Curr Opin Struct Biol, 9, 374–382.PubMedCrossRefGoogle Scholar
  77. 77.
    Wilson, C. A., Kreychman, J., Gerstein, M. (2000) Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores, J Mol Biol, 297, 233–249.PubMedCrossRefGoogle Scholar
  78. 78.
    Rost, B. (2002) Enzyme function less conserved than anticipated, J Mol Biol, 318, 595–608.PubMedCrossRefGoogle Scholar
  79. 79.
    Pawlowski, K., Godzik, A. (2001) Surface map comparison: studying function diversity of homologous proteins, J Mol Biol, 309, 793–806.PubMedCrossRefGoogle Scholar
  80. 80.
    Gardy, J. L., Spencer, C., Wang, K. et al. (2003) PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria, Nucleic Acids Res, 31, 3613–3617.PubMedCrossRefGoogle Scholar
  81. 81.
    Alexandrov, N. N., Soloveyev, V. V. (1998), Statistical significance of ungapped sequence alignments, in Altman R.B., Dunker A.K., Hunter L. et al. Eds., HICCS’ 98: Pacific Symposium on Biocomputing’ 98, World Scientific, Maui, Hawaii, U.S.A., pp. 463–472.Google Scholar
  82. 82.
    Rost, B. (1999) Twilight zone of protein sequence alignments, Protein Eng, 12, 85–94.PubMedCrossRefGoogle Scholar
  83. 83.
    Wrzeszczynski, K. O., Rost, B. (2004) Annotating proteins from Endoplasmic reticulum and Golgi apparatus in eukaryotic proteomes, Cel Mol Life Sci, 61, 1341–1353.CrossRefGoogle Scholar
  84. 84.
    Pawlowski, K., Jaroszewski, L., Rychlewski, L. et al. (2000) Sensitive sequence comparison as protein function predictor, Pac Symp Biocomput, 8, 42–53.Google Scholar
  85. 85.
    Altschul, S. F., Gish, W., Miller, W. et al. (1990) Basic local alignment search tool, J Mol Biol, 215, 403–410.PubMedGoogle Scholar
  86. 86.
    Altschul, S. F., Madden, T. L., Schaffer, A. A. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res, 25, 3389–3402.PubMedCrossRefGoogle Scholar
  87. 87.
    Sander, C., Schneider, R. (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment, Proteins, 9, 56–68.PubMedCrossRefGoogle Scholar
  88. 88.
    Valencia, A., Pazos, F. (2002) Computational methods for the prediction of protein interactions, Curr Opin Struct Biol, 12, 368–373.PubMedCrossRefGoogle Scholar
  89. 89.
    Stapley, B. J., Kelley, L. A., Sternberg, M. J. (2002) Predicting the sub-cellular location of proteins from text using support vector machines, Pac Symp Biocomput, 374–385.Google Scholar
  90. 90.
    Mika, S., Rost B. (2004) Protein names precisely peeled off free text, Bioinformatics, 20, I241–I247.PubMedCrossRefGoogle Scholar
  91. 91.
    Mika, S., Rost, B. (2004) NLProt: extracting protein names and sequences from papers, Nucleic Acids Res, 32, W634–W637.PubMedCrossRefGoogle Scholar
  92. 92.
    Tamames, J., Ouzounis, C., Casari, G. et al. (1998) EUCLID: automatic classification of proteins in functional classes by their database annotations, Bioinformatics, 14, 542–543.PubMedCrossRefGoogle Scholar
  93. 93.
    Nair, R., Rost, B. (2002) Inferring sub-cellular localization through automated lexical analysis, Bioinformatics, 18 Suppl 1, S78–86.Google Scholar
  94. 94.
    Lu, Z., Szafron, D., Greiner, R. et al. (2004) Predicting subcellular localization of proteins using machine-learned classifiers, Bioinformatics, 20, 547–556.PubMedCrossRefGoogle Scholar
  95. 95.
    Eisenhaber, F., Bork, P. (1999) Evaluation of human-readable annotation in biomolecular sequence databases with biological rule libraries, Bioinformatics, 15, 528–535.PubMedCrossRefGoogle Scholar
  96. 96.
    Hatzivassiloglou, V., Duboue, P. A., Rzhetsky, A. (2001) Disambiguating proteins, genes, and RNA in text: a machine learning approach, Bioinformatics, 17 Suppl 1, S97–106.Google Scholar
  97. 97.
    Luscombe, N. M., Greenbaum, D., Gerstein, M. (2001) What is bioinformatics? A proposed definition and overview of the field, Methods Inf Med, 40, 346–358.PubMedGoogle Scholar
  98. 98.
    Lewis, D. D., Ringuitte, M. (1994) Comparison of two learning algorithms for text characterization, In proceeding of the Third Annual Symposium on Document Analysis and Information Retrival (SDAIR’ 94), 81–93.Google Scholar
  99. 99.
    Apte, C., Damerau, F., Weiss, S. (1994) Towards language independent automated learning of text categorization models, In proceedings of The 17th Annual ACM/SIGIR Conference, 23–30.Google Scholar
  100. 100.
    Dasarathy, B. V. (1991) Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, Los Alamitos: IEEE Computer Society Press.Google Scholar
  101. 101.
    Kretschmann, E., Fleischmann, W., Apweiler, R. (2001) Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT, Bioinformatics, 17, 920–926.PubMedCrossRefGoogle Scholar
  102. 102.
    Bazzan, A. L., Engel, P. M., Schroeder, L. F. et al. (2002) Automated annotation of keywords for proteins related to mycoplasmataceae using machine learning techniques, Bioinformatics, 18 Suppl 2, S35–43.Google Scholar
  103. 103.
    Fleischmann, W., Moller, S., Gateau, A. et al. (1999) A novel method for automatic functional annotation of proteins, Bioinformatics, 15, 228–233.PubMedCrossRefGoogle Scholar
  104. 104.
    Nishikawa, K., Kubota, Y., Ooi, T. (1983) Classification of proteins into groups based on amino acid composition and other characters. II. Grouping into four types, J Biochem, 94, 997–1007.PubMedGoogle Scholar
  105. 105.
    Nishikawa, K., Kubota, Y., Ooi, T. (1983) Classification of proteins into groups based on amino acid composition and other characters. I. Angular distribution, J Biochem, 94, 981–995.PubMedGoogle Scholar
  106. 106.
    Nakashima, H., Nishikawa, K. (1994) Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, J Mol Biol, 238, 54–61.PubMedCrossRefGoogle Scholar
  107. 107.
    Horton, P., Nakai, K. (1997) Better prediction of protein cellular localization sites with the k nearest neighbors classifier, Proc Int Conf Intell Syst Mol Biol, 5, 147–152.PubMedGoogle Scholar
  108. 108.
    Nakai, K., Horton, P. (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization, Trends Biochem Sci, 24, 34–36.PubMedCrossRefGoogle Scholar
  109. 109.
    Horton, P., Nakai, K. (1996) A probabilistic classification system for predicting the cellular localization sites of proteins, Proc Int Conf Intell Syst Mol Biol, 4, 109–115.PubMedGoogle Scholar
  110. 110.
    Reinhardt, A., Hubbard, T. (1998) Using neural networks for prediction of the subcellular location of proteins, Nucleic Acids Res, 26, 2230–2236.PubMedCrossRefGoogle Scholar
  111. 111.
    Yuan, Z. (1999) Prediction of protein subcellular locations using Markov chain models, FEBS Lett, 451, 23–26.PubMedCrossRefGoogle Scholar
  112. 112.
    Cedano, J., Aloy, P., Perez-Pons, J. A. et al. (1997) Relation between amino acid composition and cellular location of proteins, J Mol Biol, 266, 594–600.PubMedCrossRefGoogle Scholar
  113. 113.
    Chou, K. C. (2000) Prediction of protein subcellular locations by incorporating quasi-sequence-order effect, Biochem Biophys Res Commun, 278, 477–483.PubMedCrossRefGoogle Scholar
  114. 114.
    Chou, K. C. (2001) Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246–255.PubMedCrossRefGoogle Scholar
  115. 115.
    Marcotte, E. M., Xenarios, I., van Der Bliek, A. M. et al. (2000) Localizing proteins in the cell from their phylogenetic profiles, Proc Natl Acad Sci USA, 97, 12115–12120.PubMedCrossRefGoogle Scholar
  116. 116.
    Nakai, K., Kidera, A., Kanehisa, M. (1988) Cluster analysis of amino acid indices for prediction of protein structure and function, Protein Eng, 2, 93–100.PubMedCrossRefGoogle Scholar
  117. 117.
    Nakai, K., Kanehisa, M. (1992) A knowledge base for predicting protein localization sites in eukaryotic cells, Genomics, 14, 897–911.PubMedCrossRefGoogle Scholar
  118. 118.
    Drawid, A., Gerstein M. A. (2000) Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome, J Mol Biol, 301, 1059–1075.PubMedCrossRefGoogle Scholar
  119. 119.
    Nasibov, E., Kandemir-Cavas, C. (2008) Protein subcellular location prediction using optimally weighted fuzzy k-NN algorithm, Comput Biol Chem, 32, 448–451.PubMedCrossRefGoogle Scholar
  120. 120.
    Larose, D.T. (2005) Discovering Knowledge in Data: An Introduction to Data Mining, John Wiley and Sons, Inc, Hoboken, NJ.Google Scholar
  121. 121.
    Andrade, M. A., O’Donoghue, S. I., Rost, B. (1998) Adaptation of protein surfaces to subcellular location, J Molr Biol, 276, 517–525.CrossRefGoogle Scholar
  122. 122.
    Nair, R., Rost, B. (2004) LOCnet and LOCtarget: sub-cellular localization for structural genomics targets, Nucleic Acids Res, 32, W517–521.CrossRefGoogle Scholar
  123. 123.
    Nair, R., Rost, B. (2003) Better prediction of sub-cellular localization by combining evolutionary and structural information, Proteins, 53, 917–930.PubMedCrossRefGoogle Scholar
  124. 124.
    Horton, P., Park, K. J., Obayashi, T. et al. (2007) WoLF PSORT: protein localization predictor, Nucleic Acids Res, 35, W585–587.CrossRefGoogle Scholar
  125. 125.
    Ashburner, M., Ball C. A., Blake, J. A. et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat Genet, 25, 25–29.PubMedCrossRefGoogle Scholar
  126. 126.
    Punta, M., Rost, B. (2005) PROFcon: novel prediction of long-range contacts, Bioinformatics, 21, 2960–2968.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Shruti Rastogi
    • 1
  • Burkhard Rost
    • 2
  1. 1.Department of Biochemistry and Molecular BiophysicsColumbia University and Columbia University Center for Computational Biology and Bioinformatics (C2B2)New YorkUSA
  2. 2.Department of Biochemistry and Molecular BiophysicsColumbia University and Columbia University Center for Computational Biology and Bioinformatics (C2B2) and NorthEast Structural Genomics Consortium (NESG) & New York Consortium on Membrane Protein Structure (NYCOMPS)New YorkUSA

Personalised recommendations